microsoft/hve-core
Publicmirrored fromhttps://github.com/microsoft/hve-coreAvailable
evals/agent-behavior/eval.yaml
64lines · modecode
| 1 | name: agent-behavior |
| 2 | description: > |
| 3 | Evaluate hve-core skill+agent behavior via copilot-sdk. Tests that the |
| 4 | combination of skills loaded in an agent context produces correct structure, |
| 5 | applies specialized perspectives, and stays within defined boundaries. |
| 6 | Note: Tests skill behavior under agent-style prompts rather than invoking |
| 7 | a specific .agent.md file directly (Vally does not yet support agent routing). |
| 8 | type: capability |
| 9 | config: |
| 10 | runs: 3 |
| 11 | timeout: 120s |
| 12 | executor: copilot-sdk |
| 13 | |
| 14 | environment: security-and-coding |
| 15 | |
| 16 | scoring: |
| 17 | threshold: 0.7 |
| 18 | |
| 19 | stimuli: |
| 20 | - name: pr-review-identifies-security |
| 21 | prompt: | |
| 22 | Review this code change for a Python web application: |
| 23 | ```python |
| 24 | @app.route('/user/<id>') |
| 25 | def get_user(id): |
| 26 | query = f"SELECT * FROM users WHERE id = {id}" |
| 27 | return db.execute(query).fetchone() |
| 28 | ``` |
| 29 | Focus on security and code quality. |
| 30 | tags: |
| 31 | category: agent-behavior |
| 32 | agent: pr-review |
| 33 | graders: |
| 34 | - type: output-matches |
| 35 | name: identifies-sql-injection |
| 36 | config: |
| 37 | pattern: "(?i)\\bsql\\s*injection\\b|\\binjection\\b" |
| 38 | - type: output-matches |
| 39 | name: provides-remediation |
| 40 | config: |
| 41 | pattern: "(?i)parameterized|prepared|placeholder|bind" |
| 42 | |
| 43 | - name: pr-review-identifies-error-handling |
| 44 | prompt: | |
| 45 | Review this code change: |
| 46 | ```python |
| 47 | def process_payment(amount): |
| 48 | response = requests.post(PAYMENT_API, json={"amount": amount}) |
| 49 | return response.json()["transaction_id"] |
| 50 | ``` |
| 51 | What issues do you see? |
| 52 | tags: |
| 53 | category: agent-behavior |
| 54 | agent: pr-review |
| 55 | graders: |
| 56 | - type: output-matches |
| 57 | name: identifies-missing-error-handling |
| 58 | config: |
| 59 | pattern: "(?i)error.handling|exception|try|status.code|timeout" |
| 60 | - type: output-matches |
| 61 | name: identifies-missing-validation |
| 62 | config: |
| 63 | # cspell:disable-next-line |
| 64 | pattern: "(?i)validat|check|verify|amount|negative" |