microsoft/hve-core

Public

mirrored fromhttps://github.com/microsoft/hve-coreAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
3e27fc3012f44412a72bda3c9a4f55fd27eeb5a3

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

evals/agent-behavior/eval.yaml

64lines · modecode

1name: agent-behavior
2description: >
3 Evaluate hve-core skill+agent behavior via copilot-sdk. Tests that the
4 combination of skills loaded in an agent context produces correct structure,
5 applies specialized perspectives, and stays within defined boundaries.
6 Note: Tests skill behavior under agent-style prompts rather than invoking
7 a specific .agent.md file directly (Vally does not yet support agent routing).
8type: capability
9config:
10 runs: 3
11 timeout: 120s
12 executor: copilot-sdk
13
14environment: security-and-coding
15
16scoring:
17 threshold: 0.7
18
19stimuli:
20 - name: pr-review-identifies-security
21 prompt: |
22 Review this code change for a Python web application:
23 ```python
24 @app.route('/user/<id>')
25 def get_user(id):
26 query = f"SELECT * FROM users WHERE id = {id}"
27 return db.execute(query).fetchone()
28 ```
29 Focus on security and code quality.
30 tags:
31 category: agent-behavior
32 agent: pr-review
33 graders:
34 - type: output-matches
35 name: identifies-sql-injection
36 config:
37 pattern: "(?i)\\bsql\\s*injection\\b|\\binjection\\b"
38 - type: output-matches
39 name: provides-remediation
40 config:
41 pattern: "(?i)parameterized|prepared|placeholder|bind"
42
43 - name: pr-review-identifies-error-handling
44 prompt: |
45 Review this code change:
46 ```python
47 def process_payment(amount):
48 response = requests.post(PAYMENT_API, json={"amount": amount})
49 return response.json()["transaction_id"]
50 ```
51 What issues do you see?
52 tags:
53 category: agent-behavior
54 agent: pr-review
55 graders:
56 - type: output-matches
57 name: identifies-missing-error-handling
58 config:
59 pattern: "(?i)error.handling|exception|try|status.code|timeout"
60 - type: output-matches
61 name: identifies-missing-validation
62 config:
63 # cspell:disable-next-line
64 pattern: "(?i)validat|check|verify|amount|negative"