microsoft/hve-core

Public

mirrored fromhttps://github.com/microsoft/hve-coreAvailable

Watch0 Fork0 Star0

Code Commits Issues Pull requests Actions Insights Security

3e27fc3012f44412a72bda3c9a4f55fd27eeb5a3

Find a branch or tag

Branches

3e27fc3012f44412a72bda3c9a4f55fd27eeb5a3

Clone

HTTPS

Download ZIP

hve-core/evals/agent-behavior

evals/agent-behavior/eval.yaml

64lines · modecode

Raw Download

Latest commit unavailable.

unknown

1	`name: agent-behavior`
2	`description: >`
3	`Evaluate hve-core skill+agent behavior via copilot-sdk. Tests that the`
4	`combination of skills loaded in an agent context produces correct structure,`
5	`applies specialized perspectives, and stays within defined boundaries.`
6	`Note: Tests skill behavior under agent-style prompts rather than invoking`
7	`a specific .agent.md file directly (Vally does not yet support agent routing).`
8	`type: capability`
9	`config:`
10	`runs: 3`
11	`timeout: 120s`
12	`executor: copilot-sdk`
13
14	`environment: security-and-coding`
15
16	`scoring:`
17	`threshold: 0.7`
18
19	`stimuli:`
20	`- name: pr-review-identifies-security`
21	`prompt: \|`
22	`Review this code change for a Python web application:`
23	```python
24	`@app.route('/user/<id>')`
25	`def get_user(id):`
26	`query = f"SELECT * FROM users WHERE id = {id}"`
27	`return db.execute(query).fetchone()`
28	```
29	`Focus on security and code quality.`
30	`tags:`
31	`category: agent-behavior`
32	`agent: pr-review`
33	`graders:`
34	`- type: output-matches`
35	`name: identifies-sql-injection`
36	`config:`
37	`pattern: "(?i)\\bsql\\s*injection\\b\|\\binjection\\b"`
38	`- type: output-matches`
39	`name: provides-remediation`
40	`config:`
41	`pattern: "(?i)parameterized\|prepared\|placeholder\|bind"`
42
43	`- name: pr-review-identifies-error-handling`
44	`prompt: \|`
45	`Review this code change:`
46	```python
47	`def process_payment(amount):`
48	`response = requests.post(PAYMENT_API, json={"amount": amount})`
49	`return response.json()["transaction_id"]`
50	```
51	`What issues do you see?`
52	`tags:`
53	`category: agent-behavior`
54	`agent: pr-review`
55	`graders:`
56	`- type: output-matches`
57	`name: identifies-missing-error-handling`
58	`config:`
59	`pattern: "(?i)error.handling\|exception\|try\|status.code\|timeout"`
60	`- type: output-matches`
61	`name: identifies-missing-validation`
62	`config:`
63	`# cspell:disable-next-line`
64	`pattern: "(?i)validat\|check\|verify\|amount\|negative"`

microsoft/hve-core

Branches

Tags

Clone