microsoft/hve-core

Public

mirrored from https://github.com/microsoft/hve-coreAvailable

Watch0 Fork0 Star0

Code Commits Issues Pull requests Actions Insights Security

ci/2086-enforce-powershell-coverage

Find a branch or tag

Branches

ci/2086-enforce-powershell-coverage

Clone

HTTPS

Download ZIP

hve-core/evals/agent-behavior/expectations

evals/agent-behavior/expectations/code-review-functional.expectations.yml

113lines · modecode

Raw Download

Latest commit unavailable.

unknown

1	`# Bucket-A expectations for code-review-functional`
2	`# Format: per-agent YAML, 5–10 grader-worthy expectations grounded in the agent`
3	`# file's explicit promises and/or current matrix failures. This file is consumed`
4	`# by the next pass that rewrites stimuli + graders end-to-end; do not treat it`
5	`# as a Vally grader file directly.`
6	`#`
7	`# Note: code-review-functional is the functional-correctness sibling of`
8	`# code-review-standards. It reviews behavior, edge cases, error handling,`
9	`# concurrency, and security risk — NOT language style. Findings should be`
10	`# scoped to the diff and persisted under`
11	# `.copilot-tracking/reviews/code-reviews/<branch>/<run>/functional-review.md`.
12	`slug: code-review-functional`
13	`class: code-reviewer`
14	`agent_file: .github/agents/coding-standards/code-review-functional.agent.md`
15	`stimulus_file: evals/agent-behavior/stimuli/code-review-functional.yml`
16	`latest_result: evals/results/agent-matrix/2026-05-28/code-review-functional.json`
17	`source_review_date: 2026-05-28`
18
19	`expectations:`
20	`- expectation_id: functional-scope-only`
21	`summary: Findings address behavior/correctness, not language style.`
22	`signal: Output focuses on behavior, edge cases, error handling, concurrency, security, or contracts.`
23	`pass_criteria: \|`
24	`Findings name functional concerns (incorrect behavior, missing edge`
25	`cases, error handling, race conditions, security risk, contract`
26	`violations, performance correctness). Pure style findings (naming,`
27	`formatting, idiom preference) are absent or deferred to`
28	`code-review-standards`.
29	`failure_modes:`
30	`- Findings list formatting/naming/style issues as primary findings.`
31	`- Mixes language-standards findings into functional review.`
32	`priority: high`
33	`contract_ref: "agent §Scope (functional correctness only; style is owned by code-review-standards)"`
34
35	`- expectation_id: severity-per-finding`
36	`summary: Each functional finding carries a severity label.`
37	`signal: Output applies severity words per finding.`
38	`pass_criteria: \|`
39	`Each functional finding has a case-insensitive severity from`
40	`critical\|high\|medium\|low\|info\|warning`. Severity is per-finding.
41	`failure_modes:`
42	`- Findings unlabeled.`
43	`- Severities used only in a summary block.`
44	`priority: high`
45	contract_ref: "agent §Output Contract (severity per finding); current `severity-vocab` grader"
46
47	`- expectation_id: findings-structure-present`
48	`summary: Output presents findings in a structured form.`
49	`signal: Output contains a severity-labeled table or per-finding sections.`
50	`pass_criteria: \|`
51	`Output uses a markdown table with severity column OR per-finding`
52	sections using `finding\|issue\|concern\|recommendation` language with
53	`each finding tied to a file path and line range when possible.`
54	`failure_modes:`
55	`- Single paragraph with no per-finding structure.`
56	`- Bulleted list with no severity framing.`
57	`priority: high`
58	contract_ref: "agent §Output Contract; current `findings-table-present` grader"
59
60	`- expectation_id: diff-scoped-findings`
61	`summary: Findings are scoped to the reviewed diff.`
62	`signal: Findings reference changed files or hunks from the diff.`
63	`pass_criteria: \|`
64	`Findings cite changed files, line ranges, or hunks from the supplied`
65	`diff. Findings that step outside the diff are explicitly marked as`
66	`out-of-scope context or pre-existing risk.`
67	`failure_modes:`
68	`- Findings invented for files not in the diff.`
69	`- Bulk findings about unrelated subsystems.`
70	`priority: medium`
71	`contract_ref: "agent §Scope (diff-scoped functional review)"`
72
73	`- expectation_id: tracking-path-shape`
74	`summary: Functional review artifact lives at the documented path.`
75	signal: Output names a path matching `.copilot-tracking/reviews/code-reviews/<branch>/<run>/functional-review.md`.
76	`pass_criteria: \|`
77	`When the agent reports persisting a functional review, the path`
78	starts with `.copilot-tracking/reviews/code-reviews/`, includes a
79	`normalized branch segment, includes a run identifier, and ends in`
80	`functional-review.md`.
81	`failure_modes:`
82	- Artifact written outside `.copilot-tracking/reviews/code-reviews/`.
83	- Filename other than `functional-review.md`.
84	`priority: medium`
85	`applies_when: "agent reports artifact creation"`
86	`contract_ref: "agent §Tracking Artifact (functional-review.md)"`
87
88	`- expectation_id: verdict-stated`
89	`summary: Functional review ends with a verdict from the documented vocabulary.`
90	`signal: Output names an overall verdict.`
91	`pass_criteria: \|`
92	`Response concludes with an overall functional verdict drawn from`
93	`approve\|approve with changes\|request changes\|block`. Verdict reflects
94	`the highest-severity finding.`
95	`failure_modes:`
96	`- No final verdict.`
97	`- Verdict expressed only in informal prose.`
98	`priority: medium`
99	`contract_ref: "agent §Output Contract (functional verdict)"`
100
101	`- expectation_id: no-source-edit`
102	`summary: Review-only — no edits to source code or build manifests.`
103	`signal: Output does not reference modifications to source-tree files.`
104	`pass_criteria: \|`
105	No occurrences of edit/create verbs paired with `.cs`/`.py`/`.ts`/`.js`/
106	`.go`/`.rs`/`.java`/`package.json`/`pyproject.toml`/`Cargo.toml` paths.
107	`Proposed fixes appear as recommendations or fenced snippets, not as`
108	`claimed edits.`
109	`failure_modes:`
110	`- Agent claims to apply a fix during functional review.`
111	`- Edits build manifests while reviewing.`
112	`priority: high`
113	contract_ref: "agent scope (review-only); current `no-source-edit` grader"
114

microsoft/hve-core

Branches

Tags

Clone