microsoft/hve-core

Public

mirrored from https://github.com/microsoft/hve-coreAvailable

Watch0 Fork0 Star0

Code Commits Issues Pull requests Actions Insights Security

ci/2086-enforce-powershell-coverage

Find a branch or tag

Branches

ci/2086-enforce-powershell-coverage

Clone

HTTPS

Download ZIP

hve-core/evals/script-validation

evals/script-validation/eval.yaml

91lines · modecode

Raw Download

Latest commit unavailable.

unknown

1	`name: script-validation`
2	`description: >`
3	`Validate that agents correctly identify valid and invalid skill`
4	`structures when presented with file listings. Uses copilot-sdk`
5	`to test real agent reasoning about validation rules. Will migrate`
6	`to the mock executor when the deterministic plugin ships.`
7	`type: capability`
8	`defaults:`
9	`runs: 3`
10	`timeout: 120s`
11	`executor: copilot-sdk`
12
13	`# Skill paths are resolved relative to this spec's directory (evals/script-validation/),`
14	`# so they ascend to the repo root before descending into .github/skills.`
15	`environment:`
16	`skills:`
17	`- ../../.github/skills/coding-standards/python-foundational`
18
19	`scoring:`
20	`threshold: 0.7`
21
22	`stimuli:`
23	`- name: identify-valid-skill-structure`
24	`prompt: \|`
25	`I have a skill directory with the following structure:`
26	```
27	`my-skill/`
28	`├── SKILL.md (contains frontmatter with name, description, license)`
29	`├── references/`
30	`│ └── api-guide.md`
31	`└── scripts/`
32	`└── validate.py`
33	```
34	`Does this follow valid skill structure conventions? What required`
35	`elements are present?`
36	`tags:`
37	`category: script-validation`
38	`script: skill-validation`
39	`graders:`
40	`- type: output-matches`
41	`name: identifies-valid-structure`
42	`config:`
43	`pattern: "(?i)valid\|correct\|proper\|follows.*convention"`
44	`- type: output-matches`
45	`name: identifies-skill-md`
46	`config:`
47	`pattern: "(?i)SKILL\\.md\|skill.entry\|entrypoint"`
48
49	`- name: identify-missing-skill-md`
50	`prompt: \|`
51	`I have a skill directory with this structure:`
52	```
53	`broken-skill/`
54	`├── README.md`
55	`├── references/`
56	`│ └── guide.md`
57	`└── scripts/`
58	`└── run.py`
59	```
60	`Is this a valid skill structure? What's missing?`
61	`tags:`
62	`category: script-validation`
63	`script: skill-validation`
64	`graders:`
65	`- type: output-matches`
66	`name: identifies-missing-skillmd`
67	`config:`
68	`pattern: "(?i)missing.SKILL\\.md\|no.SKILL\\.md\|SKILL\\.md.required\|SKILL\\.md.missing"`
69	`- type: output-matches`
70	`name: identifies-invalid`
71	`config:`
72	`pattern: "(?i)invalid\|not.valid\|missing.required\|incomplete"`
73
74	`- name: identify-frontmatter-issues`
75	`prompt: \|`
76	`This SKILL.md file has the following frontmatter:`
77	```yaml
78	`---`
79	`name: my-tool`
80	`---`
81	```
82	`Is this frontmatter complete for a skill file? What fields`
83	`are missing according to skill conventions?`
84	`tags:`
85	`category: script-validation`
86	`script: frontmatter-validation`
87	`graders:`
88	`- type: output-matches`
89	`name: identifies-missing-description`
90	`config:`
91	`pattern: "(?i)description.missing\|missing.description\|need.description\|no.description\|description.required\|lacks.description\|without.description\|description.absent\|description.*not"`

microsoft/hve-core

Branches

Tags

Clone