microsoft/hve-core

Public

mirrored from https://github.com/microsoft/hve-coreAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
ci/2086-enforce-powershell-coverage

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

evals/script-validation/eval.yaml

91lines · modecode

1name: script-validation
2description: >
3 Validate that agents correctly identify valid and invalid skill
4 structures when presented with file listings. Uses copilot-sdk
5 to test real agent reasoning about validation rules. Will migrate
6 to the mock executor when the deterministic plugin ships.
7type: capability
8defaults:
9 runs: 3
10 timeout: 120s
11 executor: copilot-sdk
12
13# Skill paths are resolved relative to this spec's directory (evals/script-validation/),
14# so they ascend to the repo root before descending into .github/skills.
15environment:
16 skills:
17 - ../../.github/skills/coding-standards/python-foundational
18
19scoring:
20 threshold: 0.7
21
22stimuli:
23 - name: identify-valid-skill-structure
24 prompt: |
25 I have a skill directory with the following structure:
26 ```
27 my-skill/
28 ├── SKILL.md (contains frontmatter with name, description, license)
29 ├── references/
30 │ └── api-guide.md
31 └── scripts/
32 └── validate.py
33 ```
34 Does this follow valid skill structure conventions? What required
35 elements are present?
36 tags:
37 category: script-validation
38 script: skill-validation
39 graders:
40 - type: output-matches
41 name: identifies-valid-structure
42 config:
43 pattern: "(?i)valid|correct|proper|follows.*convention"
44 - type: output-matches
45 name: identifies-skill-md
46 config:
47 pattern: "(?i)SKILL\\.md|skill.entry|entrypoint"
48
49 - name: identify-missing-skill-md
50 prompt: |
51 I have a skill directory with this structure:
52 ```
53 broken-skill/
54 ├── README.md
55 ├── references/
56 │ └── guide.md
57 └── scripts/
58 └── run.py
59 ```
60 Is this a valid skill structure? What's missing?
61 tags:
62 category: script-validation
63 script: skill-validation
64 graders:
65 - type: output-matches
66 name: identifies-missing-skillmd
67 config:
68 pattern: "(?i)missing.*SKILL\\.md|no.*SKILL\\.md|SKILL\\.md.*required|SKILL\\.md.*missing"
69 - type: output-matches
70 name: identifies-invalid
71 config:
72 pattern: "(?i)invalid|not.*valid|missing.*required|incomplete"
73
74 - name: identify-frontmatter-issues
75 prompt: |
76 This SKILL.md file has the following frontmatter:
77 ```yaml
78 ---
79 name: my-tool
80 ---
81 ```
82 Is this frontmatter complete for a skill file? What fields
83 are missing according to skill conventions?
84 tags:
85 category: script-validation
86 script: frontmatter-validation
87 graders:
88 - type: output-matches
89 name: identifies-missing-description
90 config:
91 pattern: "(?i)description.*missing|missing.*description|need.*description|no.*description|description.*required|lacks.*description|without.*description|description.*absent|description.*not"