microsoft/hve-core

Public

mirrored fromhttps://github.com/microsoft/hve-coreAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
feature/add-demo-setup-skill

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

evals/script-validation/eval.yaml

87lines · modecode

1name: script-validation
2description: >
3 Validate that agents correctly identify valid and invalid skill
4 structures when presented with file listings. Uses copilot-sdk
5 to test real agent reasoning about validation rules. Will migrate
6 to the mock executor when the deterministic plugin ships.
7type: capability
8config:
9 runs: 3
10 timeout: 120s
11 executor: copilot-sdk
12
13environment: coding-standards
14
15scoring:
16 threshold: 0.7
17
18stimuli:
19 - name: identify-valid-skill-structure
20 prompt: |
21 I have a skill directory with the following structure:
22 ```
23 my-skill/
24 ├── SKILL.md (contains frontmatter with name, description, license)
25 ├── references/
26 │ └── api-guide.md
27 └── scripts/
28 └── validate.py
29 ```
30 Does this follow valid skill structure conventions? What required
31 elements are present?
32 tags:
33 category: script-validation
34 script: skill-validation
35 graders:
36 - type: output-matches
37 name: identifies-valid-structure
38 config:
39 pattern: "(?i)valid|correct|proper|follows.*convention"
40 - type: output-matches
41 name: identifies-skill-md
42 config:
43 pattern: "(?i)SKILL\\.md|skill.entry|entrypoint"
44
45 - name: identify-missing-skill-md
46 prompt: |
47 I have a skill directory with this structure:
48 ```
49 broken-skill/
50 ├── README.md
51 ├── references/
52 │ └── guide.md
53 └── scripts/
54 └── run.py
55 ```
56 Is this a valid skill structure? What's missing?
57 tags:
58 category: script-validation
59 script: skill-validation
60 graders:
61 - type: output-matches
62 name: identifies-missing-skillmd
63 config:
64 pattern: "(?i)missing.*SKILL\\.md|no.*SKILL\\.md|SKILL\\.md.*required|SKILL\\.md.*missing"
65 - type: output-matches
66 name: identifies-invalid
67 config:
68 pattern: "(?i)invalid|not.*valid|missing.*required|incomplete"
69
70 - name: identify-frontmatter-issues
71 prompt: |
72 This SKILL.md file has the following frontmatter:
73 ```yaml
74 ---
75 name: my-tool
76 ---
77 ```
78 Is this frontmatter complete for a skill file? What fields
79 are missing according to skill conventions?
80 tags:
81 category: script-validation
82 script: frontmatter-validation
83 graders:
84 - type: output-matches
85 name: identifies-missing-description
86 config:
87 pattern: "(?i)description.*missing|missing.*description|need.*description|no.*description|description.*required|lacks.*description|without.*description|description.*absent|description.*not"