microsoft/hve-core

Public

mirrored fromhttps://github.com/microsoft/hve-coreAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
docs/update-npm-script

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

evals/skill-quality/eval.yaml

70lines · modecode

1name: skill-quality
2description: >
3 Evaluate hve-core skill behavior via copilot-sdk. Tests that skills
4 provide accurate, structured guidance when invoked with domain-specific
5 prompts. Uses multiple runs to account for non-deterministic output.
6type: capability
7config:
8 runs: 3
9 timeout: 120s
10 executor: copilot-sdk
11
12environment: security
13
14scoring:
15 threshold: 0.7
16
17stimuli:
18 - name: owasp-top10-identify-injection
19 prompt: |
20 I have a web application that constructs SQL queries by concatenating
21 user input directly into the query string. What OWASP Top 10 category
22 does this fall under, and what are the recommended mitigations?
23 tags:
24 category: skill-quality
25 skill: owasp-top-10
26 graders:
27 - type: output-matches
28 name: identifies-injection-category
29 config:
30 pattern: "(?i)\\binjection\\b"
31 - type: output-matches
32 name: references-parameterized-queries
33 config:
34 pattern: "(?i)parameterized|prepared.statement"
35
36 - name: owasp-top10-identify-broken-access
37 prompt: |
38 A user can modify the URL parameter from /api/users/123 to /api/users/456
39 and access another user's data. What OWASP Top 10 vulnerability is this?
40 tags:
41 category: skill-quality
42 skill: owasp-top-10
43 graders:
44 - type: output-matches
45 name: identifies-broken-access-control
46 config:
47 pattern: "(?i)broken.access.control|insecure.direct.object.reference|IDOR"
48 - type: output-matches
49 name: suggests-authorization-check
50 config:
51 # cspell:disable-next-line
52 pattern: "(?i)authori[sz]ation|access.control|ownership.check|server.side"
53
54 - name: owasp-cicd-pipeline-poisoning
55 prompt: |
56 Our CI/CD pipeline pulls build scripts from an external repository
57 that multiple developers can push to without review. What security
58 risks does this introduce according to OWASP CI/CD guidelines?
59 tags:
60 category: skill-quality
61 skill: owasp-cicd
62 graders:
63 - type: output-matches
64 name: identifies-pipeline-poisoning
65 config:
66 pattern: "(?i)pipeline.poison|code.injection|supply.chain|untrusted"
67 - type: output-matches
68 name: recommends-controls
69 config:
70 pattern: "(?i)branch.protect|code.review|sign|pin|restrict"
71