microsoft/hve-core
Publicmirrored fromhttps://github.com/microsoft/hve-coreAvailable
evals/skill-quality/eval.yaml
70lines · modecode
| 1 | name: skill-quality |
| 2 | description: > |
| 3 | Evaluate hve-core skill behavior via copilot-sdk. Tests that skills |
| 4 | provide accurate, structured guidance when invoked with domain-specific |
| 5 | prompts. Uses multiple runs to account for non-deterministic output. |
| 6 | type: capability |
| 7 | config: |
| 8 | runs: 3 |
| 9 | timeout: 120s |
| 10 | executor: copilot-sdk |
| 11 | |
| 12 | environment: security |
| 13 | |
| 14 | scoring: |
| 15 | threshold: 0.7 |
| 16 | |
| 17 | stimuli: |
| 18 | - name: owasp-top10-identify-injection |
| 19 | prompt: | |
| 20 | I have a web application that constructs SQL queries by concatenating |
| 21 | user input directly into the query string. What OWASP Top 10 category |
| 22 | does this fall under, and what are the recommended mitigations? |
| 23 | tags: |
| 24 | category: skill-quality |
| 25 | skill: owasp-top-10 |
| 26 | graders: |
| 27 | - type: output-matches |
| 28 | name: identifies-injection-category |
| 29 | config: |
| 30 | pattern: "(?i)\\binjection\\b" |
| 31 | - type: output-matches |
| 32 | name: references-parameterized-queries |
| 33 | config: |
| 34 | pattern: "(?i)parameterized|prepared.statement" |
| 35 | |
| 36 | - name: owasp-top10-identify-broken-access |
| 37 | prompt: | |
| 38 | A user can modify the URL parameter from /api/users/123 to /api/users/456 |
| 39 | and access another user's data. What OWASP Top 10 vulnerability is this? |
| 40 | tags: |
| 41 | category: skill-quality |
| 42 | skill: owasp-top-10 |
| 43 | graders: |
| 44 | - type: output-matches |
| 45 | name: identifies-broken-access-control |
| 46 | config: |
| 47 | pattern: "(?i)broken.access.control|insecure.direct.object.reference|IDOR" |
| 48 | - type: output-matches |
| 49 | name: suggests-authorization-check |
| 50 | config: |
| 51 | # cspell:disable-next-line |
| 52 | pattern: "(?i)authori[sz]ation|access.control|ownership.check|server.side" |
| 53 | |
| 54 | - name: owasp-cicd-pipeline-poisoning |
| 55 | prompt: | |
| 56 | Our CI/CD pipeline pulls build scripts from an external repository |
| 57 | that multiple developers can push to without review. What security |
| 58 | risks does this introduce according to OWASP CI/CD guidelines? |
| 59 | tags: |
| 60 | category: skill-quality |
| 61 | skill: owasp-cicd |
| 62 | graders: |
| 63 | - type: output-matches |
| 64 | name: identifies-pipeline-poisoning |
| 65 | config: |
| 66 | pattern: "(?i)pipeline.poison|code.injection|supply.chain|untrusted" |
| 67 | - type: output-matches |
| 68 | name: recommends-controls |
| 69 | config: |
| 70 | pattern: "(?i)branch.protect|code.review|sign|pin|restrict" |
| 71 | |