microsoft/hve-core

Public

mirrored from https://github.com/microsoft/hve-coreAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
ci/2086-enforce-powershell-coverage

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

evals/behavior-conformance/skill-behavior.eval.yaml

1378lines · modecode

1name: behavior-conformance-skills
2description: >
3 Advisory-tier behavior conformance evals for 25 skills exercised across three
4 stimulus shapes: knowledge (canonical concept attribution), tool-trigger
5 (domain-intent skill attribution from a working context), and bleed-detection
6 (correct refusal when an off-topic context superficially resembles the
7 skill's domain). Total: 75 stimuli. Each tool-trigger stimulus uses two
8 graders with AND logic, and the suite-level scoring threshold gates the
9 aggregate pass rate across stimuli.
10type: capability
11defaults:
12 runs: 3
13 timeout: 120s
14 executor: copilot-sdk
15
16scoring:
17 threshold: 0.6
18
19stimuli:
20
21 - name: skill-python-foundational-knowledge
22 prompt: |
23 Summarize the canonical Python idioms championed by the
24 `python-foundational` skill. Cite at least three patterns it advises.
25 tags:
26 category: behavior-conformance
27 skill: python-foundational
28 shape: knowledge
29 advisory: "true"
30 graders:
31 - type: output-matches
32 name: skill-attribution
33 config:
34 pattern: '(?i)(dataclass|pathlib|type\s+hint|comprehension|context\s+manager)'
35 - type: output-matches
36 name: scope-language
37 config:
38 pattern: '(?i)(python|idiom|foundational|best\s+practice)'
39 - name: skill-python-foundational-tool-trigger
40 prompt: |
41 I am authoring a new module at `scripts/utilities/helper.py` and want
42 idiomatic, foundational Python patterns applied. Which skill under
43 `.github/skills/**/SKILL.md` applies and what does it advise?
44 tags:
45 category: behavior-conformance
46 skill: python-foundational
47 shape: tool-trigger
48 advisory: "true"
49 graders:
50 - type: output-matches
51 name: skill-domain-attribution
52 config:
53 pattern: '(?i)(python|foundational|idiom|skill|pattern)'
54 - type: output-matches
55 name: scope-language
56 config:
57 pattern: '(?i)(python|idiom|skill|pattern)'
58 - name: skill-python-foundational-bleed-detection
59 prompt: |
60 I am authoring a new TypeScript module at `extension/src/helper.ts`.
61 Does the `python-foundational` skill apply here? Justify briefly.
62 tags:
63 category: behavior-conformance
64 skill: python-foundational
65 shape: bleed-detection
66 advisory: "true"
67 graders:
68 - type: output-matches
69 name: skill-attribution
70 config:
71 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|N\/A)'
72 - type: output-matches
73 name: scope-language
74 config:
75 pattern: '(?i)(typescript|javascript|python|scope)'
76 - name: skill-customer-card-render-knowledge
77 prompt: |
78 What inputs does the `customer-card-render` skill consume and what
79 output artifact does it produce?
80 tags:
81 category: behavior-conformance
82 skill: customer-card-render
83 shape: knowledge
84 advisory: "true"
85 graders:
86 - type: output-matches
87 name: skill-attribution
88 config:
89 pattern: '(?i)(content\.yaml|customer[-\s]card|design\s+thinking|powerpoint)'
90 - type: output-matches
91 name: scope-language
92 config:
93 pattern: '(?i)(skill|render|design\s+thinking|deck|powerpoint)'
94 - name: skill-customer-card-render-tool-trigger
95 prompt: |
96 I have completed Design Thinking canonical artifacts under
97 `.copilot-tracking/dt/methods/method-08/` and need to generate a
98 customer-card deck from them. Which skill applies?
99 tags:
100 category: behavior-conformance
101 skill: customer-card-render
102 shape: tool-trigger
103 advisory: "true"
104 graders:
105 - type: output-matches
106 name: skill-domain-attribution
107 config:
108 pattern: '(?i)(customer|card|render|design\s+thinking|customer[-\s]card|deck|powerpoint)'
109 - type: output-matches
110 name: scope-language
111 config:
112 pattern: '(?i)(design\s+thinking|customer[-\s]card|deck|powerpoint)'
113 - name: skill-customer-card-render-bleed-detection
114 prompt: |
115 I need to generate a generic project status PowerPoint with no Design
116 Thinking inputs involved. Does the `customer-card-render` skill apply?
117 tags:
118 category: behavior-conformance
119 skill: customer-card-render
120 shape: bleed-detection
121 advisory: "true"
122 graders:
123 - type: output-matches
124 name: skill-attribution
125 config:
126 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|powerpoint\s+skill)'
127 - type: output-matches
128 name: scope-language
129 config:
130 pattern: '(?i)(design\s+thinking|generic|status|powerpoint)'
131 - name: skill-powerpoint-knowledge
132 prompt: |
133 Summarize the `powerpoint` skill's content-and-style pipeline. Cite the
134 YAML files it consumes and the Python library it depends on.
135 tags:
136 category: behavior-conformance
137 skill: powerpoint
138 shape: knowledge
139 advisory: "true"
140 graders:
141 - type: output-matches
142 name: skill-attribution
143 config:
144 pattern: '(?i)(python-pptx|content\.yaml|style\.yaml)'
145 - type: output-matches
146 name: scope-language
147 config:
148 pattern: '(?i)(powerpoint|slide|deck|yaml)'
149 - name: skill-powerpoint-tool-trigger
150 prompt: |
151 I need to build a slide deck programmatically from YAML inputs in a
152 Python environment. Which skill applies and what does it scaffold?
153 tags:
154 category: behavior-conformance
155 skill: powerpoint
156 shape: tool-trigger
157 advisory: "true"
158 graders:
159 - type: output-matches
160 name: skill-domain-attribution
161 config:
162 pattern: '(?i)(powerpoint|slide|deck|yaml|pptx)'
163 - type: output-matches
164 name: scope-language
165 config:
166 pattern: '(?i)(slide|deck|yaml|pptx)'
167 - name: skill-powerpoint-bleed-detection
168 prompt: |
169 I need to generate a Microsoft Word document from a structured template.
170 Does the `powerpoint` skill apply?
171 tags:
172 category: behavior-conformance
173 skill: powerpoint
174 shape: bleed-detection
175 advisory: "true"
176 graders:
177 - type: output-matches
178 name: skill-attribution
179 config:
180 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|\bword\b)'
181 - type: output-matches
182 name: scope-language
183 config:
184 pattern: '(?i)(word|docx|powerpoint|slide)'
185 - name: skill-tts-voiceover-knowledge
186 prompt: |
187 Describe the `tts-voiceover` skill's input format and the speech engine
188 it relies on.
189 tags:
190 category: behavior-conformance
191 skill: tts-voiceover
192 shape: knowledge
193 advisory: "true"
194 graders:
195 - type: output-matches
196 name: skill-attribution
197 config:
198 pattern: '(?i)(azure\s+speech|ssml|speaker_notes|tts|wav)'
199 - type: output-matches
200 name: scope-language
201 config:
202 pattern: '(?i)(voice|speech|tts|narration)'
203 - name: skill-tts-voiceover-tool-trigger
204 prompt: |
205 I have a `content.yaml` with `speaker_notes` per slide and want
206 narration WAV files generated from those notes. Which skill applies?
207 tags:
208 category: behavior-conformance
209 skill: tts-voiceover
210 shape: tool-trigger
211 advisory: "true"
212 graders:
213 - type: output-matches
214 name: skill-domain-attribution
215 config:
216 pattern: '(?i)(tts|voiceover|voice|speech|narration|wav)'
217 - type: output-matches
218 name: scope-language
219 config:
220 pattern: '(?i)(voice|speech|narration|wav)'
221 - name: skill-tts-voiceover-bleed-detection
222 prompt: |
223 I need to synthesize background music for a video (no spoken narration).
224 Does the `tts-voiceover` skill apply?
225 tags:
226 category: behavior-conformance
227 skill: tts-voiceover
228 shape: bleed-detection
229 advisory: "true"
230 graders:
231 - type: output-matches
232 name: skill-attribution
233 config:
234 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|music)'
235 - type: output-matches
236 name: scope-language
237 config:
238 pattern: '(?i)(music|narration|voice|speech)'
239 - name: skill-video-to-gif-knowledge
240 prompt: |
241 Summarize the `video-to-gif` skill's conversion approach and the tool
242 it relies on.
243 tags:
244 category: behavior-conformance
245 skill: video-to-gif
246 shape: knowledge
247 advisory: "true"
248 graders:
249 - type: output-matches
250 name: skill-attribution
251 config:
252 pattern: '(?i)(ffmpeg|two-pass|palette)'
253 - type: output-matches
254 name: scope-language
255 config:
256 pattern: '(?i)(gif|video|convert|optimi[sz]e)'
257 - name: skill-video-to-gif-tool-trigger
258 prompt: |
259 I have a recorded screencast `demo.mp4` and need an optimized animated
260 GIF embedded in documentation. Which skill applies?
261 tags:
262 category: behavior-conformance
263 skill: video-to-gif
264 shape: tool-trigger
265 advisory: "true"
266 graders:
267 - type: output-matches
268 name: skill-domain-attribution
269 config:
270 pattern: '(?i)(video|gif|screencast|convert)'
271 - type: output-matches
272 name: scope-language
273 config:
274 pattern: '(?i)(gif|video|screencast|convert)'
275 - name: skill-video-to-gif-bleed-detection
276 prompt: |
277 I need to convert a single PNG image into a short looping MP4 video.
278 Does the `video-to-gif` skill apply?
279 tags:
280 category: behavior-conformance
281 skill: video-to-gif
282 shape: bleed-detection
283 advisory: "true"
284 graders:
285 - type: output-matches
286 name: skill-attribution
287 config:
288 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|image[-\s]to[-\s]video)'
289 - type: output-matches
290 name: scope-language
291 config:
292 pattern: '(?i)(image|video|mp4|gif)'
293 - name: skill-vscode-playwright-knowledge
294 prompt: |
295 Describe what the `vscode-playwright` skill captures and the toolchain
296 it composes.
297 tags:
298 category: behavior-conformance
299 skill: vscode-playwright
300 shape: knowledge
301 advisory: "true"
302 graders:
303 - type: output-matches
304 name: skill-attribution
305 config:
306 pattern: '(?i)(playwright|serve-web|vs\s*code|screenshot)'
307 - type: output-matches
308 name: scope-language
309 config:
310 pattern: '(?i)(screenshot|capture|browser|automation)'
311 - name: skill-vscode-playwright-tool-trigger
312 prompt: |
313 I need reproducible VS Code screenshots of a slide deck rendered in the
314 editor for documentation. Which skill applies?
315 tags:
316 category: behavior-conformance
317 skill: vscode-playwright
318 shape: tool-trigger
319 advisory: "true"
320 graders:
321 - type: output-matches
322 name: skill-domain-attribution
323 config:
324 pattern: '(?i)(vscode|playwright|vs\s*code|screenshot|capture)'
325 - type: output-matches
326 name: scope-language
327 config:
328 pattern: '(?i)(vs\s*code|screenshot|capture|playwright)'
329 - name: skill-vscode-playwright-bleed-detection
330 prompt: |
331 I need to scrape a public news website for headlines (no VS Code in
332 scope). Does the `vscode-playwright` skill apply?
333 tags:
334 category: behavior-conformance
335 skill: vscode-playwright
336 shape: bleed-detection
337 advisory: "true"
338 graders:
339 - type: output-matches
340 name: skill-attribution
341 config:
342 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|generic\s+playwright|web\s+scrap)'
343 - type: output-matches
344 name: scope-language
345 config:
346 pattern: '(?i)(scrape|web|vs\s*code|news)'
347 - name: skill-gh-code-scanning-knowledge
348 prompt: |
349 What does the `gh-code-scanning` skill retrieve, which CLI does it wrap,
350 and which GitHub token scope is required?
351 tags:
352 category: behavior-conformance
353 skill: gh-code-scanning
354 shape: knowledge
355 advisory: "true"
356 graders:
357 - type: output-matches
358 name: skill-attribution
359 config:
360 pattern: '(?i)(gh\s+cli|code\s+scanning|security_events)'
361 - type: output-matches
362 name: scope-language
363 config:
364 pattern: '(?i)(alert|scanning|github|rule|severity)'
365 - name: skill-gh-code-scanning-tool-trigger
366 prompt: |
367 I need to fetch open GitHub code scanning alerts for the current repo
368 and group them by rule and severity. Which skill applies?
369 tags:
370 category: behavior-conformance
371 skill: gh-code-scanning
372 shape: tool-trigger
373 advisory: "true"
374 graders:
375 - type: output-matches
376 name: skill-domain-attribution
377 config:
378 pattern: '(?i)(gh|code|scanning|code\s+scanning|alert|rule|severity|github)'
379 - type: output-matches
380 name: scope-language
381 config:
382 pattern: '(?i)(code\s+scanning|alert|rule|severity|github)'
383 - name: skill-gh-code-scanning-bleed-detection
384 prompt: |
385 I need to list open Dependabot alerts for the current repo (not code
386 scanning alerts). Does the `gh-code-scanning` skill apply?
387 tags:
388 category: behavior-conformance
389 skill: gh-code-scanning
390 shape: bleed-detection
391 advisory: "true"
392 graders:
393 - type: output-matches
394 name: skill-attribution
395 config:
396 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|dependabot)'
397 - type: output-matches
398 name: scope-language
399 config:
400 pattern: '(?i)(dependabot|alert|code\s+scanning|github)'
401 - name: skill-gitlab-knowledge
402 prompt: |
403 What does the `gitlab` skill manage and which environment variables
404 does its CLI require?
405 tags:
406 category: behavior-conformance
407 skill: gitlab
408 shape: knowledge
409 advisory: "true"
410 graders:
411 - type: output-matches
412 name: skill-attribution
413 config:
414 pattern: '(?i)(gitlab|merge\s+request|pipeline|GITLAB_TOKEN|GITLAB_URL)'
415 - type: output-matches
416 name: scope-language
417 config:
418 pattern: '(?i)(gitlab|merge\s+request|pipeline|cli)'
419 - name: skill-gitlab-tool-trigger
420 prompt: |
421 I need to list and update merge requests in a GitLab project via a
422 Python CLI. Which skill applies?
423 tags:
424 category: behavior-conformance
425 skill: gitlab
426 shape: tool-trigger
427 advisory: "true"
428 graders:
429 - type: output-matches
430 name: skill-domain-attribution
431 config:
432 pattern: '(?i)(gitlab|merge\s+request|pipeline|cli)'
433 - type: output-matches
434 name: scope-language
435 config:
436 pattern: '(?i)(gitlab|merge\s+request|pipeline|cli)'
437 - name: skill-gitlab-bleed-detection
438 prompt: |
439 I need to list open pull requests on a GitHub repository. Does the
440 `gitlab` skill apply?
441 tags:
442 category: behavior-conformance
443 skill: gitlab
444 shape: bleed-detection
445 advisory: "true"
446 graders:
447 - type: output-matches
448 name: skill-attribution
449 config:
450 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|github)'
451 - type: output-matches
452 name: scope-language
453 config:
454 pattern: '(?i)(github|gitlab|pull\s+request|merge\s+request)'
455 - name: skill-hve-core-installer-knowledge
456 prompt: |
457 Describe the `hve-core-installer` skill's clone-method options and the
458 two personas it offers.
459 tags:
460 category: behavior-conformance
461 skill: hve-core-installer
462 shape: knowledge
463 advisory: "true"
464 graders:
465 - type: output-matches
466 name: skill-attribution
467 config:
468 pattern: '(?i)(installer|validator|clone|six)'
469 - type: output-matches
470 name: scope-language
471 config:
472 pattern: '(?i)(install|setup|hve[-\s]?core|persona)'
473 - name: skill-hve-core-installer-tool-trigger
474 prompt: |
475 A user wants to install hve-core into a fresh workspace and validate
476 the setup end-to-end. Which skill applies?
477 tags:
478 category: behavior-conformance
479 skill: hve-core-installer
480 shape: tool-trigger
481 advisory: "true"
482 graders:
483 - type: output-matches
484 name: skill-domain-attribution
485 config:
486 pattern: '(?i)(hve|core|installer|install|setup|hve[-\s]?core|validate)'
487 - type: output-matches
488 name: scope-language
489 config:
490 pattern: '(?i)(install|setup|hve[-\s]?core|validate)'
491 - name: skill-hve-core-installer-bleed-detection
492 prompt: |
493 A user wants to uninstall hve-core and remove all of its artifacts from
494 their workspace. Does the `hve-core-installer` skill apply?
495 tags:
496 category: behavior-conformance
497 skill: hve-core-installer
498 shape: bleed-detection
499 advisory: "true"
500 graders:
501 - type: output-matches
502 name: skill-attribution
503 config:
504 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|uninstall)'
505 - type: output-matches
506 name: scope-language
507 config:
508 pattern: '(?i)(uninstall|install|remove|hve[-\s]?core)'
509 - name: skill-jira-knowledge
510 prompt: |
511 What does the `jira` skill expose and which authentication environment
512 variables does it require?
513 tags:
514 category: behavior-conformance
515 skill: jira
516 shape: knowledge
517 advisory: "true"
518 graders:
519 - type: output-matches
520 name: skill-attribution
521 config:
522 pattern: '(?i)(jira|JIRA_BASE_URL|JIRA_PAT|jql)'
523 - type: output-matches
524 name: scope-language
525 config:
526 pattern: '(?i)(jira|issue|jql|rest)'
527 - name: skill-jira-tool-trigger
528 prompt: |
529 I need to search Jira issues by JQL, transition one to In Progress, and
530 post a comment. Which skill applies?
531 tags:
532 category: behavior-conformance
533 skill: jira
534 shape: tool-trigger
535 advisory: "true"
536 graders:
537 - type: output-matches
538 name: skill-domain-attribution
539 config:
540 pattern: '(?i)(jira|issue|jql|transition)'
541 - type: output-matches
542 name: scope-language
543 config:
544 pattern: '(?i)(jira|issue|jql|transition)'
545 - name: skill-jira-bleed-detection
546 prompt: |
547 I need to query Azure DevOps work items by WIQL. Does the `jira` skill
548 apply?
549 tags:
550 category: behavior-conformance
551 skill: jira
552 shape: bleed-detection
553 advisory: "true"
554 graders:
555 - type: output-matches
556 name: skill-attribution
557 config:
558 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|azure\s+devops|\bado\b)'
559 - type: output-matches
560 name: scope-language
561 config:
562 pattern: '(?i)(ado|azure\s+devops|jira|work\s+item)'
563 - name: skill-owasp-agentic-knowledge
564 prompt: |
565 What body of knowledge does the `owasp-agentic` skill encode and how
566 many top risks does it enumerate?
567 tags:
568 category: behavior-conformance
569 skill: owasp-agentic
570 shape: knowledge
571 advisory: "true"
572 graders:
573 - type: output-matches
574 name: skill-attribution
575 config:
576 pattern: '(?i)(owasp\s+agentic|agentic\s+top|ai\s+agent)'
577 - type: output-matches
578 name: scope-language
579 config:
580 pattern: '(?i)(agent|risk|vulnerability|owasp)'
581 - name: skill-owasp-agentic-tool-trigger
582 prompt: |
583 I am reviewing the security posture of a multi-agent autonomous AI
584 system. Which OWASP skill under `.github/skills/security/**` applies?
585 tags:
586 category: behavior-conformance
587 skill: owasp-agentic
588 shape: tool-trigger
589 advisory: "true"
590 graders:
591 - type: output-matches
592 name: skill-domain-attribution
593 config:
594 pattern: '(?i)(owasp|agentic|agent|risk|review)'
595 - type: output-matches
596 name: scope-language
597 config:
598 pattern: '(?i)(agent|owasp|risk|review)'
599 - name: skill-owasp-agentic-bleed-detection
600 prompt: |
601 I am reviewing a traditional web form for SQL injection and XSS risks
602 (no AI agent involved). Does the `owasp-agentic` skill apply?
603 tags:
604 category: behavior-conformance
605 skill: owasp-agentic
606 shape: bleed-detection
607 advisory: "true"
608 graders:
609 - type: output-matches
610 name: skill-attribution
611 config:
612 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|owasp\s+top\s+10|web)'
613 - type: output-matches
614 name: scope-language
615 config:
616 pattern: '(?i)(web|owasp|injection|xss|agentic)'
617 - name: skill-owasp-cicd-knowledge
618 prompt: |
619 What body of knowledge does the `owasp-cicd` skill encode and what
620 kinds of risks does it cover?
621 tags:
622 category: behavior-conformance
623 skill: owasp-cicd
624 shape: knowledge
625 advisory: "true"
626 graders:
627 - type: output-matches
628 name: skill-attribution
629 config:
630 pattern: '(?i)(owasp\s+ci\/?cd|ci\/?cd\s+top|pipeline)'
631 - type: output-matches
632 name: scope-language
633 config:
634 pattern: '(?i)(ci\/?cd|pipeline|owasp|risk)'
635 - name: skill-owasp-cicd-tool-trigger
636 prompt: |
637 I am hardening a GitHub Actions pipeline against poisoned dependency
638 chain and IAM misconfiguration. Which OWASP skill applies?
639 tags:
640 category: behavior-conformance
641 skill: owasp-cicd
642 shape: tool-trigger
643 advisory: "true"
644 graders:
645 - type: output-matches
646 name: skill-domain-attribution
647 config:
648 pattern: '(?i)(owasp|cicd|ci\/?cd|pipeline|github\s+actions)'
649 - type: output-matches
650 name: scope-language
651 config:
652 pattern: '(?i)(ci\/?cd|pipeline|owasp|github\s+actions)'
653 - name: skill-owasp-cicd-bleed-detection
654 prompt: |
655 I am hardening a running web API against prompt injection from
656 end-user input. Does the `owasp-cicd` skill apply?
657 tags:
658 category: behavior-conformance
659 skill: owasp-cicd
660 shape: bleed-detection
661 advisory: "true"
662 graders:
663 - type: output-matches
664 name: skill-attribution
665 config:
666 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|owasp\s+llm|owasp\s+top\s+10)'
667 - type: output-matches
668 name: scope-language
669 config:
670 pattern: '(?i)(ci\/?cd|runtime|prompt|web|owasp)'
671 - name: skill-owasp-docker-knowledge
672 prompt: |
673 What body of knowledge does the `owasp-docker` skill encode and how
674 many top risks does it enumerate?
675 tags:
676 category: behavior-conformance
677 skill: owasp-docker
678 shape: knowledge
679 advisory: "true"
680 graders:
681 - type: output-matches
682 name: skill-attribution
683 config:
684 pattern: '(?i)(owasp\s+docker|docker\s+top|container)'
685 - type: output-matches
686 name: scope-language
687 config:
688 pattern: '(?i)(docker|container|owasp|risk)'
689 - name: skill-owasp-docker-tool-trigger
690 prompt: |
691 I am reviewing the security configuration of a production Dockerfile
692 and the resulting container image. Which OWASP skill applies?
693 tags:
694 category: behavior-conformance
695 skill: owasp-docker
696 shape: tool-trigger
697 advisory: "true"
698 graders:
699 - type: output-matches
700 name: skill-domain-attribution
701 config:
702 pattern: '(?i)(owasp|docker|container|image)'
703 - type: output-matches
704 name: scope-language
705 config:
706 pattern: '(?i)(docker|container|image|owasp)'
707 - name: skill-owasp-docker-bleed-detection
708 prompt: |
709 I am reviewing a Kubernetes cluster's network policies and RBAC
710 configuration. Does the `owasp-docker` skill apply?
711 tags:
712 category: behavior-conformance
713 skill: owasp-docker
714 shape: bleed-detection
715 advisory: "true"
716 graders:
717 - type: output-matches
718 name: skill-attribution
719 config:
720 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|kubernetes)'
721 - type: output-matches
722 name: scope-language
723 config:
724 pattern: '(?i)(kubernetes|docker|container|owasp)'
725 - name: skill-owasp-infrastructure-knowledge
726 prompt: |
727 What body of knowledge does the `owasp-infrastructure` skill encode
728 and what kinds of risks does it cover?
729 tags:
730 category: behavior-conformance
731 skill: owasp-infrastructure
732 shape: knowledge
733 advisory: "true"
734 graders:
735 - type: output-matches
736 name: skill-attribution
737 config:
738 pattern: '(?i)(owasp\s+infrastructure|infrastructure\s+top|outdated\s+software)'
739 - type: output-matches
740 name: scope-language
741 config:
742 pattern: '(?i)(infrastructure|owasp|risk|internal)'
743 - name: skill-owasp-infrastructure-tool-trigger
744 prompt: |
745 I am reviewing an on-prem IT infrastructure for outdated software and
746 weak threat detection. Which OWASP skill applies?
747 tags:
748 category: behavior-conformance
749 skill: owasp-infrastructure
750 shape: tool-trigger
751 advisory: "true"
752 graders:
753 - type: output-matches
754 name: skill-domain-attribution
755 config:
756 pattern: '(?i)(owasp|infrastructure|on[-\s]?prem|review)'
757 - type: output-matches
758 name: scope-language
759 config:
760 pattern: '(?i)(infrastructure|owasp|on[-\s]?prem|review)'
761 - name: skill-owasp-infrastructure-bleed-detection
762 prompt: |
763 I am reviewing a large language model deployment for prompt injection.
764 Does the `owasp-infrastructure` skill apply?
765 tags:
766 category: behavior-conformance
767 skill: owasp-infrastructure
768 shape: bleed-detection
769 advisory: "true"
770 graders:
771 - type: output-matches
772 name: skill-attribution
773 config:
774 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|owasp\s+llm)'
775 - type: output-matches
776 name: scope-language
777 config:
778 pattern: '(?i)(llm|infrastructure|prompt|owasp)'
779 - name: skill-owasp-llm-knowledge
780 prompt: |
781 What body of knowledge does the `owasp-llm` skill encode and what is
782 risk #1 in its 2025 list?
783 tags:
784 category: behavior-conformance
785 skill: owasp-llm
786 shape: knowledge
787 advisory: "true"
788 graders:
789 - type: output-matches
790 name: skill-attribution
791 config:
792 pattern: '(?i)(owasp\s+llm|llm\s+top|prompt\s+injection)'
793 - type: output-matches
794 name: scope-language
795 config:
796 pattern: '(?i)(llm|prompt|injection|owasp)'
797 - name: skill-owasp-llm-tool-trigger
798 prompt: |
799 I am reviewing an LLM-backed chatbot for prompt injection and sensitive
800 information disclosure risks. Which OWASP skill applies?
801 tags:
802 category: behavior-conformance
803 skill: owasp-llm
804 shape: tool-trigger
805 advisory: "true"
806 graders:
807 - type: output-matches
808 name: skill-domain-attribution
809 config:
810 pattern: '(?i)(owasp|llm|prompt|chatbot)'
811 - type: output-matches
812 name: scope-language
813 config:
814 pattern: '(?i)(llm|prompt|chatbot|owasp)'
815 - name: skill-owasp-llm-bleed-detection
816 prompt: |
817 I am reviewing a base container image for outdated packages. Does the
818 `owasp-llm` skill apply?
819 tags:
820 category: behavior-conformance
821 skill: owasp-llm
822 shape: bleed-detection
823 advisory: "true"
824 graders:
825 - type: output-matches
826 name: skill-attribution
827 config:
828 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|owasp\s+docker|owasp\s+infrastructure)'
829 - type: output-matches
830 name: scope-language
831 config:
832 pattern: '(?i)(container|llm|docker|owasp)'
833 - name: skill-owasp-mcp-knowledge
834 prompt: |
835 What body of knowledge does the `owasp-mcp` skill encode and name one
836 of its top risks.
837 tags:
838 category: behavior-conformance
839 skill: owasp-mcp
840 shape: knowledge
841 advisory: "true"
842 graders:
843 - type: output-matches
844 name: skill-attribution
845 config:
846 pattern: '(?i)(owasp\s+mcp|mcp\s+top|tool\s+poison|token\s+mismanagement)'
847 - type: output-matches
848 name: scope-language
849 config:
850 pattern: '(?i)(mcp|model\s+context|owasp|risk)'
851 - name: skill-owasp-mcp-tool-trigger
852 prompt: |
853 I am reviewing a Model Context Protocol server for tool poisoning and
854 token mismanagement risks. Which OWASP skill applies?
855 tags:
856 category: behavior-conformance
857 skill: owasp-mcp
858 shape: tool-trigger
859 advisory: "true"
860 graders:
861 - type: output-matches
862 name: skill-domain-attribution
863 config:
864 pattern: '(?i)(owasp|mcp|model\s+context|tool)'
865 - type: output-matches
866 name: scope-language
867 config:
868 pattern: '(?i)(mcp|model\s+context|tool|owasp)'
869 - name: skill-owasp-mcp-bleed-detection
870 prompt: |
871 I am reviewing an OAuth-only REST API for token handling weaknesses
872 (no MCP involved). Does the `owasp-mcp` skill apply?
873 tags:
874 category: behavior-conformance
875 skill: owasp-mcp
876 shape: bleed-detection
877 advisory: "true"
878 graders:
879 - type: output-matches
880 name: skill-attribution
881 config:
882 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|oauth|owasp\s+top\s+10)'
883 - type: output-matches
884 name: scope-language
885 config:
886 pattern: '(?i)(oauth|mcp|api|owasp)'
887 - name: skill-owasp-top-10-knowledge
888 prompt: |
889 What body of knowledge does the `owasp-top-10` skill encode and what
890 is risk #1 in its 2025 list for web applications?
891 tags:
892 category: behavior-conformance
893 skill: owasp-top-10
894 shape: knowledge
895 advisory: "true"
896 graders:
897 - type: output-matches
898 name: skill-attribution
899 config:
900 pattern: '(?i)(owasp\s+top\s+10|broken\s+access\s+control|web\s+application)'
901 - type: output-matches
902 name: scope-language
903 config:
904 pattern: '(?i)(web|application|owasp|risk)'
905 - name: skill-owasp-top-10-tool-trigger
906 prompt: |
907 I am reviewing a public web application for broken access control and
908 injection risks. Which OWASP skill applies?
909 tags:
910 category: behavior-conformance
911 skill: owasp-top-10
912 shape: tool-trigger
913 advisory: "true"
914 graders:
915 - type: output-matches
916 name: skill-domain-attribution
917 config:
918 pattern: '(?i)(owasp\s+top\s+10|owasp|web|application|access\s+control)'
919 - type: output-matches
920 name: scope-language
921 config:
922 pattern: '(?i)(web|application|owasp|access\s+control)'
923 - name: skill-owasp-top-10-bleed-detection
924 prompt: |
925 I am reviewing the autonomous decision boundary of a multi-agent AI
926 system. Does the `owasp-top-10` skill apply?
927 tags:
928 category: behavior-conformance
929 skill: owasp-top-10
930 shape: bleed-detection
931 advisory: "true"
932 graders:
933 - type: output-matches
934 name: skill-attribution
935 config:
936 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|owasp\s+agentic|agent)'
937 - type: output-matches
938 name: scope-language
939 config:
940 pattern: '(?i)(agent|owasp|web|autonomous)'
941 - name: skill-secure-by-design-knowledge
942 prompt: |
943 What frameworks does the `secure-by-design` skill draw from and what
944 lens does it apply when assessing a system?
945 tags:
946 category: behavior-conformance
947 skill: secure-by-design
948 shape: knowledge
949 advisory: "true"
950 graders:
951 - type: output-matches
952 name: skill-attribution
953 config:
954 pattern: '(?i)(secure[-\s]by[-\s]design|uk\s+10|asd\s+6|principle|foundation)'
955 - type: output-matches
956 name: scope-language
957 config:
958 pattern: '(?i)(secure|design|principle|lifecycle)'
959 - name: skill-secure-by-design-tool-trigger
960 prompt: |
961 I am assessing a new product's lifecycle practices against
962 secure-by-design principles. Which skill applies?
963 tags:
964 category: behavior-conformance
965 skill: secure-by-design
966 shape: tool-trigger
967 advisory: "true"
968 graders:
969 - type: output-matches
970 name: skill-domain-attribution
971 config:
972 pattern: '(?i)(secure[-\s]?by[-\s]?design|secure|design|principle|lifecycle|assessment)'
973 - type: output-matches
974 name: scope-language
975 config:
976 pattern: '(?i)(secure|design|principle|lifecycle|assessment)'
977 - name: skill-secure-by-design-bleed-detection
978 prompt: |
979 I am writing runtime intrusion detection rules for a production host.
980 Does the `secure-by-design` skill apply?
981 tags:
982 category: behavior-conformance
983 skill: secure-by-design
984 shape: bleed-detection
985 advisory: "true"
986 graders:
987 - type: output-matches
988 name: skill-attribution
989 config:
990 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|runtime|detection)'
991 - type: output-matches
992 name: scope-language
993 config:
994 pattern: '(?i)(runtime|detection|secure|design)'
995 - name: skill-security-reviewer-formats-knowledge
996 prompt: |
997 What output contracts does the `security-reviewer-formats` skill
998 define for the security reviewer orchestrator and its subagents?
999 tags:
1000 category: behavior-conformance
1001 skill: security-reviewer-formats
1002 shape: knowledge
1003 advisory: "true"
1004 graders:
1005 - type: output-matches
1006 name: skill-attribution
1007 config:
1008 pattern: '(?i)(VULN_REPORT_V1|PLAN_REPORT_V1|reviewer|orchestrator|severity)'
1009 - type: output-matches
1010 name: scope-language
1011 config:
1012 pattern: '(?i)(format|contract|reviewer|report)'
1013 - name: skill-security-reviewer-formats-tool-trigger
1014 prompt: |
1015 I am implementing a new security reviewer subagent and need the
1016 canonical output format and severity vocabulary. Which skill applies?
1017 tags:
1018 category: behavior-conformance
1019 skill: security-reviewer-formats
1020 shape: tool-trigger
1021 advisory: "true"
1022 graders:
1023 - type: output-matches
1024 name: skill-domain-attribution
1025 config:
1026 pattern: '(?i)(security|reviewer|formats|format|subagent|severity)'
1027 - type: output-matches
1028 name: scope-language
1029 config:
1030 pattern: '(?i)(reviewer|subagent|format|severity)'
1031 - name: skill-security-reviewer-formats-bleed-detection
1032 prompt: |
1033 I want to tighten the markdown linting rules across the repo. Does the
1034 `security-reviewer-formats` skill apply?
1035 tags:
1036 category: behavior-conformance
1037 skill: security-reviewer-formats
1038 shape: bleed-detection
1039 advisory: "true"
1040 graders:
1041 - type: output-matches
1042 name: skill-attribution
1043 config:
1044 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|linting|markdown)'
1045 - type: output-matches
1046 name: scope-language
1047 config:
1048 pattern: '(?i)(lint|markdown|reviewer|format)'
1049 - name: skill-pr-reference-knowledge
1050 prompt: |
1051 What does the `pr-reference` skill generate and which two scripting
1052 languages does it provide for that generation?
1053 tags:
1054 category: behavior-conformance
1055 skill: pr-reference
1056 shape: knowledge
1057 advisory: "true"
1058 graders:
1059 - type: output-matches
1060 name: skill-attribution
1061 config:
1062 pattern: '(?i)(pr[-\s]reference|git\s+diff|commit\s+history|xml)'
1063 - type: output-matches
1064 name: scope-language
1065 config:
1066 pattern: '(?i)(pull\s+request|\bpr\b|diff|reference)'
1067 - name: skill-pr-reference-tool-trigger
1068 prompt: |
1069 I am preparing a pull request description and need a structured XML
1070 reference of commits and unified diffs between two branches. Which
1071 skill applies?
1072 tags:
1073 category: behavior-conformance
1074 skill: pr-reference
1075 shape: tool-trigger
1076 advisory: "true"
1077 graders:
1078 - type: output-matches
1079 name: skill-domain-attribution
1080 config:
1081 pattern: '(?i)(pr|reference|pull\s+request|\bpr\b|diff|branch|xml)'
1082 - type: output-matches
1083 name: scope-language
1084 config:
1085 pattern: '(?i)(pull\s+request|\bpr\b|diff|branch|xml)'
1086 - name: skill-pr-reference-bleed-detection
1087 prompt: |
1088 I need to query GitHub issues for triage (no diff or commit analysis
1089 involved). Does the `pr-reference` skill apply?
1090 tags:
1091 category: behavior-conformance
1092 skill: pr-reference
1093 shape: bleed-detection
1094 advisory: "true"
1095 graders:
1096 - type: output-matches
1097 name: skill-attribution
1098 config:
1099 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|issue|triage)'
1100 - type: output-matches
1101 name: scope-language
1102 config:
1103 pattern: '(?i)(issue|triage|\bpr\b|github)'
1104 - name: skill-dt-coaching-foundation-knowledge
1105 prompt: |
1106 Summarize the foundational knowledge encoded by the
1107 `dt-coaching-foundation` skill. Name at least three of its core areas.
1108 tags:
1109 category: behavior-conformance
1110 skill: dt-coaching-foundation
1111 shape: knowledge
1112 advisory: "true"
1113 graders:
1114 - type: output-matches
1115 name: skill-attribution
1116 config:
1117 pattern: '(?i)(coach\s+identity|fidelity|method\s+sequencing|coaching\s+state|deck\s+workflow)'
1118 - type: output-matches
1119 name: scope-language
1120 config:
1121 pattern: '(?i)(design\s+thinking|coaching|facilitat|foundation)'
1122 - name: skill-dt-coaching-foundation-tool-trigger
1123 prompt: |
1124 I am persisting Design Thinking coaching state and need the canonical
1125 coaching-state schema and method-sequencing rules. Which skill under
1126 `.github/skills/**/SKILL.md` applies?
1127 tags:
1128 category: behavior-conformance
1129 skill: dt-coaching-foundation
1130 shape: tool-trigger
1131 advisory: "true"
1132 graders:
1133 - type: output-matches
1134 name: skill-domain-attribution
1135 config:
1136 pattern: '(?i)(dt|coaching|foundation|coaching\s+state|sequencing|design\s+thinking|skill)'
1137 - type: output-matches
1138 name: scope-language
1139 config:
1140 pattern: '(?i)(coaching\s+state|sequencing|design\s+thinking|skill)'
1141 - name: skill-dt-coaching-foundation-bleed-detection
1142 prompt: |
1143 I am configuring OpenTelemetry trace spans for a backend service.
1144 Does the `dt-coaching-foundation` skill apply here? Justify briefly.
1145 tags:
1146 category: behavior-conformance
1147 skill: dt-coaching-foundation
1148 shape: bleed-detection
1149 advisory: "true"
1150 graders:
1151 - type: output-matches
1152 name: skill-attribution
1153 config:
1154 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|telemetry)'
1155 - type: output-matches
1156 name: scope-language
1157 config:
1158 pattern: '(?i)(telemetry|coaching|design\s+thinking|scope)'
1159 - name: skill-dt-methods-knowledge
1160 prompt: |
1161 What body of knowledge does the `dt-methods` skill encode across the
1162 Design Thinking methods? Mention its per-method and industry coverage.
1163 tags:
1164 category: behavior-conformance
1165 skill: dt-methods
1166 shape: knowledge
1167 advisory: "true"
1168 graders:
1169 - type: output-matches
1170 name: skill-attribution
1171 config:
1172 pattern: '(?i)(nine\s+methods|per-method|deep\s+expertise|industry\s+context)'
1173 - type: output-matches
1174 name: scope-language
1175 config:
1176 pattern: '(?i)(design\s+thinking|method|technique|industry)'
1177 - name: skill-dt-methods-tool-trigger
1178 prompt: |
1179 I am coaching a Method 3 synthesis session and need method-specific
1180 techniques plus healthcare industry context. Which skill under
1181 `.github/skills/**/SKILL.md` applies?
1182 tags:
1183 category: behavior-conformance
1184 skill: dt-methods
1185 shape: tool-trigger
1186 advisory: "true"
1187 graders:
1188 - type: output-matches
1189 name: skill-domain-attribution
1190 config:
1191 pattern: '(?i)(dt|methods|method|technique|industry|design\s+thinking)'
1192 - type: output-matches
1193 name: scope-language
1194 config:
1195 pattern: '(?i)(method|technique|industry|design\s+thinking)'
1196 - name: skill-dt-methods-bleed-detection
1197 prompt: |
1198 I am writing Pester tests for a PowerShell linting module. Does the
1199 `dt-methods` skill apply here? Justify briefly.
1200 tags:
1201 category: behavior-conformance
1202 skill: dt-methods
1203 shape: bleed-detection
1204 advisory: "true"
1205 graders:
1206 - type: output-matches
1207 name: skill-attribution
1208 config:
1209 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|pester)'
1210 - type: output-matches
1211 name: scope-language
1212 config:
1213 pattern: '(?i)(pester|powershell|design\s+thinking|scope)'
1214 - name: skill-dt-rpi-integration-knowledge
1215 prompt: |
1216 Summarize what the `dt-rpi-integration` skill covers for handing off
1217 Design Thinking outputs into the RPI workflow.
1218 tags:
1219 category: behavior-conformance
1220 skill: dt-rpi-integration
1221 shape: knowledge
1222 advisory: "true"
1223 graders:
1224 - type: output-matches
1225 name: skill-attribution
1226 config:
1227 pattern: '(?i)(handoff\s+contract|subagent\s+handoff|method\s+5|image\s+prompt|rpi)'
1228 - type: output-matches
1229 name: scope-language
1230 config:
1231 pattern: '(?i)(design\s+thinking|rpi|handoff|research)'
1232 - name: skill-dt-rpi-integration-tool-trigger
1233 prompt: |
1234 I have completed a Design Thinking session and need to hand its outputs
1235 into the RPI research and planning phases via a subagent. Which skill
1236 under `.github/skills/**/SKILL.md` applies?
1237 tags:
1238 category: behavior-conformance
1239 skill: dt-rpi-integration
1240 shape: tool-trigger
1241 advisory: "true"
1242 graders:
1243 - type: output-matches
1244 name: skill-domain-attribution
1245 config:
1246 pattern: '(?i)(dt|rpi|integration|handoff|design\s+thinking|skill)'
1247 - type: output-matches
1248 name: scope-language
1249 config:
1250 pattern: '(?i)(handoff|rpi|design\s+thinking|skill)'
1251 - name: skill-dt-rpi-integration-bleed-detection
1252 prompt: |
1253 I am drafting an Azure DevOps work item description for a sprint. Does
1254 the `dt-rpi-integration` skill apply here? Justify briefly.
1255 tags:
1256 category: behavior-conformance
1257 skill: dt-rpi-integration
1258 shape: bleed-detection
1259 advisory: "true"
1260 graders:
1261 - type: output-matches
1262 name: skill-attribution
1263 config:
1264 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|work\s+item)'
1265 - type: output-matches
1266 name: scope-language
1267 config:
1268 pattern: '(?i)(work\s+item|azure\s+devops|design\s+thinking|scope)'
1269 - name: skill-dt-curriculum-knowledge
1270 prompt: |
1271 What does the `dt-curriculum` skill provide for teaching Design
1272 Thinking? Describe its module structure and reference scenario.
1273 tags:
1274 category: behavior-conformance
1275 skill: dt-curriculum
1276 shape: knowledge
1277 advisory: "true"
1278 graders:
1279 - type: output-matches
1280 name: skill-attribution
1281 config:
1282 pattern: '(?i)(nine\s+(progressive\s+)?modules|problem\s+space|solution\s+space|manufacturing\s+reference)'
1283 - type: output-matches
1284 name: scope-language
1285 config:
1286 pattern: '(?i)(design\s+thinking|curriculum|module|learn)'
1287 - name: skill-dt-curriculum-tool-trigger
1288 prompt: |
1289 I am building a progressive Design Thinking training course with a
1290 shared manufacturing scenario across modules. Which skill under
1291 `.github/skills/**/SKILL.md` applies?
1292 tags:
1293 category: behavior-conformance
1294 skill: dt-curriculum
1295 shape: tool-trigger
1296 advisory: "true"
1297 graders:
1298 - type: output-matches
1299 name: skill-domain-attribution
1300 config:
1301 pattern: '(?i)(dt|curriculum|module|design\s+thinking|skill)'
1302 - type: output-matches
1303 name: scope-language
1304 config:
1305 pattern: '(?i)(curriculum|module|design\s+thinking|skill)'
1306 - name: skill-dt-curriculum-bleed-detection
1307 prompt: |
1308 I am writing a Terraform module for an Azure storage account. Does the
1309 `dt-curriculum` skill apply here? Justify briefly.
1310 tags:
1311 category: behavior-conformance
1312 skill: dt-curriculum
1313 shape: bleed-detection
1314 advisory: "true"
1315 graders:
1316 - type: output-matches
1317 name: skill-attribution
1318 config:
1319 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|different\s+skill|terraform)'
1320 - type: output-matches
1321 name: scope-language
1322 config:
1323 pattern: '(?i)(terraform|infrastructure|design\s+thinking|scope)'
1324 - name: skill-vally-tests-knowledge
1325 prompt: |
1326 What artifact kinds does the `vally-tests` skill author conformance
1327 tests for, and what bounds its test authoring?
1328 tags:
1329 category: behavior-conformance
1330 skill: vally-tests
1331 shape: knowledge
1332 advisory: "true"
1333 graders:
1334 - type: output-matches
1335 name: skill-attribution
1336 config:
1337 pattern: '(?i)(vally|conformance|stimulus|grader)'
1338 - type: output-matches
1339 name: scope-language
1340 config:
1341 pattern: '(?i)(prompt|instruction|agent|skill|refusal|safety)'
1342 - name: skill-vally-tests-tool-trigger
1343 prompt: |
1344 I want to author conformance test stimuli for an existing
1345 `.prompt.md` artifact and route the results through Vally graders.
1346 Which skill under `.github/skills/**/SKILL.md` applies?
1347 tags:
1348 category: behavior-conformance
1349 skill: vally-tests
1350 shape: tool-trigger
1351 advisory: "true"
1352 graders:
1353 - type: output-matches
1354 name: skill-domain-attribution
1355 config:
1356 pattern: '(?i)(vally|tests|conformance|stimulus|grader|skill)'
1357 - type: output-matches
1358 name: scope-language
1359 config:
1360 pattern: '(?i)(conformance|stimulus|grader|vally|skill)'
1361 - name: skill-vally-tests-bleed-detection
1362 prompt: |
1363 I need to generate jailbreak and prompt-injection probes to red-team a
1364 model. Does the `vally-tests` skill apply here? Justify briefly.
1365 tags:
1366 category: behavior-conformance
1367 skill: vally-tests
1368 shape: bleed-detection
1369 advisory: "true"
1370 graders:
1371 - type: output-matches
1372 name: skill-attribution
1373 config:
1374 pattern: '(?i)(not\s+apply|does\s+not|inapplicable|refusal|out\s+of\s+scope)'
1375 - type: output-matches
1376 name: scope-language
1377 config:
1378 pattern: '(?i)(jailbreak|prompt-injection|adversarial|red-team|scope)'
1379