microsoft/hve-core
Publicmirrored from https://github.com/microsoft/hve-coreAvailable
Commit
feat(scripts): add evals orchestration modules and runners (#1831)
# Pull Request ## Description Added the orchestration layer that drives Vally evaluations under *scripts/evals/*. This includes the top-level runners (*Invoke-VallyEvals.ps1*, *Invoke-AgentMatrix.ps1*, *Invoke-BaselineEquivalence.ps1*, *Invoke-ContentModeration.ps1*, *Invoke-CorpusModeration.ps1*), the spec and inventory builders (*Build-AgentBehaviorSpec.ps1*, *Build-AgentInventory.ps1*, *Get-AgentDependencyMap.ps1*, *Get-ChangedAIArtifact.ps1*), validation and dashboard tooling (*Test-EvalSpec.ps1*, *Test-StimulusPresence.ps1*, *New-AgentMatrixDashboard.ps1*, *New-EquivalenceDashboard.ps1*), and shared modules under *Modules/* and *lib/*. A Python content-moderation runner (*moderation/moderate.py*) with its own *pyproject.toml*, *uv.lock*, and tests rounds out the layer. ## Related Issue(s) Closes #1816 ## Type of Change Select all that apply: **Code & Documentation:** * [ ] Bug fix (non-breaking change fixing an issue) * [x] New feature (non-breaking change adding functionality) * [ ] Breaking change (fix or feature causing existing functionality to change) * [ ] Documentation update **Infrastructure & Configuration:** * [ ] GitHub Actions workflow * [ ] Linting configuration (markdown, PowerShell, etc.) * [ ] Security configuration * [ ] DevContainer configuration * [ ] Dependency update **AI Artifacts:** * [ ] Reviewed contribution with `prompt-builder` agent and addressed all feedback * [ ] Copilot instructions (`.github/instructions/*.instructions.md`) * [ ] Copilot prompt (`.github/prompts/*.prompt.md`) * [ ] Copilot agent (`.github/agents/*.agent.md`) * [ ] Copilot skill (`.github/skills/*/SKILL.md`) **Other:** * [x] Script/automation (`.ps1`, `.sh`, `.py`) * [ ] Other (please describe): ## Testing Validated via `npm run lint:all` (exit 0), including `npm run lint:ps` (PSScriptAnalyzer) and `npm run lint:py`. The Pester suite covering these modules lands in the next stacked PR (#1817). ## Checklist ### Required Checks * [x] Documentation is updated (if applicable) * [x] Files follow existing naming conventions * [x] Changes are backwards compatible (if applicable) * [ ] Tests added for new functionality (if applicable) ### AI Artifact Contributions * [ ] Used `/prompt-analyze` to review contribution * [ ] Addressed all feedback from `prompt-builder` review * [ ] Verified contribution follows common standards and type-specific requirements ### Required Automated Checks * [x] Markdown linting: `npm run lint:md` * [ ] Spell checking: `npm run spell-check` * [x] Frontmatter validation: `npm run lint:frontmatter` * [ ] Skill structure validation: `npm run validate:skills` * [ ] Link validation: `npm run lint:md-links` * [x] PowerShell analysis: `npm run lint:ps` * [ ] Plugin freshness: `npm run plugin:generate` * [ ] Docusaurus tests: `npm run docs:test` ## Security Considerations * [x] This PR does not contain any sensitive or NDA information * [x] Any new dependencies have been reviewed for security issues * [x] Security-related scripts follow the principle of least privilege ## Additional Notes Fourth PR in the #1637 stack. **Base branch: `feat/1637-l2-skill`.** Tests for this orchestration layer are intentionally separated into the following PR to keep each diff reviewable.
Changed files
10 of 10 files listed · truncated