microsoft/hve-core

Public

mirrored from https://github.com/microsoft/hve-coreAvailable

Watch0 Fork0 Star0

Code Commits Issues Pull requests Actions Insights Security

main

Find a branch or tag

Branches

main

Clone

HTTPS

Download ZIP

Commit

feat(scripts): add evals orchestration modules and runners (#1831)

# Pull Request

## Description

Added the orchestration layer that drives Vally evaluations under
*scripts/evals/*. This includes the top-level runners
(*Invoke-VallyEvals.ps1*, *Invoke-AgentMatrix.ps1*,
*Invoke-BaselineEquivalence.ps1*, *Invoke-ContentModeration.ps1*,
*Invoke-CorpusModeration.ps1*), the spec and inventory builders
(*Build-AgentBehaviorSpec.ps1*, *Build-AgentInventory.ps1*,
*Get-AgentDependencyMap.ps1*, *Get-ChangedAIArtifact.ps1*), validation
and dashboard tooling (*Test-EvalSpec.ps1*, *Test-StimulusPresence.ps1*,
*New-AgentMatrixDashboard.ps1*, *New-EquivalenceDashboard.ps1*), and
shared modules under *Modules/* and *lib/*. A Python content-moderation
runner (*moderation/moderate.py*) with its own *pyproject.toml*,
*uv.lock*, and tests rounds out the layer.

## Related Issue(s)

Closes #1816

## Type of Change

Select all that apply:

**Code & Documentation:**

* [ ] Bug fix (non-breaking change fixing an issue)
* [x] New feature (non-breaking change adding functionality)
* [ ] Breaking change (fix or feature causing existing functionality to
change)
* [ ] Documentation update

**Infrastructure & Configuration:**

* [ ] GitHub Actions workflow
* [ ] Linting configuration (markdown, PowerShell, etc.)
* [ ] Security configuration
* [ ] DevContainer configuration
* [ ] Dependency update

**AI Artifacts:**

* [ ] Reviewed contribution with `prompt-builder` agent and addressed
all feedback
* [ ] Copilot instructions (`.github/instructions/*.instructions.md`)
* [ ] Copilot prompt (`.github/prompts/*.prompt.md`)
* [ ] Copilot agent (`.github/agents/*.agent.md`)
* [ ] Copilot skill (`.github/skills/*/SKILL.md`)

**Other:**

* [x] Script/automation (`.ps1`, `.sh`, `.py`)
* [ ] Other (please describe):

## Testing

Validated via `npm run lint:all` (exit 0), including `npm run lint:ps`
(PSScriptAnalyzer) and `npm run lint:py`. The Pester suite covering
these modules lands in the next stacked PR (#1817).

## Checklist

### Required Checks

* [x] Documentation is updated (if applicable)
* [x] Files follow existing naming conventions
* [x] Changes are backwards compatible (if applicable)
* [ ] Tests added for new functionality (if applicable)

### AI Artifact Contributions

* [ ] Used `/prompt-analyze` to review contribution
* [ ] Addressed all feedback from `prompt-builder` review
* [ ] Verified contribution follows common standards and type-specific
requirements

### Required Automated Checks

* [x] Markdown linting: `npm run lint:md`
* [ ] Spell checking: `npm run spell-check`
* [x] Frontmatter validation: `npm run lint:frontmatter`
* [ ] Skill structure validation: `npm run validate:skills`
* [ ] Link validation: `npm run lint:md-links`
* [x] PowerShell analysis: `npm run lint:ps`
* [ ] Plugin freshness: `npm run plugin:generate`
* [ ] Docusaurus tests: `npm run docs:test`

## Security Considerations

* [x] This PR does not contain any sensitive or NDA information
* [x] Any new dependencies have been reviewed for security issues
* [x] Security-related scripts follow the principle of least privilege

## Additional Notes

Fourth PR in the #1637 stack. **Base branch: `feat/1637-l2-skill`.**
Tests for this orchestration layer are intentionally separated into the
following PR to keep each diff reviewable.

Bill Berry1 week agomain 3175b5c5b2db

Changed files1010 shown on this page

Branches0Containing branches

Tags0Containing tags

Changed files

10 of 10 files listed · truncated