microsoft/hve-core

Public

mirrored from https://github.com/microsoft/hve-coreAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
main

Branches

Tags

  • No tags available.
294Branches29Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

Commit

feat(scripts): add evals orchestration modules and runners (#1831)

# Pull Request

## Description

Added the orchestration layer that drives Vally evaluations under
*scripts/evals/*. This includes the top-level runners
(*Invoke-VallyEvals.ps1*, *Invoke-AgentMatrix.ps1*,
*Invoke-BaselineEquivalence.ps1*, *Invoke-ContentModeration.ps1*,
*Invoke-CorpusModeration.ps1*), the spec and inventory builders
(*Build-AgentBehaviorSpec.ps1*, *Build-AgentInventory.ps1*,
*Get-AgentDependencyMap.ps1*, *Get-ChangedAIArtifact.ps1*), validation
and dashboard tooling (*Test-EvalSpec.ps1*, *Test-StimulusPresence.ps1*,
*New-AgentMatrixDashboard.ps1*, *New-EquivalenceDashboard.ps1*), and
shared modules under *Modules/* and *lib/*. A Python content-moderation
runner (*moderation/moderate.py*) with its own *pyproject.toml*,
*uv.lock*, and tests rounds out the layer.

## Related Issue(s)

Closes #1816

## Type of Change

Select all that apply:

**Code & Documentation:**

* [ ] Bug fix (non-breaking change fixing an issue)
* [x] New feature (non-breaking change adding functionality)
* [ ] Breaking change (fix or feature causing existing functionality to
change)
* [ ] Documentation update

**Infrastructure & Configuration:**

* [ ] GitHub Actions workflow
* [ ] Linting configuration (markdown, PowerShell, etc.)
* [ ] Security configuration
* [ ] DevContainer configuration
* [ ] Dependency update

**AI Artifacts:**

* [ ] Reviewed contribution with `prompt-builder` agent and addressed
all feedback
* [ ] Copilot instructions (`.github/instructions/*.instructions.md`)
* [ ] Copilot prompt (`.github/prompts/*.prompt.md`)
* [ ] Copilot agent (`.github/agents/*.agent.md`)
* [ ] Copilot skill (`.github/skills/*/SKILL.md`)

**Other:**

* [x] Script/automation (`.ps1`, `.sh`, `.py`)
* [ ] Other (please describe):

## Testing

Validated via `npm run lint:all` (exit 0), including `npm run lint:ps`
(PSScriptAnalyzer) and `npm run lint:py`. The Pester suite covering
these modules lands in the next stacked PR (#1817).

## Checklist

### Required Checks

* [x] Documentation is updated (if applicable)
* [x] Files follow existing naming conventions
* [x] Changes are backwards compatible (if applicable)
* [ ] Tests added for new functionality (if applicable)

### AI Artifact Contributions

* [ ] Used `/prompt-analyze` to review contribution
* [ ] Addressed all feedback from `prompt-builder` review
* [ ] Verified contribution follows common standards and type-specific
requirements

### Required Automated Checks

* [x] Markdown linting: `npm run lint:md`
* [ ] Spell checking: `npm run spell-check`
* [x] Frontmatter validation: `npm run lint:frontmatter`
* [ ] Skill structure validation: `npm run validate:skills`
* [ ] Link validation: `npm run lint:md-links`
* [x] PowerShell analysis: `npm run lint:ps`
* [ ] Plugin freshness: `npm run plugin:generate`
* [ ] Docusaurus tests: `npm run docs:test`

## Security Considerations

* [x] This PR does not contain any sensitive or NDA information
* [x] Any new dependencies have been reviewed for security issues
* [x] Security-related scripts follow the principle of least privilege

## Additional Notes

Fourth PR in the #1637 stack. **Base branch: `feat/1637-l2-skill`.**
Tests for this orchestration layer are intentionally separated into the
following PR to keep each diff reviewable.
Changed files1010 shown on this page
Branches0Containing branches
Tags0Containing tags