microsoft/hve-core

Public

mirrored fromhttps://github.com/microsoft/hve-coreAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
a3acef32dec8d8ac8051793df3686007a92266cd

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

.github/instructions/rai-planning/rai-impact-assessment.instructions.md

320lines · modecode

1---
2description: 'RAI impact assessment for Phase 5: control surface taxonomy, evidence register, tradeoff documentation, and work item generation'
3applyTo: '**/.copilot-tracking/rai-plans/**'
4---
5
6# RAI Impact Assessment and Controls
7
8Phase 5 evaluates control surface completeness for each identified threat, documents evidence of existing mitigations, identifies coverage gaps, and analyzes tradeoffs between competing RAI principles. This file defines the taxonomy, templates, and rules that govern those activities.
9
10## Control Surface Taxonomy
11
12The taxonomy maps six Microsoft RAI Standard v2 principles against three control types. Each cell represents a control surface that may contain one or more mitigations.
13
14| Principle | Prevent | Detect | Respond |
15|------------------------|----------------------------------------------------------------------|---------------------------------------------------------------------|-------------------------------------------------------------|
16| Fairness | Bias testing, balanced training data, algorithmic audits | Demographic parity monitoring, disparate impact alerts | Retraining pipelines, model rollback, remediation workflows |
17| Reliability and Safety | Input validation, adversarial robustness testing, failsafe defaults | Drift detection, performance degradation alerts, anomaly monitoring | Graceful degradation, fallback models, incident response |
18| Privacy and Security | Differential privacy, data minimization, access controls | Data leakage detection, membership inference monitoring | Breach response, data deletion, re-anonymization |
19| Inclusiveness | Accessibility testing, diverse user research, multi-language support | Usage gap analysis, accessibility compliance monitoring | Content adaptation, alternative interaction modes |
20| Transparency | Model cards, explanation interfaces, decision audit trails | Explanation quality monitoring, user comprehension testing | Explanation correction, model documentation updates |
21| Accountability | Role-based access, approval workflows, audit logging | Compliance monitoring, audit trail verification | Escalation procedures, corrective action tracking |
22
23### Prevent Controls
24
25Prevent controls stop harm before it occurs. They apply during design, training, and deployment stages. Evaluate each prevent control against the threat it addresses and confirm that the control operates before the threat materializes.
26
27### Detect Controls
28
29Detect controls identify harm during operation. They apply after deployment through monitoring, alerting, and periodic assessment. Evaluate each detect control for coverage of the associated threat and confirm that detection latency meets acceptable thresholds.
30
31### Respond Controls
32
33Respond controls mitigate harm after detection. They apply through incident response, remediation pipelines, and corrective actions. Evaluate each respond control for time-to-remediation and confirm that response procedures are documented and tested.
34
35## Evidence Register
36
37The evidence register catalogs all mitigations, their coverage status, and supporting documentation. Each entry maps a control to the threat it addresses and the principle it serves.
38
39### Evidence Fields
40
41Each evidence entry requires these fields:
42
43* Evidence ID: format `EV-{PRINCIPLE_ABBR}-{NNN}` where abbreviations are FAIR, REL, PRIV, INCL, TRAN, ACCT
44* Threat ID: the `T-RAI-{NNN}` identifier from Phase 4 security model analysis
45* Cross-Reference Threat ID: the `T-{BUCKET}-AI-{NNN}` identifier when a Security Planner threat exists
46* Principle: one of the six MS RAI Standard v2 principles
47* Control Type: Prevent, Detect, or Respond
48* Control Description: what the mitigation does and how it operates
49* Coverage Status: Full, Partial, or Gap
50* Evidence Source: document, test result, audit log, or other artifact that demonstrates the control exists
51* Verification Status: Verified, Unverified, Partially Verified, or N/A. Tracks whether the control has been tested and confirmed to work, distinct from Coverage Status which tracks whether the control exists.
52* Notes: additional context including dependencies, assumptions, or known limitations
53
54### Evidence Register Rules
55
56* Assign a unique Evidence ID to every control entry.
57* Reference exactly one Threat ID per entry. Controls that address multiple threats require separate entries.
58* Set Coverage Status to Gap when no evidence source exists for the control.
59* Review all Gap entries during work item generation.
60
61### Evidence Summary Table
62
63| Evidence ID | Threat ID | Principle | Control Type | Coverage Status |
64|-------------|-----------|------------------------|--------------|-----------------|
65| EV-FAIR-001 | T-RAI-001 | Fairness | Prevent | Full |
66| EV-REL-001 | T-RAI-002 | Reliability and Safety | Detect | Partial |
67| EV-PRIV-001 | T-RAI-003 | Privacy and Security | Respond | Gap |
68
69## Guardrail Verification Checklist
70
71The guardrail verification checklist confirms that cataloged controls function as intended, not only that they exist. Walk through each category with the user and record verification findings in the evidence register using the Verification Status field.
72
73### Input Guardrails
74
75* Prompt injection filters: Are injection attempts detected and blocked before reaching the model? What test cases validate filter coverage?
76* Input schema validation: Does the system enforce expected input formats, types, and length constraints? What happens when malformed input bypasses validation?
77* Adversarial input detection: Are adversarial perturbations (jailbreaks, encoding tricks, indirect injection via retrieved content) tested against the system's input pipeline?
78
79### Output Guardrails
80
81* Content filters: Are output moderation filters active and tested against known harmful content categories? What is the false-positive rate?
82* Grounding checks: Does the system verify that generated outputs are grounded in provided context? How are hallucinated claims detected?
83* Output format validation: Are structured outputs validated against expected schemas before delivery to users or downstream systems?
84* PII redaction: Does the system detect and redact personally identifiable information in outputs? What PII categories are covered and what detection method is used?
85
86### Verification Recording
87
88For each guardrail evaluated, update the corresponding evidence register entry:
89
90* Set Verification Status to Verified when testing confirms the control works as documented.
91* Set Verification Status to Partially Verified when some test cases pass but coverage is incomplete.
92* Set Verification Status to Unverified when no testing has been performed.
93* Create new evidence entries for guardrails discovered during verification that lack existing catalog entries.
94
95## Appropriate Reliance Assessment
96
97Appropriate reliance ensures users neither over-trust nor under-trust AI-generated outputs. This assessment evaluates whether the system's design calibrates user trust to match the system's actual reliability. Findings produce evidence register entries using the standard `EV-{PRINCIPLE_ABBR}-{NNN}` format under Reliability and Safety or Transparency principles.
98
99### Trust Calibration
100
101* How does the system communicate its confidence level for individual outputs? Are uncertainty indicators (confidence scores, probability ranges, hedging language) visible to users?
102* When the system operates outside its training distribution or encounters novel inputs, does the interface signal reduced reliability?
103* Do confidence indicators correlate with actual accuracy? Has calibration been measured?
104
105### Human-in-the-Loop Design
106
107* Which decisions require human review before the system takes action? Document the boundary between automated and human-gated decisions.
108* For high-stakes outputs (safety-critical, financially significant, legally binding), what review checkpoints exist before action?
109* Can users override or modify AI recommendations before they take effect?
110
111### UX Patterns for AI Transparency
112
113* Does the interface clearly communicate that outputs are AI-generated?
114* Are the system's capabilities and limitations described where users encounter AI outputs?
115* When the system produces explanations or reasoning, are those explanations faithful to the actual decision process?
116
117### Over-Reliance Prevention
118
119* What mechanisms prevent users from accepting AI outputs without critical evaluation? Consider friction patterns (confirmation steps, mandatory review periods) and cognitive prompts ("Did you verify this output?").
120* Does the system present alternative outputs or counterarguments to encourage independent assessment?
121* For repetitive tasks, does the interface vary its presentation to prevent automation complacency?
122
123### Under-Reliance Detection
124
125* How does the system detect when users systematically ignore AI recommendations? Are override rates or dismissal patterns monitored?
126* When under-reliance is detected, what intervention is available (contextual guidance, accuracy demonstrations, workflow adjustments)?
127* Is there a feedback mechanism for users to report why they distrust specific outputs?
128
129## Fairness-Weighted Difficulty Assessment
130
131The FWD assessment scores each threat-control pairing on two dimensions: difficulty of implementation and fairness impact. The product of these scores determines remediation priority.
132
133### FWD Scoring
134
135| Dimension | Score 1 | Score 2 | Score 3 |
136|---------------------------|---------------------------------------|---------------------------------------------|------------------------------------------|
137| Implementation Difficulty | Low: standard tooling, minimal effort | Medium: custom development, moderate effort | High: novel research, significant effort |
138| Fairness Impact | Low: limited demographic effect | Medium: measurable disparate impact | High: systemic exclusion or harm |
139
140The combined FWD score ranges from 1 to 9. Higher scores indicate greater urgency.
141
142| FWD Score Range | Priority |
143|-----------------|----------|
144| 7-9 | Critical |
145| 4-6 | High |
146| 2-3 | Medium |
147| 1 | Low |
148
149### FWD Assessment Table
150
151| Evidence ID | Threat ID | Implementation Difficulty | Fairness Impact | FWD Score |
152|-------------|-----------|---------------------------|-----------------|-----------|
153| EV-FAIR-001 | T-RAI-001 | 2 | 3 | 6 |
154| EV-REL-001 | T-RAI-002 | 1 | 2 | 2 |
155
156## Tradeoff Documentation
157
158Tradeoffs arise when mitigating one RAI principle creates tension with another. Document each tradeoff with its competing principles, the decision rationale, and any compensating controls.
159
160### Tradeoff Entry Template
161
162Each tradeoff entry includes:
163
164* Tradeoff ID: format `TO-{NNN}`
165* Competing Principles: the two principles in tension
166* Description: what creates the tension and why both principles cannot be fully satisfied simultaneously
167* Decision: which principle takes priority and under what conditions
168* Compensating Controls: mitigations that reduce the impact on the deprioritized principle
169* Residual Risk: remaining exposure after compensating controls are applied
170
171### Common Tradeoffs
172
173#### TO-001: Privacy vs. Accuracy
174
175Differential privacy techniques reduce model accuracy by adding noise to training data. In systems where prediction accuracy affects safety, this tradeoff requires explicit threshold negotiation between privacy guarantees and acceptable accuracy loss.
176
177#### TO-002: Interpretability vs. Performance
178
179Simpler, interpretable models often underperform complex models. When transparency requirements mandate explainable outputs, document the performance delta and confirm that the accuracy reduction falls within acceptable bounds.
180
181#### TO-003: Fairness vs. Complexity
182
183Fairness constraints (demographic parity, equalized odds) increase model complexity and may reduce overall accuracy. Document the specific fairness metric chosen, the accuracy impact, and the stakeholder approval for the selected operating point.
184
185#### TO-004: Safety vs. Utility
186
187Conservative safety thresholds (input filtering, output clamping) reduce system utility by rejecting valid inputs or constraining outputs. Document the threshold values, false-positive rates, and conditions under which thresholds may be adjusted.
188
189#### TO-005: Transparency vs. Security
190
191Detailed model explanations can expose proprietary logic or create adversarial attack vectors. When explanation depth conflicts with security requirements, document the information boundary and the approved level of explanation granularity.
192
193#### TO-006: Monitoring vs. Privacy
194
195Comprehensive monitoring generates usage data that may conflict with data minimization requirements. Document the monitoring scope, data retention policies, and any anonymization applied to monitoring outputs.
196
197## Per-Principle Rubrics
198
199Score each principle from 1 to 5 based on control surface coverage, evidence completeness, and tradeoff documentation. The rubric below defines thresholds for each score level.
200
201| Score | Fairness | Reliability and Safety | Privacy and Security | Inclusiveness | Transparency | Accountability |
202|-------|-------------------------------------------------------------------|---------------------------------------------------------------------------------|---------------------------------------------------------------------------|-------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|
203| 1 | No bias testing or fairness metrics defined | No input validation or failsafe mechanisms | No privacy controls or data minimization | No accessibility testing or diverse user research | No model documentation or explanation capability | No audit logging or approval workflows |
204| 2 | Bias testing planned but not yet executed | Basic input validation only | Data minimization policy exists but is not enforced | Accessibility guidelines documented but not tested | Model card exists but lacks explanation interfaces | Audit logging exists but is not monitored |
205| 3 | Bias testing executed with partial demographic coverage | Input validation and basic drift detection in place | Data minimization enforced with periodic access reviews | Accessibility testing on primary interaction modes | Model card and basic explanation interface available | Audit logging with periodic review cycles |
206| 4 | Comprehensive bias testing with ongoing monitoring | Full input validation, drift detection, and anomaly alerts | Differential privacy applied with continuous monitoring | Multi-modal accessibility tested with diverse user groups | Detailed model card, explanation interface, and decision trails | Role-based access with automated compliance monitoring |
207| 5 | Continuous fairness monitoring with automated retraining triggers | Adversarial robustness testing, failsafe defaults, and tested incident response | Full privacy stack with breach response tested and data deletion verified | Inclusive design validated through ongoing diverse user research and feedback | Complete transparency stack with user comprehension testing and explanation quality monitoring | Full accountability chain with escalation procedures tested and corrective actions tracked |
208
209### Scoring Rules
210
211* Score each principle independently based on the rubric thresholds.
212* Use evidence from the evidence register to justify the assigned score.
213* A score of 3 represents the minimum acceptable baseline for production deployment.
214* Principles scoring below 3 require work items with Critical or High priority.
215* Document the rationale for each score in the assessment output.
216
217## Work Item Generation
218
219Generate work items from the evidence register for entries with Coverage Status of Gap or Partial and for principles scoring below 3 on the rubric.
220
221### Generation Rules
222
223* Create one work item per evidence register entry with Coverage Status of Gap.
224* Create one work item per evidence register entry with Coverage Status of Partial when the associated principle scores below 3.
225* Include the Evidence ID, Threat ID, Principle, Control Type, and Control Description in the work item body.
226* Reference the Tradeoff ID when the work item involves a documented tradeoff.
227* Map the FWD score to the priority using the FWD priority table.
228
229### Priority Mapping
230
231| FWD Score Range | Work Item Priority |
232|-----------------|----------------------------------------|
233| 7-9 | Critical: address before deployment |
234| 4-6 | High: address within current iteration |
235| 2-3 | Medium: schedule for next iteration |
236| 1 | Low: add to backlog |
237
238### Work Item Fields
239
240* Title: `[RAI] {Principle}: {Control Description summary}`
241* Priority: mapped from FWD score
242* Evidence ID: the associated evidence register entry
243* Threat ID: the associated threat from Phase 4
244* Principle: the RAI principle
245* Control Type: Prevent, Detect, or Respond
246* Acceptance Criteria: the condition that moves Coverage Status from Gap or Partial to Full
247
248## Artifact Templates
249
250Phase 5 produces three artifacts. Use these templates to structure the output files.
251
252### Control Surface Catalog
253
254The control surface catalog documents all evaluated controls per principle and control type.
255
256```markdown
257---
258title: Control Surface Catalog
259rai-plan: '{plan-id}'
260phase: 5
261---
262
263# Control Surface Catalog
264
265## {Principle}
266
267### Prevent
268
269* {Control Description} (Coverage: {Full|Partial|Gap})
270
271### Detect
272
273* {Control Description} (Coverage: {Full|Partial|Gap})
274
275### Respond
276
277* {Control Description} (Coverage: {Full|Partial|Gap})
278
279<!-- Repeat for each principle -->
280```
281
282### Evidence Register
283
284The evidence register artifact provides the full listing of all evidence entries with supporting details.
285
286```markdown
287---
288title: Evidence Register
289rai-plan: '{plan-id}'
290phase: 5
291---
292
293# Evidence Register
294
295| Evidence ID | Threat ID | Cross-Ref ID | Principle | Control Type | Control Description | Coverage | Evidence Source | Verification | Notes |
296|-------------|-------------|-------------------|-------------|--------------|---------------------|----------|-----------------|--------------|---------|
297| {EV-ID} | {T-RAI-NNN} | {T-BUCKET-AI-NNN} | {Principle} | {Type} | {Description} | {Status} | {Source} | {Status} | {Notes} |
298```
299
300### RAI Tradeoffs
301
302The tradeoff artifact documents all identified tensions between principles and the decisions made to resolve them.
303
304```markdown
305---
306title: RAI Tradeoffs
307rai-plan: '{plan-id}'
308phase: 5
309---
310
311# RAI Tradeoffs
312
313## {TO-NNN}: {Principle A} vs. {Principle B}
314
315* Competing Principles: {Principle A}, {Principle B}
316* Description: {what creates the tension}
317* Decision: {which principle takes priority and conditions}
318* Compensating Controls: {mitigations for the deprioritized principle}
319* Residual Risk: {remaining exposure}
320```
321