microsoft/hve-core

Public

mirrored fromhttps://github.com/microsoft/hve-coreAvailable

Watch0 Fork0 Star0

Code Commits Issues Pull requests Actions Insights Security

a3acef32dec8d8ac8051793df3686007a92266cd

Find a branch or tag

Branches

a3acef32dec8d8ac8051793df3686007a92266cd

Clone

HTTPS

Download ZIP

hve-core/.github/instructions/rai-planning

.github/instructions/rai-planning/rai-impact-assessment.instructions.md

320lines · modecode

Raw Download

Latest commit unavailable.

unknown

1	`---`
2	`description: 'RAI impact assessment for Phase 5: control surface taxonomy, evidence register, tradeoff documentation, and work item generation'`
3	`applyTo: '/.copilot-tracking/rai-plans/'`
4	`---`
5
6	`# RAI Impact Assessment and Controls`
7
8	`Phase 5 evaluates control surface completeness for each identified threat, documents evidence of existing mitigations, identifies coverage gaps, and analyzes tradeoffs between competing RAI principles. This file defines the taxonomy, templates, and rules that govern those activities.`
9
10	`## Control Surface Taxonomy`
11
12	`The taxonomy maps six Microsoft RAI Standard v2 principles against three control types. Each cell represents a control surface that may contain one or more mitigations.`
13
14	`\| Principle \| Prevent \| Detect \| Respond \|`
15	`\|------------------------\|----------------------------------------------------------------------\|---------------------------------------------------------------------\|-------------------------------------------------------------\|`
16	`\| Fairness \| Bias testing, balanced training data, algorithmic audits \| Demographic parity monitoring, disparate impact alerts \| Retraining pipelines, model rollback, remediation workflows \|`
17	`\| Reliability and Safety \| Input validation, adversarial robustness testing, failsafe defaults \| Drift detection, performance degradation alerts, anomaly monitoring \| Graceful degradation, fallback models, incident response \|`
18	`\| Privacy and Security \| Differential privacy, data minimization, access controls \| Data leakage detection, membership inference monitoring \| Breach response, data deletion, re-anonymization \|`
19	`\| Inclusiveness \| Accessibility testing, diverse user research, multi-language support \| Usage gap analysis, accessibility compliance monitoring \| Content adaptation, alternative interaction modes \|`
20	`\| Transparency \| Model cards, explanation interfaces, decision audit trails \| Explanation quality monitoring, user comprehension testing \| Explanation correction, model documentation updates \|`
21	`\| Accountability \| Role-based access, approval workflows, audit logging \| Compliance monitoring, audit trail verification \| Escalation procedures, corrective action tracking \|`
22
23	`### Prevent Controls`
24
25	`Prevent controls stop harm before it occurs. They apply during design, training, and deployment stages. Evaluate each prevent control against the threat it addresses and confirm that the control operates before the threat materializes.`
26
27	`### Detect Controls`
28
29	`Detect controls identify harm during operation. They apply after deployment through monitoring, alerting, and periodic assessment. Evaluate each detect control for coverage of the associated threat and confirm that detection latency meets acceptable thresholds.`
30
31	`### Respond Controls`
32
33	`Respond controls mitigate harm after detection. They apply through incident response, remediation pipelines, and corrective actions. Evaluate each respond control for time-to-remediation and confirm that response procedures are documented and tested.`
34
35	`## Evidence Register`
36
37	`The evidence register catalogs all mitigations, their coverage status, and supporting documentation. Each entry maps a control to the threat it addresses and the principle it serves.`
38
39	`### Evidence Fields`
40
41	`Each evidence entry requires these fields:`
42
43	* Evidence ID: format `EV-{PRINCIPLE_ABBR}-{NNN}` where abbreviations are FAIR, REL, PRIV, INCL, TRAN, ACCT
44	* Threat ID: the `T-RAI-{NNN}` identifier from Phase 4 security model analysis
45	* Cross-Reference Threat ID: the `T-{BUCKET}-AI-{NNN}` identifier when a Security Planner threat exists
46	`* Principle: one of the six MS RAI Standard v2 principles`
47	`* Control Type: Prevent, Detect, or Respond`
48	`* Control Description: what the mitigation does and how it operates`
49	`* Coverage Status: Full, Partial, or Gap`
50	`* Evidence Source: document, test result, audit log, or other artifact that demonstrates the control exists`
51	`* Verification Status: Verified, Unverified, Partially Verified, or N/A. Tracks whether the control has been tested and confirmed to work, distinct from Coverage Status which tracks whether the control exists.`
52	`* Notes: additional context including dependencies, assumptions, or known limitations`
53
54	`### Evidence Register Rules`
55
56	`* Assign a unique Evidence ID to every control entry.`
57	`* Reference exactly one Threat ID per entry. Controls that address multiple threats require separate entries.`
58	`* Set Coverage Status to Gap when no evidence source exists for the control.`
59	`* Review all Gap entries during work item generation.`
60
61	`### Evidence Summary Table`
62
63	`\| Evidence ID \| Threat ID \| Principle \| Control Type \| Coverage Status \|`
64	`\|-------------\|-----------\|------------------------\|--------------\|-----------------\|`
65	`\| EV-FAIR-001 \| T-RAI-001 \| Fairness \| Prevent \| Full \|`
66	`\| EV-REL-001 \| T-RAI-002 \| Reliability and Safety \| Detect \| Partial \|`
67	`\| EV-PRIV-001 \| T-RAI-003 \| Privacy and Security \| Respond \| Gap \|`
68
69	`## Guardrail Verification Checklist`
70
71	`The guardrail verification checklist confirms that cataloged controls function as intended, not only that they exist. Walk through each category with the user and record verification findings in the evidence register using the Verification Status field.`
72
73	`### Input Guardrails`
74
75	`* Prompt injection filters: Are injection attempts detected and blocked before reaching the model? What test cases validate filter coverage?`
76	`* Input schema validation: Does the system enforce expected input formats, types, and length constraints? What happens when malformed input bypasses validation?`
77	`* Adversarial input detection: Are adversarial perturbations (jailbreaks, encoding tricks, indirect injection via retrieved content) tested against the system's input pipeline?`
78
79	`### Output Guardrails`
80
81	`* Content filters: Are output moderation filters active and tested against known harmful content categories? What is the false-positive rate?`
82	`* Grounding checks: Does the system verify that generated outputs are grounded in provided context? How are hallucinated claims detected?`
83	`* Output format validation: Are structured outputs validated against expected schemas before delivery to users or downstream systems?`
84	`* PII redaction: Does the system detect and redact personally identifiable information in outputs? What PII categories are covered and what detection method is used?`
85
86	`### Verification Recording`
87
88	`For each guardrail evaluated, update the corresponding evidence register entry:`
89
90	`* Set Verification Status to Verified when testing confirms the control works as documented.`
91	`* Set Verification Status to Partially Verified when some test cases pass but coverage is incomplete.`
92	`* Set Verification Status to Unverified when no testing has been performed.`
93	`* Create new evidence entries for guardrails discovered during verification that lack existing catalog entries.`
94
95	`## Appropriate Reliance Assessment`
96
97	Appropriate reliance ensures users neither over-trust nor under-trust AI-generated outputs. This assessment evaluates whether the system's design calibrates user trust to match the system's actual reliability. Findings produce evidence register entries using the standard `EV-{PRINCIPLE_ABBR}-{NNN}` format under Reliability and Safety or Transparency principles.
98
99	`### Trust Calibration`
100
101	`* How does the system communicate its confidence level for individual outputs? Are uncertainty indicators (confidence scores, probability ranges, hedging language) visible to users?`
102	`* When the system operates outside its training distribution or encounters novel inputs, does the interface signal reduced reliability?`
103	`* Do confidence indicators correlate with actual accuracy? Has calibration been measured?`
104
105	`### Human-in-the-Loop Design`
106
107	`* Which decisions require human review before the system takes action? Document the boundary between automated and human-gated decisions.`
108	`* For high-stakes outputs (safety-critical, financially significant, legally binding), what review checkpoints exist before action?`
109	`* Can users override or modify AI recommendations before they take effect?`
110
111	`### UX Patterns for AI Transparency`
112
113	`* Does the interface clearly communicate that outputs are AI-generated?`
114	`* Are the system's capabilities and limitations described where users encounter AI outputs?`
115	`* When the system produces explanations or reasoning, are those explanations faithful to the actual decision process?`
116
117	`### Over-Reliance Prevention`
118
119	`* What mechanisms prevent users from accepting AI outputs without critical evaluation? Consider friction patterns (confirmation steps, mandatory review periods) and cognitive prompts ("Did you verify this output?").`
120	`* Does the system present alternative outputs or counterarguments to encourage independent assessment?`
121	`* For repetitive tasks, does the interface vary its presentation to prevent automation complacency?`
122
123	`### Under-Reliance Detection`
124
125	`* How does the system detect when users systematically ignore AI recommendations? Are override rates or dismissal patterns monitored?`
126	`* When under-reliance is detected, what intervention is available (contextual guidance, accuracy demonstrations, workflow adjustments)?`
127	`* Is there a feedback mechanism for users to report why they distrust specific outputs?`
128
129	`## Fairness-Weighted Difficulty Assessment`
130
131	`The FWD assessment scores each threat-control pairing on two dimensions: difficulty of implementation and fairness impact. The product of these scores determines remediation priority.`
132
133	`### FWD Scoring`
134
135	`\| Dimension \| Score 1 \| Score 2 \| Score 3 \|`
136	`\|---------------------------\|---------------------------------------\|---------------------------------------------\|------------------------------------------\|`
137	`\| Implementation Difficulty \| Low: standard tooling, minimal effort \| Medium: custom development, moderate effort \| High: novel research, significant effort \|`
138	`\| Fairness Impact \| Low: limited demographic effect \| Medium: measurable disparate impact \| High: systemic exclusion or harm \|`
139
140	`The combined FWD score ranges from 1 to 9. Higher scores indicate greater urgency.`
141
142	`\| FWD Score Range \| Priority \|`
143	`\|-----------------\|----------\|`
144	`\| 7-9 \| Critical \|`
145	`\| 4-6 \| High \|`
146	`\| 2-3 \| Medium \|`
147	`\| 1 \| Low \|`
148
149	`### FWD Assessment Table`
150
151	`\| Evidence ID \| Threat ID \| Implementation Difficulty \| Fairness Impact \| FWD Score \|`
152	`\|-------------\|-----------\|---------------------------\|-----------------\|-----------\|`
153	`\| EV-FAIR-001 \| T-RAI-001 \| 2 \| 3 \| 6 \|`
154	`\| EV-REL-001 \| T-RAI-002 \| 1 \| 2 \| 2 \|`
155
156	`## Tradeoff Documentation`
157
158	`Tradeoffs arise when mitigating one RAI principle creates tension with another. Document each tradeoff with its competing principles, the decision rationale, and any compensating controls.`
159
160	`### Tradeoff Entry Template`
161
162	`Each tradeoff entry includes:`
163
164	* Tradeoff ID: format `TO-{NNN}`
165	`* Competing Principles: the two principles in tension`
166	`* Description: what creates the tension and why both principles cannot be fully satisfied simultaneously`
167	`* Decision: which principle takes priority and under what conditions`
168	`* Compensating Controls: mitigations that reduce the impact on the deprioritized principle`
169	`* Residual Risk: remaining exposure after compensating controls are applied`
170
171	`### Common Tradeoffs`
172
173	`#### TO-001: Privacy vs. Accuracy`
174
175	`Differential privacy techniques reduce model accuracy by adding noise to training data. In systems where prediction accuracy affects safety, this tradeoff requires explicit threshold negotiation between privacy guarantees and acceptable accuracy loss.`
176
177	`#### TO-002: Interpretability vs. Performance`
178
179	`Simpler, interpretable models often underperform complex models. When transparency requirements mandate explainable outputs, document the performance delta and confirm that the accuracy reduction falls within acceptable bounds.`
180
181	`#### TO-003: Fairness vs. Complexity`
182
183	`Fairness constraints (demographic parity, equalized odds) increase model complexity and may reduce overall accuracy. Document the specific fairness metric chosen, the accuracy impact, and the stakeholder approval for the selected operating point.`
184
185	`#### TO-004: Safety vs. Utility`
186
187	`Conservative safety thresholds (input filtering, output clamping) reduce system utility by rejecting valid inputs or constraining outputs. Document the threshold values, false-positive rates, and conditions under which thresholds may be adjusted.`
188
189	`#### TO-005: Transparency vs. Security`
190
191	`Detailed model explanations can expose proprietary logic or create adversarial attack vectors. When explanation depth conflicts with security requirements, document the information boundary and the approved level of explanation granularity.`
192
193	`#### TO-006: Monitoring vs. Privacy`
194
195	`Comprehensive monitoring generates usage data that may conflict with data minimization requirements. Document the monitoring scope, data retention policies, and any anonymization applied to monitoring outputs.`
196
197	`## Per-Principle Rubrics`
198
199	`Score each principle from 1 to 5 based on control surface coverage, evidence completeness, and tradeoff documentation. The rubric below defines thresholds for each score level.`
200
201	\| Score \| Fairness \| Reliability and Safety \| Privacy and Security \| Inclusiveness \| Transparency \| Accountability \|
202	\|-------\|-------------------------------------------------------------------\|---------------------------------------------------------------------------------\|---------------------------------------------------------------------------\|-------------------------------------------------------------------------------\|------------------------------------------------------------------------------------------------\|--------------------------------------------------------------------------------------------\|
203	\| 1 \| No bias testing or fairness metrics defined \| No input validation or failsafe mechanisms \| No privacy controls or data minimization \| No accessibility testing or diverse user research \| No model documentation or explanation capability \| No audit logging or approval workflows \|
204	\| 2 \| Bias testing planned but not yet executed \| Basic input validation only \| Data minimization policy exists but is not enforced \| Accessibility guidelines documented but not tested \| Model card exists but lacks explanation interfaces \| Audit logging exists but is not monitored \|
205	\| 3 \| Bias testing executed with partial demographic coverage \| Input validation and basic drift detection in place \| Data minimization enforced with periodic access reviews \| Accessibility testing on primary interaction modes \| Model card and basic explanation interface available \| Audit logging with periodic review cycles \|
206	\| 4 \| Comprehensive bias testing with ongoing monitoring \| Full input validation, drift detection, and anomaly alerts \| Differential privacy applied with continuous monitoring \| Multi-modal accessibility tested with diverse user groups \| Detailed model card, explanation interface, and decision trails \| Role-based access with automated compliance monitoring \|
207	\| 5 \| Continuous fairness monitoring with automated retraining triggers \| Adversarial robustness testing, failsafe defaults, and tested incident response \| Full privacy stack with breach response tested and data deletion verified \| Inclusive design validated through ongoing diverse user research and feedback \| Complete transparency stack with user comprehension testing and explanation quality monitoring \| Full accountability chain with escalation procedures tested and corrective actions tracked \|
208
209	`### Scoring Rules`
210
211	`* Score each principle independently based on the rubric thresholds.`
212	`* Use evidence from the evidence register to justify the assigned score.`
213	`* A score of 3 represents the minimum acceptable baseline for production deployment.`
214	`* Principles scoring below 3 require work items with Critical or High priority.`
215	`* Document the rationale for each score in the assessment output.`
216
217	`## Work Item Generation`
218
219	`Generate work items from the evidence register for entries with Coverage Status of Gap or Partial and for principles scoring below 3 on the rubric.`
220
221	`### Generation Rules`
222
223	`* Create one work item per evidence register entry with Coverage Status of Gap.`
224	`* Create one work item per evidence register entry with Coverage Status of Partial when the associated principle scores below 3.`
225	`* Include the Evidence ID, Threat ID, Principle, Control Type, and Control Description in the work item body.`
226	`* Reference the Tradeoff ID when the work item involves a documented tradeoff.`
227	`* Map the FWD score to the priority using the FWD priority table.`
228
229	`### Priority Mapping`
230
231	`\| FWD Score Range \| Work Item Priority \|`
232	`\|-----------------\|----------------------------------------\|`
233	`\| 7-9 \| Critical: address before deployment \|`
234	`\| 4-6 \| High: address within current iteration \|`
235	`\| 2-3 \| Medium: schedule for next iteration \|`
236	`\| 1 \| Low: add to backlog \|`
237
238	`### Work Item Fields`
239
240	* Title: `[RAI] {Principle}: {Control Description summary}`
241	`* Priority: mapped from FWD score`
242	`* Evidence ID: the associated evidence register entry`
243	`* Threat ID: the associated threat from Phase 4`
244	`* Principle: the RAI principle`
245	`* Control Type: Prevent, Detect, or Respond`
246	`* Acceptance Criteria: the condition that moves Coverage Status from Gap or Partial to Full`
247
248	`## Artifact Templates`
249
250	`Phase 5 produces three artifacts. Use these templates to structure the output files.`
251
252	`### Control Surface Catalog`
253
254	`The control surface catalog documents all evaluated controls per principle and control type.`
255
256	```markdown
257	`---`
258	`title: Control Surface Catalog`
259	`rai-plan: '{plan-id}'`
260	`phase: 5`
261	`---`
262
263	`# Control Surface Catalog`
264
265	`## {Principle}`
266
267	`### Prevent`
268
269	`* {Control Description} (Coverage: {Full\|Partial\|Gap})`
270
271	`### Detect`
272
273	`* {Control Description} (Coverage: {Full\|Partial\|Gap})`
274
275	`### Respond`
276
277	`* {Control Description} (Coverage: {Full\|Partial\|Gap})`
278
279	`<!-- Repeat for each principle -->`
280	```
281
282	`### Evidence Register`
283
284	`The evidence register artifact provides the full listing of all evidence entries with supporting details.`
285
286	```markdown
287	`---`
288	`title: Evidence Register`
289	`rai-plan: '{plan-id}'`
290	`phase: 5`
291	`---`
292
293	`# Evidence Register`
294
295	`\| Evidence ID \| Threat ID \| Cross-Ref ID \| Principle \| Control Type \| Control Description \| Coverage \| Evidence Source \| Verification \| Notes \|`
296	`\|-------------\|-------------\|-------------------\|-------------\|--------------\|---------------------\|----------\|-----------------\|--------------\|---------\|`
297	`\| {EV-ID} \| {T-RAI-NNN} \| {T-BUCKET-AI-NNN} \| {Principle} \| {Type} \| {Description} \| {Status} \| {Source} \| {Status} \| {Notes} \|`
298	```
299
300	`### RAI Tradeoffs`
301
302	`The tradeoff artifact documents all identified tensions between principles and the decisions made to resolve them.`
303
304	```markdown
305	`---`
306	`title: RAI Tradeoffs`
307	`rai-plan: '{plan-id}'`
308	`phase: 5`
309	`---`
310
311	`# RAI Tradeoffs`
312
313	`## {TO-NNN}: {Principle A} vs. {Principle B}`
314
315	`* Competing Principles: {Principle A}, {Principle B}`
316	`* Description: {what creates the tension}`
317	`* Decision: {which principle takes priority and conditions}`
318	`* Compensating Controls: {mitigations for the deprioritized principle}`
319	`* Residual Risk: {remaining exposure}`
320	```
321

microsoft/hve-core

Branches

Tags

Clone