microsoft/hve-core
Publicmirrored fromhttps://github.com/microsoft/hve-coreAvailable
.github/instructions/rai-planning/rai-impact-assessment.instructions.md
320lines · modecode
| 1 | --- |
| 2 | description: 'RAI impact assessment for Phase 5: control surface taxonomy, evidence register, tradeoff documentation, and work item generation' |
| 3 | applyTo: '**/.copilot-tracking/rai-plans/**' |
| 4 | --- |
| 5 | |
| 6 | # RAI Impact Assessment and Controls |
| 7 | |
| 8 | Phase 5 evaluates control surface completeness for each identified threat, documents evidence of existing mitigations, identifies coverage gaps, and analyzes tradeoffs between competing RAI principles. This file defines the taxonomy, templates, and rules that govern those activities. |
| 9 | |
| 10 | ## Control Surface Taxonomy |
| 11 | |
| 12 | The taxonomy maps six Microsoft RAI Standard v2 principles against three control types. Each cell represents a control surface that may contain one or more mitigations. |
| 13 | |
| 14 | | Principle | Prevent | Detect | Respond | |
| 15 | |------------------------|----------------------------------------------------------------------|---------------------------------------------------------------------|-------------------------------------------------------------| |
| 16 | | Fairness | Bias testing, balanced training data, algorithmic audits | Demographic parity monitoring, disparate impact alerts | Retraining pipelines, model rollback, remediation workflows | |
| 17 | | Reliability and Safety | Input validation, adversarial robustness testing, failsafe defaults | Drift detection, performance degradation alerts, anomaly monitoring | Graceful degradation, fallback models, incident response | |
| 18 | | Privacy and Security | Differential privacy, data minimization, access controls | Data leakage detection, membership inference monitoring | Breach response, data deletion, re-anonymization | |
| 19 | | Inclusiveness | Accessibility testing, diverse user research, multi-language support | Usage gap analysis, accessibility compliance monitoring | Content adaptation, alternative interaction modes | |
| 20 | | Transparency | Model cards, explanation interfaces, decision audit trails | Explanation quality monitoring, user comprehension testing | Explanation correction, model documentation updates | |
| 21 | | Accountability | Role-based access, approval workflows, audit logging | Compliance monitoring, audit trail verification | Escalation procedures, corrective action tracking | |
| 22 | |
| 23 | ### Prevent Controls |
| 24 | |
| 25 | Prevent controls stop harm before it occurs. They apply during design, training, and deployment stages. Evaluate each prevent control against the threat it addresses and confirm that the control operates before the threat materializes. |
| 26 | |
| 27 | ### Detect Controls |
| 28 | |
| 29 | Detect controls identify harm during operation. They apply after deployment through monitoring, alerting, and periodic assessment. Evaluate each detect control for coverage of the associated threat and confirm that detection latency meets acceptable thresholds. |
| 30 | |
| 31 | ### Respond Controls |
| 32 | |
| 33 | Respond controls mitigate harm after detection. They apply through incident response, remediation pipelines, and corrective actions. Evaluate each respond control for time-to-remediation and confirm that response procedures are documented and tested. |
| 34 | |
| 35 | ## Evidence Register |
| 36 | |
| 37 | The evidence register catalogs all mitigations, their coverage status, and supporting documentation. Each entry maps a control to the threat it addresses and the principle it serves. |
| 38 | |
| 39 | ### Evidence Fields |
| 40 | |
| 41 | Each evidence entry requires these fields: |
| 42 | |
| 43 | * Evidence ID: format `EV-{PRINCIPLE_ABBR}-{NNN}` where abbreviations are FAIR, REL, PRIV, INCL, TRAN, ACCT |
| 44 | * Threat ID: the `T-RAI-{NNN}` identifier from Phase 4 security model analysis |
| 45 | * Cross-Reference Threat ID: the `T-{BUCKET}-AI-{NNN}` identifier when a Security Planner threat exists |
| 46 | * Principle: one of the six MS RAI Standard v2 principles |
| 47 | * Control Type: Prevent, Detect, or Respond |
| 48 | * Control Description: what the mitigation does and how it operates |
| 49 | * Coverage Status: Full, Partial, or Gap |
| 50 | * Evidence Source: document, test result, audit log, or other artifact that demonstrates the control exists |
| 51 | * Verification Status: Verified, Unverified, Partially Verified, or N/A. Tracks whether the control has been tested and confirmed to work, distinct from Coverage Status which tracks whether the control exists. |
| 52 | * Notes: additional context including dependencies, assumptions, or known limitations |
| 53 | |
| 54 | ### Evidence Register Rules |
| 55 | |
| 56 | * Assign a unique Evidence ID to every control entry. |
| 57 | * Reference exactly one Threat ID per entry. Controls that address multiple threats require separate entries. |
| 58 | * Set Coverage Status to Gap when no evidence source exists for the control. |
| 59 | * Review all Gap entries during work item generation. |
| 60 | |
| 61 | ### Evidence Summary Table |
| 62 | |
| 63 | | Evidence ID | Threat ID | Principle | Control Type | Coverage Status | |
| 64 | |-------------|-----------|------------------------|--------------|-----------------| |
| 65 | | EV-FAIR-001 | T-RAI-001 | Fairness | Prevent | Full | |
| 66 | | EV-REL-001 | T-RAI-002 | Reliability and Safety | Detect | Partial | |
| 67 | | EV-PRIV-001 | T-RAI-003 | Privacy and Security | Respond | Gap | |
| 68 | |
| 69 | ## Guardrail Verification Checklist |
| 70 | |
| 71 | The guardrail verification checklist confirms that cataloged controls function as intended, not only that they exist. Walk through each category with the user and record verification findings in the evidence register using the Verification Status field. |
| 72 | |
| 73 | ### Input Guardrails |
| 74 | |
| 75 | * Prompt injection filters: Are injection attempts detected and blocked before reaching the model? What test cases validate filter coverage? |
| 76 | * Input schema validation: Does the system enforce expected input formats, types, and length constraints? What happens when malformed input bypasses validation? |
| 77 | * Adversarial input detection: Are adversarial perturbations (jailbreaks, encoding tricks, indirect injection via retrieved content) tested against the system's input pipeline? |
| 78 | |
| 79 | ### Output Guardrails |
| 80 | |
| 81 | * Content filters: Are output moderation filters active and tested against known harmful content categories? What is the false-positive rate? |
| 82 | * Grounding checks: Does the system verify that generated outputs are grounded in provided context? How are hallucinated claims detected? |
| 83 | * Output format validation: Are structured outputs validated against expected schemas before delivery to users or downstream systems? |
| 84 | * PII redaction: Does the system detect and redact personally identifiable information in outputs? What PII categories are covered and what detection method is used? |
| 85 | |
| 86 | ### Verification Recording |
| 87 | |
| 88 | For each guardrail evaluated, update the corresponding evidence register entry: |
| 89 | |
| 90 | * Set Verification Status to Verified when testing confirms the control works as documented. |
| 91 | * Set Verification Status to Partially Verified when some test cases pass but coverage is incomplete. |
| 92 | * Set Verification Status to Unverified when no testing has been performed. |
| 93 | * Create new evidence entries for guardrails discovered during verification that lack existing catalog entries. |
| 94 | |
| 95 | ## Appropriate Reliance Assessment |
| 96 | |
| 97 | Appropriate reliance ensures users neither over-trust nor under-trust AI-generated outputs. This assessment evaluates whether the system's design calibrates user trust to match the system's actual reliability. Findings produce evidence register entries using the standard `EV-{PRINCIPLE_ABBR}-{NNN}` format under Reliability and Safety or Transparency principles. |
| 98 | |
| 99 | ### Trust Calibration |
| 100 | |
| 101 | * How does the system communicate its confidence level for individual outputs? Are uncertainty indicators (confidence scores, probability ranges, hedging language) visible to users? |
| 102 | * When the system operates outside its training distribution or encounters novel inputs, does the interface signal reduced reliability? |
| 103 | * Do confidence indicators correlate with actual accuracy? Has calibration been measured? |
| 104 | |
| 105 | ### Human-in-the-Loop Design |
| 106 | |
| 107 | * Which decisions require human review before the system takes action? Document the boundary between automated and human-gated decisions. |
| 108 | * For high-stakes outputs (safety-critical, financially significant, legally binding), what review checkpoints exist before action? |
| 109 | * Can users override or modify AI recommendations before they take effect? |
| 110 | |
| 111 | ### UX Patterns for AI Transparency |
| 112 | |
| 113 | * Does the interface clearly communicate that outputs are AI-generated? |
| 114 | * Are the system's capabilities and limitations described where users encounter AI outputs? |
| 115 | * When the system produces explanations or reasoning, are those explanations faithful to the actual decision process? |
| 116 | |
| 117 | ### Over-Reliance Prevention |
| 118 | |
| 119 | * What mechanisms prevent users from accepting AI outputs without critical evaluation? Consider friction patterns (confirmation steps, mandatory review periods) and cognitive prompts ("Did you verify this output?"). |
| 120 | * Does the system present alternative outputs or counterarguments to encourage independent assessment? |
| 121 | * For repetitive tasks, does the interface vary its presentation to prevent automation complacency? |
| 122 | |
| 123 | ### Under-Reliance Detection |
| 124 | |
| 125 | * How does the system detect when users systematically ignore AI recommendations? Are override rates or dismissal patterns monitored? |
| 126 | * When under-reliance is detected, what intervention is available (contextual guidance, accuracy demonstrations, workflow adjustments)? |
| 127 | * Is there a feedback mechanism for users to report why they distrust specific outputs? |
| 128 | |
| 129 | ## Fairness-Weighted Difficulty Assessment |
| 130 | |
| 131 | The FWD assessment scores each threat-control pairing on two dimensions: difficulty of implementation and fairness impact. The product of these scores determines remediation priority. |
| 132 | |
| 133 | ### FWD Scoring |
| 134 | |
| 135 | | Dimension | Score 1 | Score 2 | Score 3 | |
| 136 | |---------------------------|---------------------------------------|---------------------------------------------|------------------------------------------| |
| 137 | | Implementation Difficulty | Low: standard tooling, minimal effort | Medium: custom development, moderate effort | High: novel research, significant effort | |
| 138 | | Fairness Impact | Low: limited demographic effect | Medium: measurable disparate impact | High: systemic exclusion or harm | |
| 139 | |
| 140 | The combined FWD score ranges from 1 to 9. Higher scores indicate greater urgency. |
| 141 | |
| 142 | | FWD Score Range | Priority | |
| 143 | |-----------------|----------| |
| 144 | | 7-9 | Critical | |
| 145 | | 4-6 | High | |
| 146 | | 2-3 | Medium | |
| 147 | | 1 | Low | |
| 148 | |
| 149 | ### FWD Assessment Table |
| 150 | |
| 151 | | Evidence ID | Threat ID | Implementation Difficulty | Fairness Impact | FWD Score | |
| 152 | |-------------|-----------|---------------------------|-----------------|-----------| |
| 153 | | EV-FAIR-001 | T-RAI-001 | 2 | 3 | 6 | |
| 154 | | EV-REL-001 | T-RAI-002 | 1 | 2 | 2 | |
| 155 | |
| 156 | ## Tradeoff Documentation |
| 157 | |
| 158 | Tradeoffs arise when mitigating one RAI principle creates tension with another. Document each tradeoff with its competing principles, the decision rationale, and any compensating controls. |
| 159 | |
| 160 | ### Tradeoff Entry Template |
| 161 | |
| 162 | Each tradeoff entry includes: |
| 163 | |
| 164 | * Tradeoff ID: format `TO-{NNN}` |
| 165 | * Competing Principles: the two principles in tension |
| 166 | * Description: what creates the tension and why both principles cannot be fully satisfied simultaneously |
| 167 | * Decision: which principle takes priority and under what conditions |
| 168 | * Compensating Controls: mitigations that reduce the impact on the deprioritized principle |
| 169 | * Residual Risk: remaining exposure after compensating controls are applied |
| 170 | |
| 171 | ### Common Tradeoffs |
| 172 | |
| 173 | #### TO-001: Privacy vs. Accuracy |
| 174 | |
| 175 | Differential privacy techniques reduce model accuracy by adding noise to training data. In systems where prediction accuracy affects safety, this tradeoff requires explicit threshold negotiation between privacy guarantees and acceptable accuracy loss. |
| 176 | |
| 177 | #### TO-002: Interpretability vs. Performance |
| 178 | |
| 179 | Simpler, interpretable models often underperform complex models. When transparency requirements mandate explainable outputs, document the performance delta and confirm that the accuracy reduction falls within acceptable bounds. |
| 180 | |
| 181 | #### TO-003: Fairness vs. Complexity |
| 182 | |
| 183 | Fairness constraints (demographic parity, equalized odds) increase model complexity and may reduce overall accuracy. Document the specific fairness metric chosen, the accuracy impact, and the stakeholder approval for the selected operating point. |
| 184 | |
| 185 | #### TO-004: Safety vs. Utility |
| 186 | |
| 187 | Conservative safety thresholds (input filtering, output clamping) reduce system utility by rejecting valid inputs or constraining outputs. Document the threshold values, false-positive rates, and conditions under which thresholds may be adjusted. |
| 188 | |
| 189 | #### TO-005: Transparency vs. Security |
| 190 | |
| 191 | Detailed model explanations can expose proprietary logic or create adversarial attack vectors. When explanation depth conflicts with security requirements, document the information boundary and the approved level of explanation granularity. |
| 192 | |
| 193 | #### TO-006: Monitoring vs. Privacy |
| 194 | |
| 195 | Comprehensive monitoring generates usage data that may conflict with data minimization requirements. Document the monitoring scope, data retention policies, and any anonymization applied to monitoring outputs. |
| 196 | |
| 197 | ## Per-Principle Rubrics |
| 198 | |
| 199 | Score each principle from 1 to 5 based on control surface coverage, evidence completeness, and tradeoff documentation. The rubric below defines thresholds for each score level. |
| 200 | |
| 201 | | Score | Fairness | Reliability and Safety | Privacy and Security | Inclusiveness | Transparency | Accountability | |
| 202 | |-------|-------------------------------------------------------------------|---------------------------------------------------------------------------------|---------------------------------------------------------------------------|-------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------| |
| 203 | | 1 | No bias testing or fairness metrics defined | No input validation or failsafe mechanisms | No privacy controls or data minimization | No accessibility testing or diverse user research | No model documentation or explanation capability | No audit logging or approval workflows | |
| 204 | | 2 | Bias testing planned but not yet executed | Basic input validation only | Data minimization policy exists but is not enforced | Accessibility guidelines documented but not tested | Model card exists but lacks explanation interfaces | Audit logging exists but is not monitored | |
| 205 | | 3 | Bias testing executed with partial demographic coverage | Input validation and basic drift detection in place | Data minimization enforced with periodic access reviews | Accessibility testing on primary interaction modes | Model card and basic explanation interface available | Audit logging with periodic review cycles | |
| 206 | | 4 | Comprehensive bias testing with ongoing monitoring | Full input validation, drift detection, and anomaly alerts | Differential privacy applied with continuous monitoring | Multi-modal accessibility tested with diverse user groups | Detailed model card, explanation interface, and decision trails | Role-based access with automated compliance monitoring | |
| 207 | | 5 | Continuous fairness monitoring with automated retraining triggers | Adversarial robustness testing, failsafe defaults, and tested incident response | Full privacy stack with breach response tested and data deletion verified | Inclusive design validated through ongoing diverse user research and feedback | Complete transparency stack with user comprehension testing and explanation quality monitoring | Full accountability chain with escalation procedures tested and corrective actions tracked | |
| 208 | |
| 209 | ### Scoring Rules |
| 210 | |
| 211 | * Score each principle independently based on the rubric thresholds. |
| 212 | * Use evidence from the evidence register to justify the assigned score. |
| 213 | * A score of 3 represents the minimum acceptable baseline for production deployment. |
| 214 | * Principles scoring below 3 require work items with Critical or High priority. |
| 215 | * Document the rationale for each score in the assessment output. |
| 216 | |
| 217 | ## Work Item Generation |
| 218 | |
| 219 | Generate work items from the evidence register for entries with Coverage Status of Gap or Partial and for principles scoring below 3 on the rubric. |
| 220 | |
| 221 | ### Generation Rules |
| 222 | |
| 223 | * Create one work item per evidence register entry with Coverage Status of Gap. |
| 224 | * Create one work item per evidence register entry with Coverage Status of Partial when the associated principle scores below 3. |
| 225 | * Include the Evidence ID, Threat ID, Principle, Control Type, and Control Description in the work item body. |
| 226 | * Reference the Tradeoff ID when the work item involves a documented tradeoff. |
| 227 | * Map the FWD score to the priority using the FWD priority table. |
| 228 | |
| 229 | ### Priority Mapping |
| 230 | |
| 231 | | FWD Score Range | Work Item Priority | |
| 232 | |-----------------|----------------------------------------| |
| 233 | | 7-9 | Critical: address before deployment | |
| 234 | | 4-6 | High: address within current iteration | |
| 235 | | 2-3 | Medium: schedule for next iteration | |
| 236 | | 1 | Low: add to backlog | |
| 237 | |
| 238 | ### Work Item Fields |
| 239 | |
| 240 | * Title: `[RAI] {Principle}: {Control Description summary}` |
| 241 | * Priority: mapped from FWD score |
| 242 | * Evidence ID: the associated evidence register entry |
| 243 | * Threat ID: the associated threat from Phase 4 |
| 244 | * Principle: the RAI principle |
| 245 | * Control Type: Prevent, Detect, or Respond |
| 246 | * Acceptance Criteria: the condition that moves Coverage Status from Gap or Partial to Full |
| 247 | |
| 248 | ## Artifact Templates |
| 249 | |
| 250 | Phase 5 produces three artifacts. Use these templates to structure the output files. |
| 251 | |
| 252 | ### Control Surface Catalog |
| 253 | |
| 254 | The control surface catalog documents all evaluated controls per principle and control type. |
| 255 | |
| 256 | ```markdown |
| 257 | --- |
| 258 | title: Control Surface Catalog |
| 259 | rai-plan: '{plan-id}' |
| 260 | phase: 5 |
| 261 | --- |
| 262 | |
| 263 | # Control Surface Catalog |
| 264 | |
| 265 | ## {Principle} |
| 266 | |
| 267 | ### Prevent |
| 268 | |
| 269 | * {Control Description} (Coverage: {Full|Partial|Gap}) |
| 270 | |
| 271 | ### Detect |
| 272 | |
| 273 | * {Control Description} (Coverage: {Full|Partial|Gap}) |
| 274 | |
| 275 | ### Respond |
| 276 | |
| 277 | * {Control Description} (Coverage: {Full|Partial|Gap}) |
| 278 | |
| 279 | <!-- Repeat for each principle --> |
| 280 | ``` |
| 281 | |
| 282 | ### Evidence Register |
| 283 | |
| 284 | The evidence register artifact provides the full listing of all evidence entries with supporting details. |
| 285 | |
| 286 | ```markdown |
| 287 | --- |
| 288 | title: Evidence Register |
| 289 | rai-plan: '{plan-id}' |
| 290 | phase: 5 |
| 291 | --- |
| 292 | |
| 293 | # Evidence Register |
| 294 | |
| 295 | | Evidence ID | Threat ID | Cross-Ref ID | Principle | Control Type | Control Description | Coverage | Evidence Source | Verification | Notes | |
| 296 | |-------------|-------------|-------------------|-------------|--------------|---------------------|----------|-----------------|--------------|---------| |
| 297 | | {EV-ID} | {T-RAI-NNN} | {T-BUCKET-AI-NNN} | {Principle} | {Type} | {Description} | {Status} | {Source} | {Status} | {Notes} | |
| 298 | ``` |
| 299 | |
| 300 | ### RAI Tradeoffs |
| 301 | |
| 302 | The tradeoff artifact documents all identified tensions between principles and the decisions made to resolve them. |
| 303 | |
| 304 | ```markdown |
| 305 | --- |
| 306 | title: RAI Tradeoffs |
| 307 | rai-plan: '{plan-id}' |
| 308 | phase: 5 |
| 309 | --- |
| 310 | |
| 311 | # RAI Tradeoffs |
| 312 | |
| 313 | ## {TO-NNN}: {Principle A} vs. {Principle B} |
| 314 | |
| 315 | * Competing Principles: {Principle A}, {Principle B} |
| 316 | * Description: {what creates the tension} |
| 317 | * Decision: {which principle takes priority and conditions} |
| 318 | * Compensating Controls: {mitigations for the deprioritized principle} |
| 319 | * Residual Risk: {remaining exposure} |
| 320 | ``` |
| 321 | |