microsoft/hve-core

Public

mirrored fromhttps://github.com/microsoft/hve-coreAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
ef978c9692f64214d85e82b4715515e84f8772cb

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

.github/instructions/rai-planning/rai-security-model.instructions.md

336lines · modecode

1---
2description: 'RAI security model analysis for Phase 4: AI STRIDE extensions, dual threat IDs, ML STRIDE matrix, and security model merge protocol'
3applyTo: '**/.copilot-tracking/rai-plans/**'
4---
5
6# RAI Security Model Analysis
7
8AI-specific security model analysis extensions for Phase 4 of the RAI Planner. This guidance extends the STRIDE methodology with NIST trustworthiness characteristic overlaps, AI element types, trust boundaries, data flow patterns, and a dual threat ID convention. A merge protocol enables interoperation with Security Planner security models when operating in `from-security-plan` mode.
9
10## AI STRIDE Extensions
11
12Standard STRIDE categories gain AI-specific dimensions when applied to AI systems. Each category maps to one or more NIST trustworthiness characteristics that amplify the threat surface beyond traditional software concerns.
13
14| STRIDE Category | NIST Characteristic Overlay | AI-Specific Threat Examples |
15|------------------------|------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
16| Spoofing | Valid and Reliable, Explainable and Interpretable | Adversarial inputs mimicking legitimate data, model impersonation, synthetic identity injection |
17| Tampering | Fair with Harmful Bias Managed, Valid and Reliable | Training data poisoning introducing bias, model weight manipulation, feedback loop corruption |
18| Repudiation | Accountable and Transparent, Explainable and Interpretable | Unattributable automated decisions, audit log gaps for model outputs, governance bypass |
19| Information Disclosure | Privacy-Enhanced | Training data extraction, model inversion attacks, membership inference, embedding leakage |
20| Denial of Service | Valid and Reliable | Model resource exhaustion, inference throttling attacks, adversarial input causing degradation |
21| Elevation of Privilege | Privacy-Enhanced, Valid and Reliable | Prompt injection bypassing safety filters, jailbreaking, unauthorized model capability access |
22
23## AI Element Types
24
25Eight AI-specific element types define the components subject to RAI threat analysis. Each element type carries primary NIST concerns that guide threat identification.
26
27| Element Type | Description | Primary NIST Concerns |
28|----------------------|-----------------------------------------|---------------------------------------------------------------------------------------------------------|
29| Training Data Store | Datasets used for model training | Fair with Harmful Bias Managed (bias), Privacy-Enhanced (PII), Accountable and Transparent (provenance) |
30| Model Artifact | Trained model files and weights | Valid and Reliable (integrity), Explainable and Interpretable (explainability) |
31| Inference Endpoint | API or service serving predictions | Valid and Reliable (availability), Privacy-Enhanced (query privacy) |
32| Feature Pipeline | Data transformation for model input | Fair with Harmful Bias Managed (feature bias), Privacy-Enhanced (data flow) |
33| Feedback Loop | User feedback incorporated into model | Fair with Harmful Bias Managed (feedback bias), Valid and Reliable (drift) |
34| Human Review Queue | Human oversight checkpoints | Accountable and Transparent (review coverage), Explainable and Interpretable (decision documentation) |
35| Monitoring Dashboard | Model performance and behavior tracking | Explainable and Interpretable (observability), Valid and Reliable (alerting) |
36| Orchestration Layer | Agent or pipeline orchestration | Accountable and Transparent (decision routing), Valid and Reliable (failure handling) |
37
38## AI Trust Boundaries
39
40Five trust boundaries plus one accountability-specific boundary define separation points within AI systems. Threats concentrate at these boundaries where control transfers between domains.
41
42| Trust Boundary | Description | Key RAI Threats |
43|---------------------------------------------|------------------------------------------------------------------------|--------------------------------------------------------------------|
44| Training Data Boundary | Separation between raw data sources and training pipeline | Data poisoning, bias injection, privacy violations |
45| Model Boundary | Separation between model internals and serving infrastructure | Model extraction, weight tampering, IP leakage |
46| Inference Boundary | Separation between client requests and model processing | Adversarial inputs, prompt injection, resource exhaustion |
47| Feedback Boundary | Separation between user feedback and model updates | Feedback manipulation, drift injection, bias amplification |
48| Human Oversight Boundary | Separation between automated decisions and human review | Accountability gaps, automation bias, review bypass |
49| Human Review to Automated Decision Boundary | Accountability boundary between human judgment and automated execution | Accountability transfer, decision attribution, override governance |
50
51> [!NOTE]
52> The Human Review to Automated Decision Boundary is specifically an accountability boundary. It captures the transfer of responsibility when automated systems act on human review decisions, creating a distinct threat surface for decision attribution and override governance.
53
54## AI Data Flow Patterns
55
56Three data flow patterns characterize how data moves through AI systems. Each pattern identifies RAI-relevant stages and threat concentration points where targeted analysis yields the highest return.
57
58### Training Pipeline Flow
59
60Data source -> Feature extraction -> Training -> Model store -> Validation
61
62RAI-relevant stages:
63
64* Data source ingestion: bias in source data, PII exposure, provenance gaps
65* Feature extraction: feature selection bias, proxy variable introduction
66* Training: overfitting to biased patterns, memorization of sensitive data
67* Model store: weight integrity, access control, version lineage
68* Validation: evaluation fairness across demographic groups, holdout contamination
69
70Threat concentration points: data source ingestion (poisoning, bias), training (memorization, bias amplification), validation (evaluation fairness gaps).
71
72### Inference Pipeline Flow
73
74Client request -> Pre-processing -> Model inference -> Post-processing -> Response
75
76RAI-relevant stages:
77
78* Client request: adversarial input detection, prompt injection screening
79* Pre-processing: input sanitization, feature normalization integrity
80* Model inference: output correctness, confidence calibration, latency stability
81* Post-processing: content filtering, output explanation generation
82* Response: attribution metadata, audit logging, response integrity
83
84Threat concentration points: client request (adversarial inputs, prompt injection), model inference (output manipulation), post-processing (filter bypass).
85
86### Feedback Loop Flow
87
88User interaction -> Feedback collection -> Aggregation -> Model update trigger -> Retraining
89
90RAI-relevant stages:
91
92* User interaction: feedback authenticity, sampling bias in respondents
93* Feedback collection: consent and privacy compliance, feedback representation
94* Aggregation: statistical bias in aggregation methods, outlier handling
95* Model update trigger: drift detection, update authorization
96* Retraining: bias amplification across cycles, catastrophic forgetting
97
98Threat concentration points: feedback collection (manipulation, bias), aggregation (statistical bias), retraining (bias amplification, drift injection).
99
100## Dual Threat ID Convention
101
102RAI security model analysis uses a dual ID system that enables independent tracking within the RAI plan and cross-referencing with Security Planner operational buckets.
103
104### ID Formats
105
106* `T-RAI-{NNN}`: Sequential RAI-specific threat identifier starting at T-RAI-001. Every RAI threat receives this ID.
107* `T-{BUCKET}-AI-{NNN}`: Cross-reference ID mapping to Security Planner bucket terminology. Assigned when a threat overlaps with a Security Planner operational bucket.
108
109### Rules
110
1111. All RAI threats receive a `T-RAI-{NNN}` ID in sequential order.
1122. When a threat overlaps with a Security Planner bucket, also assign a `T-{BUCKET}-AI-{NNN}` ID.
1133. Cross-reference both IDs in threat tables so each threat is traceable across both plans.
1144. Bucket names match Security Planner operational buckets: DATA, BUILD, WEBUI, IDENTITY, INFRA.
1155. The `T-RAI-{NNN}` sequence is independent of the `T-{BUCKET}-AI-{NNN}` sequence within each bucket.
116
117### Example
118
119A training data poisoning threat might carry:
120
121* RAI ID: `T-RAI-003`
122* Security cross-reference: `T-DATA-AI-001`
123
124Both IDs appear in the extended threat table, linking the RAI assessment to the security plan's data bucket analysis.
125
126## Extended Threat Table Format
127
128The threat table extends the Security Planner format with five additional columns: RAI ID, NIST Characteristic, NIST AI RMF, Suggested Threat Origin, and Concern Level.
129
130```markdown
131| Threat ID | RAI ID | STRIDE | NIST Characteristic | NIST AI RMF | Description | AI Element | Trust Boundary | Suggested Threat Origin | Concern Level | Mitigation |
132|---------------|-----------|-----------|--------------------------------|-------------|------------------------------------------------------|---------------------|------------------------|-------------------------|---------------|---------------------------------------------------------------|
133| T-DATA-AI-001 | T-RAI-003 | Tampering | Fair with Harmful Bias Managed | Map 2.3 | Training data poisoning introducing demographic bias | Training Data Store | Training Data Boundary | Data Pipeline | High Concern | Data validation pipeline, bias detection, provenance tracking |
134```
135
136### Column Definitions
137
138* Threat ID: Security Planner cross-reference ID (`T-{BUCKET}-AI-{NNN}`), or blank if no bucket overlap exists.
139* RAI ID: Sequential RAI threat identifier (`T-RAI-{NNN}`).
140* STRIDE: Applicable STRIDE category.
141* NIST Characteristic: Primary NIST trustworthiness characteristic affected (Valid and Reliable, Safe, Secure and Resilient, Accountable and Transparent, Explainable and Interpretable, Privacy-Enhanced, Fair with Harmful Bias Managed).
142* NIST AI RMF: Applicable NIST AI RMF subcategory reference.
143* Description: Clear description of the threat, attack vector, and affected behavior.
144* AI Element: Element type from the AI Element Types table.
145* Trust Boundary: Boundary crossed or affected from the AI Trust Boundaries table.
146* Suggested Threat Origin: Where the threat originates (Data Pipeline, Model, Interface, Infrastructure, or Cross-cutting).
147* Concern Level: Qualitative assessment of threat significance (Low Concern, Moderate Concern, or High Concern). See Concern Level Assessment below for criteria.
148* Mitigation: Proposed mitigation strategy with standards references.
149
150### Concern Level Assessment
151
152Suggest a qualitative concern level for each identified threat based on contextual judgment:
153
154| Concern Level | Criteria |
155|------------------|-----------------------------------------------------------------------------------------|
156| Low Concern | Threat is theoretical or mitigated by existing controls; no immediate action suggested. |
157| Moderate Concern | Threat is plausible and partially mitigated; additional controls recommended. |
158| High Concern | Threat is likely or unmitigated; priority mitigation suggested. |
159
160The concern level is a suggested assessment for the team's consideration, not a definitive risk rating.
161
162### Threat Origin Grouping
163
164After populating the threat table, present a summary grouped by Suggested Threat Origin. This helps the team identify which system components carry the most threats and prioritize architectural mitigations. Present AI-specific threats (Data Pipeline, Model) first, then Interface threats, then Infrastructure and Cross-cutting threats.
165
166### Output Detail Level
167
168Adjust threat table column visibility based on `userPreferences.outputDetailLevel`:
169
170| Level | Visible Columns |
171|---------------|-------------------------------------------------------------------------------------------------------------------------------------------|
172| summary | RAI ID, STRIDE, Concern Level, Suggested Threat Origin. |
173| standard | All columns (default). |
174| comprehensive | All columns plus a "Detailed Rationale" column with per-threat analysis explaining the concern level assignment and mitigation reasoning. |
175
176### Audience Adaptation
177
178Adjust ML STRIDE matrix presentation based on `userPreferences.audienceProfile`:
179
180| Profile | Presentation |
181|-----------|-------------------------------------------------------------------------------------------------|
182| technical | Include the full ML STRIDE matrix. |
183| executive | Summarize ML-specific threats in narrative prose; omit the matrix. |
184| mixed | Include the matrix with regulatory cross-references and contextual notes for diverse audiences. |
185
186## ML STRIDE Matrix
187
188Extended matrix covering AI system components with NIST trustworthiness characteristic annotations. Each cell contains threat applicability (High/Medium/Low/N/A) and the primary NIST characteristic relevant to that intersection.
189
190> [!NOTE]
191> The STRIDE categories in this matrix (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) correspond to the AI-extended definitions in the AI STRIDE Extensions table above. Refer to that table for AI-specific threat examples and NIST characteristic overlays for each category.
192
193| Component | Spoofing | Tampering | Repudiation | Info Disclosure | DoS | EoP |
194|------------------|-----------------------------------------|---------------------------------------|----------------------------------------|----------------------------------------|-----------------------------|-----------------------------------------|
195| Training Data | Medium / Valid and Reliable | High / Fair with Harmful Bias Managed | Medium / Accountable and Transparent | High / Privacy-Enhanced | Low / Valid and Reliable | Low / Privacy-Enhanced |
196| Feature Pipeline | Low / Explainable and Interpretable | High / Fair with Harmful Bias Managed | Medium / Accountable and Transparent | Medium / Privacy-Enhanced | Low / Valid and Reliable | Low / Fair with Harmful Bias Managed |
197| Model Training | Medium / Valid and Reliable | High / Fair with Harmful Bias Managed | High / Accountable and Transparent | High / Privacy-Enhanced | Medium / Valid and Reliable | Medium / Valid and Reliable |
198| Model Serving | High / Valid and Reliable | Medium / Valid and Reliable | Medium / Explainable and Interpretable | High / Privacy-Enhanced | High / Valid and Reliable | High / Valid and Reliable |
199| Inference API | High / Valid and Reliable | High / Valid and Reliable | Medium / Explainable and Interpretable | Medium / Privacy-Enhanced | High / Valid and Reliable | High / Privacy-Enhanced |
200| Feedback Loop | Medium / Fair with Harmful Bias Managed | High / Fair with Harmful Bias Managed | High / Accountable and Transparent | Medium / Privacy-Enhanced | Low / Valid and Reliable | Medium / Fair with Harmful Bias Managed |
201| Human Review | Low / Accountable and Transparent | Medium / Accountable and Transparent | High / Accountable and Transparent | Low / Privacy-Enhanced | N/A | Medium / Accountable and Transparent |
202| Model Monitoring | Low / Explainable and Interpretable | Medium / Valid and Reliable | High / Explainable and Interpretable | Medium / Explainable and Interpretable | Medium / Valid and Reliable | Low / Valid and Reliable |
203
204### Reading the Matrix
205
206Each cell uses the format `Applicability / NIST Characteristic`:
207
208* Applicability indicates how likely the STRIDE category applies to the component (High, Medium, Low, N/A).
209* NIST Characteristic identifies which NIST trustworthiness characteristic is most relevant for that specific threat intersection.
210* Use this matrix as a starting point for threat identification. Investigate all High-applicability cells first, then Medium, then Low. N/A cells can be skipped unless the system architecture suggests otherwise.
211
212## Merge Protocol
213
214When a Security Planner assessment already exists (`from-security-plan` entry mode), the merge protocol prevents duplication and ensures consistent cross-referencing between security and RAI security models.
215
216### Steps
217
2181. Read the existing security plan security model from the path in `state.json` `securityPlanRef`.
2192. Extract the highest `T-{BUCKET}-AI-{NNN}` ID for each bucket to establish cross-reference continuity.
2203. Start new RAI threat IDs at `T-RAI-001` (independent sequence from the security plan).
2214. For overlapping threats (threats already identified in the security plan that also have RAI dimensions), cross-reference using dual IDs rather than duplicating the threat entry.
2225. Produce an addendum document (`rai-threat-addendum.md`) with a merge header identifying the source security plan.
2236. Use the extended threat table format with both ID columns to maintain traceability.
2247. Include a cross-reference section listing security `T-{BUCKET}-AI-{NNN}` IDs and their RAI `T-RAI-{NNN}` counterparts.
225
226### Addendum Header Template
227
228```markdown
229## RAI Security Model Addendum
230
231- Source security plan: {path}
232- Security plan date: {date}
233- Highest existing security threat ID: T-{BUCKET}-{NNN}
234- RAI threat ID range: T-RAI-001 through T-RAI-{NNN}
235```
236
237### Cross-Reference Section Template
238
239```markdown
240## Security Plan Cross-Reference
241
242| Security Threat ID | RAI Threat ID | Description | Overlap Type |
243|--------------------|---------------|-------------------------|---------------------------------------------------|
244| T-DATA-AI-001 | T-RAI-003 | Training data poisoning | Full overlap, RAI extends with fairness dimension |
245```
246
247## AI Threat Concentration by Bucket
248
249Expected threat density per operational bucket when analyzing AI systems. Use these estimates for planning and to validate coverage completeness.
250
251| Bucket | Expected AI Threat Count | Key Concern Areas |
252|----------------|--------------------------|------------------------------------------------------------------------------|
253| Data | 9 | Training data poisoning, bias injection, privacy violations, data provenance |
254| Build | 5 | Model supply chain, training integrity, pipeline security |
255| Web/UI | 6 | Adversarial inputs, prompt injection, output manipulation |
256| Identity | 3 | Model impersonation, unauthorized access, credential compromise |
257| Infrastructure | 2 | Resource exhaustion, compute hijacking |
258
259> [!NOTE]
260> Actual threat counts vary based on system architecture and AI component complexity. These estimates provide a baseline for coverage validation. If analysis produces significantly fewer threats in a bucket, revisit the analysis for gaps.
261
262## Artifact Templates
263
264### RAI Threat Addendum Template
265
266Template for `rai-threat-addendum.md` produced during Phase 4.
267
268```markdown
269---
270title: RAI Security Model Addendum
271description: RAI-specific threat analysis extending security plan security model
272---
273
274## RAI Security Model Addendum
275
276- Source security plan: {path or "standalone"}
277- Security plan date: {date or "N/A"}
278- Highest existing security threat ID: {ID or "N/A"}
279- RAI threat ID range: T-RAI-001 through T-RAI-{NNN}
280
281## Extended Threat Table
282
283| Threat ID | RAI ID | STRIDE | NIST Characteristic | NIST AI RMF | Description | AI Element | Trust Boundary | Suggested Threat Origin | Concern Level | Mitigation |
284|-----------|-----------|--------|---------------------|-------------|-------------|------------|----------------|-------------------------|---------------|------------|
285| | T-RAI-001 | | | | | | | | | |
286
287## Cross-Reference
288
289| Security Threat ID | RAI Threat ID | Description | Overlap Type |
290|--------------------|---------------|-------------|--------------|
291| | | | |
292
293## Threat Concentration Summary
294
295| Bucket | Threat Count | Coverage Status |
296|----------------|--------------|-----------------|
297| Data | | |
298| Build | | |
299| Web/UI | | |
300| Identity | | |
301| Infrastructure | | |
302```
303
304### Control Surface Catalog Template
305
306Template for `control-surface-catalog.md` mapping controls to each identified threat.
307
308```markdown
309---
310title: RAI Control Surface Catalog
311description: Per-threat control surface mappings for RAI threat mitigations
312---
313
314## Control Surface Catalog
315
316### Control Entry Template
317
318For each threat, document the control surface:
319
320| Field | Value |
321|-----------------------|-------------------------------------------|
322| RAI Threat ID | T-RAI-{NNN} |
323| Security Threat ID | T-{BUCKET}-AI-{NNN} or N/A |
324| NIST Characteristic | {characteristic} |
325| Control Category | Preventive, Detective, or Corrective |
326| Control Description | {description} |
327| Implementation Status | Implemented, Partial, Planned, or Missing |
328| Evidence | {reference to evidence or "None"} |
329| Residual Concern | {concern level after control application} |
330
331### Control Surface Table
332
333| RAI ID | Control Category | Control Description | Status | Evidence | Residual Concern |
334|-----------|------------------|---------------------|--------|----------|------------------|
335| T-RAI-001 | | | | | |
336```
337