hve-core · GitVita

---

description: 'RAI security model analysis for Phase 4: AI STRIDE extensions, dual threat IDs, ML STRIDE matrix, and security model merge protocol'

3

applyTo: '**/.copilot-tracking/rai-plans/**'

4

---

5

6

# RAI Security Model Analysis

7

8

AI-specific security model analysis extensions for Phase 4 of the RAI Planner. This guidance extends the STRIDE methodology with NIST trustworthiness characteristic overlaps, AI element types, trust boundaries, data flow patterns, and a dual threat ID convention. A merge protocol enables interoperation with Security Planner security models when operating in `from-security-plan` mode.

9

10

## AI STRIDE Extensions

11

12

Standard STRIDE categories gain AI-specific dimensions when applied to AI systems. Each category maps to one or more NIST trustworthiness characteristics that amplify the threat surface beyond traditional software concerns.

13

14

| STRIDE Category        | NIST Characteristic Overlay                                | AI-Specific Threat Examples                                                                     |

15

|------------------------|------------------------------------------------------------|-------------------------------------------------------------------------------------------------|

16

| Spoofing               | Valid and Reliable, Explainable and Interpretable          | Adversarial inputs mimicking legitimate data, model impersonation, synthetic identity injection |

17

| Tampering              | Fair with Harmful Bias Managed, Valid and Reliable         | Training data poisoning introducing bias, model weight manipulation, feedback loop corruption   |

18

| Repudiation            | Accountable and Transparent, Explainable and Interpretable | Unattributable automated decisions, audit log gaps for model outputs, governance bypass         |

19

| Information Disclosure | Privacy-Enhanced                                           | Training data extraction, model inversion attacks, membership inference, embedding leakage      |

20

| Denial of Service      | Valid and Reliable                                         | Model resource exhaustion, inference throttling attacks, adversarial input causing degradation  |

21

| Elevation of Privilege | Privacy-Enhanced, Valid and Reliable                       | Prompt injection bypassing safety filters, jailbreaking, unauthorized model capability access   |

22

23

## AI Element Types

24

25

Eight AI-specific element types define the components subject to RAI threat analysis. Each element type carries primary NIST concerns that guide threat identification.

26

27

| Element Type         | Description                             | Primary NIST Concerns                                                                                   |

28

|----------------------|-----------------------------------------|---------------------------------------------------------------------------------------------------------|

29

| Training Data Store  | Datasets used for model training        | Fair with Harmful Bias Managed (bias), Privacy-Enhanced (PII), Accountable and Transparent (provenance) |

30

| Model Artifact       | Trained model files and weights         | Valid and Reliable (integrity), Explainable and Interpretable (explainability)                          |

31

| Inference Endpoint   | API or service serving predictions      | Valid and Reliable (availability), Privacy-Enhanced (query privacy)                                     |

32

| Feature Pipeline     | Data transformation for model input     | Fair with Harmful Bias Managed (feature bias), Privacy-Enhanced (data flow)                             |

33

| Feedback Loop        | User feedback incorporated into model   | Fair with Harmful Bias Managed (feedback bias), Valid and Reliable (drift)                              |

34

| Human Review Queue   | Human oversight checkpoints             | Accountable and Transparent (review coverage), Explainable and Interpretable (decision documentation)   |

35

| Monitoring Dashboard | Model performance and behavior tracking | Explainable and Interpretable (observability), Valid and Reliable (alerting)                            |

36

| Orchestration Layer  | Agent or pipeline orchestration         | Accountable and Transparent (decision routing), Valid and Reliable (failure handling)                   |

37

38

## AI Trust Boundaries

39

40

Five trust boundaries plus one accountability-specific boundary define separation points within AI systems. Threats concentrate at these boundaries where control transfers between domains.

41

42

| Trust Boundary                              | Description                                                            | Key RAI Threats                                                    |

43

|---------------------------------------------|------------------------------------------------------------------------|--------------------------------------------------------------------|

44

| Training Data Boundary                      | Separation between raw data sources and training pipeline              | Data poisoning, bias injection, privacy violations                 |

45

| Model Boundary                              | Separation between model internals and serving infrastructure          | Model extraction, weight tampering, IP leakage                     |

46

| Inference Boundary                          | Separation between client requests and model processing                | Adversarial inputs, prompt injection, resource exhaustion          |

47

| Feedback Boundary                           | Separation between user feedback and model updates                     | Feedback manipulation, drift injection, bias amplification         |

48

| Human Oversight Boundary                    | Separation between automated decisions and human review                | Accountability gaps, automation bias, review bypass                |

49

| Human Review to Automated Decision Boundary | Accountability boundary between human judgment and automated execution | Accountability transfer, decision attribution, override governance |

50

51

> [!NOTE]

52

> The Human Review to Automated Decision Boundary is specifically an accountability boundary. It captures the transfer of responsibility when automated systems act on human review decisions, creating a distinct threat surface for decision attribution and override governance.

53

54

## AI Data Flow Patterns

55

56

Three data flow patterns characterize how data moves through AI systems. Each pattern identifies RAI-relevant stages and threat concentration points where targeted analysis yields the highest return.

57

58

### Training Pipeline Flow

59

60

Data source -> Feature extraction -> Training -> Model store -> Validation

61

62

RAI-relevant stages:

63

64

* Data source ingestion: bias in source data, PII exposure, provenance gaps

65

* Feature extraction: feature selection bias, proxy variable introduction

66

* Training: overfitting to biased patterns, memorization of sensitive data

67

* Model store: weight integrity, access control, version lineage

68

* Validation: evaluation fairness across demographic groups, holdout contamination

69

70

Threat concentration points: data source ingestion (poisoning, bias), training (memorization, bias amplification), validation (evaluation fairness gaps).

71

72

### Inference Pipeline Flow

73

74

Client request -> Pre-processing -> Model inference -> Post-processing -> Response

75

76

RAI-relevant stages:

77

78

* Client request: adversarial input detection, prompt injection screening

79

* Pre-processing: input sanitization, feature normalization integrity

80

* Model inference: output correctness, confidence calibration, latency stability

81

* Post-processing: content filtering, output explanation generation

82

* Response: attribution metadata, audit logging, response integrity

83

84

Threat concentration points: client request (adversarial inputs, prompt injection), model inference (output manipulation), post-processing (filter bypass).

85

86

### Feedback Loop Flow

87

88

User interaction -> Feedback collection -> Aggregation -> Model update trigger -> Retraining

89

90

RAI-relevant stages:

91

92

* User interaction: feedback authenticity, sampling bias in respondents

93

* Feedback collection: consent and privacy compliance, feedback representation

94

* Aggregation: statistical bias in aggregation methods, outlier handling

95

* Model update trigger: drift detection, update authorization

96

* Retraining: bias amplification across cycles, catastrophic forgetting

97

98

Threat concentration points: feedback collection (manipulation, bias), aggregation (statistical bias), retraining (bias amplification, drift injection).

99

100

## Dual Threat ID Convention

101

102

RAI security model analysis uses a dual ID system that enables independent tracking within the RAI plan and cross-referencing with Security Planner operational buckets.

### ID Formats

* `T-RAI-{NNN}`: Sequential RAI-specific threat identifier starting at T-RAI-001. Every RAI threat receives this ID.

107

* `T-{BUCKET}-AI-{NNN}`: Cross-reference ID mapping to Security Planner bucket terminology. Assigned when a threat overlaps with a Security Planner operational bucket.

### Rules

1. All RAI threats receive a `T-RAI-{NNN}` ID in sequential order.

112

2. When a threat overlaps with a Security Planner bucket, also assign a `T-{BUCKET}-AI-{NNN}` ID.

113

3. Cross-reference both IDs in threat tables so each threat is traceable across both plans.

114

4. Bucket names match Security Planner operational buckets: DATA, BUILD, WEBUI, IDENTITY, INFRA.

115

5. The `T-RAI-{NNN}` sequence is independent of the `T-{BUCKET}-AI-{NNN}` sequence within each bucket.

### Example

A training data poisoning threat might carry:

120

121

* RAI ID: `T-RAI-003`

122

* Security cross-reference: `T-DATA-AI-001`

123

124

Both IDs appear in the extended threat table, linking the RAI assessment to the security plan's data bucket analysis.

125

126

## Extended Threat Table Format

127

128

The threat table extends the Security Planner format with five additional columns: RAI ID, NIST Characteristic, NIST AI RMF, Suggested Threat Origin, and Concern Level.

129

130

```markdown

131

| Threat ID     | RAI ID    | STRIDE    | NIST Characteristic            | NIST AI RMF | Description                                          | AI Element          | Trust Boundary         | Suggested Threat Origin | Concern Level | Mitigation                                                    |

132

|---------------|-----------|-----------|--------------------------------|-------------|------------------------------------------------------|---------------------|------------------------|-------------------------|---------------|---------------------------------------------------------------|

133

| T-DATA-AI-001 | T-RAI-003 | Tampering | Fair with Harmful Bias Managed | Map 2.3     | Training data poisoning introducing demographic bias | Training Data Store | Training Data Boundary | Data Pipeline           | High Concern  | Data validation pipeline, bias detection, provenance tracking |

134

```

135

136

### Column Definitions

137

138

* Threat ID: Security Planner cross-reference ID (`T-{BUCKET}-AI-{NNN}`), or blank if no bucket overlap exists.

139

* RAI ID: Sequential RAI threat identifier (`T-RAI-{NNN}`).

140

* STRIDE: Applicable STRIDE category.

141

* NIST Characteristic: Primary NIST trustworthiness characteristic affected (Valid and Reliable, Safe, Secure and Resilient, Accountable and Transparent, Explainable and Interpretable, Privacy-Enhanced, Fair with Harmful Bias Managed).

142

* NIST AI RMF: Applicable NIST AI RMF subcategory reference.

143

* Description: Clear description of the threat, attack vector, and affected behavior.

144

* AI Element: Element type from the AI Element Types table.

145

* Trust Boundary: Boundary crossed or affected from the AI Trust Boundaries table.

146

* Suggested Threat Origin: Where the threat originates (Data Pipeline, Model, Interface, Infrastructure, or Cross-cutting).

147

* Concern Level: Qualitative assessment of threat significance (Low Concern, Moderate Concern, or High Concern). See Concern Level Assessment below for criteria.

148

* Mitigation: Proposed mitigation strategy with standards references.

149

150

### Concern Level Assessment

151

152

Suggest a qualitative concern level for each identified threat based on contextual judgment:

153

154

| Concern Level | Criteria |

155

|------------------|-----------------------------------------------------------------------------------------|

156

| Low Concern | Threat is theoretical or mitigated by existing controls; no immediate action suggested. |

157

| Moderate Concern | Threat is plausible and partially mitigated; additional controls recommended. |

158

| High Concern | Threat is likely or unmitigated; priority mitigation suggested. |

159

160

The concern level is a suggested assessment for the team's consideration, not a definitive risk rating.

161

162

### Threat Origin Grouping

163

164

After populating the threat table, present a summary grouped by Suggested Threat Origin. This helps the team identify which system components carry the most threats and prioritize architectural mitigations. Present AI-specific threats (Data Pipeline, Model) first, then Interface threats, then Infrastructure and Cross-cutting threats.

165

166

### Output Detail Level

167

168

Adjust threat table column visibility based on `userPreferences.outputDetailLevel`:

169

170

| Level         | Visible Columns                                                                                                                           |

171

|---------------|-------------------------------------------------------------------------------------------------------------------------------------------|

172

| summary       | RAI ID, STRIDE, Concern Level, Suggested Threat Origin.                                                                                   |

173

| standard      | All columns (default).                                                                                                                    |

174

| comprehensive | All columns plus a "Detailed Rationale" column with per-threat analysis explaining the concern level assignment and mitigation reasoning. |

175

176

### Audience Adaptation

177

178

Adjust ML STRIDE matrix presentation based on `userPreferences.audienceProfile`:

179

180

| Profile | Presentation |

181

|-----------|-------------------------------------------------------------------------------------------------|

182

| technical | Include the full ML STRIDE matrix. |

183

| executive | Summarize ML-specific threats in narrative prose; omit the matrix. |

184

| mixed | Include the matrix with regulatory cross-references and contextual notes for diverse audiences. |

185

186

## ML STRIDE Matrix

187

188

Extended matrix covering AI system components with NIST trustworthiness characteristic annotations. Each cell contains threat applicability (High/Medium/Low/N/A) and the primary NIST characteristic relevant to that intersection.

189

190

> [!NOTE]

191

> The STRIDE categories in this matrix (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) correspond to the AI-extended definitions in the AI STRIDE Extensions table above. Refer to that table for AI-specific threat examples and NIST characteristic overlays for each category.

192

193

| Component        | Spoofing                                | Tampering                             | Repudiation                            | Info Disclosure                        | DoS                         | EoP                                     |

194

|------------------|-----------------------------------------|---------------------------------------|----------------------------------------|----------------------------------------|-----------------------------|-----------------------------------------|

195

| Training Data    | Medium / Valid and Reliable             | High / Fair with Harmful Bias Managed | Medium / Accountable and Transparent   | High / Privacy-Enhanced                | Low / Valid and Reliable    | Low / Privacy-Enhanced                  |

196

| Feature Pipeline | Low / Explainable and Interpretable     | High / Fair with Harmful Bias Managed | Medium / Accountable and Transparent   | Medium / Privacy-Enhanced              | Low / Valid and Reliable    | Low / Fair with Harmful Bias Managed    |

197

| Model Training   | Medium / Valid and Reliable             | High / Fair with Harmful Bias Managed | High / Accountable and Transparent     | High / Privacy-Enhanced                | Medium / Valid and Reliable | Medium / Valid and Reliable             |

198

| Model Serving    | High / Valid and Reliable               | Medium / Valid and Reliable           | Medium / Explainable and Interpretable | High / Privacy-Enhanced                | High / Valid and Reliable   | High / Valid and Reliable               |

199

| Inference API    | High / Valid and Reliable               | High / Valid and Reliable             | Medium / Explainable and Interpretable | Medium / Privacy-Enhanced              | High / Valid and Reliable   | High / Privacy-Enhanced                 |

200

| Feedback Loop    | Medium / Fair with Harmful Bias Managed | High / Fair with Harmful Bias Managed | High / Accountable and Transparent     | Medium / Privacy-Enhanced              | Low / Valid and Reliable    | Medium / Fair with Harmful Bias Managed |

201

| Human Review     | Low / Accountable and Transparent       | Medium / Accountable and Transparent  | High / Accountable and Transparent     | Low / Privacy-Enhanced                 | N/A                         | Medium / Accountable and Transparent    |

202

| Model Monitoring | Low / Explainable and Interpretable     | Medium / Valid and Reliable           | High / Explainable and Interpretable   | Medium / Explainable and Interpretable | Medium / Valid and Reliable | Low / Valid and Reliable                |

203

204

### Reading the Matrix

205

206

Each cell uses the format `Applicability / NIST Characteristic`:

207

208

* Applicability indicates how likely the STRIDE category applies to the component (High, Medium, Low, N/A).

209

* NIST Characteristic identifies which NIST trustworthiness characteristic is most relevant for that specific threat intersection.

210

* Use this matrix as a starting point for threat identification. Investigate all High-applicability cells first, then Medium, then Low. N/A cells can be skipped unless the system architecture suggests otherwise.

## Merge Protocol

When a Security Planner assessment already exists (`from-security-plan` entry mode), the merge protocol prevents duplication and ensures consistent cross-referencing between security and RAI security models.

### Steps

1. Read the existing security plan security model from the path in `state.json` `securityPlanRef`.

219

2. Extract the highest `T-{BUCKET}-AI-{NNN}` ID for each bucket to establish cross-reference continuity.

220

3. Start new RAI threat IDs at `T-RAI-001` (independent sequence from the security plan).

221

4. For overlapping threats (threats already identified in the security plan that also have RAI dimensions), cross-reference using dual IDs rather than duplicating the threat entry.

222

5. Produce an addendum document (`rai-threat-addendum.md`) with a merge header identifying the source security plan.

223

6. Use the extended threat table format with both ID columns to maintain traceability.

224

7. Include a cross-reference section listing security `T-{BUCKET}-AI-{NNN}` IDs and their RAI `T-RAI-{NNN}` counterparts.

225

226

### Addendum Header Template

227

228

```markdown

229

## RAI Security Model Addendum

230

231

- Source security plan: {path}

232

- Security plan date: {date}

233

- Highest existing security threat ID: T-{BUCKET}-{NNN}

234

- RAI threat ID range: T-RAI-001 through T-RAI-{NNN}

235

```

236

237

### Cross-Reference Section Template

238

239

```markdown

240

## Security Plan Cross-Reference

241

242

243

|--------------------|---------------|-------------------------|---------------------------------------------------|

```

## AI Threat Concentration by Bucket

248

249

Expected threat density per operational bucket when analyzing AI systems. Use these estimates for planning and to validate coverage completeness.

250

251

| Bucket         | Expected AI Threat Count | Key Concern Areas                                                            |

252

|----------------|--------------------------|------------------------------------------------------------------------------|

253

| Data           | 9                        | Training data poisoning, bias injection, privacy violations, data provenance |

254

| Build          | 5                        | Model supply chain, training integrity, pipeline security                    |

255

| Web/UI         | 6                        | Adversarial inputs, prompt injection, output manipulation                    |

256

| Identity       | 3                        | Model impersonation, unauthorized access, credential compromise              |

257

| Infrastructure | 2                        | Resource exhaustion, compute hijacking                                       |

258

259

> [!NOTE]

260

> Actual threat counts vary based on system architecture and AI component complexity. These estimates provide a baseline for coverage validation. If analysis produces significantly fewer threats in a bucket, revisit the analysis for gaps.

261

262

## Artifact Templates

263

264

### RAI Threat Addendum Template

265

266

Template for `rai-threat-addendum.md` produced during Phase 4.

```markdown

---

title: RAI Security Model Addendum

271

description: RAI-specific threat analysis extending security plan security model

272

---

273

274

## RAI Security Model Addendum

275

276

- Source security plan: {path or "standalone"}

277

- Security plan date: {date or "N/A"}

278

- Highest existing security threat ID: {ID or "N/A"}

279

- RAI threat ID range: T-RAI-001 through T-RAI-{NNN}

280

281

## Extended Threat Table

282

283

| Threat ID | RAI ID    | STRIDE | NIST Characteristic | NIST AI RMF | Description | AI Element | Trust Boundary | Suggested Threat Origin | Concern Level | Mitigation |

284

|-----------|-----------|--------|---------------------|-------------|-------------|------------|----------------|-------------------------|---------------|------------|

285

|           | T-RAI-001 |        |                     |             |             |            |                |                         |               |            |

## Cross-Reference

|--------------------|---------------|-------------|--------------|

291

| | | | |

292

293

## Threat Concentration Summary