Plan-Validated
Agent Execution

Self-checking AI agents with organization policy enforcement

Architecture Overview — April 2026

Works with Claude Code, GitHub Copilot, and Cursor

The Problem

AI coding assistants operate with unrestricted access to your codebase.

No boundaries
Can read any file — including .env, credentials, secrets
No oversight
Can run any command — curl, ssh, rm -rf, python -c
No audit trail
No record of what the agent did or why
No policy
Organization has no way to enforce rules on agent behavior

The Solution: Plan → Validate → Execute → Verify

AI declares
a plan
Plan is
validated
Each tool call
checked
Outcomes
verified
Everything
logged

A validation server sits between the AI and your code. No client modifications needed.

Architecture

AI Client

Unchanged — Claude Code, Copilot, or Cursor communicates via standard MCP protocol

Validation Server

  • 15 MCP tools (planning + validated proxies)
  • Every call passes through withValidation
  • Five enforcement layers checked in sequence

Enforcement Layers

1. Plan Step Check
Correct tool? Input constraints? Dependencies?
2. Org Policy Check
Tool allowed? Path allowed? Bash command allowed?
3. Permission Check
Within plan's declared read/write paths?
4. Execute
Direct, capability tool, or container sandbox
5. Record Trace
Hash-chained audit entry with timing

Organization Policy

You define the rules. Not the AI.

Manager says:

"Agents can read and write code in src/ and test/. They cannot access credentials, run curl, or open network connections. Limit to 30 steps."

Policy file:

{
  "paths": {
    "allowedWritePatterns": ["./src/**", "./test/**"],
    "deniedPatterns": ["**/.env", "**/secrets/**"]
  },
  "bash": { "mode": "capabilities-only" },
  "deniedTools": ["WebFetch", "WebSearch"],
  "limitCaps": { "maxTotalSteps": 30 }
}

JSON file, checked into your repo, version-controlled, auditable.

What Gets Blocked

Agent attempts Result Why
validated_read(.env) BLOCKED Matches denied pattern **/.env
validated_bash("curl ...") BLOCKED Bash restricted to capabilities-only mode
validated_write(/etc/crontab) BLOCKED Path not in allowed write paths
Calls Read when plan says Edit ABORTED Tool doesn't match plan step
Plan with 100 steps REJECTED Exceeds policy cap of 30
validated_npm("test") ALLOWED Structured tool, no shell injection

Bash: Three Tiers of Defense

Tier 1: Capability Tools

validated_npm, validated_git, validated_node, validated_tsc

spawnSync with array args — no shell, no injection

Tier 2: Policy-Checked Bash

Command parsing, allow/deny lists, regex patterns, network detection

Catches accidental violations

Tier 3: Container Sandbox

Docker with --network=none, read-only filesystem, resource limits

Kernel enforces what string parsing can't

Key insight: python -c "import socket; ..." bypasses string analysis. Only the kernel (container) can stop it.

Container Sandbox

Kernel-level enforcement

  • --network=none — all sockets blocked at syscall level
  • --read-only — filesystem immutable
  • --memory=512m — OOM killed if exceeded
  • --pids-limit=100 — fork bomb protection
  • Volumes derived from policy paths automatically

Device access (opt-in)

  • --gpus all for ML training
  • --device /dev/video0 for camera
  • Port publishing from policy: -p 8080:8080
{
  "container": {
    "enabled": true,
    "image": "python:3.12-slim",
    "networkMode": "none",
    "readOnly": true,
    "memoryLimit": "512m",
    "deriveVolumesFromPolicy": true,
    "devices": {
      "gpu": true
    }
  }
}

Tamper-Proof Audit Trail

Every action recorded as a hash-chained entry. Tampering breaks the chain.

Entry 0
Read
hash: a3f2...→
Entry 1
Edit
←a3f2 hash: 7b1e...→
Entry 2
Edit
←7b1e hash: c9d4...→
Verification
chainValid: true
All hashes recomputed ✓

Each entry records:

Step index, tool, input parameters, output (truncated), duration, status, SHA-256 hash

Aggregate metrics: total steps, duration, success/fail counts

Adoption: One Command

npx mcp-plan-validation init --client claude --policy dev
Claude Code
--client claude
.claude/settings.local.json
CLAUDE.md
GitHub Copilot
--client copilot
.vscode/mcp.json
.github/copilot-instructions.md
Cursor
--client cursor
.cursor/mcp.json
.cursorrules

--client all configures all three simultaneously. Policy file is shared.

Auto-detects from existing config directories when --client is omitted.

Policy Templates

Template Bash Network Container Step Limit Best For
strict Capability tools only Blocked No 30 Code review, docs
dev Policy-checked Dev ports Optional 50 Daily development
ml Policy-checked Blocked GPU, 16GB 100 ML training
ci Limited commands Blocked Read-only 20 CI/CD pipelines

Each template is a starting point. Customize by editing .plan-validation-policy.json.

Test Coverage

Unit Tests (111 assertions)

  • Bash command parsing (11 cases)
  • Command allow/deny/regex (14 cases)
  • Path policy with glob matching (8 cases)
  • Tool policy enforcement (6 cases)
  • Postcondition evaluation (26 cases)
  • Path resolution (6 cases)
  • Plan permissions (6 cases)
  • Capability tools + modes (9 cases)
  • Container volume/port/device derivation (25 cases)

Integration Tests (5 scenarios)

  • Happy path: plan → execute → verify
  • Block detection: wrong tool caught
  • Input constraints: mismatched inputs
  • Policy enforcement: curl blocked at runtime
  • Postconditions: verified after completion

Integration tests use the Anthropic Agent SDK to drive Claude Code through the full pipeline.

Known Limitations

Limitation Impact Mitigation
Bash analysis is undecidable python -c bypasses string checks Container sandbox (kernel enforcement)
No semantic predicates Can't verify "function exists" File-content predicates; AST parsing planned
Container needs Docker Not everywhere Capability tools work without Docker
Soft enforcement Model could use raw tools if exposed Configure client to only expose MCP tools
Trace output truncated Large outputs cut to 500 chars Sufficient for audit; full data in bindings

Summary

Question Answer
Can we restrict agents? Yes — org policy controls tools, paths, commands, network, limits
Prevent credential exposure? Yes — denied patterns make sensitive files invisible
Audit trail? Yes — hash-chained trace with integrity verification
Works with existing tools? Yes — Claude Code, Copilot, Cursor with one command
Who controls policy? Your team — JSON file in the repo, not the AI
What if AI breaks rules? Blocked in real-time, logged, plan violations abort
How hard to adopt? npx mcp-plan-validation init --policy dev