microsoft/hve-core

Public

mirrored fromhttps://github.com/microsoft/hve-coreAvailable

CodeCommitsIssuesPull requestsActionsInsightsSecurity
hve-core-v3.0.1

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

docs/rpi/context-engineering.md

152lines · modecode

1---
2title: Context Engineering — Why AI Context Management Matters
3description: Understanding LLM recency bias, context windows, and why /clear is an engineering practice, not just a step
4author: Microsoft
5ms.date: 2026-02-19
6ms.topic: concept
7keywords:
8 - context engineering
9 - recency bias
10 - context window
11 - clear context
12 - compact
13 - rpi-agent
14 - phase skipping
15estimated_reading_time: 7
16---
17
18You invoke `/rpi` to add a new feature. The agent researches your codebase, builds a plan, implements methodically, and produces clean, well-structured code. You're impressed. Then, in the same conversation, you ask for a second feature: "Now add input validation to the API endpoint."
19
20The agent skips research entirely, skips planning, and jumps straight to writing code. The output compiles. Tests pass. But the validation logic misses three edge cases that a research phase would have uncovered, ignores the validation patterns already established in your codebase, and introduces a naming convention that contradicts every other validator in the project.
21
22It looks right but produces shallow work. The problem isn't the AI's capability. The problem is what the AI can _see_.
23
24## The Root Cause: LLM Recency Bias
25
26Large language models process conversations as sequences of tokens with limited attention. Every message you send and every response you receive becomes part of that sequence, competing for the model's focus.
27
28At the start of a conversation, system prompt instructions occupy roughly 3K tokens. The model follows them closely because they represent most of what it can see. After a full RPI cycle, the conversation has grown to 50K, 100K, or even 200K tokens of implementation output: code, file contents, tool results, and validation logs.
29
30Those 3K tokens of instructions now compete against 50K+ tokens of recent implementation context. The model doesn't forget the instructions. It deprioritizes them because more recent tokens receive disproportionate attention weight.
31
32The result is predictable. After completing one implementation cycle, the dominant pattern in the conversation is "implement and validate." When you make a new request, the model pattern-matches to that dominant behavior rather than re-reading the phase ordering instructions that tell it to start with research.
33
34> [!WARNING]
35> A concrete failure sequence:
36>
37> 1. First `/rpi` request works correctly, executing all 5 phases in order
38> 2. Conversation grows to 50K+ tokens with implementation output, file contents, and tool results
39> 3. Second `/rpi` request skips directly to Phase 3 (implementation), producing shallow output that misses edge cases
40
41## What Context Engineering Is
42
43Context engineering is the practice of deliberately managing what information an AI model can see when processing a request. Instead of treating the conversation as a growing log that the model will "figure out," you treat context as a finite resource that requires active management.
44
45Four concepts define the discipline:
46
47* Context window: the total token capacity a model considers when generating a response. Current models range from 128K to 200K tokens, but performance degrades well before the limit.
48* Token budget: how those tokens distribute between system prompt instructions, conversation history, and tool outputs. A 200K context window doesn't mean 200K tokens of useful capacity. System prompts, conversation scaffolding, and tool metadata all consume tokens before your actual content arrives.
49* Conversation length degradation: instruction adherence drops as conversations grow. A 3K system prompt that dominates a 10K conversation (30% of tokens) becomes background noise in a 200K conversation (1.5% of tokens).
50* The gap between "using AI tools" and "engineering with AI tools": using AI means typing requests and accepting outputs. Engineering with AI means controlling the inputs, managing the context, and understanding how the model's behavior changes as conversations evolve.
51
52## Why /clear Works
53
54`/clear` removes competing signals. The mechanism is straightforward:
55
56* It eliminates the 50K to 200K tokens of accumulated implementation context that cause recency bias.
57* It restores the token ratio so that system prompt instructions dominate the model's attention again.
58* Each phase gets a clean context where its specific instructions receive full attention weight.
59* Artifacts (research documents, plans, implementation logs) carry context through files on disk, not through chat history.
60
61Starting a new chat achieves the same result through a different mechanism. Both approaches reset the token ratio. `/clear` keeps you in the same editor window. A new chat creates a fresh session. The outcome is identical: the model sees instructions clearly because nothing competes for attention.
62
63## Restoring Context After /clear
64
65`/clear` removes chat history, but agents still need the artifacts from prior phases. Those artifacts live in `.copilot-tracking/` (gitignored), not in chat history, so they survive the clear. You need to bring them back into the agent's view.
66
67Two mechanisms work reliably:
68
69* Open the file in the editor before invoking the next agent. Copilot Chat reads files visible in the active editor tab.
70* Reference the file path explicitly in your prompt message so the agent knows where to look.
71
72### What to Open at Each Transition
73
74| Transition | Open or Reference |
75|-------------------------|-------------------------------------------------------------------------|
76| Research → Plan | `.copilot-tracking/research/<topic>-research.md` |
77| Plan → Implement | `.copilot-tracking/plans/<topic>-plan.instructions.md` |
78| Implement → Review | `.copilot-tracking/changes/<topic>-changes.md` (plan and research help) |
79| Review → Rework/Iterate | `.copilot-tracking/reviews/<topic>-review.md` |
80
81The `/task-*` prompts attempt to auto-discover recent artifacts in `.copilot-tracking/`, but opening the file in the editor is more reliable, especially when multiple artifacts exist for different topics.
82
83> [!TIP]
84> For longer workflows spanning multiple sessions, use the **memory** agent to persist working state (file paths, decisions, progress) and the `/checkpoint` prompt to save and restore session context.
85
86## The /compact Alternative
87
88`/compact` takes a different approach. Instead of removing conversation history entirely, it summarizes the history into a condensed form that preserves key context while reducing the token count.
89
90When to use `/compact`:
91
92* Mid-phase, when a conversation grows long but you need to continue the current task
93* When you want to retain awareness of prior decisions without carrying the full token weight
94* When handoff buttons between phases embed transition context into the summary prompt
95
96When to use `/clear` instead:
97
98* Between phases, where each phase benefits from clean context
99* When switching to a different task entirely
100* When agent behavior has visibly degraded
101
102The tradeoff is precision. `/compact` summaries lose detail because the model decides what to keep and what to discard. Critical nuances from earlier in the conversation may not survive the summarization.
103
104| Command | Effect | Use When |
105|------------|------------------------------------|--------------------------------------|
106| `/clear` | Removes all conversation history | Between phases, switching tasks |
107| `/compact` | Summarizes history, reduces tokens | Mid-phase, conversation growing long |
108| New chat | Fresh conversation, new context | Starting unrelated work |
109
110## The rpi-agent Difference
111
112rpi-agent runs all five phases in a single conversation. This design choice prioritizes convenience: one invocation handles everything. It also creates a specific vulnerability to context degradation.
113
114With strict RPI, mandatory `/clear` commands between phases prevent token accumulation. Each phase starts fresh. The research agent never sees implementation tokens. The implementation agent never sees research exploration tokens.
115
116With rpi-agent, tokens accumulate across all phases within one session. The first request works well because the conversation is short and instructions dominate. Subsequent requests in the same session face the full recency bias effect: 50K+ tokens of prior work competing against 3K tokens of phase ordering instructions.
117
118The phase ordering instruction is advisory. It exists as prose in the agent's system prompt, not as a programmatic constraint. When recency bias shifts the model's attention toward recent implementation patterns, the advisory instruction loses its influence.
119
120> [!TIP]
121> Use `/clear` or `/compact` before making a second `/rpi` request in the same conversation.
122
123## Recognizing Context Degradation
124
125Context degradation produces observable symptoms. Catching them early prevents wasted effort.
126
127* The agent skips phases. It jumps from your request directly to writing code, bypassing research and planning entirely.
128* The agent ignores explicit instructions from its system prompt. Phase ordering, formatting rules, or convention requirements disappear from the output.
129* Output quality drops. Analysis becomes shallow, edge cases go unaddressed, and the agent repeats the same patterns instead of investigating alternatives.
130* The agent echoes earlier conversation patterns. Instead of following new instructions for a new task, it reproduces the structure and approach of the previous task.
131
132## Common Pitfalls
133
134| Pitfall | What Happens | Solution |
135|----------------------------------------|-------------------------------------------|---------------------------------------------|
136| Multiple `/rpi` calls without clearing | Recency bias causes phase skipping | Use `/clear` before each new `/rpi` request |
137| Long accumulated sessions | Token budget consumed by history | Use `/compact` or start a new chat |
138| Mixing unrelated tasks | Cross-contamination between task contexts | Use `/clear` between different tasks |
139| Ignoring degradation signs | Progressively worse output quality | Recognize the signs and clear context |
140
141## Next Steps
142
143* [Why RPI?](why-rpi.md): the psychology behind phase separation
144* [RPI Overview](README.md): complete workflow guide
145* [Using Tasks Together](using-together.md): phase transitions and handoffs
146
147---
148
149<!-- markdownlint-disable MD036 -->
150_🤖 Crafted with precision by ✨Copilot following brilliant human instruction,
151then carefully refined by our team of discerning human reviewers._
152<!-- markdownlint-enable MD036 -->
153