microsoft/hve-core

Public

mirrored fromhttps://github.com/microsoft/hve-coreAvailable

Watch0 Fork0 Star0

Code Commits Issues Pull requests Actions Insights Security

hve-core-v3.0.1

Find a branch or tag

Branches

hve-core-v3.0.1

Clone

HTTPS

Download ZIP

hve-core/docs/rpi

docs/rpi/context-engineering.md

152lines · modecode

Raw Download

Latest commit unavailable.

unknown

1	`---`
2	`title: Context Engineering — Why AI Context Management Matters`
3	`description: Understanding LLM recency bias, context windows, and why /clear is an engineering practice, not just a step`
4	`author: Microsoft`
5	`ms.date: 2026-02-19`
6	`ms.topic: concept`
7	`keywords:`
8	`- context engineering`
9	`- recency bias`
10	`- context window`
11	`- clear context`
12	`- compact`
13	`- rpi-agent`
14	`- phase skipping`
15	`estimated_reading_time: 7`
16	`---`
17
18	You invoke `/rpi` to add a new feature. The agent researches your codebase, builds a plan, implements methodically, and produces clean, well-structured code. You're impressed. Then, in the same conversation, you ask for a second feature: "Now add input validation to the API endpoint."
19
20	`The agent skips research entirely, skips planning, and jumps straight to writing code. The output compiles. Tests pass. But the validation logic misses three edge cases that a research phase would have uncovered, ignores the validation patterns already established in your codebase, and introduces a naming convention that contradicts every other validator in the project.`
21
22	`It looks right but produces shallow work. The problem isn't the AI's capability. The problem is what the AI can _see_.`
23
24	`## The Root Cause: LLM Recency Bias`
25
26	`Large language models process conversations as sequences of tokens with limited attention. Every message you send and every response you receive becomes part of that sequence, competing for the model's focus.`
27
28	`At the start of a conversation, system prompt instructions occupy roughly 3K tokens. The model follows them closely because they represent most of what it can see. After a full RPI cycle, the conversation has grown to 50K, 100K, or even 200K tokens of implementation output: code, file contents, tool results, and validation logs.`
29
30	`Those 3K tokens of instructions now compete against 50K+ tokens of recent implementation context. The model doesn't forget the instructions. It deprioritizes them because more recent tokens receive disproportionate attention weight.`
31
32	`The result is predictable. After completing one implementation cycle, the dominant pattern in the conversation is "implement and validate." When you make a new request, the model pattern-matches to that dominant behavior rather than re-reading the phase ordering instructions that tell it to start with research.`
33
34	`> [!WARNING]`
35	`> A concrete failure sequence:`
36	`>`
37	> 1. First `/rpi` request works correctly, executing all 5 phases in order
38	`> 2. Conversation grows to 50K+ tokens with implementation output, file contents, and tool results`
39	> 3. Second `/rpi` request skips directly to Phase 3 (implementation), producing shallow output that misses edge cases
40
41	`## What Context Engineering Is`
42
43	`Context engineering is the practice of deliberately managing what information an AI model can see when processing a request. Instead of treating the conversation as a growing log that the model will "figure out," you treat context as a finite resource that requires active management.`
44
45	`Four concepts define the discipline:`
46
47	`* Context window: the total token capacity a model considers when generating a response. Current models range from 128K to 200K tokens, but performance degrades well before the limit.`
48	`* Token budget: how those tokens distribute between system prompt instructions, conversation history, and tool outputs. A 200K context window doesn't mean 200K tokens of useful capacity. System prompts, conversation scaffolding, and tool metadata all consume tokens before your actual content arrives.`
49	`* Conversation length degradation: instruction adherence drops as conversations grow. A 3K system prompt that dominates a 10K conversation (30% of tokens) becomes background noise in a 200K conversation (1.5% of tokens).`
50	`* The gap between "using AI tools" and "engineering with AI tools": using AI means typing requests and accepting outputs. Engineering with AI means controlling the inputs, managing the context, and understanding how the model's behavior changes as conversations evolve.`
51
52	`## Why /clear Works`
53
54	`/clear` removes competing signals. The mechanism is straightforward:
55
56	`* It eliminates the 50K to 200K tokens of accumulated implementation context that cause recency bias.`
57	`* It restores the token ratio so that system prompt instructions dominate the model's attention again.`
58	`* Each phase gets a clean context where its specific instructions receive full attention weight.`
59	`* Artifacts (research documents, plans, implementation logs) carry context through files on disk, not through chat history.`
60
61	Starting a new chat achieves the same result through a different mechanism. Both approaches reset the token ratio. `/clear` keeps you in the same editor window. A new chat creates a fresh session. The outcome is identical: the model sees instructions clearly because nothing competes for attention.
62
63	`## Restoring Context After /clear`
64
65	`/clear` removes chat history, but agents still need the artifacts from prior phases. Those artifacts live in `.copilot-tracking/` (gitignored), not in chat history, so they survive the clear. You need to bring them back into the agent's view.
66
67	`Two mechanisms work reliably:`
68
69	`* Open the file in the editor before invoking the next agent. Copilot Chat reads files visible in the active editor tab.`
70	`* Reference the file path explicitly in your prompt message so the agent knows where to look.`
71
72	`### What to Open at Each Transition`
73
74	`\| Transition \| Open or Reference \|`
75	`\|-------------------------\|-------------------------------------------------------------------------\|`
76	\| Research → Plan \| `.copilot-tracking/research/<topic>-research.md` \|
77	\| Plan → Implement \| `.copilot-tracking/plans/<topic>-plan.instructions.md` \|
78	\| Implement → Review \| `.copilot-tracking/changes/<topic>-changes.md` (plan and research help) \|
79	\| Review → Rework/Iterate \| `.copilot-tracking/reviews/<topic>-review.md` \|
80
81	The `/task-*` prompts attempt to auto-discover recent artifacts in `.copilot-tracking/`, but opening the file in the editor is more reliable, especially when multiple artifacts exist for different topics.
82
83	`> [!TIP]`
84	> For longer workflows spanning multiple sessions, use the memory agent to persist working state (file paths, decisions, progress) and the `/checkpoint` prompt to save and restore session context.
85
86	`## The /compact Alternative`
87
88	`/compact` takes a different approach. Instead of removing conversation history entirely, it summarizes the history into a condensed form that preserves key context while reducing the token count.
89
90	When to use `/compact`:
91
92	`* Mid-phase, when a conversation grows long but you need to continue the current task`
93	`* When you want to retain awareness of prior decisions without carrying the full token weight`
94	`* When handoff buttons between phases embed transition context into the summary prompt`
95
96	When to use `/clear` instead:
97
98	`* Between phases, where each phase benefits from clean context`
99	`* When switching to a different task entirely`
100	`* When agent behavior has visibly degraded`
101
102	The tradeoff is precision. `/compact` summaries lose detail because the model decides what to keep and what to discard. Critical nuances from earlier in the conversation may not survive the summarization.
103
104	`\| Command \| Effect \| Use When \|`
105	`\|------------\|------------------------------------\|--------------------------------------\|`
106	\| `/clear` \| Removes all conversation history \| Between phases, switching tasks \|
107	\| `/compact` \| Summarizes history, reduces tokens \| Mid-phase, conversation growing long \|
108	`\| New chat \| Fresh conversation, new context \| Starting unrelated work \|`
109
110	`## The rpi-agent Difference`
111
112	`rpi-agent runs all five phases in a single conversation. This design choice prioritizes convenience: one invocation handles everything. It also creates a specific vulnerability to context degradation.`
113
114	With strict RPI, mandatory `/clear` commands between phases prevent token accumulation. Each phase starts fresh. The research agent never sees implementation tokens. The implementation agent never sees research exploration tokens.
115
116	`With rpi-agent, tokens accumulate across all phases within one session. The first request works well because the conversation is short and instructions dominate. Subsequent requests in the same session face the full recency bias effect: 50K+ tokens of prior work competing against 3K tokens of phase ordering instructions.`
117
118	`The phase ordering instruction is advisory. It exists as prose in the agent's system prompt, not as a programmatic constraint. When recency bias shifts the model's attention toward recent implementation patterns, the advisory instruction loses its influence.`
119
120	`> [!TIP]`
121	> Use `/clear` or `/compact` before making a second `/rpi` request in the same conversation.
122
123	`## Recognizing Context Degradation`
124
125	`Context degradation produces observable symptoms. Catching them early prevents wasted effort.`
126
127	`* The agent skips phases. It jumps from your request directly to writing code, bypassing research and planning entirely.`
128	`* The agent ignores explicit instructions from its system prompt. Phase ordering, formatting rules, or convention requirements disappear from the output.`
129	`* Output quality drops. Analysis becomes shallow, edge cases go unaddressed, and the agent repeats the same patterns instead of investigating alternatives.`
130	`* The agent echoes earlier conversation patterns. Instead of following new instructions for a new task, it reproduces the structure and approach of the previous task.`
131
132	`## Common Pitfalls`
133
134	`\| Pitfall \| What Happens \| Solution \|`
135	`\|----------------------------------------\|-------------------------------------------\|---------------------------------------------\|`
136	\| Multiple `/rpi` calls without clearing \| Recency bias causes phase skipping \| Use `/clear` before each new `/rpi` request \|
137	\| Long accumulated sessions \| Token budget consumed by history \| Use `/compact` or start a new chat \|
138	\| Mixing unrelated tasks \| Cross-contamination between task contexts \| Use `/clear` between different tasks \|
139	`\| Ignoring degradation signs \| Progressively worse output quality \| Recognize the signs and clear context \|`
140
141	`## Next Steps`
142
143	`* [Why RPI?](why-rpi.md): the psychology behind phase separation`
144	`* [RPI Overview](README.md): complete workflow guide`
145	`* [Using Tasks Together](using-together.md): phase transitions and handoffs`
146
147	`---`
148
149	`<!-- markdownlint-disable MD036 -->`
150	`_🤖 Crafted with precision by ✨Copilot following brilliant human instruction,`
151	`then carefully refined by our team of discerning human reviewers._`
152	`<!-- markdownlint-enable MD036 -->`
153

microsoft/hve-core

Branches

Tags

Clone