Context engineering
Learn how to give an AI system the right context, remove noise, and keep long-running work from drifting.
Context engineering means giving an AI system enough useful information to complete a task without drift. The prompt matters, but the bigger lever is the working memory around it: source material, constraints, examples, tools, history, and decisions.
If the model lacks facts, sees too much noise, or receives conflicting instructions, the answer fails even when the model is strong. Treat context as the work.
Start with the RAM model
Andrej Karpathy's operating system analogy is useful: the model acts like a CPU, and the context window acts like RAM. The context window is the model's working memory for the next step.
That memory has limits. Larger windows help, but they do not remove the need to choose. Benchmarks such as NoLiMa found that model accuracy can drop when long context buries the useful information.
Three problems cause most failures:
- Too little context: the model has to guess.
- Too much context: the useful signal gets diluted.
- Wrong context: the model follows bad or irrelevant information.
Use a simple rule: include information that can change the answer. Leave the rest outside the window.
Build a context packet
Before you ask an agent or model to work, assemble the context as a packet. This makes the task easier to understand and easier to debug.
- Name the task. State the outcome, audience, and definition of done.
- Set the constraints. Include tone, scope, format, compliance rules, and anything the model should avoid.
- Add source material. Give the model the documents, excerpts, data, or examples it needs. Do not dump a whole archive if one section answers the task.
- Show examples. Add 2 to 5 examples of the output you want. Examples beat long explanations for format, judgment, and style.
- Preserve state. Include prior decisions, open questions, assumptions, and anything the model should ignore from earlier work.
- Define tools. Tell the model which tools it can use, when to use them, and what each tool returns.
- Specify the output. Ask for the exact sections, fields, schema, or level of detail you need.
A simple packet can look like this:
<task>
Rewrite this onboarding email for finance buyers.
</task>
<constraints>
- Keep it under 200 words.
- Use a calm, direct tone.
- Do not promise outcomes we cannot prove.
</constraints>
<context>
- Buyer: VP of Finance at a 300-person B2B SaaS company.
- Goal: explain why the new workflow saves review time.
- Source: paste the product notes or link to the relevant file.
</context>
<examples>
Paste 2 or 3 examples that match the quality bar.
</examples>
<output>
Return 1 final email and 3 subject lines.
</output>The format matters less than the boundary. Markdown headings, XML-style tags, or structured JSON can all work if the pattern stays clear.
Put the important parts where the model can see them
Models tend to pay more attention to the beginning and end of a long context. The middle can lose signal.
Place context in this order:
- Task and constraints first. The model should know what success means before it reads supporting material.
- Sources next. Add the documents, excerpts, data, and links that ground the answer.
- Examples after rules. Use examples to show what good looks like.
- Final checks at the end. Restate the critical constraints before the model answers.
For complex work, ask the model to make a short plan, list assumptions, or check the answer against your constraints before it finishes. This helps the model catch missing context, but it adds cost and latency. Use it when the task needs judgment.
Use 4 operating moves
Strong systems combine 4 context moves: write, select, compress, and isolate.
Write: store memory outside the window
Save important context somewhere persistent so the model can retrieve it when needed.
Use external memory for:
- Working notes and scratchpads.
- Decision logs.
- Source files.
- URLs and file paths instead of full copied documents.
- Structured status files, such as JSON for task state.
This keeps institutional knowledge from consuming the whole context window. For long-running agents, the file system can become external memory: persistent, searchable, and large enough to hold what the context window cannot.
Select: retrieve only what matters
Good retrieval is selective. A focused 300-token excerpt can beat a 128,000-token dump.
Use:
- Semantic search to retrieve by meaning.
- Just-in-time retrieval based on the current task.
- Relevance scoring before material enters the context.
- Source filters so one unrelated document cannot steer the answer.
If the model needs 3 emails from a sales opportunity, send those 3 emails. Sending every email from every deal increases the chance that the model borrows a fact from the wrong account.
Compress: summarize without losing decisions
Compression helps when history gets long. Summarize, but preserve the facts that future work depends on.
Keep:
- Decisions.
- Constraints.
- Source links.
- Open questions.
- Known errors.
- Current task state.
Drop:
- Chit-chat.
- Repeated instructions.
- Stale branches of work.
- Material that no longer affects the answer.
As a starting rhythm, summarize chat history every 5 to 10 exchanges, agent work every 3 to 5 steps, or any time context passes about half of your target window. For high-stakes work, review summaries because compression can lose specific events.
Isolate: split work into focused contexts
Long tasks drift when one context has to hold everything. Split work into smaller contexts when the task has clear parts.
Use isolation for:
- Research branches.
- Review passes.
- Data extraction.
- Writing and editing.
- Tool-heavy workflows.
Pass summaries, source pointers, and decisions between contexts. Do not pass the full transcript unless the next step needs it. Smaller contexts make it easier for the model to stay on task and easier for you to find the failure.
Watch for 4 failure modes
Drew Breunig's failure modes are a useful debugging checklist.
| Failure mode | What happens | Fix |
|---|---|---|
| Context poisoning | A false claim or bad observation enters the context and future steps reuse it. | Stop, correct the bad fact, cite the source of truth, and remove or quarantine the poisoned context. |
| Context distraction | The model over-focuses on accumulated history and misses the current task. | Prune old history, summarize decisions, and restate the current task at the top. |
| Context confusion | Irrelevant material influences the answer. | Remove extra documents, reduce tool lists, and retrieve narrower excerpts. |
| Context clash | The model sees conflicting rules, facts, or examples. | Pick one source of truth, resolve the conflict in writing, and delete the losing instruction. |
Most agent failures become easier to fix once you name the context failure. You stop asking, "Why is the model bad?" and start asking, "What did we put in front of it?"
Use grounding, examples, and reasoning with care
Three techniques solve different context problems.
RAG grounds answers in your knowledge
Retrieval augmented generation turns a closed-book task into an open-book task. The basic loop is:
- Chunk documents into useful sections.
- Embed or index those chunks.
- Retrieve the chunks most relevant to the current task.
- Add those chunks to the prompt.
- Generate an answer grounded in the retrieved material.
RAG helps with current information, company-specific knowledge, and source-grounded answers. It fails when retrieval brings back too little, too much, or the wrong thing. Test chunk size, retrieval rules, and relevance thresholds before you trust the system.
Examples teach judgment
Few-shot examples help when the desired output has a format, tone, label, or judgment call.
Use examples for:
- Classification.
- Structured output.
- Editorial tone.
- Edge cases.
- Before-and-after transformations.
Use 3 to 5 strong examples before you add 10 weak ones. Keep the structure consistent so the model can infer the pattern.
Reasoning helps complex tasks
For planning, analysis, and multi-step decisions, ask the model to produce a plan, compare options, or check its answer before finalizing. Reasoning patterns such as least-to-most decomposition or tree-style exploration can help on hard tasks.
Skip them for simple work. They add tokens, cost, and latency, and they can produce confident-looking reasoning that still rests on bad context.
Maintain context during long-running work
Agents accumulate context with every message, tool call, and observation. Without rules, the context gets bloated and brittle.
Use these habits:
- Set trimming rules. Decide when old messages leave the window.
- Summarize on a schedule. Preserve decisions and sources before history becomes too long.
- Keep errors visible. If a tool fails, keep the failure and observation in context long enough for the agent to adapt.
- Limit the tool loadout. Long tool lists create confusion. Give the agent the tools the task needs.
- Use stable structure. Consistent ordering and serialization help production systems reuse prior work and reduce cost.
- Measure quality. Track whether better context improves accuracy, latency, cost, and review time.
Production teams learn that context quality matters more than prompt cleverness. Version it, test it, and watch how it behaves under real tasks.
Run the checklist
Before you start
- What outcome do I need?
- What information can change the answer?
- Which sources are authoritative?
- What should the model ignore?
- What does good output look like?
While the work runs
- Is the model using the right sources?
- Has context grown past the point where summaries would help?
- Did an error enter the context?
- Are tools or examples creating confusion?
- Does the current task still appear near the top?
When the answer fails
- Did the model lack necessary information?
- Did irrelevant information distract it?
- Did two instructions conflict?
- Did a bad fact get preserved?
- Should you prune, compress, retrieve better context, or split the task?
Context engineering is the discipline of making work plausibly solvable. Better models help, but you still need enough signal for the system to do the job.
Last updated at June 3, 2026