Context Engineering: The Skill That Separates Good AI Teams from Great Ones
Context engineering—not prompt engineering—determines AI system quality. A practical guide to CLAUDE.md, instruction budgets, and production patterns.
Context Engineering: The Skill That Separates Good AI Teams from Great Ones
The difference between an AI system that works and one that fails in production often comes down to something unglamorous: how you organize information for the model to read.
This is context engineering—the deliberate discipline of structuring, prioritizing, and distributing information a model needs to perform reliably at scale. It's not prompt engineering (wording), but structural: deciding what, how, when, and where to present information so models can apply it across many tasks.
Alex Hinds has spent over a decade scaling AI systems from startups to enterprises, and has seen teams repeatedly stumble here: heavy investment in orchestration and infrastructure, but ad hoc context engineering. The result is predictable—brittle systems that work in demos but fail in production.
Context vs. Prompt Engineering
Prompt engineering—writing better phrasing to a model—is tactical. Context engineering is structural: deciding what information matters, how to present it, and where to store it for reliable reuse across tasks.
The distinction reshapes the problem. A prompt engineer tweaks wording. A context engineer asks: Should this information be in the context window at all? Could we reference a file? Is this documentation stale and actively making performance worse?
With Claude's 200,000-token context window, it's tempting to think context is unlimited. It isn't. Your actual usable context—what a model can reliably process and apply—is far smaller. Capacity doesn't translate to practical utility.
This is the first principle: your context window is a budget, not a blank canvas.
The Context Budget: 150-200 Instructions Is Your Real Limit
Research shows performance degrades when you pack too much into context. Models get worse at following dense instructions—they hallucinate more, miss nuance, and optimize for surface patterns rather than actual tasks.
Think in terms of instructions, not tokens. A well-designed instruction might be 50-200 tokens. For a coding agent or complex reasoning task, you're working with a real budget of 150-200 high-quality instructions before saturating the model's ability to reliably apply them.
What belongs in this budget? System architecture, non-standard tooling, project conventions, and things that can't be inferred from code alone. What doesn't: stack traces, entire API documentation, or every edge case you've encountered.
This is where CLAUDE.md becomes essential—ruthlessly minimal documentation that lives in your repo.
CLAUDE.md: Ruthlessly Minimal Configuration
CLAUDE.md is a single markdown file in your project root containing everything a model needs to navigate the codebase—not everything it could theoretically benefit from, but what it actually needs. It's a prose document, typically 200-500 lines, organized by intent.
A good CLAUDE.md includes:
- Two-paragraph project overview and rationale
- Directory structure (prose, not a tree)
- How to run tests, dev server, production builds
- Environment variables and their sources
- Architectural patterns or constraints
- Non-obvious conventions
- What you explicitly don't want changed
The discipline is in ruthless omission. Don't include: auto-generated API docs (link to them instead), stack traces or error logs, stale patterns (outdated documentation actively hurts), or historical one-offs.
When you're tempted to explain something, ask first: should the code be clearer instead? A good CLAUDE.md documents the invisible—context that genuinely can't be inferred. Consider pairing it with ARCHITECTURE.md for design decisions, TESTING.md for test patterns, and other files distributed by need rather than crammed into one bloated document.
Progressive Disclosure: Context by Need
One of the highest-leverage decisions is distributing context according to need rather than stuffing everything into one file.
High-level info in CLAUDE.md. Architecture decisions in ARCHITECTURE.md. API documentation in its own system. Testing patterns in TESTING.md. This keeps files lean, enables on-demand loading, and makes maintenance tractable. When a testing pattern changes, you update one file.
This becomes invaluable with AI agents interacting with your codebase repeatedly. An agent that checks ARCHITECTURE.md for domain patterns is demonstrably more reliable than one working from sprawling context. Within the context of the plan-execute-clear loop where agents interact with codebases iteratively, structured, versioned context becomes part of your deployment pipeline.
Version your context like code. Changes go through PRs. You see in git history when documentation shifted. This is critical for production systems.
What Belongs in Your Context
Project description and rationale. Not a sales pitch—why it exists and what problem it solves. "We chose SQLite because we're edge-deployed" shapes better decisions than knowing you use SQLite.
Package manager and build system. Explicitly. pnpm vs. npm, custom build scripts, constraints like "no native modules." Models should never guess.
Non-standard commands. If your test command is pnpm test:integration, document it. This is where CLAUDE.md pays for itself.
Architectural conventions. File structure, test colocation patterns, API route conventions, component composition rules. These are invisible until made explicit.
What you don't want changed. Complex legacy code, intentional patterns that look wrong, areas teams have decided not to refactor. Models optimize for surface improvements. Being explicit prevents failures.
Dependency reasoning. Not a list—context. Why three HTTP clients? Why two database layers? Models that understand reasoning make better decisions.
What doesn't belong: implementation details, full API signatures (link to docs), examples longer than explanations, or anything discoverable by reading the code. When testing patterns change, you might want to reference a dedicated TDD with AI agents guide or measure context quality with AI evaluation frameworks.
Anti-Patterns That Kill Production Systems
The bloated auto-generated dump. A script ingests your entire codebase and pastes a summary into CLAUDE.md. Works once. By month three, it's stale and actively misleading. Models trust documentation; wrong documentation causes confident hallucinations.
The everything file. One CLAUDE.md containing architecture, API docs, testing, deployment, and naming philosophy. Impossible to maintain. Updates break assumptions elsewhere. No one knows what's current.
Silence. No CLAUDE.md at all. Fine for small projects. With multiple services, trade-offs, and constraints, models guess, and guesses in production are expensive.
Stale technical decisions. You documented three months ago. Major refactors shipped since then. Documentation doesn't reflect reality. You're actively misleading the model about your system.
Context as sales pitch. Polished, aspirational documentation that glosses over complications. A model with an honest picture of constraints can work around them. One working from aspirational docs will hit assumptions that don't hold.
Context Engineering in Production Systems
This matters most when building production AI agents—systems that make decisions, interact with infrastructure, or process customer data autonomously. A coding assistant's mistakes are caught immediately (code doesn't compile). Production agents fail silently.
Context becomes something you test, version, and monitor like code. Changes go through PRs. Integration tests verify agents behave as documented. You measure performance shifts when context changes.
Without ruthlessly minimal context, you lose visibility into what's affecting agent behavior. With bloated documentation, you can't maintain it. Without versioning, you can't debug changes.
Production requires thinking of context as infrastructure. When you're working with reusable agent patterns across projects, agent skills become a framework for documenting context consistently. Pair this with measured evaluation of context quality to iterate reliably.
Getting Started
Start simple. Create a CLAUDE.md in your project root. Keep it to one page: what the project does, how to run it, non-standard tooling, and architectural constraints.
Make it ruthlessly minimal. If a sentence doesn't convey something obvious from reading code or official docs, cut it.
Reference it explicitly when working with AI models. Iterate based on gaps you notice. Some things you documented won't matter. Others you'll realize should be clearer.
For multiple projects or codebases touched by multiple agents, layer in ARCHITECTURE.md, TESTING.md, or DEPLOYMENT.md as needed. Start lean.
The constraint is the feature. Limited context forces hard decisions about what's essential. Those decisions are where craft emerges. As Simon Willison has written, context engineering is a discipline unto itself—not a byproduct of using models, but a deliberate practice that separates teams that iterate reliably from those that patch fires.
Reference frameworks like OpenAI's prompt engineering guide for broader context, but remember: the prompt is ephemeral. Your CLAUDE.md is version-controlled infrastructure. Treat it accordingly.
Ruthlessly minimal context. Honest documentation. Versioned like code. That's context engineering.