Mar 2026
A chatbot has one layer: the model. You send it text, it sends text back. That is fine for a support widget. It is not sufficient for an agent running inside a real business operation.
Production agent systems need four layers. Most teams build two. The ones they skip are the ones that determine whether the system works long-term.
What the agent knows, and how it knows it. This includes retrieval architecture, memory systems, entity graphs, and the data ontology that maps your business objects to something the model can reason about.
Most teams underinvest here and overspend on everything else. A weak context layer means the model is constantly working from incomplete information. The output looks plausible but is unreliable in ways that are hard to catch before they cause problems.
Good context design means being precise: what does this agent need to know to do this task correctly, and what is the most reliable way to surface it at the moment it is needed.
How the agent decides. Model selection, prompt architecture, structured output, tool definitions, chain-of-thought design. This is the layer most people think of when they think of agent engineering, and it gets the most attention.
The key discipline here is specificity. General-purpose reasoning prompts produce general-purpose results. Effective production agents are designed around the specific decision being made: what inputs are relevant, what outputs are acceptable, what edge cases need explicit handling.
What the agent does with its output. API calls, database writes, workflow triggers, notifications, human escalation. This layer is where the agent interacts with the world outside the model.
The two things that matter most here are scope and auditability. Agents should have precisely scoped permissions: write access to exactly what they need, nothing more. Every action should be logged in a way that makes it possible to trace what happened, why, and what the state of the system was at that moment.
Escalation paths belong in this layer. Designing for the moment when the agent should not make the call, and making sure a human receives the right context to make it quickly, is as important as designing the happy path.
This is the layer that almost every prototype skips, and the one that almost every production deployment eventually breaks without.
Operations covers monitoring, alerting, drift detection, cost tracking, and exception handling. It is the answer to the question: how do you know the system is working right now, and how will you know when it stops?
Agent systems degrade in ways that are different from traditional software. A model upgrade changes behavior without anyone touching the code. Retrieved data goes stale. Edge cases that were rare become common as volume grows. An ops layer catches these before they become visible to users or accumulate into something expensive to fix.
The teams that build all four layers have systems that run reliably for months. The teams that skip the last two have systems that work well at launch and slowly become unreliable in ways nobody can quite explain.
Build all four layers. In that order.