AI Agents
Why your agents need their own protocol to talk to each other
MCP handles what agents can touch. A2A handles how they coordinate. A field guide to LangGraph, CrewAI, AutoGen, and when multi-agent is actually worth the complexity.
Multi-agent systems are seductive. One agent for research, one for writing, one for review, all talking to each other like a tiny company. The pitch sells itself. The reality is that 90% of the teams reaching for multi-agent in 2026 would be better served by one well-designed agent with the right tools.
We say that as a studio that has shipped multi-agent. It works. It also costs you. This post is the honest map of when it's worth it, and how to build it when it is.
The two protocols you actually need
The 2026 stack has settled around two protocols that do different jobs.
MCP is the agent-to-tool layer. We wrote a separate piece on it. The short version: it's how an agent reaches down into the filesystem, the database, the SaaS APIs.
A2A is the agent-to-agent layer. It's how one agent passes a task to another, gets the result back, and reasons about it. Google open-sourced their A2A spec in 2025 with backing from Atlassian, Salesforce, SAP, MongoDB and others. By April 2026 it had become the inflection point: EY, JPMorgan, and Salesforce were running thousands of orchestrated workflows where agents from different vendors coordinated through A2A.
You can build multi-agent without A2A. People did it for a year. It was painful. Every framework had its own internal coordination format, which meant you couldn't mix a LangGraph supervisor with a CrewAI specialist without writing glue. A2A removed that.
When multi-agent earns its complexity
The honest rule we use: pick multi-agent only when the workflow has distinct skill profiles, clear hand-offs, and independent failure modes.
Distinct skill profiles means each agent genuinely needs a different prompt, different tools, or different model. A "writer" and an "editor" with the same Claude Sonnet and the same tools is just one agent with two prompts. Don't bother.
Clear hand-offs means the task naturally breaks into stages that complete and pass forward. Research, then draft, then review. Each stage produces a stable artifact. Not a loose blob of "thinking".
Independent failure modes means when one agent fails, the others can usually continue or retry without restarting everything. If a failure anywhere blows up the whole graph, you don't have a multi-agent system. You have a fragile pipeline.
If the workflow doesn't pass all three tests, just use one agent. Save yourself the orchestration debt.
The framework landscape, briefly
A working studio view of the four frameworks worth knowing in 2026.
LangGraph is the most powerful and the most painful. State machines, explicit nodes, checkpoints, recovery, human-in-the-loop. It treats your agent like a real distributed system, because it usually is. We reach for it when the workflow is long-running, needs to be resumed across crashes, or has human approval points that might take days. Cost: steep learning curve, lots of boilerplate, you need to actually understand graph state.
Claude Agent SDK is the cleanest if you're committed to Anthropic. Tight loop, simple tool definitions, MCP support, good defaults. We use it for greenfield agents where we want to ship fast and we're not optimising for portability. Cost: provider lock-in.
OpenAI Agents SDK is the same shape but for OpenAI. Functionally close. Pick whichever model you actually want to run. Both are improving fast.
CrewAI and AutoGen lean into the "team of agents" metaphor. Roles, personalities, a manager that delegates. Useful when the workflow really is decomposable into roles. Less useful when the model wants to keep "negotiating" rather than doing.
There are more, but those four cover 95% of what we see in production engagements.
The orchestration pattern that actually works
After enough engagements, a shape keeps repeating. We call it the supervisor with specialists. One agent owns the task and the conversation with the user. It delegates narrow, well-defined sub-tasks to specialist agents and collects the results.
The supervisor's prompt is mostly about planning and orchestration. The specialists' prompts are tight, focused, and tool-rich. The supervisor doesn't try to do everything. It routes.
# Pseudocode shape of a supervisor turn
plan = supervisor.plan(user_request)
results = []
for step in plan:
if step.kind == "research":
results.append(researcher.run(step.query))
elif step.kind == "transform":
results.append(transformer.run(step.input))
elif step.kind == "review":
results.append(reviewer.run(results[-1]))
final = supervisor.synthesise(results)
Two reasons this works in practice.
First, the supervisor can use a smaller, cheaper model for planning, while specialists use the right model for their job. Big cost win.
Second, when one specialist fails, the supervisor can retry, swap, or escalate without unwinding the whole graph. That's the independent failure mode property.
The trap is letting agents have free-form conversations with each other. They will. They'll be polite, they'll thank each other, they'll spend tokens reaching agreement. Force structured outputs at every hand-off. The model is not your colleague.
What we use for what
To make this concrete: how the studio actually picks.
For our Framework Developer Agent, we used multi-agent. One agent enumerates requirements, one generates architecture options, one critiques them, one writes the final spec. Distinct skills, clear hand-offs, independent failures. Multi-agent earned its complexity there.
For most of our automation work, including WhatSender and Dev Scraper, a single agent with the right tool layer is more than enough. The "agent" is one model with a tight loop and MCP-style tool access. Adding a second agent would have added latency and cost without changing the output.
For long-running workflows that need to survive a server restart or wait for human approval, we reach for LangGraph and treat the agent as a state machine. The graph is the contract.
The April 2026 inflection
Three things happened almost together. A2A matured into a real spec people implemented, not just announced. MCP servers proliferated to the point that any half-decent SaaS had one. The major frameworks (LangGraph, Claude Agent SDK, OpenAI Agents SDK) added native A2A support.
That combination is why enterprise adoption finally moved. Multi-agent went from "interesting research" to "boring infrastructure" in about a quarter. Boring is good. Boring is when SMBs can start using it without burning a quarter on R&D.
The honest summary
Don't go multi-agent because it sounds modern. Go multi-agent when your workflow genuinely has distinct skills, clear hand-offs, and independent failure modes. Pick the framework that matches your real constraint, not the most-starred one. Force structured I/O at every boundary. Log everything.
Then you'll have a system you can debug at 11 p.m. on a Tuesday, which is the only test that actually matters.
Most "multi-agent" systems are one agent and a meeting that didn't have to happen. Build the agent first.
Tags
- multi-agent
- a2a
- langgraph
- crewai
- orchestration
More on ai agents
- What agentic AI actually looks like in productionMost autonomous workflow demos collapse the moment money or compliance enters the loop. The realistic 2026 default is a hybrid, and the boundary line is the product.
- MCP, explained for people who didn't read the specAnthropic's Model Context Protocol went from a niche RFC in late 2024 to the way every serious agent talks to its tools in 2026. Here's what it actually does, and where it still doesn't fit.