One Agent Is Smart. But Even the Smartest Person Can’t Build a Skyscraper Alone.
In Issue 8, we watched AI agents learn to code on their own. They could read your project, write code, run tests, hit errors, fix them, and iterate — all without a human copying and pasting. It felt like the future had arrived.
But there was a ceiling. And anyone who pushed a single agent hard enough slammed into it.
Give an agent a small, well-defined task — “write a function that sorts a list” — and it shines. Give it a large, messy, real-world project — “build me a full web application with authentication, a database, a payment system, tests, and documentation” — and something goes wrong. Not immediately. Not dramatically. But gradually, steadily, inevitably.
The agent starts strong. It makes a plan. It writes clean code. But as the work accumulates — as the terminal fills with thousands of lines of code, error messages, file contents, and conversation history — the agent begins to drift. It forgets its own plan. It revisits problems it already solved. It contradicts decisions it made twenty minutes ago.
It is not getting dumber. Its brain is getting full.
This is the story of what happened next: the leap from one overwhelmed agent to a coordinated team of focused specialists. The leap from solo to swarm.
The Core Insight: “The solution to an overwhelmed mind is not a bigger mind. It is more minds, each focused on less.”
But why exactly does a single agent fall apart? The answer lies in a phenomenon researchers call “lost in the middle” — and it reveals a fundamental limit baked into the very architecture that makes language models work.
The Context Window Ceiling
Between 2023 and 2025, AI companies raced to build bigger context windows — the amount of text a model can “see” at once. Windows grew from 8,000 tokens to 32,000, then 128,000, then 200,000 and beyond. The assumption was simple: more memory means better performance.
It did not work out that way.
In 2023, researchers Nelson Liu and colleagues published a paper called “Lost in the Middle.” They tested how well language models actually used information placed at different positions in their context. The results were striking: models performed well on information near the beginning and the end of their context, but struggled significantly with information in the middle. A U-shaped curve.
This is not a bug in one particular model. It is a pattern across model families. The attention mechanism — the same breakthrough from Issue 7 that made Transformers possible — spreads its focus across all the tokens. As the window grows, that focus thins out, like a flashlight beam widening until it barely illuminates anything.
For coding agents, this creates a vicious cycle. An agent working on a complex project accumulates context relentlessly: file contents, tool outputs, error messages, its own reasoning, conversation history. By the time it is deep into a task, the careful plan it made at the beginning has been pushed into the foggy middle of its context.
Practitioners started calling this context degradation. And making the window bigger did not fix it. It just moved the fog further away.
Think About It: Have you ever re-read a paragraph in a textbook because you forgot what it said? Now imagine re-reading it while someone keeps adding new pages between you and the paragraph. That is what context degradation feels like to an AI agent.
Making one agent smarter was not enough. Making its memory bigger was not enough. So researchers asked a different question entirely: what if instead of one agent with a huge brain, we used a TEAM of agents, each with a fresh, focused brain?
The Idea — What If We Had a TEAM of Agents?
The insight was not new. It was ancient. Every complex human project — a building, a film, a space mission — is built by teams of specialists. An architect does not also do the plumbing. A director does not also edit the film.
The idea of multiple AI agents cooperating is not new either — researchers explored multi-agent protocols as far back as 1980. What changed was the arrival of LLMs powerful enough to serve as general-purpose agents.
By early 2025, engineers began applying this same principle to AI agents. Instead of one agent drowning in a 200,000-token context trying to do everything, what if you had multiple agents, each with a clean, focused context, each handling one part of the job?
The idea took shape across several teams almost simultaneously. Microsoft Research released AutoGen in September 2023, a framework for multi-agent conversations. Joao Moura created CrewAI in late 2023 — a tool that let you define a crew of agents with specific roles. LangChain released LangGraph in early 2024, using graph theory to model agent workflows. And in October 2024, OpenAI released Swarm, a lightweight experimental framework for agent handoffs.
Meanwhile, Addy Osmani — an engineering leader at Google’s Chrome team who had spent over a decade writing about JavaScript design patterns — began documenting practical patterns for teams of Claude Code agents. His work in 2025 drew a direct line between traditional software engineering principles and multi-agent AI.
The Pattern Repeats: “The move from single agents to multi-agent teams follows the oldest pattern in computing: when one thing hits its limit, use many things together. Single CPUs gave way to multi-core. Single servers gave way to distributed systems. Single agents are giving way to swarms.”
Several architectures emerged for organizing agent teams. Each has strengths. Each has trade-offs. And one of them — the strangest-sounding of all — turned out to be surprisingly effective.
Architecture 1 — The Lead Agent and Its Specialist Workers
The first and most common architecture is hierarchical: one lead agent that coordinates, and multiple worker agents that execute.
You give a complex task to the lead agent. The lead does not try to do the work itself. Instead, it makes a plan: “This task has three parts.” It then spawns specialist agents — one for each part — and gives each one a focused brief.
Each specialist works within its own clean context. When each finishes, it sends its output back to the lead, which synthesizes everything into a final result.
This is the pattern behind Claude Code’s Task tool. When Claude Code needs to accomplish something complex, it can spawn subagents — isolated instances that each tackle a piece of the puzzle. The parent agent’s context stays clean. The messy details live in the specialists’ contexts.
Think About It: Think about the last group project you worked on. Was it better when one person tried to do everything, or when someone split the work up and each person focused on their part? Why?
The hierarchical model is clean and intuitive. But some teams pushed the idea further. What if the agents were not just separate workers — but completely, radically isolated from each other?
The Constellation Pattern: True Isolation
In early 2025, a distinct pattern emerged from practitioners building with Claude Code. They called it the Constellation Pattern, and it took multi-agent isolation to its logical extreme.
Each agent is a completely separate invocation of the AI model. Not a different persona within one conversation. A separate process, with its own fresh context window, that starts from zero.
The agents communicate through one channel only: files on the filesystem. Agent A writes its output to a markdown file. Agent B, when spawned later, reads that file as input. Agent B has never “been” Agent A. It has no memory of Agent A’s reasoning process.
The orchestrator is deliberately simple. It does not summarize. It does not editorialize. It just spawns agents in sequence, points each one at the right input files, and collects the output files when they finish.
The name captures it: stars that form a pattern, each one shining independently, none connected by anything but the shared sky.
The trade-off: file-based communication is slower than in-memory messaging, and not every task needs this level of isolation. But for complex, multi-step projects where independent judgment matters, the benefits far outweigh the cost.
Why Isolation Matters: “True isolation is not a limitation. It is a superpower. When each agent thinks independently, you get genuine diversity of analysis — not an echo chamber wearing different hats.”
But WHY does isolation matter so much? The answer has a name: the “Same Brain” problem. And it is the reason asking one AI to play two roles is fundamentally different from having two separate AIs.
The “Same Brain” Problem
Here is a deceptively simple experiment. Give a language model a coding task. Let it write a solution. Then, in the same conversation, ask it to review that solution for bugs.
It will almost always be generous. Not because it is dishonest — but because it already “knows” the intent behind every line. It generated those tokens. When it reviews its own code, it is doing the AI equivalent of proofreading your own essay five minutes after writing it.
Practitioners call this the “Same Brain” problem. The “critic” is not independent — it is the same brain wearing a different hat.
Now spawn a completely separate model instance. Give it the same code with no conversation history. This second instance routinely finds issues the first one missed. Not because it is a better model — it is literally the same model. But its context is clean.
# The “Same Brain” problem, demonstrated simply:# APPROACH 1: Same context (biased review)agent.write("Build a login function")
agent.review("Review the code you just wrote")
# Result: "Code looks correct. 9/10."# APPROACH 2: Isolated agents (genuine review)agent_a.write("Build a login function")
save_to_file(agent_a.output, "login.py")
# agent_a’s context is destroyedagent_b = fresh_agent()
agent_b.review(read_file("login.py"))
# Result: "Missing input validation.# SQL injection risk. 6/10."
The Key Insight: “Independence is not about being different models. It is about being different contexts. The same model, invoked separately, gives genuinely independent analysis — because it has no memory of generating the work it is reviewing.”
Isolation gives agents fresh eyes. But when multiple agents need to edit the same codebase at the same time, a new problem appears: how do you keep them from stepping on each other’s work?
Git Worktrees — Parallel Work Without Collisions
Cognitive isolation solves the “Same Brain” problem. But there is a practical problem too: file conflicts.
If two agents try to edit the same file at the same time, chaos ensues. One agent’s changes overwrite the other’s.
Human developers solved this years ago with Git (which we covered in Issue 5). But standard Git still assumes one working directory. You check out one branch at a time.
Enter git worktrees. Released in Git 2.5 in July 2015 — a decade before anyone used them for AI agents — worktrees let you create multiple working directories from the same repository, each checked out to a different branch.
# Create parallel workspaces
git worktree add ../agent-a feature-auth
git worktree add ../agent-b feature-database
git worktree add ../agent-c feature-tests
# Each agent works in its own folder# No collisions. Clean merge when done.
Think About It: Git worktrees were built in 2015 — a decade before anyone used them for AI agents. Can you think of other technologies that were invented for one purpose and later turned out to be essential for something nobody predicted?
We have the architecture. We have the isolation. We have the tools. But what does it actually look like when a team of agents builds something real?
A Real Example — Watching a Swarm Build a Web App
A developer wants to build a task management web application. They type a single prompt:
“Build a task management app with user authentication, a REST API, a database, and a React front end. Include tests.”
Step 1: The orchestrator makes a plan and writes each assignment to a separate file.
Step 2: Specialists spin up in parallel. Agent A builds the backend API. Agent B designs the database. Agent C builds the React frontend. Agent D writes the tests. Each works in its own git worktree.
Step 3: Results converge. The orchestrator merges the branches.
Step 4: The Red Team. A final agent reviews everything with fresh eyes.
Step 5: Fix and ship. The cycle may repeat once or twice. Then the application is ready.
Step 1 / 6
Divided Attention: “A swarm does not just divide labor. It divides attention. Each agent gives its full, undegraded focus to one concern. The result is not just faster — it is often better than what one agent could produce alone.”
If this architecture feels familiar, it should. You have seen it before — over half a century ago, in a quiet lab in New Jersey. The Unix philosophy is back.
The Unix Parallel — Half a Century Later
In 1964, Doug McIlroy — head of the Computing Techniques Research Department at Bell Labs — wrote an internal memo. He wanted a way to connect programs together “like a garden hose — screw in another segment when it becomes necessary to massage data in another way.”
It took over a decade, but his vision became the Unix pipe: the | character that lets you chain small programs together. cat file | grep pattern | sort | uniq. Four small tools. None of them knows about the others. Each reads text in and writes text out. Complex behavior emerges from simple, composable parts.
In 1978, McIlroy made it a philosophy: “Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.”
Now look at the Constellation Pattern. Each agent does one thing well. Agents work together. They communicate through files — which are text, the universal interface. The insight McIlroy had in 1964 is precisely the insight that makes multi-agent AI work in 2025.
More than half a century. The longest-running good idea in computing.
An Immortal Idea: “The best ideas in computing are not inventions that appear once. They are principles that keep being rediscovered, in new forms, at new scales. Composition through text is one of those immortal ideas.”
Multi-agent systems are the newest layer in a stack 90 years tall. And there is one more secret hiding in this issue. It involves YOU, right now, reading these words.
The Stack Grows Taller — And You Are Inside the Story
Every layer of computing’s history rests on the one below it. Turing’s idea made hardware meaningful. Hardware made programming languages possible. Languages made operating systems like Unix practical. Unix’s philosophy shaped the Internet. The Internet fed data to machine learning. Machine learning became deep learning. Deep learning produced Transformers. Transformers enabled LLMs. LLMs became agents. And now, agents are becoming swarms.
Ninety years of ideas, stacked like layers of a building. Each one was someone’s life work. Each one seemed like the peak of what was possible — until the next layer appeared.
The question is no longer “Can AI write code?” That was answered. The question is no longer “Can AI work autonomously?” That was answered too. The new question is: “How do we organize teams of AI agents to tackle problems that no single mind — human or artificial — can hold all at once?”
And that question? It is wide open. For you, for the engineers building these systems, for everyone who will shape what comes next.
90 Years: “From Turing’s tape to the swarm’s files — the core insight has not changed: the way to tackle problems too big for one mind is to organize many minds, each focused on its part, communicating through simple, universal interfaces.”
Think About It: You have just read the entire arc from a single imaginary machine in 1936 to teams of AI agents in 2026. What do you think the NEXT layer of the stack will be? What problem is too hard for today’s swarms that will need something new?
Next Issue: We zoom all the way out. The full stack. The full story. Every layer, every connection, every person who built a piece of the world you live in. Issue 10: “The Whole Stack — and What Comes Next” →