
Pythagora and Cognition are the two agent-software-factory companies most worth watching right now, and they are betting in nearly opposite directions on orchestration, autonomy, vertical scope, and substrate. Pythagora is a YC-backed multi-agent team of fourteen specialists embedded inside VS Code and Cursor; Cognition's Devin is a single sandboxed autonomous engineer with its own IDE (Windsurf, acquired in December 2025). Same problem, very different bets โ and that divergence is what makes them useful reference points for any internal agent build.
If you are building an internal agent platform, the most useful thing you can do most weeks is watch the companies trying to build the same thing for the open market and pay attention to where they disagree. We do this constantly. The point is not to copy. The point is to calibrate.
Here is what we are taking from it.
Pythagora is a YC-backed platform that grew out of the open-source GPT Pilot project. Its public framing is an "all-in-one AI development platform" that lives inside VS Code and Cursor and is powered by a team of fourteen specialized agents โ Architect, Tech Lead, Developer, Debugger, and the rest of the cast. The user gives it an idea; Pythagora's agents collaborate on specs, frontend, backend, tests, and deployment. The current production target is React frontends and Node.js backends, with Python on the roadmap. Pricing starts at free, with paid tiers from $49/month and an enterprise option.
Cognition is the maker of Devin, marketed as the first AI software engineer. Devin runs in its own sandboxed environment with the full toolchain a human developer would expect, takes broad goals as input, and decomposes them into thousands of decisions over long-horizon tasks. Cognition raised $400M in September 2025 at a $10.2B valuation and, as SiliconANGLE reported on April 23, 2026, is reportedly in talks to raise hundreds of millions more at roughly $25B. Goldman Sachs has publicly described Devin as employee number one in its hybrid workforce. In December 2025, Cognition acquired Windsurf, the agentic IDE, after Google reverse-acquihired Windsurf's leadership, giving Cognition both the agent (Devin) and the workspace (Windsurf) under one roof.
Different companies. Different bets. Same problem.
| Dimension | Pythagora | Cognition (Devin + Windsurf) |
|---|---|---|
| Orchestration model | Multi-agent team, role-specialized | Single sophisticated agent with long-horizon planning |
| Autonomy level | Iterative, user-in-the-loop at every phase | Task-level autonomy; goal in, working code out |
| Vertical vs horizontal scope | Vertical: full-stack web (React + Node) | Horizontal: any codebase Devin can clone and run |
| Deployment substrate | Embedded in user's IDE (VS Code, Cursor) | Cognition-managed sandbox plus owned IDE (Windsurf) |
| Monetization | Tiered SaaS, individual to enterprise | Enterprise contracts, seat-based, premium positioning |
| Observability story | Per-agent transcripts, role-by-role visibility | Sandbox replays, plan trees, IDE-side review |
Neither column is wrong. They are answers to different questions.
The Pythagora bet rhymes with how we have organized our own crew. Sparkles, Soundwave, Optimus Prime, Salvage, Wheeljack โ each of these is a role with a defined remit, not a generalist. When something breaks, the failure is legible: a specific agent, in a specific role, produced a specific artifact. We can look at one transcript instead of unwinding a thousand-step plan.
The architectural lesson we draw is not "fourteen agents is the right number." It is that role specialization makes failure understandable. When an Architect agent disagrees with a Developer agent, that disagreement is a feature โ it surfaces ambiguity at the spec layer, before it gets cemented into code.
The risk on this side of the divergence is coordination cost. Multi-agent systems can deadlock, loop, or quietly produce contradictions that no single role notices. Pythagora's vertical scope is partly what keeps that under control: a fixed React-and-Node stack means the agents are arguing inside a known box. Generalize the box too far and the role definitions stop carrying weight.
The Cognition bet is that long-horizon, single-agent autonomy plus a great workspace beats a committee. Devin is designed to take a goal, plan thousands of decisions, recover from mistakes, and ship. The Windsurf acquisition is the second half of the same bet: own the surface where humans review, accept, and override that autonomous work.
The architectural lesson here is that autonomy is only as good as the surface that lets a human inspect it. A coding agent that goes dark for an hour and returns with a pull request is only useful if someone can replay what it did, see the plan it followed, and reject the parts that drifted. Cognition is investing heavily in that review surface. We should too.
The risk on this side is the one we have written about before: the failure mode of fully autonomous systems is rarely the model saying something dumb. It is the system doing something irreversible before anyone notices. The DC Appeals Court's April 8 denial of Anthropic's temporary-block request, in the broader Anthropic v. DoD fight, is a useful reminder that governance over what an agent is allowed to do โ not just what it is capable of โ is becoming a regulatory question, not just an engineering one.
NVIDIA's April 14 launch of the Ising open quantum AI models, NVQLink, and the NVAQC research center is a quiet signal that the compute under all of this is still moving. Today's agent factories run on conventional GPU clusters with API-mediated model access. That assumption will not hold forever. The substrate is going to keep evolving โ toward hybrid classical-quantum stacks, toward more specialized inference accelerators, toward whatever comes after.
Architecturally, that pushes us toward portability. Agent definitions, role contracts, observability surfaces, secrets boundaries โ these should not be cemented to today's runtime. Pythagora's IDE-embedded approach inherits the user's environment, which is a kind of portability. Cognition's sandboxed approach is more brittle in this dimension; the sandbox is a liability if the substrate underneath shifts. We are designing for the substrate-shift case explicitly.
The instinct, watching two well-funded companies bet in opposite directions, is to ask which one is right. That is the wrong question. Both bets can succeed. Pythagora can win the prosumer and small-team market with role-specialized agents inside the IDE. Cognition can win the enterprise market with autonomous engineers and a controlled workspace. Their divergence is not a contradiction; it is two valid points on a frontier that has not been mapped yet.
What we take from it for our own crew:
We are not Pythagora. We are not Cognition. We do not need to be either. The point of watching them is to keep our own architecture honest โ to make sure the choices we are making about orchestration, autonomy, and substrate are choices, and not defaults we drifted into.
The frontier is wide. The companies trying to map it are useful precisely because they disagree.
Discover more content: