
🤖 Ghostwritten by Claude Opus 4.6 · Fact-checked & edited by GPT 5.4 · Curated by Tom Hundley
The framework wars are over because the real bottleneck is no longer orchestration syntax. It is production reliability. In 2026, the important question is not whether you picked LangChain, CrewAI, AutoGen, LangGraph, or an SDK from a model vendor. The important question is whether your agent system can survive failure, expose its state, recover predictably, and give operators enough control to trust it with real work.
That is the shift from framework selection to production engineering. If your agents silently degrade, lose state on restart, pass health checks while failing real tasks, or hide failure behind fallback logic, no framework choice will save you. Those are platform problems.
At Elegant Software Solutions, we learned that the hard way. We had a fleet of agents — Sparkles, Soundwave, Concierge, Harvest, and more — spread across multiple repositories and stitched together with just enough glue to feel productive. The system often reported healthy when it was not. Failures degraded silently. The orchestrator caught only part of what it should have. So we stopped building new agents and restarted the platform. Not because the agents were bad, but because the platform under them was not yet a platform.
This is why framework debates stopped mattering, what replaced them, and what we are building instead.
TL;DR: Framework choice matters less than runtime authority, observability, and failure handling.
I spent real time evaluating major agent frameworks against our needs. Here is the condensed version of what we found:
| Framework | ESS Decision | Core Reasoning |
|---|---|---|
| OpenAI Agents SDK | Adopted | Strong fit for tools, sessions, and orchestration at the application layer |
| Claude Agent SDK | Approved exception | Useful for specialist coding and research workers |
| LangGraph | Watchlist | Promising for long-running workflows, but not our day-one foundation |
| LangChain | Not selected | Broad ecosystem, but adds abstraction before runtime discipline exists |
| CrewAI | Not selected | Interesting multi-agent model, but does not solve our observability and control-plane gaps |
| AutoGen | Not selected | Microsoft has shifted focus toward Microsoft Agent Framework, reducing AutoGen's role as a standalone strategic bet |
That last point is telling. The category is consolidating. Whether you call it an agent framework, agent runtime, or agent platform, the market is moving away from the idea that a framework alone can make an agent system dependable.
You still need a control plane, a worker contract, a retry model, a dead-letter queue, and an operator surface that shows what is actually happening. When I mapped CrewAI, LangGraph, and other options against our real failure modes — silent fallbacks, misleading health checks, and state loss on restart — none addressed the root issue.
The root issue was platform authority. No framework was going to give us that. We had to build it.
TL;DR: Reliable agent systems are built on infrastructure patterns: authoritative state, explicit failure, bounded workflows, and human review where judgment matters.
The strongest production patterns in 2026 have little to do with which SDK you import and everything to do with how your system behaves under stress.
When we mapped our failures against this list, every one of them was a platform-layer problem. Not a model problem. Not a prompt problem. A platform problem.
TL;DR: We chose to build the platform kernel and adopt commodity model infrastructure.
The decision was not "build everything from scratch." It was narrower than that: build the platform layer, buy the model layer.
In practice, the platform restart consolidates work into a single monorepo, ess-agent-platform, that owns:
This is the platform kernel. It is intentionally boring. Every decision follows a simple test: does this make the next agent more dependable, or just more interesting?
The worker contract is the key architectural decision. Every agent — Sparkles, Soundwave, Harvest, and the rest — must conform to the same interface:
# Simplified worker contract (actual implementation is more detailed)
class WorkerResult:
status: Literal["success", "failed", "needs_human"]
output: TypedDict # domain-specific, but always typed
trace_id: str
retry_eligible: bool
execution_ms: int
class WorkerContract(Protocol):
def execute(self, task: TypedTask) -> WorkerResult: ...
def heartbeat(self) -> HeartbeatReport: ...
def probe(self) -> ProbeResult: ... # synthetic health checkNo silent fallbacks. No "write to a local file if the database is down." If the control plane is unreachable, the worker declares failure and the dead-letter path catches it. That is explicit degraded mode.
This is the opposite of what our old system did, where we normalized silent degradation until the health dashboard became fiction.
TL;DR: The strongest near-term agent use cases are internal operations with clear boundaries, known systems, and human oversight.
The software factory trend is external validation of the direction many teams are taking. Tools such as Devin, Codex, and OpenCode have shown that agents can contribute to real coding and operational tasks when they work inside defined boundaries, typed interfaces, and review gates.
The important point is not that these systems are fully autonomous. It is that they work best in constrained environments. Internal operations are a better fit than generic customer-facing automation because the systems are known, the workflows are defined, and success criteria are clearer.
That is the pattern ESS is building toward for business operations: email triage, bookkeeping support, payroll workflows, and insurance monitoring through specialist business agents.
Public analyst forecasts support the broader direction, even if exact adoption numbers vary by report and date. Gartner and other firms have consistently projected growing enterprise use of AI-assisted software development and workflow automation through the late 2020s. The practical takeaway is not the exact percentage. It is that organizations are moving from experimentation toward production deployment, which raises the bar for reliability.
Our file-based operating model is a direct response to that reality. If a decision, lesson, or system state is not written to a tracked file, it is not durable enough to trust. That applies to agent memory just as much as it applies to project documentation.
TL;DR: Our biggest failures came from weak platform authority, misleading health signals, and fragmented operational ownership.
Let me be specific about the failures that drove this rebuild.
The health check that lied. Our orchestrator reported agents as healthy based on process-level checks. An agent could be running, responding to heartbeats, and still be unable to reach downstream APIs. We learned that synthetic probes — checks that verify real end-to-end behavior — are non-negotiable. A heartbeat that says "I am alive" is useless if the agent cannot do its job.
The split-brain inbox. When the database-backed control plane slowed down, the shared runtime silently fell back to a file-based inbox. Two sources of truth meant no trustworthy source of truth. Tasks were processed twice, or not at all, and neither path raised an alert.
The repo sprawl problem. Every agent had its own repository, deployment configuration, and interpretation of health reporting. Onboarding a new agent meant rediscovering hidden assumptions. We wrote about this in the platform restart entry. The monorepo is not ideology. It is a maturity gate.
These failures would have persisted regardless of whether we used LangChain, CrewAI, AutoGen, or another framework. They were platform failures, not framework failures. That is the lesson I keep coming back to: production agent engineering is what matters once you are past the demo phase.
Use the framework or SDK that best fits your application layer, but do not expect it to solve production reliability for you. LangChain, CrewAI, and related tools can accelerate prototyping and orchestration, but you still need your own operational model for state, retries, observability, and escalation. The framework decision is secondary to the platform decision.
An authoritative control plane is the single system of truth for task state, run tracking, heartbeats, failure handling, and operator visibility. If the control plane cannot confirm the state of a task, that should be treated as an incident, not hidden by fallback behavior. It is what prevents split-brain operations.
The most important patterns are authoritative state, explicit degraded mode, output verification, human review at judgment points, and bounded workflows. Together, these make systems easier to audit, retry, and trust. They also reduce the blast radius when something fails.
Because the platform kernel encodes your operational rules. A generic framework does not know how your agents should fail, retry, escalate, or expose state to operators. Buying model infrastructure is usually sensible. Owning the control plane and worker contract often is too.
Software factories are environments where AI systems handle real operational work inside defined boundaries. In software, that can mean coding, testing, or triage with review gates. In business operations, it can mean structured workflows over known systems. The common trait is not autonomy. It is controlled execution.
Framework debates did not disappear because one framework won. They disappeared because production exposed a deeper truth: reliability, observability, and operational control matter more than orchestration style.
That is the work now. Build a control plane you trust. Define a worker contract you can enforce. Make failure explicit. Keep workflows bounded. Put humans where judgment matters.
If your team is working through the same transition from agent demos to dependable systems, follow the ESS blog for the next entry on worker contracts and platform design. And if you are rethinking your own agent architecture, contact Elegant Software Solutions to compare notes.
Discover more content: