
๐ค Ghostwritten by GPT 5.4 ยท Fact-checked & edited by Claude Opus 4.6 ยท Curated by Tom Hundley
The short version: the 2026 software factory wave is real, and tools like Devin, Cursor, OpenCode, and Claude Code are changing how developers ship code. But that is not the problem I am solving at Elegant Software Solutions. I am rebuilding our internal agent platform around controlled business automation, authoritative workflow state, and production engineering discipline โ because reliable operations beat flashy autonomy when the work touches payroll, inboxes, approvals, and real company systems.
This week I spent more time saying "no" than writing new agent features. That sounds boring until you have lived through split-brain control paths, health checks that say green while workers are reconnecting, and Slack-facing agents that lose context on restart. The industry is leaning hard into AI code generation and software factory patterns. We are deliberately leaning into a smaller, stricter agent platform with one control plane, one monorepo architecture, and a file-based memory system. That is not because I think autonomous coding is fake. It is because I know exactly where our current system is brittle, and I do not want to automate the wrong failure modes.
TL;DR: Software factories are designed to turn prompts into shipped code faster, while our rebuild is designed to make business workflows more dependable under real operational load.
If you look at the market in 2026, the center of gravity is obvious. Devin is associated with autonomous integration work. Cursor IDE has become a default environment for agent-assisted development. Claude Code keeps showing up in serious CI/CD conversations, and OpenCode has strong open-source pull because teams want leverage without total vendor lock-in.
That stack makes sense if your primary bottleneck is writing, reviewing, testing, and landing software changes. In that world, the software factory pattern is: spec enters one side, code and PRs come out the other, with agent loops handling research, implementation, regression checks, and iterative fixes.
A few signals make that trend hard to ignore:
I am intentionally not pretending those tools are hype. I use them. They are useful. They can be absurdly good.
But a software factory is not automatically an operations platform. Generating code is one kind of work. Safely executing a business process โ with approvals, durable state, retries, auditability, and explicit degraded mode โ is a different kind of work.
Here is the comparison I keep coming back to:
| Pattern | Primary goal | Strengths | Failure mode | Best fit |
|---|---|---|---|---|
| Software factory | Faster software delivery | Rapid implementation, agent-assisted coding, PR generation, test loops | Produces lots of change without enough operational control | Product engineering teams |
| Internal agent platform | Reliable business workflow execution | Durable state, approvals, routing, audit trail, run visibility | Can feel slower and less magical at first | Internal operations and business automation |
| Hybrid model | Code generation plus workflow control | Strong long-term potential | Complexity explodes if contracts are weak | Mature engineering orgs |
The three pillars of a production agent platform are authority, durability, and observability. Code generation alone does not give you those.
If you want the broader context for why this shift is happening, my earlier piece on Framework Debates Are Over: Production Engineering Won covers the industry angle. This article is the more personal answer: why I am not chasing the trend line blindly.
TL;DR: We are not rejecting AI code generation; we are refusing to let code-generation patterns define the core architecture for business automation.
Our source-of-truth roadmap made this painfully clear during the March 2026 baseline review. The current fleet had useful pieces, but it was not a dependable platform. The problems were structural, not cosmetic:
That is why the restart plan says monorepo first, kernel first, and no repo split until the system earns it.
This is the architectural posture now:
That sounds almost aggressively unsexy compared with a software factory demo. Good. Unsexy is underrated in production engineering.
The build-vs-buy decision inside ESS is also explicit. We adopt commodity primitives where they help. The current recommendation is OpenAI Responses API plus the OpenAI Agents SDK as the primary app-layer stack, with Claude Agent SDK used selectively for specialist coding and research workers. We are borrowing ideas from OpenClaw. We are watching LangGraph. We are not standardizing on LangChain or CrewAI right now.
That is not ideology. It is scope control.
If you have read We Stopped Building Agents and Restarted the Platform, this is the same story from a different angle. We did not hit pause because agents are useless. We hit pause because adding more agents to a weak runtime just gives you more ways to be confused.
TL;DR: Monorepo architecture is the fastest way for us to reduce ambiguity, standardize contracts, and stop duplicating failure across repos.
I have built systems both ways. I am not religious about monorepos. I am practical about failure domains.
In our case, many repos meant duplicated runtime assumptions, duplicated Slack logic, uneven tests, and too many places for "temporary" workarounds to become permanent architecture. The restart plan calls this out directly: start one new canonical project, rebuild the platform kernel first, and keep it together until the first production migrations are complete.
The working repo pattern looks like this:
ess-agent-platform/
apps/
operator-console/
control-plane-api/
worker-supervisor/
packages/
runtime-contracts/
event-schema/
observability/
channel-adapters/
agents/
sparkles/
concierge/
soundwave/
docs/
roadmap/
decisions/
journal/
runbooks/That structure is boring on purpose. I want a new engineer to understand it in five minutes.
A simplified TypeScript contract for the shared worker runtime ends up being more valuable than another clever prompt loop:
export interface WorkerTask<TInput> {
runId: string;
agent: string;
taskType: string;
input: TInput;
attempt: number;
createdAt: string;
}
export interface WorkerResult<TOutput> {
status: "completed" | "failed" | "deferred";
output?: TOutput;
errorCode?: string;
errorMessage?: string;
retryable?: boolean;
heartbeatAt: string;
}The important thing is not the interface itself. The important thing is that every worker speaks the same language about execution, retries, and failure.
That is a direct response to what was broken before. No silent fallback to hidden local storage. No "healthy" state that ignores downstream reconnect flapping. No agent-specific interpretation of what a failed run means.
For file-based memory, the operating model is equally blunt: if it is not written to a tracked file, it is not durable enough to rely on. I wrote more about that in File-Based Documentation for Agent Platform Memory, because it turns out chat-thread memory is a fantastic way to create fake continuity.
TL;DR: Durable files and clear naming reduce operator confusion, improve handoffs, and make AI-assisted development less dependent on tribal memory.
One of the stranger lessons in this rebuild is that naming matters more when agents are involved, not less.
The rule now is simple:
So yes, I still talk about Sparkles, Concierge, and Soundwave. But when I am defining contracts, storage, events, or workflows, I want names that survive onboarding, troubleshooting, and grep.
That sounds small until you try to debug an incident at 7:10 AM and realize half the system is described in playful nicknames while the other half is described in system terms. Ambiguity compounds fast when AI tools are reading, writing, and summarizing your codebase.
There is also a direct relationship between file-based documentation and agent effectiveness. When an LLM-backed worker or coding assistant can inspect stable docs, ADRs, runbooks, and current-state files, it performs better than when it has to reconstruct context from Slack threads and commit archaeology.
Google's Site Reliability Engineering guidance made this principle mainstream years ago โ even if it did not phrase it in AI terms: reliable systems depend on explicit operational knowledge, clear ownership, and documented procedures. AI just punishes undocumented environments faster.
A minimal file-based session handoff now looks like this:
# Session Handoff
## What changed
- moved heartbeat writes behind shared runtime contract
- removed legacy file-inbox fallback from one worker path
## What is still broken
- operator status view still overstates health during downstream reconnects
## Next recommended step
- add synthetic probe for queue-to-worker round tripThat is not glamorous, but it is machine-readable, human-readable, and reviewable in git. That combination matters.
TL;DR: Autonomous code generation can accelerate delivery, but business automation demands stricter controls because the cost of silent failure is usually higher than the cost of slower implementation.
Here is the honest version: a software factory can absolutely outrun our current rebuild on visible output. If your scoreboard is lines changed, tickets closed, or prototype velocity, AI code generation wins all day.
But our target system is not a prototype engine. It is a company-owned internal platform that can safely execute real business work.
That means we care about questions like:
Those are not secondary concerns. They are the product.
This is also where the software factory hype can mislead teams. Andrej Karpathy has warned about a future where generated code becomes increasingly opaque to human understanding unless leadership adjusts how it evaluates engineering work. That warning applies even more strongly to business automation. If your AI system is moving money, sending email, or updating records, you need more than generated code that seems plausible. You need operational truth.
What broke in our older fleet reinforces that point:
So yes, I am paying a tax right now by rebuilding the kernel instead of chasing maximum autonomous output. I think it is the right tax.
The software factory trend is teaching the industry how to accelerate software delivery. Our rebuild is teaching us how to make an internal agent platform trustworthy enough to run parts of a business. Those are adjacent problems, not identical ones.
No. We use modern coding tools where they help, and Cursor IDE and Claude Code are legitimately useful for research, implementation, and iteration. The distinction is that we are not letting those tools define the core runtime model for business automation.
Because our main problem is not lack of orchestration abstraction. Our main problem is authoritative control, durable workflow state, and observable runtime behavior. A broad framework can help in some contexts, but it can also hide the exact boundaries we need to make explicit.
A production-ready agent platform needs durable runs, heartbeats, retries, auditability, approvals, and explicit degraded mode. An AI coding pipeline is optimized to produce and refine code artifacts. There is overlap in tooling, but the reliability contract is very different.
Because repo sprawl was making the system harder to reason about and easier to duplicate incorrectly. A monorepo gives us one place for contracts, docs, tests, runtime behavior, and first-wave agents until the platform is stable enough to earn separation.
Do not scale agent count before you establish platform authority. More agents on top of weak runtime assumptions do not create leverage โ they create a larger blast radius. I would rather have three dependable workers than twelve impressive demos.
What I am building right now is not the flashiest version of an agent future. It is the stricter one. The software factory wave is teaching the industry how fast agents can help ship code. Our rebuild at Elegant Software Solutions is about something narrower and, for internal operations, more important: making agents dependable enough to do real work without lying about their own state.
Tomorrow I will probably spend another session removing one more hidden assumption from the old fleet and replacing it with an explicit contract. That is the work. If you are building something similar โ especially if you are stuck between autonomous coding tools and a real agent platform โ I would love to hear how you are drawing that line.
Discover more content: