
🤖 Ghostwritten by GPT 5.4 · Fact-checked & edited by Claude Opus 4.6 · Curated by Tom Hundley
The short answer: our file-based operating model fixed the handoff problem better than more chat history, more Slack threads, or more agent cleverness ever did. During the Elegant Software Solutions agent platform rebuild, we adopted one blunt rule: if a decision, lesson, handoff, or current state is not written to a tracked file, it is not real enough to rely on. That single rule changed how we document software architecture decisions, how agents hand work to each other, and how a tired engineer can restart a broken session without guessing.
This week I stopped pretending tribal knowledge was a temporary inconvenience. It was a structural bug. The first version of our crew had useful parts — Sparkles as the Slack control surface, Concierge as a general-purpose helper, Soundwave for communications — but the platform around them was too dependent on memory in chat, local machine context, and whatever I happened to remember that day. That is how you get split-brain behavior, silent degradation, and endless re-litigation.
The rebuild is different. We now treat agent platform documentation as part of runtime reliability, not a side quest. And that matters because industry teams are hitting the same wall. GitHub's 2024 Octoverse report found that 97% of surveyed developers reported using AI coding tools at work or personally, but adoption does not automatically create durable engineering memory. Meanwhile, enterprises like Fujitsu have publicly discussed end-to-end AI software factory approaches, signaling the market is moving from isolated copilots to operational systems. Durable handoffs are now infrastructure.
TL;DR: Chat is good for momentum; files are good for continuity, accountability, and clean AI agent handoffs.
The old pattern at ESS was pretty typical: discuss a direction in Slack, maybe test an idea in a local branch, maybe write a note somewhere, then assume the next engineer — or the next agent run — would "have context." That works right up until it doesn't. And when it fails, it fails in the most expensive way possible: everyone thinks someone else already decided the thing.
Our accepted working model now says durable project memory lives in files under version control. Not in chat. Not in a single engineer's head. Not in repo-local folklore. That sounds obvious, but in practice it forces discipline across the entire agent platform rebuild.
Here is the core distinction we were missing:
| Medium | Good at | Bad at | Operational risk |
|---|---|---|---|
| Chat threads | Fast exploration, brainstorming, quick coordination | Canonical truth, precise history, durable handoffs | Context drift, contradictory interpretations |
| Wiki pages | Broad reference material | Day-to-day execution continuity | Gets stale unless tied to active workflow |
| Source-controlled files | Decisions, current state, handoffs, lessons, architecture history | Casual collaboration speed | Lower speed upfront, much higher long-term clarity |
| Ticket comments | Narrow task updates | Cross-cutting architecture context | Fragmented decision trail |
The practical effect is simple: files lower cognitive reload time. When I come back to a subsystem after two bad nights of sleep and three unrelated incidents, I do not want "roughly what we meant." I want a stable filename that tells me what the current truth is, when it changed, and whether it was an approved decision or just a session breadcrumb.
That is also why this model pairs well with modern agent systems. A worker can summarize a thread, but it cannot infer authority from chaos. If you want reliable AI crew management, you need authoritative artifacts. This is the same reason I keep pointing people to State Management: Why Chatbots Forget (And How to Fix It). Memory is not state. Retrieval is not truth.
A definitive statement I am comfortable making now: the memory system for a production agent platform should be version-controlled files, not conversational residue.
TL;DR: Separate stable truth from temporal truth, and give each class of knowledge exactly one home.
The file-based operating model only works if the structure is boring enough to follow under stress. We now organize the rebuild around a small set of durable document types:
A simplified version looks like this:
ess-agent-platform/
docs/
roadmap/
01-VISION.md
02-CURRENT-STATE.md
03-TARGET-ARCHITECTURE.md
06-EXECUTION/
FILE_BASED_OPERATING_MODEL.md
FILE_NEW_PROJECT_RESTART_PLAN.md
08-INVENTORY/
AGENT_ROSTER.md
09-REFERENCES/
BUILD_VS_BUY_EVALUATION.md
BEST_OF_BREED_TRACKER.md
adr/
ADR-001-monorepo-first.md
ADR-002-business-names-in-code.md
sessions/
2026-03-16-control-plane-handoff.md
journal/
2026-03-14-07-what-we-are-building-instead.mdThe important design rule is not the folder names. It is the separation of stable truth and temporal truth.
Stable truth uses stable filenames. These are living documents that should answer questions like:
For us, 01-VISION.md, 02-CURRENT-STATE.md, and 03-TARGET-ARCHITECTURE.md are canonical examples.
Temporal truth is date-stamped and intentionally historical. These files answer different questions:
That is where session notes and journal entries live. They are not less important. They are just not the same thing as a living architecture file.
This structure also helps avoid a common anti-pattern in agent platform documentation: one giant "notes" file that becomes an archaeological dig. If you cannot tell whether a statement is current, historical, proposed, or abandoned, the file is worse than useless.
If you want the adjacent architecture view, Designing Agent Workflows: Architecture for AI Automation covers the workflow side of the same problem.
TL;DR: Use boring names where machines and engineers need precision, and fun names where operators need memorability.
One of my favorite decisions in this rebuild is also one of the least glamorous: business names in code, codenames for humans. We still talk about Sparkles, Concierge, and Soundwave because codenames are memorable. They make the crew easier to discuss. But in file paths, APIs, logs, schemas, and infrastructure, operational clarity wins.
This came directly out of cleanup work. When a system is already too wide, fuzzy naming makes everything worse. You start asking questions like:
That is exactly the sort of ambiguity that poisons AI agent handoffs.
So the rule now is:
Here is a sanitized example of what that looks like in config:
agents:
- id: slack-operator-surface
codename: Sparkles
role: operator_interface
- id: general-purpose-assistant
codename: Concierge
role: specialist_worker
- id: email-communications-agent
codename: Soundwave
role: channel_adapterThis sounds small until you see the downstream impact. Clear names improve searchability, onboarding, auditability, and prompt grounding. They also help when you compare choices to external frameworks — our implementation still needs names that match our business domains and runtime contracts.
And this is not just a personal preference. The Stack Overflow Developer Survey has consistently shown documentation quality as a major developer concern, and unclear naming is one of the fastest ways to make documentation expensive to trust. Every senior engineer has paid this tax.
TL;DR: Every meaningful session should leave code, a breadcrumb, and an updated truth source if the session changed reality.
The phrase we use internally is simple: every session must leave breadcrumbs. The point is not bureaucratic paperwork. The point is to make continuation cheap.
After a real work session on the platform, we usually create or update some combination of the following:
A minimal session handoff template looks like this:
# Session Handoff - 2026-03-16
## Goal
Stabilize worker heartbeat semantics for control-plane visibility.
## What changed
- normalized heartbeat payload shape
- removed local fallback path from worker startup
- added degraded-mode event when control plane unavailable
## What is confirmed
- worker reports explicit degraded state
- operator surface now shows missing heartbeat separately from process alive
## What is not done
- synthetic probe still pending
- dead-letter replay UX not implemented
## Next best step
Implement probe job that verifies end-to-end task ingestion and result persistence.That "what is confirmed / what is not done" split matters a lot. One of the biggest lies in fast-moving engineering is the accidental blur between implemented, planned, and imagined. Our current-state review called this out directly: health reporting was too optimistic, and silent degradation had become normalized. So now the docs have to be explicit.
This is also where file-based project management helps AI workers. A coding or research agent can read a stable architecture file, then a dated session note, then produce a constrained next step. Without that file chain, the handoff turns into vibe-based reconstruction.
If you are dealing with agents that "seem healthy" while quietly failing downstream, read Debugging AI Agents: Monitoring and Observability Guide. Documentation and observability are the same reliability problem viewed from different angles.
TL;DR: Re-litigation happens when teams cannot tell what was decided, why it was decided, or whether the decision is still active.
The first platform died a thousand small deaths from re-litigation. Not because debate is bad, but because unresolved history keeps leaking back into the present.
Here is the pattern we kept hitting before the rebuild:
The restart plan now says one monorepo until the system earns the right to split again. That is not a forever religion. It is an explicit response to repo sprawl, duplicated logic, and poor reasoning about the active system. By writing that down in a canonical plan, we remove a lot of fake uncertainty.
The same applies to control-plane authority. We documented that the old runtime could silently fall back from the database-backed control plane to a legacy file inbox, creating split-brain behavior. Once that lesson became a tracked file, it stopped being a half-remembered complaint and became a design constraint.
A practical rule emerged from this: software architecture decisions should become files before they become assumptions.
That is the difference between engineering documentation as compliance theater and engineering documentation as an operating system for the team.
There is also a broader industry reason this matters right now. As more teams move from single-agent prototypes to orchestrated systems, AI crew management becomes a documentation problem as much as a model problem. Enterprises want end-to-end delivery systems, not disconnected demos. If your handoffs are weak, the factory stalls.
File-based project management puts durable engineering context under version control, close to the code, with clear ownership and history. A wiki can still be useful, but in our rebuild it was too easy for broad reference pages to drift away from the active implementation. Files win when you need architecture truth, session continuity, and AI agent handoffs that can be replayed deterministically.
They give agents a constrained context chain: current architecture, latest decisions, recent session notes, and the exact next step. That is much more reliable than asking an agent to infer intent from long chat threads. In practice, it reduces ambiguity about what is approved, what is experimental, and what is already known to be broken.
Start with vision, current state, target architecture, and one file that defines the operating model for decisions and handoffs. Then add ADRs, a dated session-handoff folder, and an operator-facing agent roster. Those six categories cover most of the failure modes that create tribal knowledge and re-litigation.
Because operational memory is not the same as runtime state or semantic retrieval. A control plane tracks runs, events, and heartbeats; a vector store helps retrieve related content. Neither replaces explicit, reviewed, versioned statements about architecture, policy, and what changed yesterday. They complement each other, but files remain the source of authority.
Use ADRs when a decision is ratified and should govern future implementation unless explicitly superseded. Use session notes for temporal breadcrumbs: what happened, what was tried, what is confirmed, and what the next engineer should do. ADRs define policy; session notes preserve momentum.
What changed for us was not some exotic framework choice. It was admitting that the first platform had a memory problem, and that memory problems become reliability problems. The file-based operating model gave the rebuild a spine: one place for current truth, one place for historical breadcrumbs, and a much cleaner way for Sparkles, Concierge, Soundwave, and the humans around them to keep moving without guessing.
I am still rebuilding this in public, and I am sure we will find rough edges in the model too. But this part already feels durable. Tomorrow I will probably be back in the weeds on control-plane authority, typed worker contracts, or another thing the old fleet made harder than it needed to be. If you are building something similar, I would love to hear how you are handling engineering documentation, software architecture decisions, and agent handoffs as your system grows.
Discover more content: