🤖 Ghostwritten by GPT 5.4 · Fact-checked & edited by Claude Opus 4.6 · Curated by Tom Hundley

File-Based Agent Platform Documentation That Works

The short answer: our file-based operating model fixed the handoff problem better than more chat history, more Slack threads, or more agent cleverness ever did. During the Elegant Software Solutions agent platform rebuild, we adopted one blunt rule: if a decision, lesson, handoff, or current state is not written to a tracked file, it is not real enough to rely on. That single rule changed how we document software architecture decisions, how agents hand work to each other, and how a tired engineer can restart a broken session without guessing.

This week I stopped pretending tribal knowledge was a temporary inconvenience. It was a structural bug. The first version of our crew had useful parts — Sparkles as the Slack control surface, Concierge as a general-purpose helper, Soundwave for communications — but the platform around them was too dependent on memory in chat, local machine context, and whatever I happened to remember that day. That is how you get split-brain behavior, silent degradation, and endless re-litigation.

The rebuild is different. We now treat agent platform documentation as part of runtime reliability, not a side quest. And that matters because industry teams are hitting the same wall. GitHub's 2024 Octoverse report found that 97% of surveyed developers reported using AI coding tools at work or personally, but adoption does not automatically create durable engineering memory. Meanwhile, enterprises like Fujitsu have publicly discussed end-to-end AI software factory approaches, signaling the market is moving from isolated copilots to operational systems. Durable handoffs are now infrastructure.

Why file-based project management beat chat memory

TL;DR: Chat is good for momentum; files are good for continuity, accountability, and clean AI agent handoffs.

The old pattern at ESS was pretty typical: discuss a direction in Slack, maybe test an idea in a local branch, maybe write a note somewhere, then assume the next engineer — or the next agent run — would "have context." That works right up until it doesn't. And when it fails, it fails in the most expensive way possible: everyone thinks someone else already decided the thing.

Our accepted working model now says durable project memory lives in files under version control. Not in chat. Not in a single engineer's head. Not in repo-local folklore. That sounds obvious, but in practice it forces discipline across the entire agent platform rebuild.

Here is the core distinction we were missing:

Medium	Good at	Bad at	Operational risk
Chat threads	Fast exploration, brainstorming, quick coordination	Canonical truth, precise history, durable handoffs	Context drift, contradictory interpretations
Wiki pages	Broad reference material	Day-to-day execution continuity	Gets stale unless tied to active workflow
Source-controlled files	Decisions, current state, handoffs, lessons, architecture history	Casual collaboration speed	Lower speed upfront, much higher long-term clarity
Ticket comments	Narrow task updates	Cross-cutting architecture context	Fragmented decision trail

The practical effect is simple: files lower cognitive reload time. When I come back to a subsystem after two bad nights of sleep and three unrelated incidents, I do not want "roughly what we meant." I want a stable filename that tells me what the current truth is, when it changed, and whether it was an approved decision or just a session breadcrumb.

That is also why this model pairs well with modern agent systems. A worker can summarize a thread, but it cannot infer authority from chaos. If you want reliable AI crew management, you need authoritative artifacts. This is the same reason I keep pointing people to State Management: Why Chatbots Forget (And How to Fix It). Memory is not state. Retrieval is not truth.

A definitive statement I am comfortable making now: the memory system for a production agent platform should be version-controlled files, not conversational residue.

The file structure we actually use

TL;DR: Separate stable truth from temporal truth, and give each class of knowledge exactly one home.

The file-based operating model only works if the structure is boring enough to follow under stress. We now organize the rebuild around a small set of durable document types:

vision and target architecture files
current-state assessments
ADRs (architecture decision records)
execution plans
dated journal entries
session handoffs
agent roster and system inventory
best-of-breed and build-vs-buy references

A simplified version looks like this:

ess-agent-platform/
  docs/
    roadmap/
      01-VISION.md
      02-CURRENT-STATE.md
      03-TARGET-ARCHITECTURE.md
      06-EXECUTION/
        FILE_BASED_OPERATING_MODEL.md
        FILE_NEW_PROJECT_RESTART_PLAN.md
      08-INVENTORY/
        AGENT_ROSTER.md
      09-REFERENCES/
        BUILD_VS_BUY_EVALUATION.md
        BEST_OF_BREED_TRACKER.md
    adr/
      ADR-001-monorepo-first.md
      ADR-002-business-names-in-code.md
    sessions/
      2026-03-16-control-plane-handoff.md
    journal/
      2026-03-14-07-what-we-are-building-instead.md

The important design rule is not the folder names. It is the separation of stable truth and temporal truth.

Stable truth

Stable truth uses stable filenames. These are living documents that should answer questions like:

What are we building?
What is broken right now?
What architecture are we targeting?
What is the approved stack?
What does each agent do for the operator?

For us, 01-VISION.md, 02-CURRENT-STATE.md, and 03-TARGET-ARCHITECTURE.md are canonical examples.

Temporal truth

Temporal truth is date-stamped and intentionally historical. These files answer different questions:

What happened in this work session?
What broke?
What changed my mind?
What should the next engineer do first?

That is where session notes and journal entries live. They are not less important. They are just not the same thing as a living architecture file.

Create an isometric architectural illustration on a dark workshop-style background with warm amber, steel gray, and electric blue accents. The left zone is labeled "Stable Truth" and shows sturdy fili

This structure also helps avoid a common anti-pattern in agent platform documentation: one giant "notes" file that becomes an archaeological dig. If you cannot tell whether a statement is current, historical, proposed, or abandoned, the file is worse than useless.

If you want the adjacent architecture view, Designing Agent Workflows: Architecture for AI Automation covers the workflow side of the same problem.

Business names in code, codenames for humans

TL;DR: Use boring names where machines and engineers need precision, and fun names where operators need memorability.

One of my favorite decisions in this rebuild is also one of the least glamorous: business names in code, codenames for humans. We still talk about Sparkles, Concierge, and Soundwave because codenames are memorable. They make the crew easier to discuss. But in file paths, APIs, logs, schemas, and infrastructure, operational clarity wins.

This came directly out of cleanup work. When a system is already too wide, fuzzy naming makes everything worse. You start asking questions like:

Is this repo the same thing as that agent?
Is "control" the runtime, the bot, or the scheduler?
Did we rename the concept or only the codename?

That is exactly the sort of ambiguity that poisons AI agent handoffs.

So the rule now is:

use business-descriptive names in code and docs where precision matters
allow codenames as secondary labels for operator discussion and storytelling
never let codenames become the only identifier for a production component

Here is a sanitized example of what that looks like in config:

agents:
  - id: slack-operator-surface
    codename: Sparkles
    role: operator_interface
  - id: general-purpose-assistant
    codename: Concierge
    role: specialist_worker
  - id: email-communications-agent
    codename: Soundwave
    role: channel_adapter

This sounds small until you see the downstream impact. Clear names improve searchability, onboarding, auditability, and prompt grounding. They also help when you compare choices to external frameworks — our implementation still needs names that match our business domains and runtime contracts.

And this is not just a personal preference. The Stack Overflow Developer Survey has consistently shown documentation quality as a major developer concern, and unclear naming is one of the fastest ways to make documentation expensive to trust. Every senior engineer has paid this tax.

What we create after each session

TL;DR: Every meaningful session should leave code, a breadcrumb, and an updated truth source if the session changed reality.

The phrase we use internally is simple: every session must leave breadcrumbs. The point is not bureaucratic paperwork. The point is to make continuation cheap.

After a real work session on the platform, we usually create or update some combination of the following:

code changes
a session handoff note
a journal entry if something important broke or surprised us
an ADR if a decision became ratified
a canonical roadmap/current-state file if the platform truth changed

A minimal session handoff template looks like this:

# Session Handoff - 2026-03-16

## Goal
Stabilize worker heartbeat semantics for control-plane visibility.

## What changed
- normalized heartbeat payload shape
- removed local fallback path from worker startup
- added degraded-mode event when control plane unavailable

## What is confirmed
- worker reports explicit degraded state
- operator surface now shows missing heartbeat separately from process alive

## What is not done
- synthetic probe still pending
- dead-letter replay UX not implemented

## Next best step
Implement probe job that verifies end-to-end task ingestion and result persistence.

That "what is confirmed / what is not done" split matters a lot. One of the biggest lies in fast-moving engineering is the accidental blur between implemented, planned, and imagined. Our current-state review called this out directly: health reporting was too optimistic, and silent degradation had become normalized. So now the docs have to be explicit.

This is also where file-based project management helps AI workers. A coding or research agent can read a stable architecture file, then a dated session note, then produce a constrained next step. Without that file chain, the handoff turns into vibe-based reconstruction.

If you are dealing with agents that "seem healthy" while quietly failing downstream, read Debugging AI Agents: Monitoring and Observability Guide. Documentation and observability are the same reliability problem viewed from different angles.

How this stopped re-litigation during the rebuild

TL;DR: Re-litigation happens when teams cannot tell what was decided, why it was decided, or whether the decision is still active.

The first platform died a thousand small deaths from re-litigation. Not because debate is bad, but because unresolved history keeps leaking back into the present.

Here is the pattern we kept hitting before the rebuild:

discuss monorepo versus many repos
partially decide it
encode the opposite behavior somewhere else
come back two weeks later and argue from memory
repeat because no one trusts the artifact trail

The restart plan now says one monorepo until the system earns the right to split again. That is not a forever religion. It is an explicit response to repo sprawl, duplicated logic, and poor reasoning about the active system. By writing that down in a canonical plan, we remove a lot of fake uncertainty.

The same applies to control-plane authority. We documented that the old runtime could silently fall back from the database-backed control plane to a legacy file inbox, creating split-brain behavior. Once that lesson became a tracked file, it stopped being a half-remembered complaint and became a design constraint.

A practical rule emerged from this: software architecture decisions should become files before they become assumptions.

That is the difference between engineering documentation as compliance theater and engineering documentation as an operating system for the team.

There is also a broader industry reason this matters right now. As more teams move from single-agent prototypes to orchestrated systems, AI crew management becomes a documentation problem as much as a model problem. Enterprises want end-to-end delivery systems, not disconnected demos. If your handoffs are weak, the factory stalls.

Frequently Asked Questions

Q: What is the difference between file-based project management and just writing better wiki pages?

File-based project management puts durable engineering context under version control, close to the code, with clear ownership and history. A wiki can still be useful, but in our rebuild it was too easy for broad reference pages to drift away from the active implementation. Files win when you need architecture truth, session continuity, and AI agent handoffs that can be replayed deterministically.

Q: How do files improve AI agent handoffs in practice?

They give agents a constrained context chain: current architecture, latest decisions, recent session notes, and the exact next step. That is much more reliable than asking an agent to infer intent from long chat threads. In practice, it reduces ambiguity about what is approved, what is experimental, and what is already known to be broken.

Q: What files should an agent platform rebuild create first?

Start with vision, current state, target architecture, and one file that defines the operating model for decisions and handoffs. Then add ADRs, a dated session-handoff folder, and an operator-facing agent roster. Those six categories cover most of the failure modes that create tribal knowledge and re-litigation.

Q: Why not let the control plane or vector database become the memory system?

Because operational memory is not the same as runtime state or semantic retrieval. A control plane tracks runs, events, and heartbeats; a vector store helps retrieve related content. Neither replaces explicit, reviewed, versioned statements about architecture, policy, and what changed yesterday. They complement each other, but files remain the source of authority.

Q: When should a team use ADRs versus session notes?

Use ADRs when a decision is ratified and should govern future implementation unless explicitly superseded. Use session notes for temporal breadcrumbs: what happened, what was tried, what is confirmed, and what the next engineer should do. ADRs define policy; session notes preserve momentum.

Key Takeaways

File-based project management is the most practical fix I have found for agent platform documentation debt.
Stable truth and temporal truth should live in different file types.
If a decision is not in a tracked file, it is too fragile to depend on.
Business names in code reduce ambiguity; codenames still help humans talk about the crew.
Every meaningful session should leave breadcrumbs, not just code.
Reliable AI agent handoffs require authoritative artifacts, not reconstructed chat context.
Re-litigation is usually a documentation failure before it becomes a team failure.

Conclusion

What changed for us was not some exotic framework choice. It was admitting that the first platform had a memory problem, and that memory problems become reliability problems. The file-based operating model gave the rebuild a spine: one place for current truth, one place for historical breadcrumbs, and a much cleaner way for Sparkles, Concierge, Soundwave, and the humans around them to keep moving without guessing.

I am still rebuilding this in public, and I am sure we will find rough edges in the model too. But this part already feels durable. Tomorrow I will probably be back in the weeds on control-plane authority, typed worker contracts, or another thing the old fleet made harder than it needed to be. If you are building something similar, I would love to hear how you are handling engineering documentation, software architecture decisions, and agent handoffs as your system grows.