File-Based Project Management: The Handoff Schema That Replaced Postgres

We ran a Postgres-backed task queue for inter-agent coordination for about four months. It is now gone. In its place: a folder of markdown files, a frontmatter schema, and a git history. This is the deep-dive on why that swap held up, what the schema looks like in practice, and where it absolutely does not work.

The broader monorepo restart story gets its own post later this month. This one is narrower — just the project-management layer between agents.

Where we started: a task table, predictably

The first version of crew coordination was the version anyone would write. A tasks table in Postgres with the obvious columns: id, requester_agent, executor_agent, status, payload jsonb, created_at, updated_at. A worker loop on each agent polled for status = 'pending' AND executor_agent = $self. A small Slack notifier posted state transitions into the agents channel.

It worked. For one or two agents and a single task shape, it was fine.

The pain, in the order it showed up

Schema migrations every time a new agent joined. Sparkles needed an email_account_id on the payload. Soundwave needed attachments_uri[]. Wheeljack wanted branch_name and pr_url. Each new agent meant either widening the payload jsonb (and watching the implicit contract drift) or adding another columnar bolt-on. We had Alembic running for migrations no human reviewer could meaningfully review.

Race conditions that only happened in production. Two workers occasionally grabbed the same row before the row-level lock landed. We added SELECT … FOR UPDATE SKIP LOCKED. That fixed the duplicate execution but introduced a new failure mode where a crashed worker held an invisible advisory lock that didn't release until session timeout. We were now writing distributed-systems code to coordinate four agents on a Mac mini.

Observability black box. When something went wrong, the answer to "what did Sparkles ask Soundwave to do?" was a Postgres query. The answer to "what was the state of that task three hours ago?" was "nothing, we don't keep history on the row." We added an audit_log table. Now we had two tables to keep in sync.

The killer: prompts couldn't see the work. This was the real one. When Optimus Prime — our orchestration layer — handed a task to Wheeljack, what Wheeljack actually wanted was the briefing: the context, the asks, the constraints, the edge cases the requester already thought through. A row in a database is a terrible briefing. Models read prose, not relational schemas.

The shift: handoffs as versioned files

We started writing handoffs as markdown files in a shared workspace, organized by date and series. Each handoff is a single file with frontmatter as the schema and a free-text body as the briefing. Git tracks history. The agents channel just gets a "new handoff: " ping with a one-line summary.

There is nothing novel here. Maildir-style file queues for agents, markdown task files like tick-md, and hybrid MCP-served mailboxes have been kicking around for the last year. AutoGen, the OpenAI Agents SDK, and LangGraph all treat the handoff as a first-class primitive. Even the recent New Yorker Altman piece — published April 7 — leaned heavily on internal documents and memos rather than ticket-tracker exports, which is a tell. Scratch a coordination problem at any scale and you find a paper trail.

What was novel for us was being honest about which parts of the database we were using as a database and which parts we were abusing as a notebook.

The schema

Every handoff file carries the same frontmatter:

task_id: 2026-04-11-sparkles-to-wheeljack-001
requester: sparkles
executor: wheeljack
status: open    # open | in_progress | blocked | done | rejected
created_at: 2026-04-11T09:14:00-04:00
updated_at: 2026-04-11T09:14:00-04:00
blockers: []
artifacts: []  # paths or URLs the executor should produce
owner_for_review: optimus-prime
parent_task: null
links: []

The body is freeform markdown. Required sections are ## Context, ## Ask, ## Constraints, ## Done When. The executor appends a ## Notes section as it works and a ## Result section on completion. State changes happen in two places: the executor edits the frontmatter status and writes a one-line entry to a per-agent log file. Git records who did what.

Two worked examples

Sparkles → Soundwave (an email task.) Sparkles needs the most recent thread with a specific vendor across all twelve mail accounts. The handoff body says exactly that, names the vendor, names the date floor, and lists the artifact: a single markdown summary written back to a known location. Soundwave reads it, runs the searches, writes the summary, flips status to done, and pings the channel. Sparkles' next loop iteration reads the result file. No row was modified. No migration was needed when, two weeks later, we added a 13th account.

Sparkles → Wheeljack (a code change.) Sparkles has a build-log entry that says "the publisher mis-detects social filters." The handoff to Wheeljack carries the symptom, the relevant module name, the constraint that the fix must not change the publish-time behavior, and the Done When: a passing test plus a draft PR. Wheeljack works in a feature branch, writes its ## Notes as it goes, fills in ## Result with the PR URL, flips status to done, and assigns review to Optimus Prime via owner_for_review. The whole loop is legible to a human reading one file.

Where this breaks

Three places, plainly:

Long workflows with branching. A task that fans out to four sub-tasks, two of which depend on each other, is awkward to express as flat files. We use parent_task and links, but past about three levels of nesting, you start wanting a real graph store. We haven't crossed that threshold; if we do, the answer is probably a small graph DB layered on top of the files, not replacing them.
Transactional consistency. "Either both these tasks complete or neither does" is not something a folder of markdown files will give you. Two-phase anything is out of scope. Where we genuinely need atomicity — token rotations, billing — we route through code, not handoffs.
High-frequency, low-context signals. Heartbeats, queue depth, latency samples. Writing a markdown file for those is absurd. They go to Postgres or directly to the metrics pipeline.

What we kept the database for

Three things, all of which are real database workloads:

Sessions and tokens. OAuth refresh, session state, anything with a TTL. Files don't expire; rows do.
Audit log. A long, append-only table of every state transition across the fleet. Cheaper to query than git log across a thousand files.
The learnings index. A searchable store of cross-project lessons, embedded for retrieval. Different shape, different access pattern, belongs in a database.

The win wasn't "files beat databases." It was admitting that what we had been calling a task queue was actually a document store wearing a relational costume. We moved the documents to where documents live, and we kept the database for the rows that are actually rows. Four months in, no migrations, and every handoff is grep-able.

Key Takeaways

A folder of versioned markdown handoff files with frontmatter schemas replaces a Postgres-backed task queue cleanly for crew-scale agent coordination — under three levels of nesting, with no transactional or high-frequency requirements.
The shift wasn't files-beat-databases. It was admitting that what was being called a task queue was actually a document store wearing a relational costume. Move the documents to where documents live.
Schema is frontmatter; body is the briefing. Required body sections (## Context, ## Ask, ## Constraints, ## Done When) make handoffs legible to both humans and prompts. Models read prose, not relational schemas.
File-based handoffs break for: long workflows with branching dependencies (3+ levels of nesting), transactional consistency (two-phase atomic operations), and high-frequency low-context signals (heartbeats, latency samples).
Three things stayed in Postgres because they're real database workloads: sessions and tokens (TTL-bound), audit logs (append-only across the fleet), and the searchable learnings index. Files for documents, rows for rows.