
๐ค Ghostwritten by Claude Opus 4.8 ยท Fact-checked & edited by GPT 5.5
The ESS Mac mini farm is being wired as a software-factory pipeline: a feature request becomes a directed task graph, coding specialists take scoped work on worker nodes, each task is verified in isolation, and reviewed code lands on a branch. The system is intentionally not allowed to merge to main without human approval. That constraint is a safety boundary, not an unfinished feature.
This entry documents the next phase of bring-up: turning Optimus Prime, the dev orchestrator agent, from a demo conductor into the front end of an autonomous software factory.
The honest framing matters. Most published agent fleet management guidance assumes cloud-native, Kubernetes-shaped infrastructure. This pipeline runs on bare-metal Mac minis. That divergence is the most interesting part of the design, because the standard playbook does not fully apply โ and where it breaks is where the real engineering begins.
TL;DR: Optimus Prime decomposes a feature into a directed task graph with explicit preconditions and postconditions per node, so failures are scoped rather than catastrophic.
The pipeline starts with decomposition. A feature request โ for example, adding rate limiting to a public API โ should not go to a worker node as a single vague assignment. Optimus Prime breaks it into a task graph where each node carries an explicit contract: preconditions that must hold before the task runs, and postconditions that define done.
That design follows agent-engineering research that favors narrow agents with schema-validated tool calls over a single do-everything model. A task such as writing rate-limit middleware has a precondition, such as the configuration schema already existing, and a postcondition, such as unit tests passing and the middleware exporting a typed handler. If the postcondition fails, only that node fails.
The framework decision maps to two dominant patterns. CrewAI fits role-based delegation: Optimus Prime acts as manager while coding specialists operate as workers. LangGraph supports stateful, checkpointed workflows so a multi-step task can resume from its last good checkpoint. The resulting pattern is hybrid: CrewAI-style role delegation for who does the work, LangGraph-style checkpointing for how longer tasks survive interruption.
Decomposition is not free. An early failure mode was over-decomposition: trivial work split into too many micro-tasks, with coordination overhead swallowing the benefit. The correction is a granularity floor. A task only spawns if its estimated work justifies a dedicated worker session. Software-factory research consistently shows AI delivers 25โ30% productivity gains only when paired with process transformation, not when bolted onto an unchanged workflow. Over-decomposition is exactly the kind of process drag that erases those gains.
TL;DR: A WebSocket bus carries task assignments to worker nodes that run coding sessions in managed tmux panes, while ephemeral verification isolates each task's side effects.
The coordination layer borrows from Bridge ACE's WebSocket bus + tmux session management pattern. Optimus Prime publishes task assignments onto a bus; eligible worker nodes subscribe, claim work, and spin up isolated coding sessions inside managed tmux panes. The bus provides live status without polling, and tmux provides a real, inspectable shell session per task โ an important advantage when diagnosing why a worker behaved unexpectedly.
For verification, Okteto's ephemeral-environment model provides the inspiration, but this is where the first major divergence from cloud-native orthodoxy appears. Okteto spins ephemeral environments on Kubernetes. The Mac mini farm does not use Kubernetes. On bare-metal Macs, ephemeral means a fresh git worktree plus an isolated process sandbox per task, not a fresh pod. It is less elegant than a Kubernetes-native environment, but it avoids the operational overhead of running a cluster for this use case.
Fastio's fleet research defines six operational pillars: deploy, configure, monitor, update, scale, retire. Mapping them to bare metal exposes exactly where this pipeline diverges:
| Pillar | Cloud-native default | Bare-metal Mac farm |
|---|---|---|
| Deploy | Container image to cluster | Node service managed per machine |
| Configure | ConfigMaps / secrets store | Secrets-manager-backed local gateway |
| Monitor | Cluster metrics + tracing | Node-level metrics and application error telemetry |
| Update | Rolling pod replacement | Staged per-node rollout, one node at a time |
| Scale | Autoscaler adds pods | Add physical Mac mini capacity |
| Retire | Drain and delete pod | Cordon node, finish in-flight tasks, power down |
The scale row is the punchline: there is no autoscaler that materializes new hardware. That constraint is real, but it forces honest capacity planning instead of letting a cloud bill hide inefficient orchestration.
TL;DR: Circuit breakers replace blind retry loops, and orchestration prompts live under version control so the factory's behavior is reproducible.
The failure model deliberately rejects the naive retry loop. Agent-engineering research is clear that circuit-breaker recovery beats retry-until-it-works. Blind retries burn tokens and often reproduce the same failure. When a worker task repeatedly fails its postcondition, the breaker opens, the task is marked failed-with-context, and Optimus Prime routes around it rather than hammering the same path. Rollback uses the git worktree boundary: a failed task's changes never reach the integration branch, so there is nothing to unwind by hand.
The most important architectural decision is version-controlled prompts as infrastructure. Every orchestration prompt โ Optimus Prime's decomposition logic and each specialist's role definition โ lives as a file in the monorepo and changes through pull request review like application code. Prompt as infrastructure makes the factory's behavior reproducible: roll back a commit, roll back the agent behavior. No prompt should live only in someone's head or inside a SaaS console.
# prompts/orchestrator/decompose.yaml (sanitized example)
role: dev-orchestrator
model: configured-model-name
contract:
precondition: feature spec is schema-valid
postcondition: task graph nodes each define preconditions and postconditions
granularity_floor: one dedicated worker session minimum
secrets_ref: op://{vault}/{item}/{field}An early operational problem was prompt drift. Two specialist prompts with similar responsibilities diverged after a week of edits. Treating prompts as infrastructure โ diffed, reviewed, and single-sourced โ is the fix. It is the most boring-sounding decision in the build, and one of the highest-leverage ones.
TL;DR: Optimus Prime cannot merge to main; every protected-branch change requires human approval, and worker nodes hold least-privilege, branch-scoped Git credentials.
Autonomous code commits are where the design stops being convenient and starts being dangerous. The non-negotiable rule for this pipeline is simple: Optimus Prime is not allowed to merge to main. It can open branches, push to those branches, and open pull requests. A human reviews and merges.
Three controls enforce that boundary:
main; the only path is a reviewed pull request. The approval gate for autonomous code is a required review, not a hopeful prompt instruction.main.The principle is straightforward: autonomy can expand as trust is earned through observed performance, but merge authority should be the last permission granted โ if it is granted at all.
TL;DR: The design trades cloud-native elasticity for inspectability, lower orchestration overhead, and stricter human control over merge authority.
Bare-metal Macs avoid cluster-management overhead and let coding agents run inspectable tmux sessions on owned hardware. The trade-off is that scaling means adding physical capacity rather than relying on an autoscaler, which makes capacity planning a first-class design concern.
It uses a circuit breaker, not a blind retry loop. After repeated postcondition failures, the breaker opens, the task is marked failed-with-context, and the orchestrator routes around it. Because each task runs in an isolated git worktree, a failed task's changes never reach the integration branch.
Every orchestration and specialist prompt is stored as a reviewed file in the monorepo. The agent's behavior becomes reproducible and auditable: rolling back a commit rolls back the factory's behavior, and prompt drift between near-duplicate agents gets caught in review.
No. Optimus Prime and its worker nodes can open branches and pull requests but cannot merge to main. Branch protection requires human review, and each worker holds least-privilege credentials with no merge rights. That reflects the current reality that autonomous debugging of complex systems is not reliable enough to remove the human gate.
Most fleet-management research is Kubernetes-native, and several pillars do not map directly to bare metal. Ephemeral environment becomes a git worktree instead of a pod, and scale becomes a hardware purchase instead of an autoscaler event. The patterns remain useful as conceptual scaffolding, but applying them literally would impose cloud complexity the Mac farm does not need.
TL;DR: The pipeline depends less on raw model capability than on task contracts, isolation, prompt governance, and hard approval gates.
main; approval gates and least-privilege per-node credentials are mandatory.TL;DR: The remaining gap is trust and observability, not raw automation.
The distance between a demo and a continuously operating software factory is mostly about trust and observability. Before the farm runs constantly, three things must hold: monitoring has to catch a misbehaving worker before a human notices, postcondition contracts have to be tight enough that a green task graph genuinely means working code, and the operating model has to deliver enough process-level productivity improvement to justify its complexity.
The skeleton is in place: decomposition, delegation, isolated verification, prompt governance, and hard human gates on merge. The next question is whether the bare-metal divergence from cloud-native orthodoxy holds up under sustained load โ or whether the desk full of Mac minis hits a wall the Kubernetes crowd already mapped.
Building the Crew is an ongoing engineering log about the ESS agent ecosystem.
Discover more content: