🤖 Ghostwritten by Claude Opus 4.8 · Fact-checked & edited by GPT 5.5

Software Factory Pipeline: Wiring the Mac Mini Farm

The ESS Mac mini farm is being wired as a software-factory pipeline: a feature request becomes a directed task graph, coding specialists take scoped work on worker nodes, each task is verified in isolation, and reviewed code lands on a branch. The system is intentionally not allowed to merge to main without human approval. That constraint is a safety boundary, not an unfinished feature.

This entry documents the next phase of bring-up: turning Optimus Prime, the dev orchestrator agent, from a demo conductor into the front end of an autonomous software factory.

The honest framing matters. Most published agent fleet management guidance assumes cloud-native, Kubernetes-shaped infrastructure. This pipeline runs on bare-metal Mac minis. That divergence is the most interesting part of the design, because the standard playbook does not fully apply — and where it breaks is where the real engineering begins.

How Work Gets Decomposed Into Tasks

TL;DR: Optimus Prime decomposes a feature into a directed task graph with explicit preconditions and postconditions per node, so failures are scoped rather than catastrophic.

The pipeline starts with decomposition. A feature request — for example, adding rate limiting to a public API — should not go to a worker node as a single vague assignment. Optimus Prime breaks it into a task graph where each node carries an explicit contract: preconditions that must hold before the task runs, and postconditions that define done.

That design follows agent-engineering research that favors narrow agents with schema-validated tool calls over a single do-everything model. A task such as writing rate-limit middleware has a precondition, such as the configuration schema already existing, and a postcondition, such as unit tests passing and the middleware exporting a typed handler. If the postcondition fails, only that node fails.

The framework decision maps to two dominant patterns. CrewAI fits role-based delegation: Optimus Prime acts as manager while coding specialists operate as workers. LangGraph supports stateful, checkpointed workflows so a multi-step task can resume from its last good checkpoint. The resulting pattern is hybrid: CrewAI-style role delegation for who does the work, LangGraph-style checkpointing for how longer tasks survive interruption.

Decomposition is not free. An early failure mode was over-decomposition: trivial work split into too many micro-tasks, with coordination overhead swallowing the benefit. The correction is a granularity floor. A task only spawns if its estimated work justifies a dedicated worker session. Software-factory research consistently shows AI delivers 25–30% productivity gains only when paired with process transformation, not when bolted onto an unchanged workflow. Over-decomposition is exactly the kind of process drag that erases those gains.

Delegation and Worker Node Orchestration

TL;DR: A WebSocket bus carries task assignments to worker nodes that run coding sessions in managed tmux panes, while ephemeral verification isolates each task's side effects.

The coordination layer borrows from Bridge ACE's WebSocket bus + tmux session management pattern. Optimus Prime publishes task assignments onto a bus; eligible worker nodes subscribe, claim work, and spin up isolated coding sessions inside managed tmux panes. The bus provides live status without polling, and tmux provides a real, inspectable shell session per task — an important advantage when diagnosing why a worker behaved unexpectedly.

For verification, Okteto's ephemeral-environment model provides the inspiration, but this is where the first major divergence from cloud-native orthodoxy appears. Okteto spins ephemeral environments on Kubernetes. The Mac mini farm does not use Kubernetes. On bare-metal Macs, ephemeral means a fresh git worktree plus an isolated process sandbox per task, not a fresh pod. It is less elegant than a Kubernetes-native environment, but it avoids the operational overhead of running a cluster for this use case.

The Six Operational Pillars, Bare-Metal Edition

Fastio's fleet research defines six operational pillars: deploy, configure, monitor, update, scale, retire. Mapping them to bare metal exposes exactly where this pipeline diverges:

Pillar	Cloud-native default	Bare-metal Mac farm
Deploy	Container image to cluster	Node service managed per machine
Configure	ConfigMaps / secrets store	Secrets-manager-backed local gateway
Monitor	Cluster metrics + tracing	Node-level metrics and application error telemetry
Update	Rolling pod replacement	Staged per-node rollout, one node at a time
Scale	Autoscaler adds pods	Add physical Mac mini capacity
Retire	Drain and delete pod	Cordon node, finish in-flight tasks, power down

The scale row is the punchline: there is no autoscaler that materializes new hardware. That constraint is real, but it forces honest capacity planning instead of letting a cloud bill hide inefficient orchestration.

Partial Failures, Rollback, and Prompts as Infrastructure

TL;DR: Circuit breakers replace blind retry loops, and orchestration prompts live under version control so the factory's behavior is reproducible.

The failure model deliberately rejects the naive retry loop. Agent-engineering research is clear that circuit-breaker recovery beats retry-until-it-works. Blind retries burn tokens and often reproduce the same failure. When a worker task repeatedly fails its postcondition, the breaker opens, the task is marked failed-with-context, and Optimus Prime routes around it rather than hammering the same path. Rollback uses the git worktree boundary: a failed task's changes never reach the integration branch, so there is nothing to unwind by hand.

The most important architectural decision is version-controlled prompts as infrastructure. Every orchestration prompt — Optimus Prime's decomposition logic and each specialist's role definition — lives as a file in the monorepo and changes through pull request review like application code. Prompt as infrastructure makes the factory's behavior reproducible: roll back a commit, roll back the agent behavior. No prompt should live only in someone's head or inside a SaaS console.

# prompts/orchestrator/decompose.yaml (sanitized example)
role: dev-orchestrator
model: configured-model-name
contract:
  precondition: feature spec is schema-valid
  postcondition: task graph nodes each define preconditions and postconditions
granularity_floor: one dedicated worker session minimum
secrets_ref: op://{vault}/{item}/{field}

An early operational problem was prompt drift. Two specialist prompts with similar responsibilities diverged after a week of edits. Treating prompts as infrastructure — diffed, reviewed, and single-sourced — is the fix. It is the most boring-sounding decision in the build, and one of the highest-leverage ones.

Security: Autonomous Commits Need Hard Gates

TL;DR: Optimus Prime cannot merge to main; every protected-branch change requires human approval, and worker nodes hold least-privilege, branch-scoped Git credentials.

Autonomous code commits are where the design stops being convenient and starts being dangerous. The non-negotiable rule for this pipeline is simple: Optimus Prime is not allowed to merge to main. It can open branches, push to those branches, and open pull requests. A human reviews and merges.

Three controls enforce that boundary:

Approval gates before any protected-branch change. Branch protection rejects direct pushes to main; the only path is a reviewed pull request. The approval gate for autonomous code is a required review, not a hopeful prompt instruction.
Least-privilege Git credentials per worker node. Each worker node carries a narrowly scoped credential brokered through a local secrets gateway: write access to feature branches, no merge rights, and no administrative privileges. A compromised node can create a bad branch, not poison main.
No model inside the final merge trust boundary. Software-factory research is unambiguous that full autonomous debugging of complex systems is not yet reliable. A model that cannot be trusted to debug every subtle race condition should not be the last gate before production.

The principle is straightforward: autonomy can expand as trust is earned through observed performance, but merge authority should be the last permission granted — if it is granted at all.

Frequently Asked Questions

TL;DR: The design trades cloud-native elasticity for inspectability, lower orchestration overhead, and stricter human control over merge authority.

Q: Why use bare-metal Mac minis instead of cloud for an agent fleet?

Bare-metal Macs avoid cluster-management overhead and let coding agents run inspectable tmux sessions on owned hardware. The trade-off is that scaling means adding physical capacity rather than relying on an autoscaler, which makes capacity planning a first-class design concern.

Q: How does Optimus Prime handle a task that keeps failing?

It uses a circuit breaker, not a blind retry loop. After repeated postcondition failures, the breaker opens, the task is marked failed-with-context, and the orchestrator routes around it. Because each task runs in an isolated git worktree, a failed task's changes never reach the integration branch.

Q: What does version-controlled prompts as infrastructure mean?

Every orchestration and specialist prompt is stored as a reviewed file in the monorepo. The agent's behavior becomes reproducible and auditable: rolling back a commit rolls back the factory's behavior, and prompt drift between near-duplicate agents gets caught in review.

Q: Can the AI agents merge their own code to production?

No. Optimus Prime and its worker nodes can open branches and pull requests but cannot merge to main. Branch protection requires human review, and each worker holds least-privilege credentials with no merge rights. That reflects the current reality that autonomous debugging of complex systems is not reliable enough to remove the human gate.

Q: Why not just follow standard Kubernetes fleet-management patterns?

Most fleet-management research is Kubernetes-native, and several pillars do not map directly to bare metal. Ephemeral environment becomes a git worktree instead of a pod, and scale becomes a hardware purchase instead of an autoscaler event. The patterns remain useful as conceptual scaffolding, but applying them literally would impose cloud complexity the Mac farm does not need.

Key Takeaways

TL;DR: The pipeline depends less on raw model capability than on task contracts, isolation, prompt governance, and hard approval gates.

Decompose features into a task graph with explicit preconditions, postconditions, and a granularity floor; over-decomposition erases productivity gains.
Hybridize CrewAI-style role delegation with LangGraph-style checkpointing for resilient long-running tasks.
Prefer circuit breakers over retry loops, and isolate side effects in git worktrees so rollback stays simple.
Treat prompts as infrastructure: version-controlled, reviewed, and single-sourced.
Do not let an agent merge to main; approval gates and least-privilege per-node credentials are mandatory.

Conclusion: What Has to Be True Before the Farm Runs Constantly

TL;DR: The remaining gap is trust and observability, not raw automation.

The distance between a demo and a continuously operating software factory is mostly about trust and observability. Before the farm runs constantly, three things must hold: monitoring has to catch a misbehaving worker before a human notices, postcondition contracts have to be tight enough that a green task graph genuinely means working code, and the operating model has to deliver enough process-level productivity improvement to justify its complexity.

The skeleton is in place: decomposition, delegation, isolated verification, prompt governance, and hard human gates on merge. The next question is whether the bare-metal divergence from cloud-native orthodoxy holds up under sustained load — or whether the desk full of Mac minis hits a wall the Kubernetes crowd already mapped.

Building the Crew is an ongoing engineering log about the ESS agent ecosystem.