Cross-Machine Portability for AI Agent Fleets

🤖 Ghostwritten by GPT 5.4 · Fact-checked & edited by Claude Opus 4.6

The clearest test of an agent fleet is no longer whether it runs on the original laptop, but whether it can be brought up cleanly on a brand-new Mac. That became the focus of recent hardening work: making portability a first-class property of fleet operations rather than an afterthought. The practical outcome was a documented and tooled path from "fresh Mac" to "running fleet," including path resolution, config reconciliation, auth-profile sync, and a doctor pass that verifies the environment.

The lesson is simple: a fleet that runs on only one machine is not production-ready. It carries undocumented machine-specific state, and that state becomes an outage the moment hardware fails or a second machine is added. Portability work forces hidden assumptions into the open. It requires a clean boundary between what belongs in the repo and what must stay machine-local, and it turns setup from tribal knowledge into a repeatable system.

This build log covers what changed, why the repo-versus-host split became the core design decision, where the hidden state was found, and why the template-versus-live config boundary is as much a security control as an operational one.

The Real Portability Problem Was Hidden State

TL;DR: Portability was not blocked by code; it was blocked by configuration, paths, and credentials that quietly existed only on the original machine.

The hardest part of getting a fleet onto a new Mac is not cloning the repo. It is discovering everything that has accreted around the repo over time. Agent systems tend to collect machine assumptions in small, easy-to-miss places: an absolute path in a wrapper, a local profile that was created manually, a runtime file that nobody intended to keep forever, or a credential reference that works only because it was once set up interactively.

That is the trap with multi-agent systems. The visible architecture looks portable because the code is version-controlled, but the actual runtime depends on invisible state. In practice, the fleet is only as portable as the least-documented machine-local dependency.

The work shifted from "can this run elsewhere?" to "can every required piece be named, classified, and regenerated?" That led to a more explicit doctrine:

Portable things belong in the repo
Machine-specific things belong on the host
Secrets should resolve per machine, not be copied from machine to machine
Setup should be verified by a doctor, not assumed from a successful clone

This is not a new lesson in software operations, but agent fleets amplify it because they often combine:

Multiple runtimes and wrappers
Scheduled jobs and background processes
Per-agent auth profiles
Local filesystem assumptions
Secrets needed by different tools at different layers

As developers increase their use of automation and AI-assisted workflows across the software lifecycle, environment consistency becomes more important, not less. As more work moves through local automation, the cost of undocumented setup rises with it. Apple's continued transition toward security-hardened defaults on macOS has also made ad hoc local setup more brittle over time; workflows that rely on "just copy what worked before" tend to fail in subtle ways on a fresh machine.

The biggest insight was that portability work is mostly discovery work. The value comes from finding the state that silently lives only on the original machine and externalizing it into templates, scripts, and documented steps.

The Portability Boundary: Repo-Portable vs. Machine-Local

TL;DR: The repo/host-local split is the core design decision because it clearly defines what can travel and what must be created per machine.

Once the problem was framed correctly, the architecture became much easier to reason about. The fleet splits into two layers:

A portable, version-controlled repo
A host-local runtime layer that is intentionally not committed

That boundary sounds obvious, but it changes how configuration is designed. Instead of treating config as one file that somehow needs to work everywhere, the system treats config as two related artifacts: a template that expresses the portable shape of the system, and a live machine-local config that expresses how that shape is realized on one specific Mac.

Here is the illustrative split:

Layer	Lives where	Contains	Commit status	Purpose
Portable repo	Git repo	Agent code, wrappers, config template, validation scripts, doctor scripts, docs	Committed	Defines the fleet structure and expected inputs
Machine-local runtime	Host machine	Live config, resolved secret references, local clone paths, machine-specific runtime state	Not committed	Adapts the fleet to one machine safely

That split turned "config as template" into a practical operating model rather than a documentation idea. The template describes what the fleet expects: agents, channels, wrappers, profiles, and required settings. The live config answers the machine-specific questions: where the local clones are, which local paths should be used, and how secret references resolve on that host.

The practical benefit is that a new Mac no longer needs a copy of somebody else's working state. It needs the repo, a way to resolve secrets on that machine, and a reconciliation step that produces a valid local runtime.

This also improves change management. A template can evolve in version control, be reviewed, and be validated. A live config can remain local, explicit, and disposable. If the machine is replaced, the runtime can be regenerated instead of recovered through guesswork.

What Changed: From Fresh Mac to Running Fleet

TL;DR: The new flow is explicit and repeatable: resolve paths, reconcile template to live config, sync auth profiles, then run doctor.

The hardening work was less about adding one new tool and more about making the entire bring-up path coherent. The result was a RUNNING-ON-A-NEW-MACHINE guide backed by scripts that reflect the actual setup sequence.

The path now looks like this:

1) Resolve Local Clone Paths

A fresh Mac rarely mirrors the exact directory layout of the original machine. That matters more than it seems because wrappers, launch scripts, and agent entrypoints often assume stable paths.

Instead of hardcoding one laptop's filesystem layout, the fleet now resolves local clone paths per machine. That makes path resolution an explicit setup concern rather than an accidental dependency. If a machine keeps repos in a different local directory, the runtime config records that locally.

2) Reconcile Template Config into Live Config

The repo carries a config template, not the active runtime config. On a new machine, a reconcile step materializes the host-local live config from that template and fills in the machine-specific values that should never be committed.

This is the core portability move. The template defines the expected structure; reconciliation creates the local truth.

A sanitized example of that pattern:

## repo: config.template.yaml
agents:
  sparkles:
    enabled: true
    workspace_path: "${LOCAL_REPOS}/sparkles"
    auth_profile: "sparkles-default"
  concierge:
    enabled: true
    workspace_path: "${LOCAL_REPOS}/concierge"
    auth_profile: "concierge-default"
secrets:
  provider: "1password"
  mode: "file-backed-refs"

## host-local: config.live.yaml
agents:
  sparkles:
    enabled: true
    workspace_path: "/Users/your-user/dev/sparkles"
    auth_profile: "sparkles-default"
  concierge:
    enabled: true
    workspace_path: "/Users/your-user/work/concierge"
    auth_profile: "concierge-default"
secrets:
  provider: "1password"
  mode: "file-backed-refs"
  resolved_refs_dir: "~/.fleet/secrets"

3) Sync Per-Agent Auth Profiles

The fleet depends on more than one credential context. Different agents may need different profiles, scopes, or provider-specific auth state. Those profiles now sync from the secrets manager per machine rather than being copied from an older laptop.

That matters operationally because copied auth state is hard to reason about. Fresh sync makes provenance clearer and supports least privilege.

4) Run Doctor

The final step is a doctor pass that verifies the assembled environment. It checks whether channels, wrappers, config expectations, and related runtime assumptions are actually satisfied.

This is the difference between "setup completed" and "fleet is runnable." The doctor closes the loop.

Step	Input	Output	Failure mode caught
Path resolution	Local clone locations	Machine-correct paths	Broken wrappers, missing repos
Config reconcile	Repo template + local values	Host-local live config	Stale keys, missing machine values
Auth-profile sync	Secrets manager + local machine	Per-agent local auth state	Missing scopes, absent profiles
Doctor	Assembled runtime	Verified readiness report	Channel, wrapper, and runtime mismatches

Security Improved Because Portability Got Stricter

TL;DR: The template-vs-live split is not just cleaner engineering; it is a security boundary that keeps secrets and runtime state out of version control.

Portability work often gets described as a convenience feature. In this case, it also tightened security.

The first rule is that host-local runtime config must not be committed. That includes machine paths, active runtime details, and any resolved secret references. Once those values enter the repo, the portability boundary collapses. The template is meant to be shared; the live config is not.

The second rule is that secrets resolve per machine from a secrets manager rather than being baked into files or copied from another host. A new Mac should not inherit a previous machine's secret material by file transfer. It should authenticate to the secrets system and resolve what it is permitted to use.

A generic pattern for that:

## sanitized example
op read "op://{vault}/{item}/{field}" > ~/.fleet/secrets/provider-token

The important point is not the exact command. The point is the contract: secret references are resolved locally on the machine that will use them.

The third rule is fresh least-privilege provisioning. A new machine should receive only the credentials and scopes it needs. That reduces blast radius and makes deprovisioning cleaner if the machine is retired.

These practices align with widely accepted security guidance. NIST's Secure Software Development Framework (SP 800-218) emphasizes protecting sensitive configuration and using controlled processes for software environments. OWASP guidance has also long treated secrets management and environment separation as core operational controls rather than optional hygiene.

In practical terms, the security model became:

Repo contains structure, not secrets
Template contains intent, not live values
Machine-local config contains runtime specifics, not shared defaults
Secrets are resolved on the machine that needs them
New machines are provisioned freshly with least privilege

That model is more work upfront, but it scales better than machine cloning and is much easier to audit.

The Honest Part: Most of the Work Was Naming What Had Been Implicit

TL;DR: The hard part of portability was not writing scripts; it was identifying every assumption that had never been written down.

Build logs are most useful when they include the uncomfortable parts. The uncomfortable part here is that hidden state accumulates naturally. It does not require bad engineering. It only requires time, a working system, and enough local fixes that nobody notices which ones became dependencies.

That is why portability hardening can feel slower than expected. Each failure on the new Mac is usually a clue that something important existed only as local memory or local state on the original machine. A wrapper expects a path nobody documented. A profile exists because it was created manually months ago. A runtime file looks generated, but actually contains hand-edited values that matter.

The portability project succeeded because those assumptions were treated as bugs in system design, not as setup quirks.

A few practical lessons stood out:

Prefer Reconciliation over Copying

Copying a working config from one machine to another feels fast, but it preserves unknowns. Reconciliation forces the system to declare what belongs in the template and what must be supplied locally.

Make Doctor Part of Fleet Operations

Verification should not be a rescue tool used only when something breaks. In a portable fleet, doctor is part of normal bring-up and normal change validation.

Treat Machine-Local Config as a Product Surface

If a human must edit it, its shape and lifecycle deserve as much care as the code. Poorly designed local config is where portability efforts stall.

Hidden State Is the Real Outage Risk

A laptop failure is not the root problem. The root problem is runtime knowledge that exists nowhere except that laptop.

This is why portability belongs inside fleet operations, not outside it. A portable fleet is easier to recover, easier to extend to another machine, and easier to reason about when something changes.

Frequently Asked Questions

Q: What is the difference between config as template and machine-local config?

A config template defines the portable structure of the system: expected agents, settings, and placeholders. Machine-local config is the realized version for one host, including local paths, resolved references, and runtime details that should not be committed. The template travels with the repo; the live config is generated per machine and treated as disposable.

Q: Why not just commit one complete config file for everyone?

Because a complete live config usually contains machine-specific paths, runtime assumptions, and potentially sensitive references. Committing that file mixes portable intent with host-local state, which hurts both security and portability. It also creates merge conflicts every time two machines diverge, which they inevitably do.

Q: Why are secrets per machine better than copying credentials from an existing laptop?

Per-machine secret resolution creates a cleaner security boundary and supports least privilege. It also makes credential rotation, revocation, and machine retirement more manageable because each host has its own provisioned access path. Copied credentials, by contrast, create an invisible dependency chain that is difficult to audit or revoke cleanly.

Q: What should a doctor script verify in an agent fleet?

At minimum, it should verify config integrity, expected wrappers, channel readiness, required local paths, and whether auth-related prerequisites are present. The goal is to catch mismatches before an agent fails at runtime. A good doctor script also reports which checks passed, not just which failed, so operators can confirm coverage.

Q: When does a fleet actually become portable?

A fleet becomes portable when a new machine can go from fresh setup to verified runtime through documented, repeatable steps without copying hidden state from the original host. If success still depends on undocumented manual fixes, portability is not complete.

Key Takeaways

Portability is an operational requirement, not a cleanup task.
The repo/host-local split is the core portability boundary.
Config works best as a template in version control and a reconciled live file on each machine.
Machine-local config, resolved secret references, and local paths should never be committed.
Secrets per machine support least privilege and cleaner lifecycle management.
A doctor script turns setup assumptions into explicit verification.
Most portability work is really the work of discovering and externalizing hidden state.

Conclusion

The fleet's portability story has become concrete: a fresh Mac can move toward a running system through a defined sequence instead of personal memory. That is the real milestone. Portability does not come from making one laptop more important; it comes from making any single laptop less special. The template-versus-live boundary, per-machine secret resolution, and a doctor-verified bring-up path together form a system that can survive hardware changes, scale to additional machines, and remain auditable throughout.