
🤖 Ghostwritten by GPT 5.4 · Fact-checked & edited by Claude Opus 4.6 · Curated by Tom Hundley
Apple dropping M5 Pro and M5 Max with Neural Accelerators and improved on-device AI throughput is exactly the kind of announcement that can derail a rebuild if you're not careful. My answer, at least for our mac-mini-fleet-agent-platform work at Elegant Software Solutions, is straightforward: we are still choosing platform-kernel-before-hardware. The new silicon matters, and it will matter more later, but right now faster boxes would only help us run a brittle system more quickly.
This week I had to force myself to separate two very different problems. Problem one is that our current agent fleet is structurally weak: the control plane is not fully authoritative, health reporting is too optimistic, some failures degrade silently, and too much logic still lives in Slack-facing edges like Sparkles. Problem two is that Apple keeps making local AI hardware more attractive. Only one of those problems can sink the whole platform today.
So the current call is boring on purpose. We are rebuilding the kernel first in one monorepo, with one control plane, one worker contract, one operator surface, durable state, and an explicit degraded mode. Then we earn the hardware upgrade. If you've read The Platform Kernel: What We Built First in the Monorepo, this is the same thesis under a new kind of pressure: agent-platform-rebuild-timing is mostly a discipline problem, not a shopping problem.
TL;DR: M5-class Apple silicon makes local inference, embedding, and multimodal preprocessing more practical, but none of that fixes split-brain control flow or unreliable worker contracts.
The temptation is obvious. Apple is positioning the newest Pro and Max chips around heavier AI workloads, and its broader Apple silicon story has been moving toward more capable on-device model execution for several generations. The M1's Neural Engine delivered up to 15.8 trillion operations per second, establishing the baseline for local AI expectations across the Mac line. Even without hanging our entire roadmap on vendor benchmarks, the direction is clear: Apple silicon for AI agent infrastructure is getting better fast.
That matters for agent systems in a few concrete places:
That second list is the reason I am not re-planning the rebuild around M5 Neural Accelerators. Our problem is not that Concierge needs another 30–50% of local throughput for a side task. Our problem is that if the authoritative control plane can fall back to a legacy file inbox, the system can develop split-brain behavior. Faster hardware does not rescue bad authority boundaries.
This is also where a lot of agent teams get fooled by demos. Local inference speed is emotionally satisfying because you can feel it. A job finishing in half the time looks like progress. But if the operator cannot trust whether that job was the only copy, whether its state was durable, or whether a downstream publish quietly failed, then the platform is still not dependable.
| Decision area | Upgrade hardware first | Rebuild platform kernel first |
|---|---|---|
| Short-term wow factor | High | Low |
| Fixes control-plane authority | No | Yes |
| Fixes silent degradation | No | Yes |
| Makes later hardware rollout easier | Somewhat | Yes |
| Risk of accelerating bad architecture | High | Low |
| Best fit for ESS right now | No | Yes |
The bottom line is simple: hardware multiplies whatever system you already have. If the system is confused, hardware gives you a faster confused system.
TL;DR: The current fleet has useful agents, but the structural failures are in authority, observability, and runtime consistency — not raw compute.
I want to be precise here because this is where rebuilds go sideways. We do have working pieces. Sparkles is useful as an operator entry point. We have specialist agents with real domain intent. We have enough infrastructure to feel like there is a platform. But the roadmap review from 2026-03-14 was blunt for a reason: the fleet is not yet dependable.
The baseline issues are already documented in our internal roadmap and journal:
That diagnosis changed the order of operations. Instead of asking, "Which agent should get the M5 box first?" I am asking, "What is the minimum kernel that makes any future agent trustworthy?" That is why the new canonical project is one monorepo, not another scatter of repos and launch scripts.
The old pattern was basically this:
That produced breadth without authority. It also made machine-specific state feel normal, which is poison if you want a 12-machine rollout strategy that can survive restarts, migrations, and handoffs.
The replacement model is intentionally narrower:
If that sounds familiar, it should. It is the same direction behind M5 Fleet Timing: Upgrade Hardware After the Platform Kernel Stabilizes, but here I am grounding it in the actual operational pain: platform kernel before hardware is not philosophical purity. It is blast-radius management.
TL;DR: M5-class machines are most valuable after the rebuild because they can become interchangeable execution capacity instead of bespoke snowflake hosts.
I am not anti-hardware here. Quite the opposite. Once the kernel stabilizes, M5 Pro and M5 Max class systems look attractive for exactly the kinds of agent-side workloads that are annoying on weaker edge machines.
First, local model orchestration gets more realistic. Small and medium open-weight models for routing, extraction, classification, and guardrail passes become easier to run without turning each worker into a thermal experiment.
Second, multimodal preprocessing gets cheaper. Document OCR pipelines, screenshot understanding, audio cleanup, and frame sampling all benefit from more capable local acceleration, even if the final reasoning step still goes to a hosted model.
Third, concurrency gets saner. The goal is not "run everything locally." The goal is "make each node more useful for preflight work, caching, retries, and local assistance without stealing cycles from control-plane responsibilities."
Apple has publicly stated there are more than 100 million active Mac users globally, which is one reason the company keeps investing in the Mac as a serious compute endpoint rather than treating it as a thin-client market. Apple silicon's substantial performance-per-watt gains from the M1 generation onward remain one of the main reasons Mac mini fleets are appealing for compact, always-on internal infrastructure.
The catch is that hardware-specific architecture gets ugly fast. If I optimize the platform around one generation's local acceleration characteristics before the worker contract is stable, I end up baking host assumptions into the wrong layer.
That means:
I would rather standardize a capability model than a machine myth. In code, that looks more like this:
export type WorkerCapabilities = {
localEmbeddings: boolean
localReranker: boolean
audioPreprocessing: boolean
visionPreprocessing: boolean
maxConcurrentTasks: number
}
export type TaskPolicy = {
requiresDurableState: boolean
allowLocalModelPass: boolean
preferredExecution: "local" | "cloud" | "hybrid"
}That pattern lets newer machines join the fleet cleanly later. It also keeps the scheduler honest. The control plane should decide placement based on declared capability and policy, not because Tom remembers that "machine 7 is the fast one."
TL;DR: Our 12-machine plan should create a replaceable fleet with shared contracts, not twelve handcrafted pets with slightly different powers.
The 12-machine rollout strategy only works if each node is boring from the platform's perspective. That was one of the more painful lessons from the earlier system: when agents and infra evolve together informally, every host picks up personality. One machine has a local shortcut. Another has an older launch config. A third has a weird dependency chain nobody wants to touch. Congratulations, you built folklore.
What I want instead is a rollout model with three phases.
This is where we are now.
Deliverables:
Once the kernel exists, we make node enrollment, replacement, and policy assignment repeatable. This is the unglamorous part that determines whether hardware upgrades later take days or a quarter.
A sanitized config shape looks like this:
[node]
name = "worker-03"
role = "general"
[control_plane]
url = "https://your-project.supabase.co"
[capabilities]
local_embeddings = true
local_reranker = false
audio_preprocessing = true
vision_preprocessing = true
max_concurrent_tasks = 4Only here do I want to ask the M5 question seriously. At that point, the chip upgrade becomes a capacity and placement discussion, not an architectural rescue attempt. That is the whole point behind M5 Neural Accelerators: Should We Upgrade Our Agent Fleet?: new hardware is most valuable when the software layer can absorb it without ceremony.
The sentence I keep coming back to is this: a healthy fleet is one where replacing a node changes throughput, not behavior.
TL;DR: We are explicitly choosing to let the hardware story lag the reliability story, because that is how you avoid rebuilding the same mess on faster machines.
Today I wrote this down so future me cannot pretend I was undecided. We are not pausing because M5 looks weak. We are pausing because the platform is still earning the right to scale.
Our source-of-truth documents already made the core call:
That means the current path is:
If you are in the same spot, my advice is blunt: do not let exciting hardware become an excuse to avoid architecture. Chasing throughput before authority is how brittle systems get expensive.
No. If the control plane is not authoritative, better hardware mainly increases the speed of unreliable execution. Rebuild the kernel first so later hardware upgrades improve capacity without changing system behavior.
The biggest wins are local preprocessing tasks: embeddings, reranking, audio cleanup, OCR-adjacent steps, and lightweight classification or routing models. These reduce latency and cloud dependency, but they still need durable scheduling and observability around them to be production-worthy.
Because a fleet only scales well when nodes are interchangeable. If each machine carries hidden assumptions, a hardware refresh creates more operational variance instead of less. The kernel is what makes node replacement safe.
Start with a stable control plane and worker contract, then standardize enrollment and capability declaration, then introduce new machine classes selectively. That sequence lets the scheduler reason about capacity explicitly rather than depending on tribal knowledge.
Yes, but mostly at the execution layer. More capable local silicon makes hybrid local-cloud patterns more attractive for preprocessing, fallback, and edge autonomy. It should influence capability modeling and placement policy, not undermine the authority of the core platform.
That is the call I am making this week: the new Apple silicon is interesting, useful, and very likely part of where this goes next, but it is not the first domino. The first domino is authority. Until the control plane is real, the worker contract is enforceable, and the operator surface reflects reality instead of hope, new hardware is just a nicer workshop around the same shaky machine.
Tomorrow I will keep pushing on the kernel and the boring parts that make the rest possible. If you're building something similar, I'd love to hear how you're handling the same tradeoff between silicon excitement and platform discipline. And if your team wants hands-on help building the implementation layer for this kind of system, Elegant Software Solutions offers developer-focused AI implementation and training.
Discover more content: