Why We Stopped Sharing One LLM Key Across the Crew

TL;DR. For most of our first quarter on a multi-agent platform, every agent shared one Anthropic API key. Then a runaway loop in a small agent ate a meaningful slice of the month's budget in an afternoon, and throttling at the provider level took the entire fleet down with it. We ripped out the shared key, issued one scoped key per agent with a hard monthly ceiling, and routed every secret through a single keyvault primitive. The next runaway took out exactly one agent. This is the build-log entry.

For most of our first quarter running a multi-agent platform, every agent on our crew shared a single Anthropic API key — Sparkles, Soundwave, Optimus Prime, Rewind, all of them. It was fast, clean, and exactly the kind of default that felt fine until it didn't. Then a runaway loop in one small agent ate a meaningful slice of our monthly budget in an afternoon, and we ripped the shared key out for good.

This is the build-log entry for that decision: what was wrong, what per-agent keys with hard ceilings actually got us, and what we still don't have a clean answer for.

The afternoon a chatbot ate the budget

It was a Tuesday in early March. One of our smaller agents — a notes-summarizer that nobody had touched in weeks — got into a loop. A malformed tool response came back, the agent retried, the retry produced another malformed response, and the prompt grew on every pass because we were appending the failure to context. By the time a human noticed, the agent had burned through a meaningful slice of the month's Anthropic budget in a single afternoon. Every other agent on the platform — Sparkles, Soundwave, Optimus Prime, Rewind — was sharing the same API key. So when we finally rate-throttled at the provider level to stop the bleed, every agent on the platform stopped working at the same time.

That was the day we ripped out the shared key.

The shared-key default is a blast radius

When you stand up a multi-agent platform on a deadline, the first instinct is to drop one provider key into a vault, reference it from every agent, and move on. We did this. It is fast. It is clean from a config standpoint. And it has a property nobody talks about out loud: every agent on the platform shares a single failure surface.

Concretely, with one shared LLM key:

A runaway loop in any agent eats budget belonging to every agent.
A leaked key from any agent compromises the credentials of every agent.
A provider rate-limit triggered by any agent throttles every agent.
A revocation forced by any incident takes down every agent simultaneously.

It is hard to overstate how casually this risk gets accepted. Until early March, we had also been telling ourselves that the upstream provider was a stable dependency we did not need to model defensively. That assumption got noticeably wobblier on March 3, when the Pentagon publicly designated Anthropic a supply-chain risk — the first time the U.S. government has applied that label to an AI vendor, reportedly tied to a dispute over Claude's use in classified analysis. A week later, on March 9, Anthropic sued the Department of Defense in two federal courts — a development also covered as front-page tech news — and the vendor put out its own statement on where it stood. None of that touches our actual API access. But it does change how you think about concentrated dependencies. If a regulator cannot tolerate a single point of exposure to one vendor, your fleet of agents probably should not either.

The shared key was not the cause of either of those events. It was just the thing that made our exposure to any upstream turbulence sit at maximum.

What we changed

The fix is unglamorous. It is also one of the highest-leverage things we have done all quarter.

One scoped API key per agent. Sparkles has its own key. Soundwave has its own key. Optimus Prime has its own. Rewind has its own. Same for Wheeljack, Salvage, and every other named agent in our roster. The keys are issued, named, and labeled at the provider so a billing line item maps cleanly to a single agent.
A hard monthly ceiling on every key. Each key has a spend cap. The cap is sized for that agent's expected workload plus a buffer, not for the worst case across the whole platform. If an agent runs away, the provider cuts off that key when it hits its ceiling. Every other agent keeps working.
Keys flow through one keyvault primitive, never through code. No agent reads a raw secret from disk or env. They all call the same internal get_secret(agent_name, "llm_api_key") helper, which resolves to a per-agent path in the secret store. Adding a new agent is one config entry plus one key-issuance step. We do not sprinkle credentials through code.
Provider-side alerts at 50%, 80%, 100% of the cap. The first two go to a Slack channel. The last one pages. Before this change we only had cost alerts at the account level, which means we always learned about a problem after it was already a problem.

That is the entirety of the architectural change. No new service, no new framework, no fancy proxy layer. Just per-agent identity, per-agent ceilings, and a discipline about how secrets get loaded.

What we got back

The morning after we shipped this, a different agent did exactly the same kind of thing the notes-summarizer had done two weeks earlier — bad tool response, retry, growing context, retry. This time the agent's own key hit its ceiling around lunchtime, the provider locked it out, the alert fired, and no other agent noticed. Sparkles kept routing. Soundwave kept ingesting mail. Optimus Prime kept doing its thing. We fixed the misbehaving agent, raised its ceiling slightly, and moved on.

The concrete wins:

Failure isolation. One bad agent is now one bad agent, not a fleet outage.
Attribution. The provider invoice now reads like a per-agent ledger. We can finally see which agents are expensive, which are cheap, and which are running silently and producing nothing.
Faster blast-radius response. When something goes wrong, we revoke one key and move on, instead of negotiating which agents are critical enough to keep alive on a recycled credential.

What we still get wrong

This is a build-log, so honesty about the open problems matters.

We now have more than thirty active provider keys across the crew, between LLM providers, image providers, embedding providers, and the long tail of integration agents. Rotating those keys is now its own ops problem. We do not have a fully automated rotation pipeline yet. Today, rotation is a quarterly ritual handled by a human running an internal script and watching for the agents that fail to pick up the new secret on the next call. That is fine at our current size; it will not be fine at three times this size.

The other thing we have not solved cleanly is per-agent budget tuning. The right ceiling for an agent is "a little more than its real usage." We are still setting those ceilings by gut, then adjusting after the first month of telemetry. We will probably automate that loop later this year.

Takeaway

If you are running a multi-agent platform and you still have a single shared LLM key referenced by every agent: that key is your blast radius. It is also your single point of credential leakage, your single rate-limit choke, and your single revocation surface. Per-agent keys with per-key ceilings cost you a one-time refactor and a slightly larger ops burden. They buy you a fleet that fails one agent at a time instead of all at once.

We learned that the expensive way. You do not have to.

Key Takeaways

A single shared LLM key across an agent fleet is the blast-radius default — every runaway loop, leaked credential, rate-limit, or revocation hits every agent at once.
Per-agent scoped keys with hard monthly ceilings turn fleet-wide outages into one-agent incidents, with attribution that makes the provider invoice readable as a per-agent ledger.
Keys flow through one keyvault primitive, never through code. Adding a new agent is one config entry plus one key-issuance step — no credential sprinkling.
Provider-side alerts at 50%, 80%, and 100% of each per-agent cap turn "we discovered the budget burn yesterday" into "the alert fired at lunchtime."
Open problems remain: rotation across 30+ active keys is a quarterly manual ritual, and per-agent budget tuning is still gut-set then telemetry-corrected. Both are next year's automation work.