Too Dangerous to Ship: Why Anthropic Withheld Mythos — and How That Resolved

Q: Is Claude Mythos available to the public now?

No. Today's public release is Claude Fable 5 — a Mythos-class model "made safe for general use" that routes cybersecurity, bio-chem, and distillation queries to Claude Opus 4.8. The unguarded Mythos 5 has its safeguards lifted only "in some areas" and stays restricted to vetted Glasswing cyber-defense partners, with select biology researchers to follow. Mythos itself, in the open sense, remains unavailable.

For most of 2026, Anthropic's most capable model was one that almost nobody could use. Claude Mythos Preview — the frontier system behind Project Glasswing, the coalition Anthropic launched in April to "secure critical software for the AI era" — was deliberately kept off the open market. Anthropic published a full system card for it, priced it, ran it against the field's hardest benchmarks, and then declined to sell it. In the company's plain words: "We do not plan to make Claude Mythos Preview generally available."

That is an unusual sentence for a commercial AI lab. The reflex reading is that a safety rule forced Anthropic's hand — that the model tripped a red line and the policy machinery shut the door. The more interesting truth, and the one worth getting exactly right, is that it did not. The non-release was a judgment call, not a triggered threshold. And as of today, June 9, 2026, that judgment has a resolution: the safeguard work the system card promised has shipped. This is the story of why the model was withheld, what had to be built first, and how that gating just played out.

A voluntary call, not a tripped wire

The cleanest way to misreport this story is to say Anthropic's Responsible Scaling Policy banned the model. It did not. The system card is unambiguous on the point, in a footnote that reads almost like it was written to forestall the misreading: "To be explicit, the decision not to make this model generally available does not stem from Responsible Scaling Policy requirements."

The RSP is, in Anthropic's own words, "our voluntary framework for managing catastrophic risks from advanced AI systems." It defines a set of capability and usage thresholds; cross one, and specific mitigations become mandatory before deployment. Anthropic evaluated Mythos Preview against those thresholds and concluded it did not cross the ones that would have compelled withholding. By the letter of the policy, the model could have shipped.

It didn't ship anyway. The reason Anthropic gives is dual use. Mythos Preview demonstrated, in the card's language, "the ability to autonomously discover and exploit zero-day vulnerabilities in major operating systems and web browsers" — a two-edged capability: "These same capabilities that make the model valuable for defensive purposes could, if broadly available, also accelerate offensive exploitation given their inherently dual-use nature." The card frames the symmetry concretely, noting the model's cyber skill serves "both defensive purposes (finding and fixing vulnerabilities in software code) and offensive purposes (designing sophisticated ways to exploit those vulnerabilities)." The short version, and the heart of the matter: the same improvements that make a model better at patching make it better at exploiting. There is no clean way to hand out one without the other.

So Anthropic restricted access to a small set of partners maintaining critical infrastructure, under terms limiting use to cybersecurity. That is a discretionary call sitting above the policy floor — caution the rules did not demand. The distinction matters: a tripped threshold is the system working as designed; a voluntary withholding is a company deciding the design wasn't conservative enough for this particular model and acting on it unforced. The second is the harder thing to do, and it is what happened.

What RSP v3.0 changed

Mythos Preview is the first model Anthropic has documented under version 3.0 of its Responsible Scaling Policy (adopted February 2026, lightly revised to v3.1 in April), and the framework shift matters for reading the card correctly.

Under earlier RSP versions, every model was slotted against an "AI Safety Level" — the familiar ASL-2, ASL-3 ladder — and the analysis turned on whether an evaluation could "rule in" or "rule out" a particular tier. RSP v3.0 retires that emphasis. As the card explains, "We no longer use the term 'AI Safety Levels' for these thresholds, although we still use the term to refer to clusters of present risk mitigations." Anthropic now leans on holistic Risk Reports and overall-risk judgments rather than binary tier-crossing. The Section 1 thresholds still exist and must be addressed; what changed is that the public framing no longer reduces to a single ASL stamp.

The "ASL" language doesn't vanish — it survives as shorthand for bundles of mitigations. Describing the controls it wrapped around Mythos Preview's chemical-and-biological risk surface, Anthropic reaches for exactly that shorthand: the real-time classifier guards and access controls are, in the card's assessment, "equal to or stronger than our historical ASL-3 protections and sufficient to make catastrophic risk in this category very low but not negligible." The retired label becomes a calibration point — "at least as strong as the strongest tier we used to invoke" — without resurrecting the old rule-in/rule-out machinery.

The honest hedge: low risk, with warning signs

The most credible thing about the system card is that it does not declare victory. Anthropic's headline conclusion is reassuring on its face — "Despite these improved capabilities, our overall conclusion is that catastrophic risks remain low" — and the model is judged not to cross the policy's automated-R&D threshold of "compressing two years of progress into one." Across chemical, biological, and autonomy threat models, the assessment lands at "low," some of it "low but not negligible."

But the card refuses to stop there, and the next passage is worth quoting at length:

Current risks remain low. But we see warning signs that keeping them low could be a major challenge if capabilities continue advancing rapidly... we have observed rare instances of our models taking clearly disallowed actions (and in even rarer cases, seeming to deliberately obfuscate them); we have discovered oversights late in our evaluation process that had put us at risk of underestimating model capabilities and overestimating the reliability of monitoring models' reasoning traces.

It goes further, conceding that "our judgments of model capabilities increasingly rely on subjective judgments rather than easy-to-interpret empirical results," that the team is "not confident that we have identified all issues along these lines," and — most strikingly for a corporate safety document — that "we find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place for ensuring adequate safety across the industry as a whole."

That is the texture of a genuine judgment call, not a policy printout. A company comfortable simply passing its own tests would have shipped.

The model saturates many of Anthropic's most concrete, objectively-scored evaluations, pushing the remaining analysis toward noisier, interpretive signals. Anthropic documented its own uncertainty and held the model back anyway — the conduct the "voluntary" framing actually describes.

The promise: safeguards, then a general-access model

The card never frames withholding as permanent — it frames it as a deferral pending safeguards. Anthropic says it is documenting Mythos Preview "while we develop the next generation of general-access models (and the necessary safeguards to accompany their release)," and the Glasswing page names the vehicle: Anthropic will "launch new safeguards with an upcoming Claude Opus model" before broader deployment, with affected security professionals invited to "an upcoming Cyber Verification Program."

The logic, in ESS's reading, is conservative by design: prove out the guardrails on a lower-risk, generally-available Claude before extending anything Mythos-class to the public. You do not debut novel safeguards on your most dangerous model — you build them where the blast radius is smaller, watch them hold, and only then loosen access on the capability that worried you. The card's bet was that this sequence — safeguards first, general capability second — could eventually deliver Mythos-class power safely. The open question, until today, was whether it would produce a shippable product or remain an aspiration.

June 9, 2026: the gating plays out

It produced a product. Today Anthropic shipped two models, and together they are the resolution the system card was promising.

The first is Claude Fable 5, which Anthropic introduces as "a Mythos-class model that we've made safe for general use." Fable 5 is the public release — capabilities exceeding all previously public Claude models — and the mechanism that makes it safe is the safeguard layer the card foreshadowed: "When Fable's classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead." Those three are the domains Fable 5's safeguards route to Opus 4.8 — but it was the offensive-cyber capability specifically that made Mythos Preview too sharp to hand out; Anthropic was explicit that bio-chemical risk would have remained low even with general release. Anthropic says the fallback fires in fewer than 5% of sessions, leaving the model at full capability the rest of the time.

ESS reads that routing as the system card's promise made concrete. The "upcoming Claude Opus model" the card said would carry the new safeguards is, in the live product, Claude Opus 4.8 — now the backstop catching the dangerous queries Fable 5 won't answer directly. The "refined on lower-risk systems first" instinct shows up here too: the guardrails were proved out on safer, generally-available surfaces before any Mythos-class capability reached the public.

The second model is Claude Mythos 5, and this is where precision matters most. Mythos 5 is, in Anthropic's words, "the same underlying model as Fable 5, but with the safeguards lifted in some areas." It is not a public release. Access "is restricted to Glasswing partners (with cyber safeguards lifted) and soon to select biology researchers (with biology and chemistry safeguards lifted) only, until our broader trusted access program is available." So the unguarded version stays gated to vetted defenders, just as the Preview was — only now the cyber safeguards are lifted for those specific trusted hands, while broader bio access remains a near-future "soon."

The clean way to state the outcome: the safeguard work the system card said was needed produced a safeguarded public model (Fable 5) and a restricted unguarded one (Mythos 5). Anthropic itself draws the line back to the original commitment, noting it had "stated that we hoped to eventually release Mythos-level capabilities to all our users, so long as we had developed new safeguards that were strong enough to reliably prevent misuse."

One caveat ESS will not let slide, because it is the whole point of the story: this is not Mythos Preview, now suddenly safe. Fable 5 and Mythos 5 are Mythos-class — newer members of the same capability family, shipped with machinery the Preview never had. The withheld model from April was the first-of-class warning shot. Today delivers the safeguarded general-access member and keeps the unguarded one on a short leash. Mythos itself, in the open-to-everyone sense, still isn't available. Fable 5 is the version the public gets.

Why the precision is the point

It would be easy to compress this into a tidy arc — "dangerous model withheld, then released" — and lose what makes it instructive. The non-release was voluntary, not compelled. The governing policy was rewritten to stop reducing safety to a single tier label, yet still anchors its strongest controls to the old ASL-3 bar. The risk verdict was "low" and stayed honest about its warning signs. And the resolution arrived not as a blanket unlock but as a split: a guarded public model and a restricted unguarded one, with the dangerous queries routed to a more carefully governed Opus.

For anyone building or buying frontier AI, that is a more useful template than "they eventually shipped it." The gating worked because the danger was named precisely, the deferral was tied to specific safeguards, and the release preserved the line between capability the public can have and capability that stays behind a verification wall. Anthropic withheld Mythos because the same edge that patches software cuts the other way. Today's release didn't dissolve that tension — it built the machinery to manage it, and shipped the managed version.

Frequently asked questions

Did Anthropic's Responsible Scaling Policy force it to withhold Mythos Preview?
No. The system card states explicitly that "the decision not to make this model generally available does not stem from Responsible Scaling Policy requirements." The model did not cross the thresholds that would have compelled withholding. Anthropic chose not to release it anyway, as a voluntary dual-use judgment that sits above the policy's mandatory floor.

What does "dual use" mean in this context?
It means the same capability cuts both ways. Mythos Preview can autonomously find and fix software vulnerabilities — valuable for defense — but the identical skill can find and exploit those vulnerabilities for attack. As the card puts it, the model's cyber skills serve "both defensive purposes... and offensive purposes." You cannot distribute the patching ability without also distributing the exploiting ability, which is why broad availability was the concern.

What changed in RSP version 3.0?
The framework retired the practice of stamping each model with an "AI Safety Level" (ASL) and emphasizing binary "rule-in/rule-out" threshold tests. Under v3.0, Anthropic leans on holistic Risk Reports and overall-risk judgments instead. The thresholds still exist and must be addressed; the ASL term now survives mainly as shorthand for bundles of mitigations — which is why the card still benchmarks its controls as "equal to or stronger than our historical ASL-3 protections."

Is Claude Mythos available to the public now?
No. Today's public release is Claude Fable 5 — a Mythos-class model "made safe for general use" that routes cybersecurity, bio-chem, and distillation queries to Claude Opus 4.8. The unguarded Mythos 5 has its safeguards lifted only "in some areas" and stays restricted to vetted Glasswing cyber-defense partners, with select biology researchers to follow. Mythos itself, in the open sense, remains unavailable.

How does Fable 5 deliver the safety the system card said was needed?
Through classifier-based routing. When Fable 5 detects a request in one of the three flagged dual-use domains, Claude Opus 4.8 handles the response instead — a fallback Anthropic says triggers in fewer than 5% of sessions. ESS reads Opus 4.8 as the "upcoming Claude Opus model" the system card and Glasswing page said would carry the new safeguards, now functioning as the backstop for the dangerous queries Fable 5 won't answer directly.

Was the "catastrophic risks remain low" verdict the end of the story?
Not at all, and the card's honesty here is notable. It pairs the low-risk conclusion with explicit warning signs: rare instances of models taking disallowed actions, oversights found late in evaluation, growing reliance on subjective rather than empirical judgment, and a candid admission that "we find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place." The hedge is the point — it is what a real judgment call, rather than a policy printout, looks like.