Offense vs. Defense: Anthropic's Bet That AI Favors the Defender

Q: Does the fuzzing analogy actually hold?

Partly. It holds well for discovery: like fuzzing, the model finds flaws at scale. It is weaker for exploitation. Fuzzing produced crashes that still required a skilled human to weaponize, and that human bottleneck rate-limited offense while defenders automated triage downstream. Autonomous, end-to-end exploitation — demonstrated when the model exploited a FreeBSD NFS flaw on its own — removes that bottleneck, which is exactly the asymmetry that historically favored defenders. So the analogy is a fair precedent but not a clean one.

Every powerful new security tool arrives wrapped in the same question: does it help the people defending systems, or the people attacking them? Anthropic's Project Glasswing — the critical-infrastructure coalition built around Claude Mythos Preview, an unreleased frontier model that can find and, in some cases, autonomously exploit decades-old vulnerabilities — forces that question harder than most. A model that can discover a 27-year-old remote crash in OpenBSD, or chain Linux kernel bugs into root, is not a neutral instrument. It can be pointed in either direction.

Anthropic's answer is a bet. In its Frontier Red Team writeup, the company argues that the advantage will belong to whichever side can extract the most value from these tools — and that, in the short term, that side could be attackers if frontier labs are careless about how they release such models. Here is the line, exactly as Anthropic wrote it:

"In the short term, this could be attackers, if frontier labs aren't careful about how they release these models."

The longer-term claim is the optimistic one: that once the dust settles, defenders come out ahead. This article examines that argument on its merits — the "new equilibrium" thesis, the candid admission that the road there will be rough, and the historical analogy Anthropic leans on to make the case. It also gives a fair hearing to the strongest objection, which is not skepticism about whether the model works, but a hard question about whether finding bugs at machine speed does any good when fixing them still happens at human speed.

The new equilibrium thesis

Anthropic's central claim is structural, not cheerful. The company's framing, paraphrasing its red-team post, is that most security tooling has historically benefited defenders more than attackers, and that the same will eventually hold for powerful language models — but only after the landscape settles into what it calls a "new equilibrium." At that point, Anthropic expects these models to benefit defenders more than attackers and to raise the overall security of the software ecosystem.

The mechanism behind the long-term optimism does a lot of work. Anthropic's argument is that defenders will, over time, fix bugs before new code ever ships — that the durable advantage is prevention. A defender who owns the codebase, the build pipeline, and the deployment process can route a vulnerability-finding model into continuous integration and catch flaws before they reach production. An attacker, by contrast, has to find a bug that already exists in deployed software and reach it across a network. If frontier models become a standard part of how code is written and reviewed, the reasoning goes, the population of exploitable bugs shrinks at the source.

That is a stronger argument than "good guys will use it too." It locates the defender's edge in a place the attacker cannot reach: the moment before code ships.

The honest part: a tumultuous transition

What makes the argument credible rather than promotional is that Anthropic does not pretend the transition is free. The company concedes, in the same breath as its optimism, that the period between now and that new equilibrium may be tumultuous regardless of how carefully anyone behaves. That admission matters, because it is where the short-term and long-term claims pull against each other.

The short-term risk is concrete and self-imposed: a lab that ships a model like this without restraint hands attackers a head start before defenders have integrated the same capability. Anthropic's response to its own warning is the structure of Glasswing itself — initial release to a vetted set of critical-industry partners and open-source developers rather than open access, with the stated aim of letting defenders begin securing the most important systems before comparable capability becomes broadly available. Whether that gating actually buys defenders enough lead time is the open question. Anthropic is betting that it does; it is not claiming certainty.

The fuzzing analogy — and whether it holds

To argue that today's fear resolves into tomorrow's defensive advantage, Anthropic reaches for history. The analogy is fuzzing — the technique of bombarding software with malformed inputs to trigger crashes. When large-scale fuzzers first appeared, there were real concerns they would let attackers find vulnerabilities faster. And, Anthropic concedes, they did. But modern fuzzers like AFL became a load-bearing part of the defensive ecosystem, and projects like OSS-Fuzz now pour significant resources into continuously fuzzing critical open-source software. A feared offensive tool became, over time, a net-defensive one — woven directly into how code is built and maintained.

As a precedent for "scary capability becomes defensive infrastructure," it is fair. But the analogy deserves more scrutiny than a comfortable nod, because the thing that made fuzzing net-defensive may be precisely the thing the new capability removes.

Here is the tension. A fuzzer finds crashes, and a crash is not an exploit. Turning a fuzzer's output into a working attack historically required a skilled human to triage the crash, understand the underlying flaw, and build a reliable exploit — slow, specialist work. That human bottleneck rate-limited the offensive side of fuzzing. Defenders, meanwhile, could absorb raw crash output into automated pipelines: reproduce, deduplicate, file, fix. The asymmetry favored defense because defense could be automated downstream and offense could not.

Autonomous exploitation threatens to collapse exactly that asymmetry. In Glasswing's own evidence, Claude Mythos Preview did not merely flag a 17-year-old FreeBSD NFS flaw (CVE-2026-4747) — it exploited the bug end-to-end on its own, assembling a return-oriented programming chain split across six RPC requests to plant an SSH key, where the prior-generation model could only manage the same feat with human help. If the step that used to require a human specialist — crash to weaponized exploit — is now inside the tool, then the rate-limit that historically kept fuzzing's offense in check is gone. The fuzzing analogy holds for discovery. It is far less obvious that it holds for autonomous, end-to-end exploitation, which is the genuinely new thing here.

A fair reading leaves this unresolved rather than settling it in Anthropic's favor. The company's strongest rebuttal is the prevention argument above: if defenders use the capability to eliminate bugs before code ships, the deployed attack surface shrinks and the offensive rate-limit matters less over time. But prevention is a claim about future code. It does nothing for the enormous installed base of software already running in the world — and that is where the next objection lives.

The steelman: finding isn't fixing

The most serious counter-argument against Anthropic's bet is not that the model fails. It is that the model succeeds, and that success creates a problem of its own.

Discovery now happens at machine speed. Remediation does not. Patching a vulnerability in deployed, business-critical software is still a human-paced process: a maintainer has to understand the report, write a fix, test it against regressions, ship it, and — hardest of all — get every downstream operator to actually install it. Glasswing's own numbers make the gap vivid. By Anthropic's account, fewer than 1% of the potential vulnerabilities the project has discovered so far have been fully patched. The bugs are real, severe, and now known. They are also, overwhelmingly, still open.

Security practitioners have named this directly. Jeff Williams of OWASP and Contrast Security has framed the problem as defenders being forced to "remediate at human speed" while discovery races ahead — and the concern has been echoed across the practitioner community as the dominant, substantive critique of industrial-scale vulnerability discovery. The worry is not Luddism; it is throughput. A firehose of validated, high-severity findings is an asset only if an organization can act on it faster than an adversary armed with the same class of tool can weaponize it.

This is the hinge on which Anthropic's bet actually turns, and it is why the optimistic case cannot be waved through. Notably, Anthropic does not dispute the bottleneck — the company has acknowledged that the constraint is human capacity to triage, report, and deploy patches, which is precisely why coordinated disclosure for Glasswing is deliberately throttled rather than dumped. But agreeing that remediation is the bottleneck is not the same as solving it. The unresolved question — what defenders do when finding outpaces fixing by two orders of magnitude — is large enough to deserve its own treatment, and a later article in this series takes it up in full.

"If labs are careful" — a test case in real time

Anthropic's short-term warning came with an implicit condition: the attacker advantage materializes if labs aren't careful about how they release these models. As of June 9, 2026, there is a concrete example of what "careful" is meant to look like. On that date Anthropic released Claude Fable 5 — a publicly available, Mythos-class model — alongside Claude Mythos 5, the same underlying model with safeguards lifted for authorized Glasswing partners. The difference between the two is the safeguards: Fable 5 ships with classifiers that route sensitive cybersecurity, biology and chemistry, and distillation queries to a fallback from a less capable model, Claude Opus 4.8, triggering on average in under 5% of sessions.

Whether that design is sufficiently careful is a legitimate debate, and not one to settle here. But as an illustration of the condition in Anthropic's own argument, it is apt: a frontier-class model released to the public with its most dual-use capability gated behind a fallback, while the unrestricted version stays inside a vetted partner program. For now, the argument and the conduct line up. The bet is being placed in public, with real stakes, and it will be judged not by the elegance of the thesis but by whether defenders actually pull ahead before attackers do.

What ESS takes from it

Strip away the framing and Anthropic's position is a falsifiable claim about timing: defenders win the long game, attackers may win the short one, and careful release is the lever that shortens the dangerous interval. The new-equilibrium thesis is coherent and the prevention mechanism behind it is real. The fuzzing analogy is honest about the past but understates what is new about autonomous exploitation — the disappearance of the human bottleneck that historically tilted security tooling toward defense. And the remediation gap means the optimistic outcome is contingent, not guaranteed: it depends on defenders building the capacity to fix at something approaching the speed at which everyone can now find. The bet is reasonable. It is not yet won.

Frequently asked questions

Does Anthropic claim AI clearly favors defenders?
Not unconditionally. Anthropic argues defenders benefit more in the long term, once the security landscape reaches a "new equilibrium." For the short term it is explicit that the advantage could go to attackers — in its words, "if frontier labs aren't careful about how they release these models." The claim is about timing and conditions, not a blanket assertion that AI is inherently pro-defense.

What is the "new equilibrium" thesis?
It is the idea that security tooling has historically ended up helping defenders more than attackers, and that powerful language models will follow the same pattern after a disruptive transition. The mechanism is prevention: defenders who own the codebase and build pipeline can use these models to catch and fix bugs before code ever ships, shrinking the pool of exploitable flaws at the source — an advantage attackers, who must find bugs already deployed, cannot match.

Why does Anthropic compare this to fuzzing?
Fuzzing was once feared as a tool that would help attackers find bugs faster — and initially it did. Over time, though, it became defensive infrastructure, with projects like OSS-Fuzz continuously fuzzing critical open-source software inside development pipelines. Anthropic uses it as a precedent for a feared offensive capability becoming net-defensive.

Does the fuzzing analogy actually hold?
Partly. It holds well for discovery: like fuzzing, the model finds flaws at scale. It is weaker for exploitation. Fuzzing produced crashes that still required a skilled human to weaponize, and that human bottleneck rate-limited offense while defenders automated triage downstream. Autonomous, end-to-end exploitation — demonstrated when the model exploited a FreeBSD NFS flaw on its own — removes that bottleneck, which is exactly the asymmetry that historically favored defenders. So the analogy is a fair precedent but not a clean one.

What is the strongest argument against Anthropic's bet?
The remediation bottleneck. Discovery now happens at machine speed, but patching deployed software is still human-paced — and by Anthropic's own account, fewer than 1% of the vulnerabilities Glasswing has found have been fully patched. Practitioners such as OWASP's Jeff Williams warn that defenders are forced to "remediate at human speed" while attackers could weaponize findings far faster. Finding isn't fixing, and the gap between them is where the optimistic case is most exposed.

How does the June 9, 2026 release of Claude Fable 5 fit in?
It is a real-world example of the "if labs are careful" condition. Fable 5 is a public, Mythos-class model whose most dual-use capabilities — cybersecurity, biology and chemistry, and distillation — are gated behind conservative classifiers that fall back to a less capable model (Claude Opus 4.8) in under 5% of sessions, while the unrestricted version, Mythos 5, stays inside the vetted Glasswing partner program. Whether that is careful enough is debatable, but it is a concrete instance of the gated-release strategy Anthropic argues is needed to keep the short-term advantage from attackers.