Finding Isn't Fixing: The Remediation Bottleneck Everyone's Warning About

When Anthropic previewed Claude Mythos — the unreleased frontier model behind Project Glasswing — the headline numbers were about discovery. An AI agent, prompted once and left alone in an internet-isolated container, found a 27-year-old remote denial-of-service bug in OpenBSD's TCP stack, a 16-year-old flaw in FFmpeg's H.264 decoder, and a 17-year-old remote-code-execution vulnerability in FreeBSD's NFS implementation that it then exploited fully autonomously, planting an SSH key by chaining six network requests. The winning run on the OpenBSD bug reportedly cost under $50 in compute.

That is the part of the story everyone repeats. It is not the part that should keep security leaders awake.

Buried in Anthropic's own preview is a single statistic that reframes the entire enterprise: "fewer than 1% of the potential vulnerabilities we've discovered so far have been fully patched by their maintainers." Finding has become industrial. Fixing has not. And the practitioners who study this for a living — at OWASP, at SANS, at the Cloud Security Alliance, at firms like ProCircular — are converging on the same uncomfortable conclusion: the bottleneck is no longer the bug hunt. It is everything that has to happen after.

Why finding got cheap and fixing didn't

For most of software's history, discovering a serious vulnerability in mature, widely-deployed code was expensive, specialized work. It required a skilled human, deep familiarity with a codebase, and time. That scarcity shaped the entire security economy — bug bounties, pentest engagements, the long tail of unfound bugs that simply sat there because nobody had looked hard enough.

The Mythos preview attacks that scarcity directly. Anthropic describes a pipeline that runs fully autonomously after a single prompt: Claude Code drives the model, AddressSanitizer acts as a near-perfect crash oracle with what Anthropic says is effectively zero false positives, files are ranked one-to-five for bug-likelihood, and a final agent re-reviews every report before a human ever sees it. On the CyberGym benchmark, the system card scores Mythos at 0.83 versus 0.67 for the prior Claude Opus generation (Anthropic's Glasswing marketing renders the same result as 83.1% versus 66.6%). When Anthropic's contractors manually reviewed 198 reports, 89% matched Claude's severity rating exactly and 98% landed within one level.

The economics invert. Anthropic notes that turning public vulnerability identifiers into functional exploits — historically the hardest, most human-intensive step — now "happens much faster, cheaper, and without intervention," with one complete exploit pipeline finishing in under a day for under $2,000.

Patching obeys none of these new economics. A fix still has to be written by a maintainer who understands the code's invariants, reviewed so it doesn't introduce a regression, released, and then — the part that quietly defeats everyone — actually deployed across every system running the vulnerable version. None of that scales with GPU spend. It scales with human attention, organizational process, and the patience of volunteer maintainers who were already stretched thin. AI made one side of the ledger nearly free while leaving the other side exactly as expensive as it has always been.

Anthropic agrees the constraint is downstream

What makes the Mythos preview unusual is that the vendor does not pretend otherwise. Anthropic's stated reason for disclosing so little is itself a remediation argument: "Over 99% of the vulnerabilities we've found have not yet been patched, so it would be irresponsible for us to disclose details about them." The company has contracted professional security contractors to manually validate every bug report before sending it, explicitly so it does not "flood maintainers with an unmanageable amount of new work" — and concedes that this human-in-the-loop process is part of why the patched fraction stays under 1%.

In other words, the throttle on Glasswing today is not the model's ability to find bugs. It is the human capacity to triage, validate, responsibly report, and ultimately deploy fixes — a constraint Anthropic accepts on both its own disclosure side and the maintainers' patching side. Its coordinated-disclosure design reflects that: SHA-3 cryptographic commitments to prove possession of unpublished findings, and a 90-plus-45-day window before details go public.

Anthropic's prescription for defenders is equally blunt about where the pressure lands. The preview states that "software users and administrators will need to drive down the time-to-deploy for security updates, including by tightening the patching enforcement window, enabling auto-update wherever possible." That is a vendor of the discovery technology telling its customers that the discovery is not their problem — the clock is.

The practitioners saying it louder

Independent security leaders have been making the same point, sometimes more sharply than Anthropic.

The clearest statement of the bottleneck thesis comes from Jim Sherlock, vice president of AI and cybersecurity R&D at ProCircular. Speaking about the Glasswing expansion, he framed it as a confession: "Finding bugs stopped being the hard part," he said, and "this expansion is Anthropic quietly admitting the hard part is now patching and deploying at scale." His warning about what happens to organizations that can't keep up is vivid: "Patch pipelines that are not able to handle the incoming flood of advisories and vulnerabilities will simply turn into a giant backlog full of good intentions." His counsel, per the reporting, is to fix what's already broken before the flood arrives, not during it.

In April 2026, four organizations — the SANS Institute, the Cloud Security Alliance, the OWASP GenAI Security Project, and [un]prompted — issued a joint emergency briefing titled The AI Vulnerability Storm: Building a Mythos-Ready Security Program. Its central warning is precisely the asymmetry at issue: "AI-driven vulnerability discovery tools can now generate working exploits at a rate that outpaces organizational patch cycles." The briefing adds a second-order problem that should worry anyone who treats patching as a leisurely activity: "Every patch also becomes an exploit blueprint, as AI accelerates patch-diffing and reverse engineering of fixes." According to the briefing, the mean time from disclosure to confirmed exploitation has collapsed to under a day in 2026, down from 2.3 years in 2019 — a figure to weigh as the briefing's own, not an independently established benchmark.

Jeff Williams, an OWASP founder and CTO of Contrast Security, pushes the analysis somewhere more provocative. He has argued that Mythos threatens not just bug bounties but "the whole idea that security can remain a find-and-fix afterthought," declaring that "the era of the security backlog is coming to a welcome end." His sharpest framing reframes the problem itself: "This is not a prioritization problem. It's an exposure-window problem." Williams's bet is that the durable answer is prevention rather than ever-faster patching — "software factories that can reliably produce secure code and the assurance case to prove it." That sits in productive tension with the patch-faster camp, and it is worth holding both in view: one school says fix the pipeline, the other says stop shipping the bugs in the first place.

What actually helps

The forwardable part of this analysis is not the alarm. It is that the remediation bottleneck is a known engineering and economic problem with known, if unglamorous, levers. None is a silver bullet; together they move the needle.

Software bills of materials (SBOMs). You cannot patch what you cannot find in your own estate. When a flaw lands in a widely-used library, the organizations that respond in hours rather than weeks are the ones that already know, machine-readably, where that component runs. SBOMs turn "are we affected?" from a fire drill into a query.

Automated patching and PR bots. Dependency-update automation — the Dependabot/Renovate pattern — and emerging AI-generated fix tooling attack the labor cost of remediation directly. If discovery is automated, the remediation side has to automate too: machine-generated, human-reviewed pull requests that arrive ready to merge. The asymmetry only closes if both sides scale.

Funding the maintainers. Many of the most consequential bugs live in open-source projects maintained by a handful of unpaid volunteers. Anthropic paired the Mythos preview with $4 million in donations — $2.5 million to OpenSSF's Alpha-Omega program via the Linux Foundation and $1.5 million to the Apache Software Foundation — an implicit acknowledgment that the patch-side bottleneck is partly a funding problem. Money that pays maintainers to triage and fix is money spent on the actual constraint.

Prioritization by exploitability, not just severity. A backlog of thousands is unworkable; a backlog ranked by real-world exploitability is a plan. Industry signals like exploit-prediction scoring and known-exploited-vulnerability catalogs let teams patch the bugs attackers are actually using first, rather than chasing every high CVSS score. (Williams's caution applies here: prioritization buys time, it doesn't close the exposure window — so treat it as triage, not a destination.) Vendors are already publishing the scale of the problem: Cloudflare has described roughly 2,000 AI-surfaced bugs, around 400 of them high or critical, and Mozilla reported 271 in a single Firefox release — vendor-stated figures that illustrate the volume defenders now face.

The disclosure window is the real clock. Every coordinated-disclosure timeline — Anthropic's 90-plus-45 days included — is a countdown to public detail. Because AI compresses the gap between a published fix and a working exploit, the only safe assumption is that the patch and the exploit arrive together. That makes deployment speed, not patch availability, the metric that matters: auto-update by default, tighter enforcement windows, and the ability to roll a fix to production in hours.

The Mythos preview is a genuine inflection in offensive and defensive capability, and Anthropic itself concedes the transition "may be tumultuous regardless," with the short-term advantage potentially favoring attackers if labs are careless about release. But the lasting lesson is narrower and more actionable than the headlines suggest. The hard problem was never going to stay at the front of the pipeline. AI found the bugs. The question every security leader now has to answer is whether their organization can fix them — and whether the maintainers they depend on can, too.

Frequently asked questions

What does the "fewer than 1% patched" figure actually mean?
It is Anthropic's own statement, in the Mythos preview, that fewer than 1% of the potential vulnerabilities the system has discovered have been fully patched by their maintainers. Anthropic attributes this partly to its deliberately slow, human-validated coordinated-disclosure process — designed not to overwhelm maintainers — and partly to the maintainers' own limited capacity to ship and deploy fixes. It is a snapshot of the remediation gap, not a claim that the bugs are unfixable.

Does Anthropic agree that remediation is the bottleneck?
Yes, in substance. Anthropic says it withholds vulnerability details precisely because over 99% remain unpatched, validates every report by hand to avoid flooding maintainers, and tells defenders they "will need to drive down the time-to-deploy for security updates." That is a discovery vendor pointing at the patching side of the ledger as the binding constraint.

Who are the credible voices warning about this, and do they agree with each other?
Jim Sherlock of ProCircular frames the expansion as Anthropic admitting "the hard part is now patching and deploying at scale." A joint SANS, Cloud Security Alliance, OWASP GenAI Security Project, and [un]prompted briefing warns that AI discovery "outpaces organizational patch cycles." Jeff Williams of OWASP and Contrast Security agrees the backlog model is breaking but reframes it as "an exposure-window problem," arguing the durable fix is preventing bugs upstream rather than patching faster. They agree on the diagnosis; they differ on the cure.

Why does every patch make the problem worse?
The SANS/CSA briefing warns that "every patch also becomes an exploit blueprint" — once a fix is public, AI accelerates patch-diffing and reverse-engineering to derive the exploit it closes. Combined with a disclosure-to-exploitation window the briefing measures in under a day, this means defenders should assume a working exploit ships at roughly the same moment as the patch, making deployment speed the decisive variable.

What can a security team actually do about it right now?
Maintain machine-readable SBOMs so you can locate affected components in hours; automate remediation with dependency bots and human-reviewed AI-generated fix PRs; prioritize by real-world exploitability (exploit-prediction and known-exploited catalogs) rather than raw severity; enable auto-update and tighten patch-enforcement windows; and, where you depend on open source, support the maintainers who do the fixing — the constraint Anthropic itself addressed with $4 million in donations to OpenSSF and the Apache Software Foundation.

Is Claude Mythos available to use today?
No. Mythos is an unreleased preview model powering the Glasswing coalition, not a generally available product. Anthropic has said vetted-customer access may broaden over time, but a general or public release is gated on safeguards the company describes as not yet developed across the industry. Treat any claim that it is "weeks away" from public release as unsupported.