
Anthropic did not invent AI-driven vulnerability discovery, and it is not the only lab building it. When the company expanded Project Glasswing on June 2, 2026, it made a forecast about its own competition that is easy to skim past and important to read closely: in its words, "within 6 to 12 months, we expect that many other AI companies will have Mythos-class models." It added a second clause that gives the first its weight β those companies "could release them without safeguards that prevent misuse."
That is not a boast about being first. It is an argument about what being first is for. This piece maps the competitive landscape that forecast describes β OpenAI's cyber push, Google's vulnerability-finding and patching agents, and Anthropic's own positioning β and tests one specific claim: that the risk Anthropic names is real and prospective, not an accusation against the rivals shipping today. Elegant Software Solutions writes here as an analyst, not a participant.
Three labs are publicly building AI that finds or fixes software vulnerabilities, and they are not at the same place or pointed in the same direction.
Anthropic's contribution is Claude Mythos β a withheld frontier model that, inside Glasswing, autonomously hunts and in some cases exploits vulnerabilities in critical software. Anthropic has described partners collectively surfacing more than 10,000 high- and critical-severity flaws in under two months. It is the most aggressive capability in the field and, by design, the most tightly held. OpenAI's contribution is GPT-5.5-Cyber. Google's is a pair of research agents β Big Sleep for discovery, CodeMender for patching β both of which predate the Mythos moment and approach the problem more conservatively. Understanding why matters more than ranking them, because the differences in posture are exactly what Anthropic's forecast is about.
OpenAI rolled out GPT-5.5-Cyber on May 7, 2026, in limited preview to defenders responsible for critical infrastructure. The framing OpenAI chose is the part worth holding onto. Per OpenAI's announcement, the model "is primarily trained to be more permissive on security-related tasks" β it is not pitched as a leap in raw cyber capability over the general GPT-5.5 that shipped two weeks earlier. Help Net Security made the same point β the model is more permissive on security tasks, not more capable.
The gate around it is the more telling design choice. GPT-5.5-Cyber sits behind Trusted Access for Cyber (TAC), which OpenAI describes as "an identity and trust-based access framework for cybersecurity users." The logic is to give vetted, approved defenders "broader access to GPT-5.5's cybersecurity capabilities for defensive tasks while maintaining restrictions on requests that could contribute to real-world harm." Approved users see "lower classifier-based refusals" for authorized workflows β vulnerability identification and triage, malware analysis, binary reverse engineering, detection engineering, and patch validation β while requests that could enable harm stay blocked.
Read carefully, GPT-5.5-Cyber is not a counterexample to Anthropic's caution but the same instinct expressed differently: OpenAI is loosening refusals for verified defenders rather than handing a maximally capable cyber model to the open market β capability and access governed in tandem. Cybernews and others framed GPT-5.5-Cyber as OpenAI's answer to Anthropic's Mythos, but on the evidence of OpenAI's own description, the answer was a vetting framework as much as a model.
Google's work is older and arrives in two pieces, both still framed as research rather than product.
Big Sleep is Google's AI vulnerability-discovery agent, developed by DeepMind and Project Zero (it grew out of an earlier effort called Project Naptime). Its headline result came in November 2024, when Google reported that Big Sleep found a stack buffer underflow in the SQLite database engine. Google called it "the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software." Notably, the bug was caught in a development branch β flagged before it ever reached a release. A separate, later milestone underlined the defensive payoff: Google reported that Big Sleep helped identify and head off a live SQLite vulnerability (CVE-2025-6965) that threat actors were poised to exploit, letting Google cut it off beforehand. Big Sleep has since been reported finding additional flaws across open-source software.
CodeMender, announced in October 2025, is the other half β an agent that doesn't just find bugs but rewrites code to fix them. Per Google DeepMind, it "leverages the thinking capabilities of recent Gemini Deep Think models" and combines program-analysis tools β "static analysis, dynamic analysis, differential testing, fuzzing and SMT solvers" β with a multi-agent design that includes an "LLM-based critique tool" to validate that a change actually fixes the root cause without breaking functionality. Over roughly six months, Google reported, CodeMender "upstreamed 72 security fixes to open source projects, including some as large as 4.5 million lines of code."
The qualifier Google attaches is the one that matters here: "All patches currently undergo human review before submission," inside a broader posture that "agents must have well-defined human controllers, their powers must be carefully limited, and their actions and planning must be observable." Google is gating not at the access layer like OpenAI but at the autonomy layer β keeping a human in the loop on the consequential action (the patch) even as the agent does the heavy lifting.
Put the three side by side and the surprising finding is convergence, not a race to the bottom. Anthropic withholds the most capable cyber model and ships the safeguarded sibling broadly. OpenAI loosens refusals only for identity-vetted defenders. Google keeps its patching agent human-reviewed and its discovery work inside a research program. Each of the three frontier labs publicly working this problem has chosen some brake β access vetting, autonomy limits, or outright non-release.
This is why it would be a category error to read Anthropic's forecast as an attack on OpenAI or Google. Nothing in Anthropic's writing, or in these competitors' own descriptions, says that OpenAI ships recklessly or that Google fixes bugs without review. The opposite is documented. The risk Anthropic names is prospective and field-wide: the worry is not the careful actors at the front of the pack today, but the diffusion of comparable capability over the next 6-to-12 months to entrants who feel no obligation to gate it. A safeguard is only as protective as the least cautious lab that holds the capability. The moment a Mythos-class model exists without a TAC-style gate, a human-review requirement, or a non-release decision behind it, the restraint of the careful labs stops protecting anyone downstream.
That is the precise structure of Anthropic's claim, and it is a claim about the future shape of the market, not a scorecard of present-day competitors.
The phrase "first mover with restraint" is ESS's framing, not an Anthropic slogan. But it is grounded in documented behavior rather than rhetoric, and the cleanest demonstration of it landed the same day this article was being written.
On June 9, 2026, Anthropic released Claude Fable 5 and Claude Mythos 5 β and the split between them is the thesis made literal. Per Anthropic, Fable 5 is "a Mythos-class model that we've made safe for general use," with capabilities that "exceed those of any model we've ever made generally available." It is public, on Pro, Max, Team, Enterprise, and the API. But it carries hard brakes: in high-risk domains β offensive cybersecurity, biology and chemistry, and model distillation β Fable 5's classifiers detect the request and the response is "automatically handled by Claude Opus 4.8 instead," with users told when it happens. Mythos 5 is, in Anthropic's words, "the same underlying model as Fable 5, but with the safeguards lifted in some areas," and it stays gated β deployed initially through Glasswing to vetted cyberdefenders and infrastructure providers, with "the strongest cybersecurity capabilities of any model in the world." (Anthropic notes the naming itself: "Fable is from the Latin fabula, 'that which is told,' akin to the Greek mythos. The safeguards are what distinguish the two models.")
That is "first mover with restraint" enacted, not asserted: ship the Mythos-class capability the public can have with safeguards; keep the unsafeguarded version behind a vetting wall. It is the same instinct visible in the original decision to withhold Mythos Preview from general availability β a choice Anthropic framed as a voluntary dual-use judgment β and in the Glasswing rationale of getting defenders the tool "before models with similar capabilities become broadly available." The behavior is consistent across April's non-release and June's split launch: capability and safeguard travel together, and the most dangerous configuration is the one held back.
The forecast is the load-bearing beam under all of it. If Anthropic is right that "many other AI companies will have Mythos-class models" within a year, then the value of its restraint depends entirely on whether the next labs match it. A first mover that gates buys defenders time. A first mover that gates while everyone else ships unsafeguarded buys defenders nothing β it just slows one runner in a race the field is still running.
This is neither a settled win nor an empty gesture, and a fair read keeps three points in view at once.
First, the present-day picture is more reassuring than the headlines suggest: the labs at the frontier of AI-cyber capability are, today, all applying brakes of one kind or another β which cuts against the "uncontrolled arms race" framing the topic invites. Second, that convergence is not durable. Each brake is a choice, not a law: OpenAI could widen TAC, Google could take the human out of CodeMender's loop, and Anthropic's gate on Mythos is self-imposed. Meanwhile the capability is diffusing β Anthropic says within months β to actors who made no such choice. Third, the safeguard Anthropic says still doesn't exist is the quiet tell: even while shipping Fable 5, it describes the goal of cyber safeguards that "we (and, to our knowledge, all other AI developers) have yet to develop." The field is gating on access and autonomy because it cannot yet gate on the model's behavior itself.
ESS takes no side on whether voluntary restraint is enough β that is a policy question we examine elsewhere in this series. The narrower, defensible conclusion: the AI-cyber arms race is real, but in mid-2026 it is being run by participants who are mostly choosing to hold back. Anthropic's forecast is a bet that this politeness has a shelf life. Whether it does is the most important thing to watch over the next 6 to 12 months β and the labs themselves are the ones telling you to watch it.
Is GPT-5.5-Cyber more powerful than the regular GPT-5.5?
Not meaningfully, by OpenAI's own description. OpenAI says GPT-5.5-Cyber "is primarily trained to be more permissive on security-related tasks" rather than to significantly increase cyber capability beyond GPT-5.5. The change is mostly about loosening refusals for vetted defenders, not raising the capability ceiling. Access runs through OpenAI's identity-based Trusted Access for Cyber framework, which keeps restrictions on requests that could enable real-world harm.
What are Big Sleep and CodeMender, and are they new?
They are Google's AI security agents, and they predate the 2026 Mythos moment. Big Sleep (DeepMind/Project Zero) finds vulnerabilities β Google reported in November 2024 that it found a previously unknown SQLite memory-safety bug, "the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software," and later helped head off a live SQLite exploit (CVE-2025-6965). CodeMender, announced October 2025, uses Gemini Deep Think models to fix bugs; Google reported it upstreamed 72 fixes to open-source projects, all human-reviewed before submission. Both remain framed as research.
Does Anthropic accuse OpenAI or Google of being reckless?
No, and it would be inaccurate to imply it does. Anthropic's forecast is that "within 6 to 12 monthsβ¦ many other AI companies will have Mythos-class models" and that those companies "could release them without safeguards that prevent misuse." That is a forward-looking, field-wide warning. The documented behavior of the named competitors actually shows restraint β OpenAI's vetting gate, Google's human-review requirement. The risk Anthropic names is about the future diffusion of capability, not about these two labs today.
What does "first mover with restraint" mean here?
It is ESS's framing for Anthropic's documented posture: demonstrate frontier cyber capability ahead of rivals, but hold back the most dangerous configuration. The clearest example is the June 9, 2026 release β Claude Fable 5 (a Mythos-class model "made safe for general use," with hard blocks in cybersecurity, biology, chemistry, and distillation that fall back to Claude Opus 4.8) launched publicly, while Claude Mythos 5 (the same model "with the safeguards lifted in some areas") stays gated to vetted Glasswing partners.
Why does Anthropic's restraint depend on its competitors?
Because a safeguard only protects downstream users if every holder of the capability applies one. If comparable models proliferate and even one lab releases an unsafeguarded version, attackers gain access regardless of how carefully the others behaved. Anthropic's own argument is that getting defenders the tool early helps "before models with similar capabilities become broadly available" β an advantage measured in time, which evaporates once the field catches up without matching restraint.
Is there a safeguard that fully solves the cyber-misuse problem today?
No. Even while releasing the safeguarded Fable 5, Anthropic has described the goal of cyber-capability safeguards that it β and, to its knowledge, all other AI developers β "have yet to develop." Today's brakes are gating mechanisms (vetted access, autonomy limits, non-release), not a fully solved control on what the model will do. That gap is part of why the next 6 to 12 months matter.
Discover more content: