Pushmeet Kohli & DeepMind: AI Solves Open Erdős Problems

🤖 Ghostwritten by Claude Opus 4.6 · Fact-checked & edited by GPT 5.4

Pushmeet Kohli & DeepMind: AI Solves Open Erdős Problems With Lean-Verified Proofs

On May 21, 2026, a Google DeepMind-affiliated research team posted arXiv preprint 2605.22763, reporting that an AI agent solved 9 of 353 open Erdős problems and proved 44 of 492 open conjectures from the Online Encyclopedia of Integer Sequences (OEIS). The paper says every accepted proof was formally verified in Lean, and that the compute cost was on the order of a few hundred dollars per solved problem. If those results hold up under community scrutiny, they mark a meaningful step beyond benchmark math: AI would be contributing new, machine-checked mathematical results on problems that had remained open for decades.

That is the core story. The more careful framing is that this is a preprint, not yet a final journal publication, and its strongest claims should be anchored to the paper itself rather than to press summaries or product branding layered on afterward. Still, the reported combination of novelty, formal verification, and relatively low cost makes this one of the most consequential AI-for-mathematics announcements of 2026.

Who Is Pushmeet Kohli and Why Does He Matter Here?

TL;DR: Pushmeet Kohli is a senior Google DeepMind research leader, and his relevance here is organizational: he is listed as an author on a paper that pushes DeepMind's math work from benchmark performance toward open-ended discovery.

Pushmeet Kohli is a vice president at Google DeepMind and has been publicly associated with the lab's work in AI for science. Before joining DeepMind, he spent years at Microsoft Research working in machine learning and computer vision. In the context of this paper, his significance is less about a single theorem and more about research direction: DeepMind has spent the past several years moving from systems that perform well on structured tasks to systems aimed at scientific and mathematical discovery.

That trajectory matters. DeepMind's earlier math milestones, including high-profile work on Olympiad-style problem solving, showed that AI systems could perform strongly on difficult but closed-form tasks. Open Erdős problems are different. They are not benchmark questions with hidden answer keys. They are research problems that, by definition, did not have known solutions when the system attempted them.

What the Paper Claims — and What Needs Caution

TL;DR: The paper's headline numbers are clear, but readers should distinguish between what the preprint explicitly states and what outside coverage may have inferred.

According to the arXiv preprint, the system achieved the following:

Metric	Reported result
Open Erdős problems solved	9 of 353
Longest-open solved problems	56 years
Open OEIS conjectures proved	44 of 492
Verification method	Lean proof assistant
Approximate cost per solved problem	A few hundred dollars

Those are the paper's central claims, and they are the right place to start. They are also unusually strong claims, so precision matters.

A few caveats are important:

The work is currently presented as an arXiv preprint.
The paper can support claims about reported results, authorship, and verification method.
Claims about product names, internal model branding, or the exact underlying model engine should be treated cautiously unless the paper or an official DeepMind source states them directly.

That distinction matters because AI coverage often compresses research systems, product labels, and model families into a single narrative. For executives or technical readers, the safer approach is simple: trust the preprint for the math claims, and treat media-added branding as secondary unless independently confirmed.

Why Lean Verification Is the Critical Detail

TL;DR: Lean is what turns these from plausible AI-generated arguments into formally checked proofs, which is why this result matters beyond headline numbers.

The most important technical detail in the paper is not the raw count of solved problems. It is the use of Lean for formal verification.

Lean is a proof assistant that checks whether each step of a proof follows from prior definitions, lemmas, and axioms. In ordinary mathematical practice, proofs are reviewed by humans, who are highly capable but not infallible. In AI-generated mathematics, that gap becomes more serious: language models can produce arguments that look elegant and convincing while still containing subtle errors.

Formal verification changes the standard of acceptance. If a proof is accepted by Lean, then its correctness does not rest on whether the prose sounds right or whether a reviewer missed a hidden gap. It rests on a machine-checkable derivation inside a formal system.

That does not eliminate every possible concern. A formally verified theorem still depends on the correctness of the formalization, the definitions used, and the translation from the original problem into Lean. But it dramatically reduces the main failure mode that has limited trust in AI-generated proofs: persuasive-looking reasoning that is actually wrong.

In that sense, Lean is not a side detail. It is the credibility mechanism that makes the paper notable.

Why the Cost and Success Rate Both Matter

TL;DR: The reported cost is striking, but the success rate is just as important: this looks like a promising research tool, not a universal theorem machine.

The paper's reported cost of a few hundred dollars per solved problem is one of its most provocative details. If accurate, it suggests that trying many formally checkable research paths may be far cheaper than many observers expected.

But the success rate deserves equal attention. Solving 9 of 353 open Erdős problems is impressive precisely because the denominator is large and the task is real research. At the same time, it means the system did not solve most of the problems it attempted. That is a sign of both progress and limitation.

The right interpretation is not that AI has automated mathematics wholesale. It is that, in at least some domains, an autonomous system can now search for novel proofs, pass successful ones through a formal verifier, and occasionally produce results that humans did not already know.

That combination has broader implications. Any field where discovery is expensive but verification is comparatively cheap could become a strong candidate for similar workflows. Software verification, protocol design, and some areas of formal methods fit that pattern better than open-ended domains where correctness is hard to specify.

What This Means Beyond Mathematics

TL;DR: The broader lesson is about search plus verification: when correctness can be checked mechanically, autonomous AI research becomes much more plausible.

The deepest implication of this result is structural, not mathematical.

Mathematics is a particularly clean test case because proof verification can be formalized. That makes it a useful model for other domains where the hard part is generating a candidate solution and the cheaper part is checking whether that solution satisfies strict constraints.

Examples include:

Software correctness: Tools such as Lean, Coq, Dafny, and TLA+ can help verify whether implementations or specifications satisfy required properties.
Protocol design: Distributed systems and cryptographic protocols can often be model-checked or formally analyzed after candidate designs are proposed.
Program synthesis and optimization: Candidate programs can be generated broadly, then filtered by tests, specifications, or formal constraints.

The common pattern is straightforward: if search is expensive but verification is reliable and cheap, autonomous systems become more useful. This paper suggests that AI may now be entering that regime for at least some classes of mathematical research.

Frequently Asked Questions

Q: What are Erdős problems?

Erdős problems are open mathematical questions associated with Paul Erdős, one of the most prolific mathematicians of the 20th century. They are especially prominent in combinatorics, graph theory, and number theory. Solving any genuine open Erdős problem is significant because these are research-level questions, not training-set exercises.

Q: Why does Lean verification matter so much here?

Lean matters because it checks proofs mechanically. That sharply reduces the risk of accepting an argument that merely looks correct. For AI-generated mathematics, this is crucial: formal verification provides a much stronger standard than human plausibility alone.

Q: Did the paper prove that AI can replace mathematicians?

No. The reported result shows that AI may be able to contribute original results in some formally structured areas. It does not show broad replacement of mathematicians, and the system reportedly failed on most of the open problems it attempted.

Q: Did the paper identify the exact model behind the agent?

The preprint should be treated as the primary source. If a specific model name or product brand is not stated there, readers should be cautious about repeating it as fact based only on secondary coverage.

Q: Why is the OEIS result important in addition to the Erdős result?

The OEIS result matters because it suggests the system was not limited to one narrow benchmark. Proving 44 of 492 open OEIS conjectures, if validated by the community, would indicate broader usefulness across formally expressible mathematical statements.

Key Takeaways

A May 21, 2026 arXiv preprint reports that an AI agent solved 9 of 353 open Erdős problems and proved 44 of 492 open OEIS conjectures.
The paper says all accepted proofs were formally verified in Lean, which is the main reason the result carries unusual credibility.
Two solved Erdős problems were reportedly 56 years old, underscoring that these were longstanding open questions.
The reported compute cost of a few hundred dollars per solved problem is striking, though it should be read in the context of a modest overall success rate.
The safest interpretation is not "AI has solved mathematics," but "AI may now be able to generate some original, formally verified research results in domains with cheap verification."

Conclusion

If the claims in arXiv:2605.22763 continue to hold up, this paper will stand as one of the clearest examples yet of AI producing new, formally verified mathematical knowledge rather than simply excelling on known benchmarks. Pushmeet Kohli's presence on the author list reflects the broader DeepMind strategy behind that shift: pairing large-scale search with rigorous verification. The larger lesson is likely to extend beyond mathematics. In fields where candidate solutions can be checked more cheaply than they can be discovered, autonomous AI systems are becoming much more credible research tools.