On May 20, an OpenAI reasoning model did something no AI system has done before: it autonomously produced a valid mathematical proof disproving a conjecture that had stood unchallenged since 1946. Nine leading mathematicians, including Fields Medalist Timothy Gowers, verified the result and co-authored a companion paper on arXiv.
The problem is known as the Erdős unit distance conjecture, one of the most famous open questions in discrete geometry. The solution marks the first time AI has solved a prominent open problem central to an entire subfield of mathematics, and the reaction from the math community has been unequivocal.
Gowers said he would have “recommended acceptance without any hesitation” if the proof had been submitted to the Annals of Mathematics, the field’s most prestigious journal. Terence Tao, another Fields Medalist, called it “perhaps the most unambiguous instance” of AI solving an open mathematical problem.
What Erdős Asked in 1946
Paul Erdős posed a deceptively simple question: given n points on a flat plane, how many pairs of points can be exactly one unit apart? For 80 years, mathematicians believed that square grids were essentially the best arrangement. No one could prove it definitively, but no one could beat the grid either.
The conjecture held that the number of unit-distance pairs could not exceed n^(1+o(1)), meaning the grid was essentially optimal. Generations of mathematicians tried and failed to find a construction that performed meaningfully better.
What the Model Actually Did
OpenAI’s reasoning model connected the geometry problem to an entirely different branch of mathematics: algebraic number theory. Rather than working within the traditional geometric toolkit, the model used infinite class field towers and Golod-Shafarevich theory to construct higher-dimensional lattices with special symmetries, then projected them back into two dimensions.
The result was an infinite family of point configurations that produce n^(1+δ) unit-distance pairs for a fixed δ > 0, a polynomial improvement over the grid. Princeton mathematician Will Sawin later refined the exponent to δ = 0.014.
One percent more pairs per doubling of the point count sounds small. The mathematical significance is enormous. Erdős claimed virtually no gain was possible. The model proved otherwise with a construction no human had found in eight decades.
Sébastien Bubeck, a researcher at OpenAI, offered a revealing characterization: “The model did not invent something fundamentally new. It just executed like an amazing mathematician.” The AI applied existing tools in an unexpected combination, connecting two fields that no human had thought to bridge in this context.
The January 2025 Debacle
OpenAI has been here before, and it did not go well.
In January 2025, then VP Kevin Weil posted on X that “GPT-5 found solutions to 10 (!) previously unsolved Erdős problems and made progress on 11 others.” Thomas Bloom, the mathematician who maintains the official Erdős Problems database, called the claim “a dramatic misrepresentation.” The model had simply surfaced solutions already sitting in published literature and presented them as original discoveries. Weil deleted the post.
The contrast with May 2026 is deliberate. OpenAI published the proof on arXiv alongside a companion remarks paper co-authored by nine external mathematicians: Noga Alon (Princeton), Thomas Bloom, Timothy Gowers, Daniel Litt (University of Toronto), Will Sawin (Princeton), Arul Shankar, Jacob Tsimerman (University of Toronto), Victor Wang, and Melanie Matchett Wood (Harvard).
The inclusion of Bloom is particularly telling. He was the loudest critic of OpenAI’s 2025 claim. His presence on the verification paper is not a simple endorsement; it is a signal that this proof is materially stronger than what came before.
What the Mathematicians Said
The reactions from the verification team go beyond polite acknowledgment.
Daniel Litt called it “the unique interesting result produced autonomously by AI so far.” Jacob Tsimerman pointed to a structural advantage AI holds over human mathematicians: “AIs have an edge. It’s not just that they can try all known methods. They can play for longer.”
Arul Shankar framed the implications broadly, stating the work demonstrates that “AI systems can move beyond assisting mathematicians and begin generating genuinely original ideas.”
Not everyone treated the result as unqualified vindication. Bloom noted that “the human still plays a vital role in discussing, digesting, and improving this proof.” The verification team shortened and generalized the original output, and Sawin improved the exponent. The model’s raw proof was completely valid, but the humans made it better. The final published version is a collaboration, not a solo AI achievement.
Melanie Matchett Wood offered perhaps the most provocative takeaway: “Maybe people should be spending more time playing devil’s advocate” regarding unproven conjectures. If an AI can disprove a conjecture that 80 years of human effort assumed was true, the field may need to reconsider which assumptions deserve that confidence.
What This Means for the AI Industry
The Erdős result is not a product demo. It is a capability proof that lands differently than coding benchmarks or chatbot evaluations.
For OpenAI specifically, it validates the company’s reasoning model investment at a moment when the industry debate has shifted toward whether frontier models are plateauing. Competitors including Anthropic and Google have made similar investments in extended reasoning, but none has produced a result at this level of mathematical significance.
For the broader industry, the proof forces a recalibration of what “AI reasoning” means. Most enterprise AI use cases today involve pattern matching, summarization, and code generation. A model that can autonomously connect two unrelated mathematical fields to solve an 80-year open problem is operating in a qualitatively different space.
Several mathematicians mentioned the risk of “proof indigestion,” where AI generates valid results faster than humans can digest, verify, and integrate them. If that scenario materializes across disciplines (biology, materials science, drug discovery), the bottleneck in scientific research shifts from generating hypotheses to evaluating them.
From an enterprise perspective, this is the clearest signal yet that reasoning models will eventually move beyond software development and customer service into research and development pipelines. The timeline is uncertain, but the direction is not.
The Redemption and What Comes Next
OpenAI needed this. The January 2025 embarrassment damaged the company’s credibility in the scientific community at exactly the wrong moment, as competitors closed the gap on reasoning benchmarks. Sixteen months later, the same company produced a result that Fields Medalists say belongs in the world’s top mathematics journal.
The lesson is not that AI can replace mathematicians. The lesson is that a general-purpose reasoning model, not a specialized theorem prover, produced something that no specialized system and no human mathematician managed in 80 years. When Gowers says the proof meets Annals of Mathematics standards, that is a specific, testable claim from someone whose judgment the entire field trusts.
The unit distance conjecture was one problem. Erdős posed roughly 1,500 problems over his career, and hundreds remain open. Mathematics is one discipline. If OpenAI’s reasoning models can bridge algebraic number theory and discrete geometry on their own, the question is no longer whether AI can do original science. The question is how fast, and in how many fields at once.
