Sam Altman said in early 2025 that he believed AGI was “coming soon.” Demis Hassabis has spent his entire career building toward it. Yann LeCun thinks we’re nowhere close and that the whole framing is wrong. Meanwhile, OpenAI quietly updated its internal definition of AGI to something that conveniently excludes its own most capable systems from triggering certain agreements with Microsoft. The word gets used constantly, defined rarely, and understood almost never. That’s a problem — because how you define AGI shapes everything: policy, investment, research priorities, and whether the thing we’re building should excite or terrify you.
The Definition Problem: AGI Means Different Things to Different People
There is no universally agreed-upon definition of Artificial General Intelligence. This is not a minor academic quibble — it’s a fundamental problem that makes most AGI discourse nearly useless unless you know which definition the speaker is working from.
Here are the definitions actually in play right now:
- The “human-level” definition: AGI is a system that can perform any intellectual task a human can perform, at roughly human level. This is the most common pop-science version.
- The “economic” definition: OpenAI has at various points defined AGI as a system capable of generating $100 billion in value, or outperforming humans at most economically valuable work. This is a business-friendly frame that keeps the goalposts movable.
- The “generalist autonomy” definition: A system that can learn new tasks from scratch without being specifically trained for them — what Andrej Karpathy might call a genuinely autonomous agent that improves itself through experience.
- The “superintelligence” conflation: Some people use AGI and ASI (Artificial Superintelligence) interchangeably, which is a mistake. AGI is human-level. ASI is beyond human-level. They’re different problems with different timelines and different implications.
- LeCun’s definition (sort of): LeCun has argued publicly and repeatedly — on X, at conferences, in papers — that current deep learning architectures are fundamentally insufficient for AGI, and that what we’re building today isn’t even on the right path. For him, real general intelligence requires common sense, world models, and causal reasoning that transformers don’t have.
The reason this matters practically: when OpenAI says they’ve achieved or nearly achieved AGI, and when LeCun says they’re nowhere close, they’re often not even disagreeing about the same thing. They’re using different maps to describe different territories.
What Today’s Best Systems Can and Can’t Do
As of early 2026, the most capable publicly available models include GPT-4o and the o3/o4-mini family from OpenAI, Claude 3.7 Sonnet from Anthropic, Gemini 2.0 Pro from Google DeepMind, and Grok 3 from xAI. These are genuinely impressive systems. They also have real, well-documented limitations that matter for any honest AGI assessment.
What they’re genuinely good at:
- Reasoning through complex multi-step problems when given good context (o3 in particular shows strong performance on PhD-level science benchmarks)
- Writing, coding, summarization, translation — often at or above average human professional level
- Synthesizing large bodies of text quickly (Claude’s 200K context window, Gemini’s 1M token context)
- Passing bar exams, medical licensing exams, and coding interviews
- Operating as agents that can browse the web, write and execute code, and take actions in digital environments
What they still can’t reliably do:
- Maintain consistent goals and memory across truly long autonomous tasks without drift or failure
- Learn permanently from new experiences without retraining (they don’t update their weights from conversation)
- Reason robustly about physical reality and embodied situations — they don’t have a world model in the way a child does
- Know when they don’t know something — hallucination remains a fundamental, unsolved problem
- Transfer skills learned in one domain to genuinely novel domains the way a human naturally would
The honest read is this: current models are extraordinarily capable narrow systems that look general because they were trained on an extraordinarily broad dataset. That’s not the same as being general in the way the term was originally meant. To understand what is actually happening now with these systems, it helps to separate genuine capability gains from definitional sleight of hand.
The Benchmark Problem: Why Test Scores Don’t Settle the Debate
Every few months, a new model posts a score that sounds like a threshold has been crossed. o3 scored 87.5% on ARC-AGI, a benchmark specifically designed by François Chollet to resist the kind of pattern-matching that LLMs excel at. That was a legitimate milestone — the benchmark was supposed to be hard for these systems, and it still mostly is. But here’s the issue Chollet himself raised: even that result doesn’t demonstrate the kind of flexible, efficient learning from minimal examples that the benchmark was designed to test for. The model used significantly more compute than a human would to achieve those scores.
Other benchmarks have fallen faster than expected — MMLU, HumanEval, MATH — and each time the field temporarily treats it as a sign of approach toward AGI, then recalibrates when the limitations become apparent in practice. The benchmarks measure performance on specific problem formats. AGI, whatever it is, would presumably transfer to formats it has never seen.
Peter Diamandis and Salim Ismail, writing and speaking about exponential technology curves, would point out that benchmark improvements are happening at an accelerating rate regardless of definitional debates — which is true and worth noting. The trajectory is real even if the destination is contested.
The AGI Timeline Debate: Who Believes What and Why
Here’s a rough map of serious positions as of early 2026:
| Who | Rough Timeline | Core Reasoning |
|---|---|---|
| Sam Altman (OpenAI) | Within a few years | Scaling continues to yield capabilities; agents will accelerate research itself |
| Demis Hassabis (Google DeepMind) | Within a decade, possibly sooner | Combining deep learning with neuroscience-inspired architectures; AlphaFold as proof of concept for superhuman scientific reasoning |
| Yann LeCun (Meta AI) | Current path doesn’t lead there | Transformers lack world models, causality, persistent memory — new architectures needed entirely |
| Andrej Karpathy (independent) | Genuinely uncertain, but sooner than most people expect | Each capability gap closes faster than predicted;
Recent PostsGoogle Just Bet $40 Billion on Anthropic: Inside the Circular Finance Powering the AI Race Google will invest $10 billion now and up to $30 billion more in Anthropic, creating the largest single company bet on an AI rival in history. The deal reveals how circular finance is reshaping the... GPT-5.5: OpenAI Stops Selling a Chatbot and Starts Selling an Agent OpenAI released GPT-5.5 on April 23, 2026, positioning it as an autonomous agent rather than a chatbot. With 82.7% on Terminal-Bench 2.0, a verified mathematical proof, and $30 per million output... |
