What Is AGI? A Scored Checklist to Evaluate Any Model’s Proximity


white and black typewriter with white printer paper

Sam Altman said in early 2025 that he believed AGI was “coming soon.” Demis Hassabis has spent his entire career building toward it. Yann LeCun thinks we’re nowhere close and that the whole framing is wrong. Meanwhile, OpenAI quietly updated its internal definition of AGI to something that conveniently excludes its own most capable systems from triggering certain agreements with Microsoft. The word gets used constantly, defined rarely, and understood almost never. That’s a problem — because how you define AGI shapes everything: policy, investment, research priorities, and whether the thing we’re building should excite or terrify you.

The Definition Problem: AGI Means Different Things to Different People

There is no universally agreed-upon definition of Artificial General Intelligence. This is not a minor academic quibble — it’s a fundamental problem that makes most AGI discourse nearly useless unless you know which definition the speaker is working from.

Here are the definitions actually in play right now:

  • The “human-level” definition: AGI is a system that can perform any intellectual task a human can perform, at roughly human level. This is the most common pop-science version.
  • The “economic” definition: OpenAI has at various points defined AGI as a system capable of generating $100 billion in value, or outperforming humans at most economically valuable work. This is a business-friendly frame that keeps the goalposts movable.
  • The “generalist autonomy” definition: A system that can learn new tasks from scratch without being specifically trained for them — what Andrej Karpathy might call a genuinely autonomous agent that improves itself through experience.
  • The “superintelligence” conflation: Some people use AGI and ASI (Artificial Superintelligence) interchangeably, which is a mistake. AGI is human-level. ASI is beyond human-level. They’re different problems with different timelines and different implications.
  • LeCun’s definition (sort of): LeCun has argued publicly and repeatedly — on X, at conferences, in papers — that current deep learning architectures are fundamentally insufficient for AGI, and that what we’re building today isn’t even on the right path. For him, real general intelligence requires common sense, world models, and causal reasoning that transformers don’t have.

The reason this matters practically: when OpenAI says they’ve achieved or nearly achieved AGI, and when LeCun says they’re nowhere close, they’re often not even disagreeing about the same thing. They’re using different maps to describe different territories.

What Today’s Best Systems Can and Can’t Do

As of early 2026, the most capable publicly available models include GPT-4o and the o3/o4-mini family from OpenAI, Claude 3.7 Sonnet from Anthropic, Gemini 2.0 Pro from Google DeepMind, and Grok 3 from xAI. These are genuinely impressive systems. They also have real, well-documented limitations that matter for any honest AGI assessment.

What they’re genuinely good at:

  • Reasoning through complex multi-step problems when given good context (o3 in particular shows strong performance on PhD-level science benchmarks)
  • Writing, coding, summarization, translation — often at or above average human professional level
  • Synthesizing large bodies of text quickly (Claude’s 200K context window, Gemini’s 1M token context)
  • Passing bar exams, medical licensing exams, and coding interviews
  • Operating as agents that can browse the web, write and execute code, and take actions in digital environments

What they still can’t reliably do:

  • Maintain consistent goals and memory across truly long autonomous tasks without drift or failure
  • Learn permanently from new experiences without retraining (they don’t update their weights from conversation)
  • Reason robustly about physical reality and embodied situations — they don’t have a world model in the way a child does
  • Know when they don’t know something — hallucination remains a fundamental, unsolved problem
  • Transfer skills learned in one domain to genuinely novel domains the way a human naturally would

The honest read is this: current models are extraordinarily capable narrow systems that look general because they were trained on an extraordinarily broad dataset. That’s not the same as being general in the way the term was originally meant. To understand what is actually happening now with these systems, it helps to separate genuine capability gains from definitional sleight of hand.

The Benchmark Problem: Why Test Scores Don’t Settle the Debate

Every few months, a new model posts a score that sounds like a threshold has been crossed. o3 scored 87.5% on ARC-AGI, a benchmark specifically designed by François Chollet to resist the kind of pattern-matching that LLMs excel at. That was a legitimate milestone — the benchmark was supposed to be hard for these systems, and it still mostly is. But here’s the issue Chollet himself raised: even that result doesn’t demonstrate the kind of flexible, efficient learning from minimal examples that the benchmark was designed to test for. The model used significantly more compute than a human would to achieve those scores.

Other benchmarks have fallen faster than expected — MMLU, HumanEval, MATH — and each time the field temporarily treats it as a sign of approach toward AGI, then recalibrates when the limitations become apparent in practice. The benchmarks measure performance on specific problem formats. AGI, whatever it is, would presumably transfer to formats it has never seen.

Peter Diamandis and Salim Ismail, writing and speaking about exponential technology curves, would point out that benchmark improvements are happening at an accelerating rate regardless of definitional debates — which is true and worth noting. The trajectory is real even if the destination is contested.

The AGI Timeline Debate: Who Believes What and Why

Here’s a rough map of serious positions as of early 2026:

Who Rough Timeline Core Reasoning
Sam Altman (OpenAI) Within a few years Scaling continues to yield capabilities; agents will accelerate research itself
Demis Hassabis (Google DeepMind) Within a decade, possibly sooner Combining deep learning with neuroscience-inspired architectures; AlphaFold as proof of concept for superhuman scientific reasoning
Yann LeCun (Meta AI) Current path doesn’t lead there Transformers lack world models, causality, persistent memory — new architectures needed entirely
Andrej Karpathy (independent) Genuinely uncertain, but sooner than most people expect Each capability gap closes faster than predicted;

Ty Sutherland

Ty Sutherland is the Chief Editor of AI Rising Trends. Living in what he believes to be the most transformative era in history, Ty is deeply captivated by the boundless potential of emerging technologies like the metaverse and artificial intelligence. He envisions a future where these innovations seamlessly enhance every facet of human existence. With a fervent desire to champion the adoption of AI for humanity's collective betterment, Ty emphasizes the urgency of integrating AI into our professional and personal spheres, cautioning against the risk of obsolescence for those who lag behind. "Airising Trends" stands as a testament to his mission, dedicated to spotlighting the latest in AI advancements and offering guidance on harnessing these tools to elevate one's life.

Recent Posts