Every major AI lab is racing to build systems that can reason, plan, and act autonomously. OpenAI, Google DeepMind, Anthropic — they’re all betting heavily that scaling large language models is the path to artificial general intelligence. And then there’s Yann LeCun, Chief AI Scientist at Meta, Turing Award winner, and arguably the most credentialed voice in the room willing to say, loudly and repeatedly: you’re all going in the wrong direction. That tension — between the scaling optimists and LeCun’s persistent skepticism — is one of the most useful intellectual debates happening in AI right now. Not because LeCun is definitely right. But because engaging seriously with his arguments forces you to think harder about what intelligence actually is, and what we’re actually building.
Who Is Yann LeCun, and Why Should You Care What He Thinks?
LeCun isn’t a contrarian for sport. He’s one of the three people (alongside Geoffrey Hinton and Yoshua Bengio) who won the 2018 Turing Award for foundational work on deep learning — the same family of techniques that powers everything from image recognition to ChatGPT. His work on convolutional neural networks in the 1980s and 90s is the reason your phone can unlock with your face. He’s not a skeptic of AI. He’s a skeptic of a specific approach to AI, which is a very different thing.
Since joining Meta (then Facebook) as VP and Chief AI Scientist in 2013, LeCun has had a platform that most academics can only dream of. He uses it constantly — on X (formerly Twitter), in academic papers, in conference keynotes, and in a growing number of podcast appearances. His positions are specific, technical, and often deliberately provocative. He thinks Geoffrey Hinton and others who warn about near-term existential AI risk are wrong. He thinks GPT-4, Claude, and Gemini are impressive but fundamentally limited. And he has a detailed alternative vision for what real machine intelligence would look like. Whether you agree with him or not, his framework is worth understanding. If you want a broader map of the people shaping these debates, the 25 AI Thinkers and Creators Worth Following in 2026 is a useful companion.
The Core Argument: LLMs Can’t Think, They Predict
LeCun’s central critique of large language models is that they are, at their core, next-token predictors. They learn statistical patterns from text. They don’t build a model of the world. They don’t understand causality. They can’t plan multi-step actions in the physical world. And they hallucinate — not occasionally, but structurally — because generating plausible-sounding text and generating true text are not the same objective.
He’s made this point in various forms across multiple venues. In a widely shared 2022 paper titled “A Path Towards Autonomous Machine Intelligence,” LeCun laid out what he thinks is actually required for human-level AI: a system that can build persistent world models, reason about the future, plan hierarchically, and learn from much less data than current LLMs require. He argues that humans learn to understand the physical world primarily through sensorimotor experience — watching, touching, moving through space — not through reading text. A child who has never read a single word still develops a rich model of gravity, object permanence, and social dynamics. LLMs skip all of that and try to reconstruct world understanding from text alone. LeCun thinks this is a fundamental architectural mistake.
His proposed alternative is something he calls the Joint Embedding Predictive Architecture (JEPA). Rather than predicting exact pixel values or token sequences, JEPA learns to predict abstract representations of future states — building a kind of compressed world model. Meta’s AI research team (FAIR) has been actively working on this, with Image-JEPA and V-JEPA released as research models. These are genuinely interesting research directions, though they remain far from deployed, general-purpose systems. LeCun is honest about this — he tends to describe the path to his vision as a decade or more of hard research, not something that will emerge from the next training run.
The Debates That Made Him Famous (and Controversial)
LeCun doesn’t just publish papers. He argues, publicly and often. A few specific exchanges are worth knowing because they illuminate where the real fault lines are.
LeCun vs. Hinton on AI risk: When Geoffrey Hinton left Google in 2023 and began speaking publicly about existential risks from AI, LeCun pushed back hard. His position, stated across multiple interviews and posts, is roughly: current AI systems are not remotely close to human-level intelligence, and treating them as if they are leads to misallocated concern. He thinks the AI doom narrative is not just premature but actively harmful because it distracts from real, present-day harms — bias, misuse, economic disruption — and gives AI systems credit for capabilities they don’t actually have. Hinton disagrees. Both are serious scientists. The disagreement is real and unresolved.
LeCun vs. the scaling hypothesis: The dominant assumption in most frontier AI labs is that intelligence scales with data, compute, and model size. LeCun’s challenge is that even a perfectly scaled LLM is still doing next-token prediction over text, and text is a lossy, impoverished representation of the world. He’s compared this to trying to understand physics by reading physics textbooks without ever running an experiment. Sam Altman and Demis Hassabis have both, in various ways, expressed more optimism that emergent capabilities will continue to surprise us. LeCun’s counterpoint: emergence is interesting but not sufficient — architecture matters.
Podcast appearances worth finding: LeCun has appeared on Lex Fridman’s podcast multiple times — Episodes 212 and 306 are particularly substantive, covering both his technical arguments and his broader views on AGI timelines and AI safety. He’s also appeared on the Dwarkesh Patel podcast in 2023, where he goes deep on the JEPA architecture and why he thinks the AI safety discourse is misdirected. These are long-form conversations where his arguments are laid out in detail rather than compressed into tweets, and they’re worth the time if you want to engage with his actual position rather than a summary of it.
Where LeCun Is Probably Right (and Where He Might Be Wrong)
Engaging seriously with LeCun means being honest about both sides of his ledger.
Where he’s likely right: LLMs do hallucinate structurally, and this is a real limitation for high-stakes applications. The best models — GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro — still confabulate facts, fail basic physical reasoning tasks, and struggle with multi-step planning in novel environments. Autonomous agents built on LLMs (like early versions of AutoGPT or Devin) have shown real limitations in reliability over long task horizons. LeCun’s intuition that something important is missing feels grounded in observable behavior, not just theoretical concern.
Where he might be underestimating LLMs: The rate of improvement in LLMs has consistently surprised experts, including skeptics. Reasoning models like OpenAI’s o3 and o1, Google’s Gemini 2.0 Flash Thinking, and Anthropic’s extended thinking in Claude 3.7 Sonnet show that explicit chain-of-thought reasoning at inference time meaningfully improves performance on tasks LeCun would have predicted LLMs couldn’t handle — complex math, multi-step logic, even some physical reasoning problems. Whether this constit
LeCun’s Specific Technical Arguments Against LLMs (And What the Evidence Actually Shows)
LeCun’s critique isn’t vibes-based. He has three concrete technical claims, each pointing to a different structural limitation. Here they are, stated precisely, with the evidence he cites.
Claim 1: LLMs Have No World Model
A world model, in LeCun’s framing, is an internal representation that lets a system predict the consequences of actions before taking them. You don’t have to touch a hot stove to know it will burn you. You simulate the outcome first. LLMs have no such thing. They have token distributions — statistical summaries of what text looks like, not causal maps of how the world works.
His 2022 paper “A Path Towards Autonomous Machine Intelligence” (available on OpenReview) lays this out explicitly. The paper proposes a modular architecture he calls JEPA — Joint Embedding Predictive Architecture — which learns representations in an abstract embedding space rather than predicting raw pixel or token outputs. The key idea: a system that predicts in abstract space can build compressed, causal representations of the world. A system that predicts the next token is optimizing for something else entirely.
The practical consequence LeCun draws: LLMs will always struggle with physical intuition, spatial reasoning, and any task where the right answer depends on simulating a process rather than pattern-matching to prior text. This is why GPT-4 can write a recipe for a soufflé but will confidently describe physically impossible outcomes when asked about novel mechanical systems it has no training examples for.
Claim 2: LLMs Cannot Plan
Planning, in a technical sense, means searching over a space of possible action sequences to find one that achieves a goal. It requires being able to evaluate intermediate states — to say “if I do X, then Y becomes possible, and that leads toward Z.” LeCun argues LLMs cannot do this because they have no persistent state between tokens, no ability to simulate forward, and no mechanism to evaluate whether a partial plan is on track.
He demonstrated this point publicly in a 2023 post on X that got significant attention: he argued that a 3-year-old child can stack four blocks reliably, something no LLM-driven robot can do from scratch without extensive additional engineering. The gap isn’t data. It’s that the child has a model of gravity, balance, and object permanence built from physical interaction. The LLM has descriptions of block-stacking.
The evidence in the literature supports a version of this. The 2022 paper “Large Language Models Still Can’t Plan” by Kambhampati et al. at Arizona State tested GPT-3 and GPT-4 on standard planning benchmarks from the automated planning community (Blocksworld, Logistics, etc.). Performance was poor and degraded as plan length increased — exactly what you’d expect from a system doing pattern matching rather than search. Kambhampati, who runs the Yochan lab and has engaged directly with LeCun’s arguments, concluded that LLMs need to be paired with external planners to be reliable on multi-step tasks.
Claim 3: LLM Reasoning Is Autocomplete, Not Inference
LeCun’s third claim is that what looks like reasoning in LLMs is mostly sophisticated pattern completion. When GPT-4 solves a math problem, it’s not running an algorithm — it’s generating tokens that look like the solution to math problems it has seen. This works often enough to be impressive. It breaks down in ways that actual reasoning wouldn’t.
The clearest public evidence for this: the “reversal curse” paper published by Berglund et al. in 2023 showed that GPT-4 trained on “A is B” does not reliably learn “B is A.” A system doing logical inference would handle both directions identically. A system doing token prediction gets tripped up by the direction it saw in training. That’s not a bug in the model — it’s structural evidence of what the model is actually doing.
LeCun’s position: this isn’t fixable by scaling. You can’t reach genuine reasoning by adding more parameters to a next-token predictor. You need a different objective function.
A Practical Stress-Test: Finding LeCun’s Predicted Failure Modes in Your Own LLM Product
If you’re building something with GPT-4o, Claude 3.5, Gemini 1.5, or any other LLM, LeCun’s framework gives you a concrete checklist of where your system is most likely to fail in production. Here’s how to actually run that test.
Test 1: Multi-Step Planning Under Novel Constraints
Give your LLM a task that requires a plan of 6 or more steps where at least one constraint isn’t present in common training examples. Don’t use “plan a trip to Paris” — that’s heavily represented in training data. Use something like: “A warehouse has three robots, two charging stations, and five delivery zones. Robot A is currently charging, Robot B is in Zone 3 with a low battery, and Robot C is idle in Zone 1. A package needs to move from Zone 5 to Zone 2. Generate a step-by-step coordination plan that avoids any robot running out of battery.”
Watch for: confident plans that violate stated constraints, plans that ignore the battery state of Robot B, and plans that change the problem setup mid-response. Per LeCun’s prediction, the model will generate plausible-sounding sequences but won’t reliably track state across steps. Test this with at least 10 variations. In most LLM products without external state tracking, failure rates on constraint-heavy planning tasks above 5 steps are high enough to matter in production.
Test 2: Physical Intuition Outside Training Distribution
Describe a physically novel scenario — one involving uncommon materials or geometries — and ask for a prediction. Example prompt: “A hollow aluminum sphere 30cm in diameter with a wall thickness of 2mm is placed on top of a flat rubber surface. A 5kg steel cylinder is placed on top of the sphere. Describe exactly what happens over the next 5 seconds, including any deformation or movement.” There is no single correct answer to memorize here. A system with a world model would reason from material properties. A system doing token prediction will generate something that sounds plausible but may be physically inconsistent.
Look for: internal contradictions within the same response, confident assertions about outcomes that contradict each other if you ask a follow-up, and refusal to express appropriate uncertainty. This is LeCun’s world model gap showing up directly.
Test 3: Causal Direction Reversal
This is the reversal curse test applied to your domain. If your product uses an LLM to answer questions about your own documentation, run this: take 20 factual pairs from your docs (e.g., “Feature X requires Setting Y to be enabled”), feed the model questions in both directions (“Does Feature X require Setting Y?” and “What features require Setting Y to be enabled?”), and compare accuracy. Per Berglund et al., you will likely see asymmetric performance. The direction the fact appears in training data will have better recall. This matters if your product makes claims in any domain where both directions of a factual relationship are operationally important — compliance, medical, legal, technical support.
Decision Framework: What to Do With What You Find
| Task Type | LeCun’s Predicted LLM Weakness | Better Architecture or Mitigation | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Multi-step planning with hard constraints | State drift, constraint violation, no forward simulation | Pair LLM with symbolic planner (e.g., PDDL solver) or use explicit state tracking in code | ||||||||||||||||||||
| Physical world predictions | No world model, outputs pattern-matched plausibility | Physics simulation engine; LLM handles only language interface | ||||||||||||||||||||
| Novel mathematical or logical reasoning | Autocomplete mimics reasoning; fails outside training distribution |
| Problem Type | LLM Reliability (per LeCun’s evidence) | Better Architecture / Approach |
|---|---|---|
| Text summarization, drafting, reformatting | High — this is pattern completion, what LLMs are built for | LLMs are fine here |
| Factual recall from training data | Medium — degrades with recency, specificity, and niche domains | LLM plus retrieval-augmented generation (RAG) |
| Multi-step reasoning with dependencies | Low — errors compound, no backtracking mechanism | Neuro-symbolic systems, explicit planners like PDDL solvers, or LLM plus verifier |
| Physical world interaction and robotics | Very low — no world model, no sensorimotor grounding | JEPA-style architectures, model-based reinforcement learning |
| Long-horizon autonomous agents | Very low — planning without world model fails past a few decision points | Hybrid systems: LLM for language interface, separate planning and memory modules |
| Learning from small amounts of novel data | Very low — sample inefficiency is structural, not a tuning problem | Few-shot learning architectures, JEPA, structured world models |
The point isn’t to abandon LLMs. It’s to stop using them where LeCun’s analysis says they’ll structurally fail, and start building the hybrid architectures that compensate for those specific gaps.
Recent Posts
Google Just Bet $40 Billion on Anthropic: Inside the Circular Finance Powering the AI Race
Google will invest $10 billion now and up to $30 billion more in Anthropic, creating the largest single company bet on an AI rival in history. The deal reveals how circular finance is reshaping the...
GPT-5.5: OpenAI Stops Selling a Chatbot and Starts Selling an Agent
OpenAI released GPT-5.5 on April 23, 2026, positioning it as an autonomous agent rather than a chatbot. With 82.7% on Terminal-Bench 2.0, a verified mathematical proof, and $30 per million output...
