Something shifted in late 2022 and has been accelerating ever since. Not gradually, the way technology usually moves — but in lurches. ChatGPT hit 100 million users in two months. GPT-4 passed the bar exam in the 90th percentile. Google, Microsoft, Meta, Apple, and dozens of startups reorganized entire product roadmaps around a single question: what do we do now that language models actually work? Andreessen Horowitz committed billions. Sam Altman testified before Congress. Geoff Hinton quit Google and started warning people. Yann LeCun called the whole thing overhyped. Both of them are probably partially right.
This post is an attempt to cut through the noise — the hype and the dismissal — and explain what is actually happening, why the pace feels different from previous tech cycles, and what it means for people building products, running businesses, or just trying to understand the world they’re living in.
Table of Contents
- What Actually Changed (and When)
- The Capabilities Landscape Right Now
- Why This Pace Feels Different From Previous Tech Cycles
- The Agentic Shift: From Chatbots to Autonomous Systems
- The Disagreements That Actually Matter
- What This Does to Society, Work, and Power
- A Practical Framework for Navigating the Inflection
- FAQ
- What to Watch Next
What Actually Changed (and When)
The Transformer Was the Unlock
The architecture that made modern AI possible — the transformer — was published by Google researchers in 2017 in a paper titled “Attention Is All You Need.” But it took years of scaling, compute investment, and engineering to understand what transformers could actually become. The insight that turned out to matter enormously: these models don’t just memorize; they generalize in ways that earlier neural architectures couldn’t. You could throw a model trained on text at a math problem it had never seen, and it would make reasonable progress. That was new.
Andrej Karpathy, who ran AI at Tesla and was one of OpenAI’s earliest researchers, has described language models as something closer to a “compressed, lossy simulation of the internet” — a representation of human knowledge and reasoning patterns that can be queried and extended. That framing helps explain why they’re useful in so many domains simultaneously. They’re not narrow tools. They’re general-purpose cognitive substrates, with real limitations, but strikingly broad reach.
The Scaling Hypothesis and Its Consequences
From roughly 2020 onward, OpenAI and others started publishing results suggesting that transformer models follow predictable scaling laws. More compute, more data, more parameters — and performance improves in ways you can forecast. This wasn’t obvious. It meant that companies willing to spend billions on training runs could buy capability improvements with some confidence. OpenAI published GPT-3 in 2020. It was impressive but clunky. GPT-4 arrived in 2023 and passed professional licensing exams across medicine, law, and finance. The delta between those two models, in three years, was not incremental.
ChatGPT as a Distribution Event
The specific thing that happened in November 2022 wasn’t a technical breakthrough — it was a distribution event. OpenAI wrapped a capable model in a clean interface with a free tier and let anyone use it. Within weeks, people were discovering that the model could write their cover letters, debug their code, explain legal documents, draft marketing campaigns, and tutor their kids in calculus. The bottleneck had never been purely capability; it had been access. ChatGPT removed the access barrier, and the use cases flooded in faster than anyone — including OpenAI — had anticipated.
The Capabilities Landscape Right Now
What Frontier Models Can Actually Do
As of mid-2025, the frontier models — GPT-4o and the o3 series from OpenAI, Claude 3.5 and 3.7 Sonnet from Anthropic, Gemini 1.5 and 2.0 Pro from Google DeepMind, and Llama 3 variants from Meta — are capable of tasks that would have seemed unrealistic five years ago. They can read and analyze 200,000-token documents (roughly a 500-page book) in a single context window. They can write, run, and debug code across multiple files. They can describe images, transcribe audio, and generate structured data from unstructured inputs. Some models now perform real-time web search, execute code in sandboxes, and interact with external APIs within a single conversation.
OpenAI’s o3 model, released in early 2025, introduced a reasoning mode that allows the model to “think” through problems over extended compute time before answering — analogous in some ways to how a human might work through a hard problem slowly rather than blurting out the first answer. On the ARC-AGI benchmark, which was designed specifically to test tasks that require genuine generalization rather than pattern matching, o3 achieved scores that had previously seemed unreachable by current architectures. That doesn’t mean AGI is here, but it does mean the ceiling has moved.
Where They Still Fail
It’s worth being specific about the failures, because the hype cycle tends to paper over them. Current models hallucinate — they confidently state false information, especially when operating at the edge of their training data. They struggle with tasks that require precise spatial reasoning or reliable multi-step arithmetic without tool use. They have no persistent memory across conversations by default (though this is changing with memory features in products like ChatGPT and Claude). They can be inconsistent: ask the same question twice and get different answers. And they remain brittle in ways that aren’t always predictable — an edge case that a human would navigate trivially can trip a frontier model.
Yann LeCun, Meta’s Chief AI Scientist, has made the argument that current large language models are fundamentally limited by their architecture — that they lack world models, genuine causal reasoning, and the kind of grounded understanding that comes from sensory experience with the physical world. His proposed alternative, Joint Embedding Predictive Architectures (JEPA), is still largely theoretical in practice. But his critique is worth taking seriously. Not because LLMs are useless — they clearly aren’t — but because assuming they’re on a straight-line path to human-level general intelligence may be overconfident.
Multimodality Is Bigger Than It Looks
The shift to multimodal models — systems that process text, images, audio, video, and code simultaneously — is underrated in most coverage. GPT-4o can take a photo of a math problem and solve it. Gemini 1.5 Pro can watch an hour of video and answer questions about specific moments. Google’s NotebookLM can turn a pile of documents into a podcast-style audio discussion between two AI voices. These aren’t demos — they’re production features people are using. The implication is that AI is no longer a text interface to information; it’s becoming an interface to any modality of information, which expands the set of tasks it can augment dramatically.
