At GTC 2026, Jensen Huang stood in front of 3,500 developers representing a combined $40 trillion in market cap and made a claim that should probably get more attention than it did. Someone in the audience told him NVIDIA had just posted what might be the single best earnings print in recorded human history. His response: “It must be only recorded humanity. I’m sure somebody had better returns.” Then he got back to explaining why, structurally, this is not an anomaly. It’s the beginning of a permanent shift in how every company on earth spends money on software and infrastructure.
The short version: every internet service you use has already rebuilt itself around generative AI. Not as a side project. Not as a chatbot bolted onto a homepage. The major cloud providers — Meta, Google, AWS — took their entire capital expenditure budgets and converted them to generative and agentic AI infrastructure. And Huang’s argument is that this was rational, not experimental, because the ROI is proven. Search got better. Shopping got better. Ads got better. Social feeds got better. The companies that did this didn’t do it on a hunch. They did it because it worked.
What follows is an attempt to actually understand what Huang was saying at GTC — and why the implications stretch far beyond NVIDIA’s stock price (which, for what it’s worth, was down 30 cents on the day of the interview, on a stock up roughly 22,000% over the prior decade).
The Proof Is Already in the Products You Use Every Day
When Huang says the entire internet industry could take 100% of its capex and make it AI “because it’s better,” he’s making an empirical claim, not a sales pitch. His exact words: “We’ve proven it to be better.” That word — proven — is doing a lot of work. It means these companies ran the experiments, measured the outcomes, and the numbers came back positive.
Think about what that looks like in practice. Google’s search results now surface AI-generated overviews before organic links. Meta’s content recommendation engine, ad targeting, and even the way Reels are sequenced runs through generative models. AWS has rebuilt the underlying infrastructure of its services to support inference at scale for thousands of enterprise customers. These are not demos. These are production systems handling billions of queries per day.
The reason this matters is that it answers a question that was genuinely open two years ago: does generative AI actually improve internet services in ways users and advertisers pay for? The answer, based on what the companies operating at the largest scale have concluded, is yes. That changes the calculus for everyone downstream. If you’re a mid-size SaaS company watching Meta prove out AI-driven ad ROI, the question isn’t whether to invest in AI infrastructure. It’s how fast you can get there.
Compute Is Now the Input to Revenue — Not Just the Cost of Running It
Huang articulated a chain of causality that’s worth spelling out carefully because it reframes how you should think about IT spending. The chain is: compute → intelligence → digital workforce → revenues. His direct quote: “Every single company will need compute for revenues.”
This is a genuinely different framing from how most companies think about their infrastructure budget. Historically, compute was overhead — a cost center that kept the lights on. You bought servers to run your CRM, your ERP, your internal tools. You minimized that cost because it didn’t directly generate revenue. It just enabled people who generated revenue.
What Huang is describing is a model where compute is the revenue-generating asset. If your customer service is being handled by AI agents, the compute running those agents is directly displacing human labor costs and directly handling customer interactions. If your marketing copy, product descriptions, and ad creative are being generated by models, the compute producing those tokens is directly tied to conversion rates and revenue. The infrastructure is no longer downstream of the business — it’s inside it.
This has a direct implication for how CFOs should think about capex. It’s not a question of how much compute you need to run your operations. It’s a question of how much intelligence you can deploy and whether the revenue it generates justifies the spend. That’s a very different optimization problem.
Every Software Company Is About to Become a Token Business
Here’s the part of Huang’s GTC talk that I think is underappreciated in most coverage: his claim that “the entire software industry will be token driven.”
Tokens, for anyone who needs the quick definition: when a language model processes a request or generates a response, it does so in chunks called tokens. Every word, roughly speaking, is a token or two. Every query your users submit, every document your product summarizes, every workflow your agent executes — that’s all token consumption. And someone pays for those tokens.
Huang’s argument is that every software company ends up on one of two paths: either you run your own models and produce tokens (which requires compute), or you resell tokens from someone else’s infrastructure (which still requires compute on your supplier’s end, and creates a margin dependency for you). Either way, your business model now has tokens in it. Either way, the compute question is unavoidable.
He made this concrete: “You pick your favorite software company and I can show you exactly how they’re going to be token driven.” Think through what that means for the companies most people in enterprise software know. Salesforce is already embedding AI agents into CRM workflows. SAP is running generative AI across ERP. ServiceNow has agentic automation across IT service management. Oracle is integrating AI into its database and cloud infrastructure. These are not small experiments. And in each case, the underlying economics are converging on: how many tokens does your product consume, and what do you charge for the intelligence that generates them?
For anyone building or running a SaaS company right now, this is the strategic question: are you going to be a token producer, a token reseller, or are you going to get squeezed out by a competitor who figured out the token economics before you did?
The Inference Inflection: Why This Gets Bigger, Not Smaller
One of the more counterintuitive things Huang said at GTC was about the trajectory of NVIDIA’s growth: “Our growth is accelerating at a larger scale. That’s surprising for people.” To understand why that’s surprising — and why he thinks it’s correct — you need to understand the distinction between training and inference.
Training is what you do to build a model. You take massive datasets, run compute-intensive optimization processes for weeks or months, and produce a model. This is expensive, it happens once (or a few times), and a relatively small number of organizations do it. OpenAI trains GPT. Google trains Gemini. Anthropic trains Claude. The training cluster is enormous, but the number of training runs is finite.
Inference is everything else. Every time you type a prompt into ChatGPT, that’s inference. Every time a customer service agent handles a ticket, that’s inference. Every time Google surfaces an AI overview, that’s inference. Every token produced by every deployed model, for every user, every second of every day — that’s inference. And inference scales with adoption. As more people use more AI-powered products, inference compute — and models like NVIDIA Nemotron
