Google I/O 2026 Proved What $190 Billion a Year in AI Capex Actually Buys

Google processed 3.2 quadrillion tokens in April 2026. That figure represents a 7x increase from 480 trillion tokens per month one year ago and roughly 330x the number Sundar Pichai cited at I/O in May 2024 (9.7 trillion). On May 19, Pichai opened [Google I/O 2026](https://blog.google/innovation-and-ai/sundar-pichai-io-2026/) by declaring the company “firmly in our agentic Gemini era,” then spent two hours proving it: a new frontier model priced below every competitor, a background agent that works while your laptop is closed, custom silicon split into two separate chips for the first time, and a cloud business that nearly doubled its backlog to $462 billion in a single quarter.

No other AI company announced across all four layers (silicon, models, agents, distribution) in the same week. Here is what each layer means for the industry.

## The Fastest Frontier Model Is Also the Cheapest

[Gemini 3.5 Flash](https://www.latent.space/p/ainews-google-io-2026-gemini-35-flash), generally available immediately, is Google’s new default model across every product surface: Search, the Gemini app, AI Studio, Android Studio, and the Gemini API. The specifications combine speed, context, and cost in a way no competitor currently matches.

Gemini 3.5 Flash outperforms Gemini 3.1 Pro across nearly all benchmarks, scores 76.2% on Terminal-Bench 2.1, rates 1,656 Elo on GDPval-AA, and hits 83.6% on MCP Atlas. Output speed exceeds 280 tokens per second in standard mode. An Antigravity-optimized configuration reaches 867 tokens per second, roughly 12x faster than comparable frontier models.

The model offers a 1 million token context window, a 65,000 token maximum output, four configurable thinking levels (minimal, low, medium, high), and “thought preservation” that maintains reasoning chains across multi-turn conversations.

Pricing: $1.50 per million input tokens, $9.00 per million output tokens, with a 90% discount on cached input. Google framed this as “less than half the price of comparable frontier alternatives.” Those numbers undercut both [GPT-5.5](https://airisingtrends.com/gpt-5-5-openai-agent-model/) and Claude Opus 4.7 on a per-token basis while matching or exceeding most benchmark scores. Google’s own internal usage reflects the economics: the company processes over 3 trillion tokens daily through Flash alone, up from 500 billion in March. Google estimates enterprises shifting 80% of workloads to Flash could save over $1 billion annually.

One caveat from third-party assessment firm Artificial Analysis: Flash’s hallucination rate sits at 61%, a 31-point regression from the prior generation. That matters for enterprise buyers who need factual precision over raw speed.

Gemini 3.5 Pro remains in internal testing. Pichai confirmed it will be publicly available next month.

Google also launched **Gemini Omni**, a multimodal generation model that accepts any combination of text, images, audio, and video as input and produces video output grounded in what Google calls “real-world knowledge.” Unlike its predecessor Veo 3 (text to video only), Omni handles cross-modal creation: editing characters within scenes, altering actions with natural language, and preserving consistency across multi-turn sessions. Omni integrates with Google Flow, YouTube Shorts, and the Gemini app, with developer APIs rolling out in the coming weeks.

## Agents That Run While Your Laptop Is Closed

Gemini Spark, the second headline announcement, is Google’s entry into always-on autonomous agents. Spark runs on dedicated Google Cloud virtual machines, which means it can execute long-running tasks, check calendars, draft emails, and monitor information across connected apps even when the user’s device is powered off.

Google’s approach differs from [Anthropic’s dreaming framework](https://airisingtrends.com/anthropic-dreaming-ai-agents/) in one structural way: Spark’s background execution is tied entirely to Google’s infrastructure rather than relying on client-side compute. The tradeoff is vendor lock-in. The advantage is persistent availability with no dependency on the user’s hardware.

Spark enters beta with trusted testers this week, expanding to Google AI Ultra subscribers in the U.S. next week. Google restructured its subscription tiers for the occasion: a new $100/month plan sits below the Ultra tier, which dropped from $250 to $200/month. Third-party app integration via Model Context Protocol (MCP) is expected over the summer. An Android notification layer called “Halo” will surface live task tracking later in 2026.

Beyond Spark, Google announced two other agent products at I/O. “Daily Brief” is an out-of-the-box agent in the Gemini app that synthesizes a user’s inbox, calendar, and tasks into a prioritized morning summary with suggested next steps. “Information Agents” in Search provide persistent background monitoring of web, news, and social signals on topics the user specifies, delivering synthesized updates with actionable links. Both roll out to paid subscribers this summer.

Alongside these consumer agents, Google launched Antigravity 2.0, a standalone desktop application for multi-agent orchestration aimed at developers. The demo was designed to grab headlines: 93 parallel sub-agents built a functioning operating system in 12 hours, consuming 2.6 billion tokens and over 15,000 model requests for under $1,000 in API credits. Whether that translates to real enterprise workflows is a separate question, but the cost and speed numbers are striking.

## Custom Silicon, Split in Two

Google has manufactured its own TPUs for years. The [eighth generation](https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/) introduces a fundamental architectural change: two separate chips, each optimized for a different workload.

TPU 8t (training) packs 9,600 chips into a single superpod delivering 121 exaflops of compute and two petabytes of shared memory. That is nearly 3x the raw computing power of the previous generation. Using JAX and Pathways, Google can now distribute training runs across more than one million TPUs globally.

TPU 8i (inference) triples on-chip SRAM to 384 MB and increases high-bandwidth memory to 288 GB, hosting massive key-value caches entirely on silicon. Google claims 80% better performance per dollar for inference and 2x better performance per watt compared to the prior generation.

The bifurcation matters. Training needs raw FLOPs; inference needs memory bandwidth and energy efficiency. By splitting the chip line, Google can optimize each without compromise. NVIDIA’s H200 and B200 still serve both roles. [OpenAI’s $20 billion Cerebras deal](https://airisingtrends.com/openai-cerebras-deal-inference-chips/) signals that competitors are reaching the same conclusion: inference deserves dedicated silicon.

## 900 Million Users and a $462 Billion Cloud Backlog

Distribution is where Google’s position becomes impossible for any competitor to replicate on a comparable timeline.

The Gemini app now reaches 900 million monthly active users, doubled from 400 million a year ago. Daily requests grew 7x. AI Overviews in Search serve 2.5 billion monthly active users, and the newer AI Mode in Search hit 1 billion monthly users within its first year. Across all products, Google has 13 services with more than one billion users. Five of those exceed three billion.

On the cloud side, [Q1 2026 revenue reached $20 billion](https://www.constellationr.com/insights/news/google-cloud-revenue-growth-hits-63-q1-20-billion) (63% year-over-year growth), outpacing Azure at roughly 30% and AWS at roughly 28%. The cloud backlog nearly doubled quarter over quarter to $462 billion. GenAI-based cloud products grew nearly 800% year over year. Operating margin expanded to 32.9%.

Pichai acknowledged on the earnings call that cloud revenue would have been higher if Google had more compute capacity. Mizuho analyst Lloyd Walmsley raised his Alphabet price target to $460 following the results, [noting](https://www.benzinga.com/Opinion/26/05/52676168/googles-gemini-push-at-i-o-2026-forces-a-new-battle-over-agentic-ai) that “consensus estimates continue to significantly under-model Google Cloud revenue and operating income potential.” A supply-constrained growth story with a nearly half-trillion-dollar backlog is exactly the kind of dynamic that compounds.

## The Only Company That Owns Every Layer

The strategic picture from I/O 2026 is not about any single product. It is about the combination.

Google now controls custom silicon (TPU 8t/8i), frontier models (Gemini 3.5 Flash, Omni, and 3.5 Pro coming next month), autonomous agents (Spark, Antigravity 2.0, Information Agents), and massive consumer distribution (900 million Gemini users, [3 billion Android devices](https://airisingtrends.com/google-gemini-intelligence-android-agent/), 2.5 billion Search users). No other company owns all four layers.

[Anthropic](https://airisingtrends.com/anthropic-900-billion-valuation-openai-revenue/) builds models and agents but relies on AWS and Google Cloud for compute, with no consumer distribution platform. OpenAI has ChatGPT’s user base but depends on a multicloud infrastructure strategy and manufactures no custom silicon. [Meta’s $145 billion GPU pivot](https://airisingtrends.com/meta-ai-layoffs-145-billion-gpu-pivot/) gives it models and distribution through Instagram and WhatsApp but no cloud business and no custom inference chips in production. NVIDIA owns the silicon layer but builds neither models nor consumer products.

Google’s $180 to $190 billion in 2026 capex (a 6x increase from $31 billion in 2022) funds this full-stack position. The 8.5 million developers building on Google models monthly, the 375+ Cloud customers each processing over one trillion tokens in the past year, and the consumer products that double as distribution channels for Gemini all feed into the same flywheel: more usage generates more training data, better models attract more users, and cheaper inference expands the addressable market.

One additional signal from I/O worth noting: OpenAI, Kakao, and ElevenLabs are now adopting SynthID, Google’s watermarking technology for AI-generated content. When your competitors voluntarily adopt your standard, you are no longer just competing on products. You are setting infrastructure.

The risk for Google is execution complexity. Owning every layer means competing on every layer simultaneously. But after I/O 2026, arguing that Google is behind in AI requires ignoring the largest infrastructure build in corporate history, the fastest frontier model per dollar, the most widely distributed AI product suite on earth, and a cloud business growing faster than both of its largest competitors combined.

At $190 billion a year, Google is not placing a bet on AI. It is making AI the entire company.

Recent Posts