Jensen Huang walked onto the GTC 2026 stage in his usual leather jacket and delivered what might be the clearest articulation of NVIDIA’s long-term bet: the physical world is the next GPU. Not data centers. Not software. Actual robots, factories, autonomous vehicles, and industrial systems that perceive, reason, and act in the real world. The company that built the infrastructure for the large language model era is now positioning itself as the foundational layer for what it calls Physical AI — and at GTC this March, it showed up with a lot more than slide decks.
This isn’t just about robots being cool. It’s about a specific thesis: that the bottleneck for the next wave of AI value creation isn’t compute or models — it’s the physical world’s integration with intelligent systems. NVIDIA is placing a very large, very deliberate bet that it can own that integration layer the same way it came to own GPU compute. Whether that plays out is genuinely uncertain. But the pieces it announced at GTC 2026 are worth understanding in detail, because they reveal how serious and how infrastructural this push actually is.
What Physical AI Actually Means (And Why NVIDIA Is Saying It Now)
The term “Physical AI” as NVIDIA uses it refers to AI systems that perceive, process, and act within physical environments — not just generating text or analyzing images, but controlling robots, managing factory floors, coordinating autonomous systems in real time. It’s the difference between a model that can describe how to weld a joint and a model that can guide a robotic arm to do it.
NVIDIA’s angle is infrastructure. It’s not building the robots itself (mostly). It’s building the models, simulation environments, and deployment runtimes that robot makers, industrial companies, and OEMs sit on top of. Two announcements at GTC 2026 represent the clearest expression of this strategy: the Cosmos models and the GR00T open models for humanoid robots.
Cosmos is a suite of world foundation models — models trained to understand and simulate physical environments. Think of it as a way for robotics developers to train and test AI systems in simulated physical scenarios before deploying them in the real world, dramatically compressing development cycles. GR00T (Generalist Robot 00 Technology, if you’re wondering) is an open model family specifically aimed at humanoid robot developers. The “open” part matters here: NVIDIA is making a deliberate Linux-style play, lowering the barrier to entry for robot AI development to pull the ecosystem toward its hardware and simulation stack.
Jensen Huang at GTC explicitly drew that comparison when describing OpenClaw, the open source runtime underlying their new enterprise agent platform: “OpenClaw gave us exactly what it needed at exactly the right time… like Linux, like Kubernetes.” That’s not accidental phrasing. NVIDIA understands what it means to become critical infrastructure.
NemoClaw and the Enterprise Agent Stack
Before getting deeper into robots, it’s worth unpacking one of the most consequential announcements at GTC that isn’t getting enough attention: NemoClaw.
NemoClaw is NVIDIA’s enterprise-grade AI agent platform, built on top of OpenClaw. If OpenClaw is the open runtime — the Linux layer — then NemoClaw is the Red Hat: hardened, enterprise-ready, with security guardrails, privacy enforcement, and policy controls that large companies actually need before deploying agents in production. It integrates with NVIDIA NeMo, the broader AI agent software suite, giving enterprises a coherent stack from model to deployment.
What’s notable is what NemoClaw doesn’t require: NVIDIA GPUs. The platform is hardware agnostic. That’s a significant strategic move. NVIDIA is essentially saying: even if you’re running on AMD, Intel, or cloud-provider silicon, we want NemoClaw to be the agentic infrastructure layer. The GPU lock-in play is real, but NVIDIA is smart enough to know that mandating hardware kills enterprise adoption. You get them on the software, and the hardware follows.
The partner list at GTC makes the enterprise ambition clear: Adobe, Atlassian, Cisco, CrowdStrike, SAP, Salesforce, ServiceNow, and Siemens were all on stage or listed as integration partners. These aren’t hobbyist integrations — these are the companies running the operational backbone of large organizations. When SAP and ServiceNow are building on your agent runtime, you’re not a demo company anymore.
Also announced: the NVIDIA AI-Q Blueprint, an agentic search capability that NVIDIA claims tops the DeepResearch Bench accuracy leaderboards. The Agent Toolkit rounds things out as an open source collection of models and software for enterprise agent builders. Between OpenShell (the open source runtime for self-evolving agents), NemoClaw, AI-Q, and the Agent Toolkit, NVIDIA is assembling a full-stack answer to the question: “How do enterprises actually deploy agents safely at scale?”
Nemotron 3 Super: The Model That Quietly Outperforms
Buried under the robotics announcements was a model release that deserves a closer look. Nemotron 3 Super, announced March 11, 2026, is a 120 billion total parameter model with only 12 billion active parameters at inference time. It uses a hybrid Mixture-of-Experts architecture — specifically with a novel routing approach called LatentMoE — and was pretrained natively in NVFP4, a low-precision format that significantly reduces memory and compute requirements without the typical accuracy tradeoffs.
The benchmark numbers are worth stating plainly:
| Benchmark | Nemotron 3 Super | GPT-OSS-120B |
|---|---|---|
| SWE-Bench Verified (coding) | 60.47% | 41.90% |
| RULER at 1M tokens (long context) | 91.75% | 22.30% |
| Throughput vs GPT-OSS-120B | 2.2x higher | Baseline |
The long-context number is the one that stands out most. 91.75% on RULER at one million tokens versus 22.30% for GPT-OSS at the same parameter count is not a marginal improvement — that’s a qualitatively different capability class. For applications that need to reason over long documents, codebases, or extended agentic task histories, that gap is operationally significant.
The model is open weights with an open training recipe, which means the research community can build on it, fine-tune it, and audit it. Combined with the 2.2x throughput advantage, this positions Nemotron 3 Super as a serious enterprise deployment option — not just a benchmark trophy. The LatentMoE routing and native NVFP4 pretraining are the technical innovations driving this; both are novel enough that they’ll likely influence how other labs approach efficiency-focused MoE design in the coming months.
The Vera Rubin Platform: Betting on Trillion-Parameter Models
NVIDIA also officially detailed the Vera Rubin platform at GTC 2026, centered on H300
How to Actually Access NVIDIA Cosmos Models Today
Cosmos isn’t vaporware. The models are available right now through NVIDIA’s NGC catalog and Hugging Face, and you don’t need special enterprise access to start experimenting. Here’s exactly how to get in.
Step 1: Choose Your Access Path
There are two realistic entry points depending on what you want to do:
- Hugging Face: NVIDIA hosts Cosmos model weights at
nvidia/Cosmos-1.0-Diffusion-7B-Video2Worldand related repos on Hugging Face. You can pull these directly if you accept the license terms. Search “nvidia cosmos” on Hugging Face and you’ll find the full model family. - NVIDIA NGC Catalog: Go to
catalog.ngc.nvidia.com, search “Cosmos,” and you’ll find containerized versions ready to run on NVIDIA hardware. This is the cleaner path if you’re deploying on an A100, H100, or even a high-end RTX 4090.
Step 2: Understand What You Actually Need
Cosmos is not a lightweight model. Be honest with yourself about the hardware requirements before you start:
- The 7B diffusion model needs at minimum a single A100 80GB for reasonable inference speed. On an RTX 4090 (24GB), you can run smaller variants with quantization, but generation will be slow.
- The 14B variants require multi-GPU setups or cloud instances. NVIDIA’s own recommendation is H100 SXM for production workloads.
- If you don’t have the hardware, the fastest honest path is a cloud instance — Lambda Labs, CoreWeave, or AWS p4de nodes all work. Budget roughly $3–8 per hour depending on the instance.
Step 3: What You Can Actually Do With It
Cosmos models do one core thing: given a real or simulated scene, they generate physically plausible continuations of that scene. In practice, this means:
- Synthetic training data generation: You feed in a starting frame or prompt describing a physical scenario — a robot arm reaching for an object on a conveyor belt — and Cosmos generates video of that scenario playing out. That video becomes labeled training data for your robot policy without requiring physical trials.
- World state prediction: You can use Cosmos as a forward model — given the current state of an environment, predict what happens next under different actions. This is the core loop for model-based reinforcement learning in robotics.
- Simulation-to-real transfer testing: Generate synthetic scenarios that stress-test edge cases your real-world dataset doesn’t cover.
Step 4: A Minimal Working Starting Point
If you want to run something this week, here’s the shortest path:
- Create an account on Hugging Face if you don’t have one.
- Accept the Cosmos model license at
huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Video2World. - Clone the NVIDIA Cosmos GitHub repo at
github.com/NVIDIA/Cosmos— it includes inference scripts and environment setup instructions. - Follow the README to set up the Conda environment. The dependency list is specific; don’t skip the CUDA version requirements.
- Run the provided sample inference script with the default prompt first before trying custom scenarios. This confirms your environment is working before you start debugging your own inputs.
NVIDIA’s Cosmos GitHub repo also links to a set of example prompts designed for robotics scenarios. Start there rather than writing your own from scratch — the prompt structure for physically coherent outputs is less forgiving than typical image generation models.
Using Isaac Lab for Robot Simulation: A Real Starting Point
Isaac Lab is NVIDIA’s open-source robot learning framework built on top of Isaac Sim. It’s the practical layer where most robotics developers will actually spend their time — training robot policies in simulation before touching real hardware. It’s genuinely usable today, though with real rough edges worth knowing about upfront.
What Isaac Lab Is and Isn’t
Isaac Lab is not a drag-and-drop robot builder. It’s a Python-based framework for defining robot environments, reward functions, and training loops using reinforcement learning. It integrates with popular RL libraries like RSL-RL and RL Games, and it uses Isaac Sim as the underlying physics engine — which means GPU-accelerated physics simulation, which is the actual point. You can run thousands of parallel simulation instances on a single H100, compressing weeks of training into hours.
What it isn’t: a finished product. Documentation has gaps. Some features listed in the README are still being actively developed. Plan for setup friction.
Getting Started With Isaac Lab
- Check prerequisites first. Isaac Lab requires Isaac Sim 4.x, which itself requires an NVIDIA GPU (RTX 3070 minimum, RTX 4090 or A100 recommended for serious training), Ubuntu 20.04 or 22.04, and specific CUDA versions. Windows support exists but the Linux path is significantly smoother.
- Install Isaac Sim. Get it through the NVIDIA Omniverse Launcher or directly via pip using the Isaac Sim Python package. The pip path (
pip install isaacsim) is newer and generally cleaner for headless training workflows. - Clone the Isaac Lab repo. It’s at
github.com/isaac-sim/IsaacLab. Run the install script (./isaaclab.sh --install), which sets up the conda environment and links to your Isaac Sim installation. - Run a reference task first. Isaac Lab ships with pre-built environments for standard tasks — quadruped locomotion, manipulator reach tasks, cartpole balancing. Run one of these before building anything custom. The command looks like:
python scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Ant-v0 --headless. The--headlessflag skips the GUI and runs significantly faster. - Understand the task definition structure. Every environment in Isaac Lab is defined by a Python config class specifying the robot URDF or USD asset, the observation space, the action space, and the reward function. The repo’s source directory contains dozens of working examples to dissect. The cartpole and reach tasks are the clearest starting templates.
What a Real Training Loop Looks Like
Say you want to train a robot arm to pick up an object. In Isaac Lab, you define a task config that specifies the Franka Panda arm asset (included in the library), the object’s initial position distribution, an observation vector that includes joint positions, end-effector pose, and object pose, and a reward function that gives positive signal for end-effector proximity to the object and a bonus for successful grasp. You then run that environment with 4,096 parallel instances on a single A100. A basic reach policy converges in under an hour of wall-clock time. A manipulation policy with grasping typically needs 4–8 hours depending on reward shaping quality.
The GR00T N1 open model from NVIDIA is designed to slot into this workflow as a pre-trained backbone — rather than training from scratch, you fine-tune GR00T on your specific task using Isaac Lab-generated data. The GR00T model weights and fine-tuning code are available at github.com/NVIDIA/Isaac-GR00T, with documentation covering the data format expected and example fine-tuning scripts.
Honest Assessment: What’s Ready vs. What’s Not
| Capability | Status Today |
|---|
