NVIDIA Nemotron 3 Super: Open Model That Beats GPT-4o

At GTC 2026, Jensen Huang announced a lot of things. Physical AI, humanoid robots, next-generation GPU platforms. The usual Jensen spectacle. But tucked inside the keynote was something that deserves more attention than it’s getting: Nemotron 3 Super, a 120B parameter open-weights model that hits 60.47% on SWE-Bench Verified — compared to GPT-OSS’s 41.90% on the same benchmark. That’s not a marginal improvement. That’s a 44% relative gap on one of the most credible coding benchmarks we have. And it runs on 12B active parameters thanks to a hybrid Mixture-of-Experts architecture, meaning the compute cost to actually use it is dramatically lower than the raw parameter count suggests. If you’re building software, evaluating open models, or thinking about where enterprise AI is headed in 2026, this one’s worth understanding properly.

What Nemotron 3 Super Actually Is

Nemotron 3 Super was announced on March 11, 2026. The headline numbers: 120B total parameters, 12B active at inference time, open weights, open training recipe. That last part matters as much as the benchmarks — NVIDIA isn’t just releasing a model, they’re releasing the recipe used to build it, which means the research community can learn from it, adapt it, and build on top of it.

The architecture has two genuinely novel elements worth paying attention to:

LatentMoE routing: A new approach to how the model selects which “experts” to activate for a given input. Traditional MoE routing decides which expert to use based on the token representation directly. LatentMoE operates in a compressed latent space, which NVIDIA claims improves both routing efficiency and the quality of expert specialization. Whether this becomes an industry standard or a NVIDIA-specific innovation remains to be seen, but it’s a real architectural contribution, not a marketing label.
Native NVFP4 pretraining: Most models are pretrained in higher-precision formats (BF16, FP8) and then quantized down for deployment. Nemotron 3 Super was pretrained natively in NVFP4 — a 4-bit floating point format. Training at low precision without sacrificing quality is a hard problem, and if NVIDIA has actually solved it cleanly here, that has significant implications for training cost and accessibility going forward.

Both of these are worth watching not just because of what they do for this model, but because they signal where NVIDIA thinks model architecture is going.

The Benchmark Numbers That Matter

Benchmarks are imperfect. Everyone in the field knows this. Andrej Karpathy has been vocal about the gap between benchmark performance and real-world usefulness, and he’s right to be skeptical. But SWE-Bench Verified is one of the more honest benchmarks we have for coding — it tests a model’s ability to resolve real GitHub issues from real open-source repositories, with verified solutions. It’s hard to game in the way that multiple-choice knowledge benchmarks can be.

Benchmark	Nemotron 3 Super	GPT-OSS-120B
SWE-Bench Verified	60.47%	41.90%
RULER @ 1M tokens	91.75%	22.30%
Inference throughput	2.2x higher	Baseline
Active parameters	12B	120B (dense)

The RULER number is striking in a different way. RULER tests long-context retrieval and reasoning — whether a model can actually use information buried deep in a long input, not just pretend to. At one million tokens, Nemotron 3 Super scores 91.75%. GPT-OSS manages 22.30%. That’s not a benchmark where you’d expect such a cliff. It suggests something meaningful about how Nemotron handles long contexts — likely a combination of architectural choices and training data — but it also raises a question worth asking: is GPT-OSS just not designed to operate at 1M token contexts effectively? The comparison is still valid for production use cases, because if you need million-token context and you’re evaluating open models, this gap is operationally relevant.

The throughput advantage — 2.2x higher than GPT-OSS-120B — follows logically from the MoE architecture. When only 12B parameters are active per forward pass instead of all 120B, you move more tokens per second per unit of compute. For anyone running inference at scale, that’s a cost multiplier that compounds fast.

Where This Fits in NVIDIA’s Bigger GTC Story

Nemotron 3 Super didn’t arrive in isolation. Jensen Huang’s GTC keynote was structured around a broader argument: that NVIDIA is building a complete software and infrastructure stack for the agentic AI era, not just selling GPUs. Several pieces of that stack are directly relevant to how Nemotron gets deployed in practice.

NemoClaw is the enterprise layer — built on OpenClaw, with added security, privacy guardrails, and policy enforcement. Notably, it’s hardware agnostic. NVIDIA is explicitly not requiring you to run on NVIDIA GPUs to use their software stack, which is a meaningful posture shift. It integrates with NVIDIA NeMo, their AI agent software suite. Think of NemoClaw as the enterprise wrapper that makes it feasible for a company like Salesforce or SAP — both announced as GTC partners — to deploy agentic systems without their security team having a breakdown.

OpenShell is NVIDIA’s open source runtime for what they’re calling “self-evolving agents and claws” — with safety and security built in. Jensen drew the Linux and Kubernetes comparison directly: “OpenClaw gave us exactly what it needed at exactly the right time… like Linux, like Kubernetes.” That’s a deliberate framing — positioning OpenClaw/OpenShell as infrastructure that the industry builds on top of, the way developers stopped arguing about operating systems and container orchestration and just used the standard thing.

NVIDIA AI-Q Blueprint is their agentic search product, which they claim tops the DeepResearch Bench accuracy leaderboard. Aravind Srinivas at Perplexity has made the case that search plus reasoning is the core use case of the current AI era — NVIDIA seems to agree, and they’re building directly into that space.

The enterprise partner list — Adobe, Atlassian, Cisco, CrowdStrike, SAP, Salesforce, ServiceNow, Siemens — tells you who NVIDIA is selling this stack to. These aren’t startups experimenting with open models. These are companies that need compliance, audit trails, security controls, and SLAs. That’s the market NemoClaw is designed for.

NVIDIA Nemotron 3 Super: The Open Model Beating GPT at Coding

What Nemotron 3 Super Actually Is

The Benchmark Numbers That Matter

Where This Fits in NVIDIA’s Bigger GTC Story

The Open

Recent Posts