Skip to content

Nemotron 3 Ultra

A 550B parameter (55B active) open reasoning model from NVIDIA, built for long-running agent workflows. It uses a hybrid Mamba-Transformer MoE architecture and supports a 1M token context window.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'nvidia/nemotron-3-ultra-550b-a55b',
prompt: 'Why is the sky blue?'
})

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Together AI
1M
0.2s
214tps
$0.60/M$3.60/M
Read:$0.2/M
Write:
+1
06/04/2026
DeepInfra
262K
0.5s
263tps
$0.50/M$2.50/M
Read:$0.15/M
Write:
+1
06/04/2026
Blackbox
1M
$0.37/M$1.08/M
Read:$0.14/M
Write:
+1
06/04/2026
Baseten
1M
0.2s
$0.60/M$2.40/M
Read:$0.12/M
Write:
+1
06/04/2026