Nemotron 3 Ultra

A 550B parameter (55B active) open reasoning model from NVIDIA, built for long-running agent workflows. It uses a hybrid Mamba-Transformer MoE architecture and supports a 1M token context window.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'nvidia/nemotron-3-ultra-550b-a55b',
  prompt: 'Why is the sky blue?'
})

Overview Playground Providers Throughput Latency Uptime Status Similar

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Together AI

0.2s

214tps

$0.60/M

$3.60/M

Read:$0.2/M

Write:—

—

06/04/2026

DeepInfra

262K

0.5s

263tps

$0.50/M

$2.50/M

Read:$0.15/M

Write:—

—

06/04/2026

Blackbox

$0.37/M

$1.08/M

Read:$0.14/M

Write:—

—

06/04/2026

Baseten

0.2s

$0.60/M

$2.40/M

Read:$0.12/M

Write:—

—

06/04/2026

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Nemotron 3 Ultra

Providers