Models.do

Llama-3.3-Nemotron-Super-49B-v1 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements. This allows the model to support a context length of up to 128K tokens and fit efficiently on single high-performance GPUs, such as NVIDIA H200.

Note: you must include detailed thinking on in the system prompt to enable reasoning. Please see Usage Recommendations for more.

Provider	Model ID	Context	Max Output	Input Cost	Output Cost	Throughput	Latency
Nebius	nebiusAiStudio	131K	-	$0.13/M	$0.40/M	42.7 t/s	1714 ms

Provider

Model ID

Context

Max Output

Input Cost

Output Cost

Throughput

Latency

Nebius

nebiusAiStudio

131K

$0.13/M

$0.40/M

42.7 t/s

1714 ms

NVIDIA: Llama 3.3 Nemotron Super 49B v1

131,072 Token Context

Advanced Coding

Agentic Workflows

Available On

Standard Pricing

Do Work. With AI.