Do Services-as-Software

Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens.

Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.

Available On

Provider	Model ID	Context	Max Output	Input Cost	Output Cost	Throughput	Latency
Lambda	lambda	1049K	1049K	$0.08/M	$0.30/M	96.5 t/s	629 ms
DeepInfra	deepInfra	328K	16K	$0.08/M	$0.30/M	31.0 t/s	480 ms
Kluster	klusterAi	131K	131K	$0.08/M	$0.45/M	78.8 t/s	782 ms
GMICloud	gmiCloud	1049K	-	$0.08/M	$0.50/M	111.3 t/s	572 ms
Parasail	parasail	158K	158K	$0.09/M	$0.48/M	106.2 t/s	444 ms
Cent-ML	centMl	1049K	1049K	$0.10/M	$0.10/M	80.1 t/s	373 ms
Novita	novitaAi	131K	131K	$0.10/M	$0.50/M	70.0 t/s	879 ms
Groq	groq	131K	8K	$0.11/M	$0.34/M	801.8 t/s	353 ms
BaseTen	baseten	1000K	131K	$0.13/M	$0.50/M	124.3 t/s	246 ms
Fireworks	fireworks	1049K	-	$0.15/M	$0.60/M	78.9 t/s	634 ms
Together	together	1049K	-	$0.18/M	$0.59/M	100.2 t/s	544 ms
Google	vertex	1311K	-	$0.25/M	$0.70/M	117.7 t/s	1850 ms
SambaNova	sambaNova	8K	4K	$0.40/M	$0.70/M	694.4 t/s	1897 ms
Cerebras	cerebras	32K	32K	$0.65/M	$0.85/M	3291.0 t/s	420 ms

Meta: Llama 4 Scout

1,048,576 Token Context

Advanced Coding

Agentic Workflows

Vision Capabilities

Available On

Do Work. With AI.