Mistral: Mistral Small 3

Mistral

Input: text

Output: text

Released: Jan 30, 2025•Updated: Mar 28, 2025

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.

The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.

32,768 Token Context

Process and analyze large documents and conversations.

Advanced Coding

Improved capabilities in front-end development and full-stack updates.

Agentic Workflows

Autonomously navigate multi-step processes with improved reliability.

Available On

Provider	Model ID	Context	Max Output	Input Cost	Output Cost	Throughput	Latency
Kluster	klusterAi	33K	33K	$0.05/M	$0.10/M	64.5 t/s	1166 ms
DeepInfra	deepInfra	33K	16K	$0.05/M	$0.10/M	78.1 t/s	273 ms
Enfer	enfer	28K	28K	$0.06/M	$0.12/M	31.7 t/s	1382 ms
NextBit	nextBit	33K	-	$0.07/M	$0.25/M	33.7 t/s	1608 ms
Mistral	mistral	33K	-	$0.10/M	$0.30/M	129.0 t/s	386 ms
Ubicloud	ubicloud	33K	33K	$0.30/M	$0.30/M	32.9 t/s	1026 ms
Together	together	33K	2K	$0.80/M	$0.80/M	-	-

Standard Pricing

Input Tokens

per 1M tokens

$0.05

Output Tokens

per 1M tokens

$0.10

Do Work. With AI.

Join Waitlist Learn more