Qwen: Qwen3 32B

Qwen3

Input: text

Output: text

Released: Apr 28, 2025•Updated: May 11, 2025

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.

40,960 Token Context

Process and analyze large documents and conversations.

Hybrid Reasoning

Choose between rapid responses and extended, step-by-step processing for complex tasks.

Advanced Coding

Improved capabilities in front-end development and full-stack updates.

Agentic Workflows

Autonomously navigate multi-step processes with improved reliability.

Available On

Provider	Model ID	Context	Max Output	Input Cost	Output Cost	Throughput	Latency
DeepInfra	deepInfra	41K	-	$0.10/M	$0.30/M	45.1 t/s	869 ms
Nebius AI Studio	nebiusAiStudio	41K	-	$0.10/M	$0.30/M	40.0 t/s	752 ms
NovitaAI	novitaAi	41K	41K	$0.10/M	$0.45/M	26.3 t/s	1674 ms
Parasail	parasail	41K	41K	$0.10/M	$0.50/M	48.2 t/s	929 ms
GMICloud	gmiCloud	33K	-	$0.30/M	$0.60/M	55.1 t/s	918 ms
Cerebras	cerebras	33K	-	$0.40/M	$0.80/M	2769.4 t/s	589 ms
SambaNova	sambaNova	8K	4K	$0.40/M	$0.80/M	321.1 t/s	1112 ms

Standard Pricing

Input Tokens

$0.0000001

per 1K tokens

Output Tokens

$0.0000003

per 1K tokens

Do Work. With AI.

Join Waitlist Learn more