Do Services-as-Software

Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance.

Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

Available On

Provider	Model ID	Context	Max Output	Input Cost	Output Cost	Throughput	Latency
DeepInfra	deepInfra	41K	41K	$0.08/M	$0.29/M	107.1 t/s	624 ms
InferenceNet	inferenceNet	16K	16K	$0.08/M	$0.29/M	16.0 t/s	999 ms
Parasail	parasail	41K	41K	$0.09/M	$0.50/M	155.0 t/s	386 ms
Nebius	nebiusAiStudio	41K	-	$0.10/M	$0.30/M	111.8 t/s	513 ms
Novita	novitaAi	41K	20K	$0.10/M	$0.45/M	158.9 t/s	685 ms
Fireworks	fireworks	40K	-	$0.15/M	$0.60/M	124.0 t/s	900 ms

Qwen: Qwen3 30B A3B

40,960 Token Context

Hybrid Reasoning

Advanced Coding

Agentic Workflows

Available On

Do Work. With AI.