Do Services-as-Software

Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math, coding, and logical inference, and "non-thinking" mode for general conversation. The model is fine-tuned for instruction-following, agent integration, creative writing, and multilingual use across 100+ languages and dialects. It natively supports a 32K token context window and can extend to 131K tokens with YaRN scaling.

Provider	Model ID	Context	Max Output	Input Cost	Output Cost	Throughput	Latency
Novita	novitaAi	128K	20K	$0.04/M	$0.14/M	28.8 t/s	1034 ms

Provider

Model ID

Context

Max Output

Input Cost

Output Cost

Throughput

Latency

Novita

novitaAi

128K

20K

$0.04/M

$0.14/M

28.8 t/s

1034 ms

Qwen: Qwen3 8B

128,000 Token Context

Hybrid Reasoning

Advanced Coding

Agentic Workflows

Available On

Do Work. With AI.