Models.do

Spotlight is a 7‑billion‑parameter vision‑language model derived from Qwen 2.5‑VL and fine‑tuned by Arcee AI for tight image‑text grounding tasks. It offers a 32 k‑token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question‑answering, and diagram‑analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock‑ups need to be interpreted on the fly. Early benchmarks show it matching or out‑scoring larger VLMs such as LLaVA‑1.6 13 B on popular VQA and POPE alignment tests.

Provider	Model ID	Context	Max Output	Input Cost	Output Cost	Throughput	Latency
Together	together	131K	66K	$0.18/M	$0.18/M	184.1 t/s	1151 ms

Provider

Model ID

Context

Max Output

Input Cost

Output Cost

Throughput

Latency

Together

together

131K

66K

$0.18/M

184.1 t/s

1151 ms

Arcee AI: Spotlight

131,072 Token Context

Advanced Coding

Agentic Workflows

Vision Capabilities

Available On

Standard Pricing

Do Work. With AI.