Models & Providers
21st Agents supports multiple AI models from different providers. Use Anthropic Claude natively or connect any model through OpenRouter for maximum flexibility.
Which runtimes are supported natively?
21st Agents ships with two native runtimes that work out of the box:
Anthropic · Claude Code
Full-featured agentic runtime with tool use, web access, and sandboxed execution. Models: Claude Opus 4.6, Sonnet 4.6, Haiku 4.5.
OpenAI · Codex Soon
OpenAI's optimized runtime with multiple reasoning levels. Models: GPT-5.2 Codex, GPT-5.2 Codex (High), GPT-5.1 Codex Mini.
Can I use models beyond Anthropic and OpenAI?
Yes. Through OpenRouter, you can route your agent to hundreds of models from Google, Meta, DeepSeek, Alibaba, Zhipu AI, Mistral, and other providers — all through a single, unified API. OpenRouter handles authentication, billing, and failover across providers so you don't have to manage multiple API keys.
This is especially useful when you want to:
- Use cost-efficient open-weight models like DeepSeek V3.2 or Llama 4 for high-volume, lower-complexity tasks
- Access the largest context windows (Gemini 3.1 Pro at 1M tokens, Llama 4 Scout at 10M tokens)
- Experiment with specialized models like GLM-5 for agentic workloads or Qwen 3.5 for multilingual support across 119 languages
- Set up automatic fallback chains — if one provider is down, OpenRouter routes to an alternative
How does model switching work?
You can let end users switch between models at runtime without redeploying your agent. Enable “Allow model switching” in the agent config and select which models to expose. This is useful when you want users to choose between faster (cheaper) and more capable (pricier) options.
For example, you might expose Claude Haiku 4.5 for quick questions and Claude Opus 4.6 for deep analysis — or let users pick between DeepSeek V3.2 for cost efficiency and Gemini 3.1 Pro for maximum context.
Popular models available through 21st Agents
The table below lists popular models you can use. Native runtimes work out of the box; all others are available via OpenRouter integration.
| Provider | Model | Context | Best for |
|---|---|---|---|
| Anthropic | Claude Opus 4.6anthropic/claude-opus-4-6 | 200K | Top-tier reasoning, complex multi-step tasks, code generation |
Claude Sonnet 4.6anthropic/claude-sonnet-4-6 | 200K | Best balance of speed, cost, and intelligence | |
Claude Haiku 4.5anthropic/claude-haiku-4-5-20251001 | 200K | Fastest and cheapest, great for simple tasks | |
Gemini 3.1 Progoogle/gemini-3.1-pro | 1M | Highest benchmark scores, massive context window | |
Gemini 3.0 Flashgoogle/gemini-3.0-flash | 1M | Fast and affordable with million-token context | |
| DeepSeek | DeepSeek V3.2deepseek/deepseek-v3.2 | 128K | Near-frontier performance at 10x lower cost, strong reasoning |
DeepSeek R1deepseek/deepseek-r1 | 128K | Advanced chain-of-thought reasoning | |
| Meta | Llama 4 Maverickmeta-llama/llama-4-maverick | 256K | Open-weight, strong coding and multilingual |
Llama 4 Scoutmeta-llama/llama-4-scout | 10M | Largest context window available, open-weight | |
| Alibaba | Qwen 3.5 Maxqwen/qwen-3.5-max | 128K | 119 languages, competitive with frontier models |
Qwen 3.5 Coderqwen/qwen-3.5-coder | 128K | Optimized for code generation and analysis | |
| Zhipu AI | GLM-5zhipu/glm-5 | 128K | Top open-weight model, strong agentic and SWE tasks |
GLM-4.7 Flashzhipu/glm-4.7-flash | 128K | Lightweight MoE, efficient for local coding tasks | |
| OpenAI | GPT-5.4openai/gpt-5.4 | 128K | Frontier reasoning and analysis |
GPT-5.2 Codexopenai/gpt-5.2-codex | 128K | Optimized for code, multiple reasoning levels |
How do I choose the right model?
For complex agentic tasks — use Claude Opus 4.6 or GPT-5.4. These models excel at multi-step reasoning, tool use, and long-horizon tasks where accuracy matters more than cost.
For balanced production workloads — Claude Sonnet 4.6 or Gemini 3.1 Pro offer the best price-to-performance ratio. Sonnet is the default for most 21st agents.
For high-volume, cost-sensitive tasks — DeepSeek V3.2 delivers near-frontier performance at roughly 10x lower cost than Claude Opus. Qwen 3.5 and Llama 4 are strong open-weight alternatives.
For massive context windows — Gemini 3.1 Pro (1M tokens) or Llama 4 Scout (10M tokens) can process entire codebases, legal documents, or research corpora in a single call.
For multilingual applications — Qwen 3.5 supports 119 languages with strong performance across all of them. GLM-5 is another strong option for Chinese and multilingual workloads.
For quick, simple tasks — Claude Haiku 4.5 or GLM-4.7 Flash are the fastest and cheapest options while still being highly capable.
How does pricing work with different models?
When using native runtimes (Anthropic Claude Code), pricing is based on your 21st Agents plan. When routing through OpenRouter, you pay OpenRouter's per-token rates for the chosen model — these vary significantly between providers. DeepSeek and Qwen models are among the most affordable, while frontier models like Claude Opus 4.6 and GPT-5.4 cost more per token but deliver higher quality.
You can set a maxBudgetUsd per agent run to control costs regardless of which model you use.