How to access AI models — buy directly from the source, or route through an aggregator
You sign up and pay each AI company directly — Anthropic, OpenAI, Google etc. You get the lowest possible price, direct access to new models, and full SLA/support from the provider.
A single API or platform that gives you access to models from many providers. One bill, one SDK, often with routing, fallback, and cost optimisation built in. Usually adds a small fee on top of provider cost.
| Provider | Models | Pricing model | Free tier | Best for |
|---|---|---|---|---|
|
Anthropic
anthropic.com/pricing →
|
Claude 4 family (Opus, Sonnet, Haiku) | Per token (input / output / cache) | No | Long-context reasoning, coding, writing |
|
OpenAI
openai.com/api/pricing →
|
GPT-4o, o3, o4-mini, GPT-4.1 | Per token + batch discounts (50%) | No | Broadest ecosystem, function calling, vision |
|
Google
ai.google.dev/pricing →
|
Gemini 2.0/2.5 Flash, Pro, Ultra | Per token (free tier available) | Yes — Gemini Flash free up to limits | Multimodal, large context, cost efficiency |
|
Mistral
mistral.ai →
|
Mistral Large, Small, Codestral, Embed | Per token | Yes — free tier on la Plateforme | European data residency, code, open weights |
|
xAI
x.ai/api →
|
Grok 3, Grok 3 Mini | Per token | Yes — $25/month free credits | Real-time web access, X/Twitter data |
The largest AI model router. Single API, single bill. Passes through provider pricing with a 5.5% credit purchase fee — no per-token markup once credits are loaded. Supports bring-your-own-key for zero fees.
Runs open-source models on their own GPU infrastructure (H100/H200/B200). No middleman — their prices are the provider prices. Strong for open-weight models like Llama, Qwen, and Mixtral.
Ultra-fast inference on custom LPU hardware — typically 10–20× faster than GPU-based providers. Smaller model selection focused on popular open-source models. Lowest latency option available.
Open-source inference with 200+ models, competitive token pricing, and strong batch processing discounts. Often the cheapest option for high-volume open-weight model workloads.
Workspace-style platform aimed at teams rather than developers. Subscription pricing rather than pay-per-token — covers usage up to plan limits. Good for non-technical users who want a unified UI across models.
| If you need… | Use | Why |
|---|---|---|
| Lowest cost on a specific frontier model | Direct | No markup, direct SLA, earliest model access |
| One API for many models, one bill | OpenRouter | 300+ models, 5.5% fee easily offset by convenience |
| Fastest possible inference | Groq | LPU hardware, 10–20× faster than GPU inference |
| Open-source models at scale | Fireworks AI | 200+ models, 50% batch discount, cheap GPU rental |
| Model experimentation across open weights | Together AI | Wide open-model catalog, fine-tuning support |
| Non-technical team, workspace UI | nexos.ai | Subscription, no per-token billing, team management |