Direct Providers vs Aggregators

How to access AI models — buy directly from the source, or route through an aggregator

Direct Go direct to the provider

You sign up and pay each AI company directly — Anthropic, OpenAI, Google etc. You get the lowest possible price, direct access to new models, and full SLA/support from the provider.

✓ Lowest cost — no intermediary markup
✓ Earliest access to new models
✓ Direct support and SLAs
✗ Separate account and billing per provider
✗ Different SDKs and API formats
✗ No automatic failover between providers

Aggregator Route through an aggregator

A single API or platform that gives you access to models from many providers. One bill, one SDK, often with routing, fallback, and cost optimisation built in. Usually adds a small fee on top of provider cost.

✓ One integration, hundreds of models
✓ Single bill across all providers
✓ Automatic failover and load balancing
✗ Small markup on provider prices (typically 5–10%)
✗ Slight additional latency
✗ Model availability depends on aggregator contracts

Direct Providers

Provider	Models	Pricing model	Free tier	Best for
Anthropic anthropic.com/pricing →	Claude 4 family (Opus, Sonnet, Haiku)	Per token (input / output / cache)	No	Long-context reasoning, coding, writing
OpenAI openai.com/api/pricing →	GPT-4o, o3, o4-mini, GPT-4.1	Per token + batch discounts (50%)	No	Broadest ecosystem, function calling, vision
Google ai.google.dev/pricing →	Gemini 2.0/2.5 Flash, Pro, Ultra	Per token (free tier available)	Yes — Gemini Flash free up to limits	Multimodal, large context, cost efficiency
Mistral mistral.ai →	Mistral Large, Small, Codestral, Embed	Per token	Yes — free tier on la Plateforme	European data residency, code, open weights
xAI x.ai/api →	Grok 3, Grok 3 Mini	Per token	Yes — $25/month free credits	Real-time web access, X/Twitter data

Aggregators & Inference Routers

OpenRouter Aggregator

openrouter.ai →

300+

The largest AI model router. Single API, single bill. Passes through provider pricing with a 5.5% credit purchase fee — no per-token markup once credits are loaded. Supports bring-your-own-key for zero fees.

Fee5.5% on credit purchase (5.0% crypto)

Free tierYes — BYOK for 1M req/month free

Volume discounts3–7% off at $1k/$5k/$10k/$20k/mo

Model coverageAll major providers + open source

Together AI Inference

together.ai/pricing →

100+

Runs open-source models on their own GPU infrastructure (H100/H200/B200). No middleman — their prices are the provider prices. Strong for open-weight models like Llama, Qwen, and Mixtral.

FeeNone — direct infrastructure pricing

Free tier$1 free credits on signup

Token range$0.05–$9.00 per 1M tokens

GPU rentalH100 $3.99/hr · H200 $5.49/hr

Groq Inference

groq.com/pricing →

15+

Ultra-fast inference on custom LPU hardware — typically 10–20× faster than GPU-based providers. Smaller model selection focused on popular open-source models. Lowest latency option available.

FeeNone — own hardware

Free tierYes — generous rate limits

Best priceLlama 3.3 70B from $0.59/1M

StandoutFastest inference available

Fireworks AI Inference

fireworks.ai/pricing →

200+

Open-source inference with 200+ models, competitive token pricing, and strong batch processing discounts. Often the cheapest option for high-volume open-weight model workloads.

FeeNone — own infrastructure

Free tier$1 free credits on signup

Batch discount50% off on batch jobs

GPU on-demandA100 from $2.90/hr

nexos.ai Subscription

nexos.ai →

200+

Workspace-style platform aimed at teams rather than developers. Subscription pricing rather than pay-per-token — covers usage up to plan limits. Good for non-technical users who want a unified UI across models.

Fee modelSubscription, not per-token

Free tier7-day trial, no card required

Pro plan€25/user/month (€20 annual)

EnterpriseCustom pricing

Which to choose

If you need…	Use	Why
Lowest cost on a specific frontier model	Direct	No markup, direct SLA, earliest model access
One API for many models, one bill	OpenRouter	300+ models, 5.5% fee easily offset by convenience
Fastest possible inference	Groq	LPU hardware, 10–20× faster than GPU inference
Open-source models at scale	Fireworks AI	200+ models, 50% batch discount, cheap GPU rental
Model experimentation across open weights	Together AI	Wide open-model catalog, fine-tuning support
Non-technical team, workspace UI	nexos.ai	Subscription, no per-token billing, team management

Pricing correct as of June 2026. Always verify on the provider's website before committing to a plan.