Llama 3.3 70B Versatile
Free131.072kllama-3.3-70b-versatile14,400 req/day · 30 RPM · 12k TPM
Workhorse generalist. The most reliable free Groq model.
Model catalogue
Bring keys for any of these 17 providers — 40 free-tier models plus the paid flagships from OpenAI, Anthropic and xAI — and sarmalink will route, retry and failover for you. Free keys are always tried first; paid keys are only used when you add them.
40
Free models
17
Providers
0
Markup tokens
LPU inference, the easiest free key to start with. · 14,400 requests/day, 30 RPM
llama-3.3-70b-versatile14,400 req/day · 30 RPM · 12k TPM
Workhorse generalist. The most reliable free Groq model.
llama-3.1-8b-instant14,400 req/day · 30 RPM
Sub-second answers for cheap workflows.
meta-llama/llama-4-scout-17b-16e-instruct14,400 req/day · 30 RPM
Vision-capable MoE — image upload roadmap pending.
qwen/qwen3-32b14,400 req/day · 30 RPM
deepseek-r1-distill-llama-70b1,000 req/day · 30 RPM
Cheap chain-of-thought without the latency tax.
openai/gpt-oss-120b1,000 req/day · 30 RPM
moonshotai/kimi-k2-instruct1,000 req/day · 30 RPM
DeepSeek V3.2 frontier model on RDU hardware. · Frontier model, generous daily quota
DeepSeek-V3.1Free tier, generous daily quota
DeepSeek-R1Free tier, lower daily quota
Meta-Llama-3.3-70B-InstructFree tier
WSE-3 wafer-scale inference at extreme speed. · 1M tokens/day free
qwen-3-coder-480b1M tokens/day free · ~2,400 tokens/sec
Wafer-scale silicon: editor-level autocomplete.
llama3.3-70b1M tokens/day free
Flash, Pro, and Google Search grounding for Live mode. · Flash + Pro, Search grounding
gemini-2.5-flash500 req/day · 15 RPM (free)
Best free big-context option with Google Search grounding.
gemini-2.5-flash-lite1,000 req/day · 30 RPM (free)
gemini-2.5-pro50 req/day · 5 RPM (free)
Aggregator with 17+ models behind one key. · 17+ models, free variants
deepseek/deepseek-chat-v3.1:free50 req/day · 20 RPM
Frontier open-weight, completely free on OpenRouter.
deepseek/deepseek-r1:free50 req/day · 20 RPM
qwen/qwen3-coder:free50 req/day · 20 RPM
qwen/qwen3-235b-a22b:free50 req/day · 20 RPM
meta-llama/llama-3.3-70b-instruct:free50 req/day · 20 RPM
google/gemini-2.0-flash-exp:free50 req/day · 20 RPM
microsoft/mai-ds-r1:free50 req/day · 20 RPM
moonshotai/kimi-k2:free50 req/day · 20 RPM
z-ai/glm-4.5-air:free50 req/day · 20 RPM
nvidia/nemotron-nano-9b-v2:free50 req/day · 20 RPM
mistralai/mistral-small-3.2-24b-instruct:free50 req/day · 20 RPM
Llama-Nemotron, Nemotron Mini, plus a Mixture of frontier open weights. · 1000 free credits at sign-up
nvidia/llama-3.1-nemotron-70b-instruct1,000 free credits at sign-up
meta/llama-3.3-70b-instruct1,000 free credits at sign-up
qwen/qwen2.5-coder-32b-instruct1,000 free credits
DeepSeek V3.2 + R1 reasoner. Best £/token on the market. · Pay-as-you-go, very cheap
deepseek-chatPay-as-you-go, ≈ $0.27/M input
deepseek-reasonerPay-as-you-go, ≈ $0.55/M input
Qwen3 Max, Qwen-Coder, Qwen-VL via Alibaba DashScope. · Free tier on DashScope
qwen3-max1M tokens free trial
qwen3-coder-plus1M tokens free trial
qwen-vl-max1M tokens free trial
Kimi K2 with 256k context — long-doc and agent workloads. · Free trial credits
kimi-k2Free trial credits
kimi-k2-thinkingFree trial credits
GLM-4.6 + GLM-Z1 reasoner. Strong agentic tool-use. · Generous free quota
glm-4-plusGenerous free quota
glm-z1-airFree quota
Mistral Large + Codestral 25.08 for European data residency. · La Plateforme free tier
mistral-small-latest1 req/sec · 500k tokens/min (free)
codestral-latestFree trial credits
GPT-5.1 and GPT-5 mini straight from the source. · Paid — usage-based
gpt-5.1Paid — usage-based
OpenAI flagship with adaptive reasoning.
gpt-5-miniPaid — usage-based
Claude Sonnet, Haiku and Opus via the OpenAI-compatible endpoint. · Paid — usage-based
claude-sonnet-4-5Paid — usage-based
The everyday Claude: strong coding and agentic tool use.
claude-haiku-4-5Paid — usage-based
claude-opus-4-5Paid — usage-based
Grok 4 and the fast Grok 4 variants from xAI. · Paid — usage-based
grok-4Paid — usage-based
grok-4-fast-non-reasoningPaid — usage-based
Llama, DeepSeek and Qwen open weights on serverless GPUs. · Free endpoints on select open models
meta-llama/Llama-3.3-70B-Instruct-TurboPaid — usage-based (free variant available)
deepseek-ai/DeepSeek-R1Paid — usage-based
Qwen/Qwen2.5-Coder-32B-InstructPaid — usage-based
Low-latency serving for open models: Llama, DeepSeek, Qwen. · Paid — usage-based
accounts/fireworks/models/llama-v3p3-70b-instructPaid — usage-based
accounts/fireworks/models/deepseek-r1Paid — usage-based
accounts/fireworks/models/deepseek-v3Paid — usage-based
Budget serving of open models with trial credits at sign-up. · Free trial credits
meta-llama/llama-3.3-70b-instructTrial credits, then usage-based
deepseek/deepseek-r1Trial credits, then usage-based
sarmalink is a wrapper, not a wallet. Tokens billed by Groq, Gemini, DeepSeek or anyone else go straight to them at the rate you signed up for. We do not skim, mark up, or proxy through a paid pool. If you want a paid-tier model, bring the paid-tier key.
Start free in 60 seconds →