Groq

The fastest inference cloud — by an order of magnitude — using custom LPU silicon.

Contains affiliate link

Groq hosts open models on custom LPU hardware that delivers 1000+ tokens/sec inference. Best for latency-sensitive workloads.

Latest update

May 19, 2026

Llama 4 inference record: 2,100 tok/s.

Related tools

One API key for 200+ models, with automatic cost-routing across providers.

The most ergonomic TypeScript SDK for building with LLMs — now with built-in MCP.

Run any open model with a single API call, now with sub-100ms cold start.

Run Python code in the cloud — including GPU workloads — with a function decorator.