AI chip company and cloud provider offering the world's fastest LLM inference via its proprietary LPU hardware.
What They Do
Groq designs Language Processing Units (LPUs) — custom silicon optimised for autoregressive token generation, achieving 300–800 tokens/second on 70B models. GroqCloud is the hosted inference API offering pay-per-token access to popular open models.
Mission
Democratise access to low-latency, cost-efficient AI compute through purpose-built chip technology.
Available Models
| Model | Family | Context | Input /M | Output /M |
|---|---|---|---|---|
| allam-2-7b | — | — | — | |
| canopylabs/orpheus-arabic-saudi | — | — | — | |
| canopylabs/orpheus-v1-english | — | — | — | |
| groq/compound | — | — | — | |
| groq/compound-mini | — | — | — | |
| llama-3.1-8b-instant | — | — | — | |
| llama-3.3-70b-versatile | — | — | — | |
| meta-llama/llama-4-scout-17b-16e-instruct | — | — | — | |
| meta-llama/llama-prompt-guard-2-22m | — | — | — | |
| meta-llama/llama-prompt-guard-2-86m | — | — | — | |
| openai/gpt-oss-120b | — | — | — | |
| openai/gpt-oss-20b | — | — | — | |
| openai/gpt-oss-safeguard-20b | — | — | — | |
| qwen/qwen3-32b | — | — | — | |
| whisper-large-v3 | — | — | — | |
| whisper-large-v3-turbo | — | — | — |
FAQ
A Language Processing Unit is Groq's custom ASIC designed specifically for inference workloads. Unlike GPUs optimised for parallelism, the LPU uses a deterministic dataflow architecture that eliminates memory stalls during token generation.
Yes. Change the base URL to https://api.groq.com/openai/v1 and existing code works unchanged.