Gene Library Courses Download Pricing Contact Sign in
Groq

Groq

North America Est. 2016 Pay-per-token; free tier with generous daily rate limits.
inference-providerhardwarefast-inference

AI chip company and cloud provider offering the world's fastest LLM inference via its proprietary LPU hardware.

What They Do

Groq designs Language Processing Units (LPUs) — custom silicon optimised for autoregressive token generation, achieving 300–800 tokens/second on 70B models. GroqCloud is the hosted inference API offering pay-per-token access to popular open models.

Mission

Democratise access to low-latency, cost-efficient AI compute through purpose-built chip technology.

Available Models

ModelFamilyContextInput /MOutput /M
allam-2-7b
canopylabs/orpheus-arabic-saudi
canopylabs/orpheus-v1-english
groq/compound
groq/compound-mini
llama-3.1-8b-instant
llama-3.3-70b-versatile
meta-llama/llama-4-scout-17b-16e-instruct
meta-llama/llama-prompt-guard-2-22m
meta-llama/llama-prompt-guard-2-86m
openai/gpt-oss-120b
openai/gpt-oss-20b
openai/gpt-oss-safeguard-20b
qwen/qwen3-32b
whisper-large-v3
whisper-large-v3-turbo

FAQ