Fireworks AI
Fast, cost-efficient inference and fine-tuning platform for open-source models, optimised for production compound-AI systems.
What They Do
Fireworks AI provides serverless inference for 100+ open-weight models with industry-leading latency, achieved through custom CUDA kernels and speculative decoding. The platform specialises in compound AI systems — structured generation, function calling, and JSON mode.
Mission
Enable developers to build production AI applications with fast, reliable, and cost-effective open-model inference.
Available Models
| Model | Family | Context | Input /M | Output /M |
|---|---|---|---|---|
| accounts/fireworks/models/deepseek-v4-pro | — | — | — | |
| accounts/fireworks/models/flux-1-schnell-fp8 | — | — | — | |
| accounts/fireworks/models/glm-5p1 | — | — | — | |
| accounts/fireworks/models/gpt-oss-120b | — | — | — | |
| accounts/fireworks/models/kimi-k2p5 | — | — | — | |
| accounts/fireworks/models/kimi-k2p6 | — | — | — |
FAQ
Lin Qiao (CEO, former Meta/PyTorch engineering director), Dmytro Dzhulgakov, and Praveen Tiwari co-founded Fireworks AI in 2022 with backing from Sequoia and other investors.