Fireworks AI

North America Est. 2022 Pay-per-token; per-GPU-hour for dedicated deployments and fine-tuning.

inference-providerfine-tuningfast-inference

Fast, cost-efficient inference and fine-tuning platform for open-source models, optimised for production compound-AI systems.

What They Do

Fireworks AI provides serverless inference for 100+ open-weight models with industry-leading latency, achieved through custom CUDA kernels and speculative decoding. The platform specialises in compound AI systems — structured generation, function calling, and JSON mode.

Mission

Enable developers to build production AI applications with fast, reliable, and cost-effective open-model inference.

Available Models

Model	Context	Input /M	Output /M
accounts/fireworks/models/deepseek-v4-pro	—	—	—
accounts/fireworks/models/flux-1-schnell-fp8	—	—	—
accounts/fireworks/models/glm-5p1	—	—	—
accounts/fireworks/models/glm-5p2	—	—	—
accounts/fireworks/models/gpt-oss-120b	—	—	—
accounts/fireworks/models/kimi-k2p5	—	—	—
accounts/fireworks/models/kimi-k2p6	—	—	—

FAQ

: Lin Qiao (CEO, former Meta/PyTorch engineering director), Dmytro Dzhulgakov, and Praveen Tiwari co-founded Fireworks AI in 2022 with backing from Sequoia and other investors.