Gene Library Courses Download Pricing Contact Sign in

Fireworks AI

North America Est. 2022 Pay-per-token; per-GPU-hour for dedicated deployments and fine-tuning.
inference-providerfine-tuningfast-inference

Fast, cost-efficient inference and fine-tuning platform for open-source models, optimised for production compound-AI systems.

What They Do

Fireworks AI provides serverless inference for 100+ open-weight models with industry-leading latency, achieved through custom CUDA kernels and speculative decoding. The platform specialises in compound AI systems — structured generation, function calling, and JSON mode.

Mission

Enable developers to build production AI applications with fast, reliable, and cost-effective open-model inference.

Available Models

ModelFamilyContextInput /MOutput /M
accounts/fireworks/models/deepseek-v4-pro
accounts/fireworks/models/flux-1-schnell-fp8
accounts/fireworks/models/glm-5p1
accounts/fireworks/models/gpt-oss-120b
accounts/fireworks/models/kimi-k2p5
accounts/fireworks/models/kimi-k2p6

FAQ