Together AI

North America Est. 2022 Pay-per-token for inference and fine-tuning; hourly for dedicated GPU clusters.

inference-providerfine-tuningopen-sourcecloud

AI-native cloud providing fast inference, fine-tuning, and training for open-source foundation models.

What They Do

Together AI operates a full-stack platform for production AI: serverless inference, dedicated GPU clusters, fine-tuning pipelines, and custom model evaluations. Founded by academic researchers (including FlashAttention co-author Tri Dao).

Mission

Build the AI-native cloud that makes open-source model research and production accessible to everyone.

Available Models

Model	Context	Input /M	Output /M
BAAI/bge-base-en-v1.5	—	—	—
ByteDance-Seed/Seedream-3.0	—	—	—
ByteDance-Seed/Seedream-4.0	—	—	—
ByteDance/Seedance-1.0-lite	—	—	—
ByteDance/Seedance-1.0-pro	—	—	—
ByteDance/Seedance-2.0	—	—	—
ByteDance/Seedream-5.0-lite	—	—	—
HappyHorse/HappyHorse-1.0-T2V	—	—	—
Hcompany/Holo3-35B-A3B	—	—	—
HiDream-ai/HiDream-I1-Dev	—	—	—
HiDream-ai/HiDream-I1-Fast	—	—	—
HiDream-ai/HiDream-I1-Full	—	—	—
LiquidAI/LFM2-24B-A2B	—	—	—
LiquidAI/LFM2.5-8B-A1B	—	—	—
Lykon/DreamShaper	—	—	—
MiniMaxAI/MiniMax-M1-40k	—	—	—
MiniMaxAI/MiniMax-M1-80k	—	—	—
MiniMaxAI/MiniMax-M2	—	—	—
MiniMaxAI/MiniMax-M2.5-FP4	—	—	—
MiniMaxAI/MiniMax-M2.7	—	—	—
MiniMaxAI/MiniMax-M3	—	—	—
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO	—	—	—
Prism-ML/Ternary-Bonsai-27B	—	—	—
Qwen/QwQ-32B	—	—	—
Qwen/Qwen-Image	—	—	—
Qwen/Qwen-Image-2.0	—	—	—
Qwen/Qwen-Image-2.0-Pro	—	—	—
Qwen/Qwen2-1.5B	—	—	—
Qwen/Qwen2-1.5B-Instruct	—	—	—
Qwen/Qwen2-72B	—	—	—
Qwen/Qwen2-72B-Instruct	—	—	—
Qwen/Qwen2-7B	—	—	—
Qwen/Qwen2-VL-72B-Instruct	—	—	—
Qwen/Qwen2.5-1.5B	—	—	—
Qwen/Qwen2.5-1.5B-Instruct	—	—	—
Qwen/Qwen2.5-14B	—	—	—
Qwen/Qwen2.5-14B-Instruct	—	—	—
Qwen/Qwen2.5-32B	—	—	—
Qwen/Qwen2.5-32B-Instruct	—	—	—
Qwen/Qwen2.5-3B-Instruct	—	—	—
Qwen/Qwen2.5-72B	—	—	—
Qwen/Qwen2.5-72B-Instruct	—	—	—
Qwen/Qwen2.5-72B-Instruct-Turbo	—	—	—
Qwen/Qwen2.5-7B	—	—	—
Qwen/Qwen2.5-7B-Instruct	—	—	—
Qwen/Qwen2.5-7B-Instruct-Turbo	—	—	—
Qwen/Qwen2.5-Coder-32B-Instruct	—	—	—
Qwen/Qwen2.5-VL-72B-Instruct	—	—	—
Qwen/Qwen3-0.6B	—	—	—
Qwen/Qwen3-0.6B-Base	—	—	—
Qwen/Qwen3-1.7B	—	—	—
Qwen/Qwen3-1.7B-Base	—	—	—
Qwen/Qwen3-14B	—	—	—
Qwen/Qwen3-14B-Base	—	—	—
Qwen/Qwen3-235B-A22B-Instruct-2507-FP8	—	—	—
Qwen/Qwen3-235B-A22B-Instruct-2507-tput	—	—	—
Qwen/Qwen3-30B-A3B	—	—	—
Qwen/Qwen3-30B-A3B-Base	—	—	—
Qwen/Qwen3-30B-A3B-Instruct-2507-Lora	—	—	—
Qwen/Qwen3-32B	—	—	—
Qwen/Qwen3-4B-Base	—	—	—
Qwen/Qwen3-4B-Instruct-2507	—	—	—
Qwen/Qwen3-8B	—	—	—
Qwen/Qwen3-8B-Base	—	—	—
Qwen/Qwen3-8B-Lora	—	—	—
Qwen/Qwen3-Coder-30B-A3B-Instruct	—	—	—
Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8	—	—	—
Qwen/Qwen3-Coder-Next-FP8	—	—	—
Qwen/Qwen3-Next-80B-A3B-Instruct	—	—	—
Qwen/Qwen3-Next-80B-A3B-Instruct-FP8	—	—	—
Qwen/Qwen3-Next-80B-A3B-Thinking	—	—	—
Qwen/Qwen3-VL-235B-A22B-Instruct-FP8	—	—	—
Qwen/Qwen3-VL-32B-Instruct	—	—	—
Qwen/Qwen3-VL-8B-Instruct	—	—	—
Qwen/Qwen3.5-0.8B-Lora	—	—	—
Qwen/Qwen3.5-122B-A10B-FP8	—	—	—
Qwen/Qwen3.5-122B-A10B-Lora	—	—	—
Qwen/Qwen3.5-27B-Lora	—	—	—
Qwen/Qwen3.5-2B-Lora	—	—	—
Qwen/Qwen3.5-35B-A3B	—	—	—
Qwen/Qwen3.5-35B-A3B-Base-Lora	—	—	—
Qwen/Qwen3.5-35B-A3B-Lora	—	—	—
Qwen/Qwen3.5-397B-A17B	—	—	—
Qwen/Qwen3.5-397B-A17B-Lora	—	—	—
Qwen/Qwen3.5-4B-Lora	—	—	—
Qwen/Qwen3.5-9B	—	—	—
Qwen/Qwen3.5-9B-FP8	—	—	—
Qwen/Qwen3.5-9B-Lora	—	—	—
Qwen/Qwen3.6-27B-Lora	—	—	—
Qwen/Qwen3.6-35B-A3B-FP8	—	—	—
Qwen/Qwen3.6-35B-A3B-Lora	—	—	—
Qwen/Qwen3.6-Plus	—	—	—
Qwen/Qwen3.7-Max	—	—	—
Qwen/Qwen3.7-Plus	—	—	—
RunDiffusion/Juggernaut-pro-flux	—	—	—
Rundiffusion/Juggernaut-Lightning-Flux	—	—	—
Salesforce/Llama-Rank-V1	—	—	—
Wan-AI/Wan2.2-I2V-A14B	—	—	—
Wan-AI/Wan2.2-T2V-A14B	—	—	—
Wan-AI/Wan2.6-image	—	—	—
Wan-AI/wan2.7-i2v	—	—	—
Wan-AI/wan2.7-r2v	—	—	—
Wan-AI/wan2.7-t2v	—	—	—
agentica-org/DeepCoder-14B-Preview	—	—	—
alibaba/happyhorse-1.0-i2v	—	—	—
alibaba/happyhorse-1.0-r2v	—	—	—
alibaba/happyhorse-1.0-t2v	—	—	—
alibaba/happyhorse-1.1-i2v	—	—	—
alibaba/happyhorse-1.1-r2v	—	—	—
alibaba/happyhorse-1.1-t2v	—	—	—
allenai/Molmo-7B-D-0924	—	—	—
arcee-ai/trinity-mini	—	—	—
arize-ai/qwen-2-1.5b-instruct	—	—	—
black-forest-labs/FLUX.1-kontext-max	—	—	—
black-forest-labs/FLUX.1-kontext-pro	—	—	—
black-forest-labs/FLUX.1-schnell	—	—	—
black-forest-labs/FLUX.1.1-pro	—	—	—
black-forest-labs/FLUX.2-dev	—	—	—
black-forest-labs/FLUX.2-flex	—	—	—
black-forest-labs/FLUX.2-max	—	—	—
black-forest-labs/FLUX.2-pro	—	—	—
canopylabs/orpheus-3b-0.1-ft	—	—	—
cartesia/sonic	—	—	—
cartesia/sonic-2	—	—	—
cartesia/sonic-3	—	—	—
cartesia/sonic-3.5	—	—	—
deepcogito/cogito-v1-preview-llama-70B	—	—	—
deepcogito/cogito-v1-preview-llama-70B-Turbo	—	—	—
deepcogito/cogito-v1-preview-llama-8B	—	—	—
deepcogito/cogito-v1-preview-qwen-14B	—	—	—
deepcogito/cogito-v1-preview-qwen-32B	—	—	—
deepcogito/cogito-v2-1-671b	—	—	—
deepgram/aura-2	—	—	—
deepgram/flux	—	—	—
deepgram/nova-3-en	—	—	—
deepgram/nova-3-multi	—	—	—
deepseek-ai/DeepSeek-OCR-2	—	—	—
deepseek-ai/DeepSeek-R1-0528	—	—	—
deepseek-ai/DeepSeek-R1-Distill-Llama-70B	—	—	—
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	—	—	—
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B	—	—	—
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B	—	—	—
deepseek-ai/DeepSeek-V3.1	—	—	—
deepseek-ai/DeepSeek-V4-Pro	—	—	—
deepseek-ai/deepseek-coder-33b-instruct	—	—	—
essentialai/rnj-1-instruct	—	—	—
facebook/cwm	—	—	—
google/flash-image-2.5	—	—	—
google/flash-image-3.1	—	—	—
google/flash-image-3.1-lite	—	—	—
google/gemini-3-pro-image	—	—	—
google/gemma-2-27b-it	—	—	—
google/gemma-2-9b-it	—	—	—
google/gemma-2b-it	—	—	—
google/gemma-3-1b-it	—	—	—
google/gemma-3-1b-pt	—	—	—
google/gemma-3-270m-it	—	—	—
google/gemma-3-270m-it-lora	—	—	—
google/gemma-3-27b-it	—	—	—
google/gemma-3-27b-it-lora	—	—	—
google/gemma-3-27b-pt	—	—	—
google/gemma-3-4b-it	—	—	—
google/gemma-3n-E4B-it	—	—	—
google/gemma-4-12B-it	—	—	—
google/gemma-4-26B-A4B-it	—	—	—
google/gemma-4-31B-it	—	—	—
google/gemma-4-31B-it-lora	—	—	—
google/gemma-4-E2B-it	—	—	—
google/gemma-4-E4B-it	—	—	—
google/imagen-4.0-fast	—	—	—
google/imagen-4.0-preview	—	—	—
google/imagen-4.0-ultra	—	—	—
google/medgemma-27b-text-it	—	—	—
google/veo-2.0	—	—	—
google/veo-3.0	—	—	—
google/veo-3.0-audio	—	—	—
google/veo-3.0-fast	—	—	—
google/veo-3.0-fast-audio	—	—	—
google/veo-3.1	—	—	—
google/veo-3.1-lite	—	—	—
google/veo-3.1-test-debug	—	—	—
hexgrad/Kokoro-82M	—	—	—
ideogram/ideogram-3.0	—	—	—
ideogram/ideogram-4.0	—	—	—
intfloat/multilingual-e5-large-instruct	—	—	—
kwaivgI/kling-1.6-pro	—	—	—
kwaivgI/kling-1.6-standard	—	—	—
kwaivgI/kling-2.0-master	—	—	—
kwaivgI/kling-2.1-master	—	—	—
kwaivgI/kling-2.1-pro	—	—	—
kwaivgI/kling-2.1-standard	—	—	—
meta-llama/Llama-2-7b-chat-hf	—	—	—
meta-llama/Llama-3-8b-chat-hf	—	—	—
meta-llama/Llama-3.1-405B	—	—	—
meta-llama/Llama-3.1-405B-Instruct	—	—	—
meta-llama/Llama-3.2-1B	—	—	—
meta-llama/Llama-3.2-1B-Instruct	—	—	—
meta-llama/Llama-3.2-3B	—	—	—
meta-llama/Llama-3.2-3B-Instruct	—	—	—
meta-llama/Llama-3.3-70B-Instruct	—	—	—
meta-llama/Llama-3.3-70B-Instruct-FP8-Lora	—	—	—
meta-llama/Llama-3.3-70B-Instruct-Turbo	—	—	—
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP4	—	—	—
meta-llama/Llama-4-Scout-17B-16E	—	—	—
meta-llama/Llama-4-Scout-17B-16E-Instruct	—	—	—
meta-llama/Llama-4-Scout-17B-16E-Instruct-FP8-Lora	—	—	—
meta-llama/Llama-Guard-4-12B	—	—	—
meta-llama/Meta-Llama-3-70B-Instruct-Turbo	—	—	—
meta-llama/Meta-Llama-3-8B-Instruct	—	—	—
meta-llama/Meta-Llama-3-8B-Instruct-Lite	—	—	—
meta-llama/Meta-Llama-3.1-70B	—	—	—
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo	—	—	—
meta-llama/Meta-Llama-3.1-8B	—	—	—
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo	—	—	—
minimax/hailuo-02	—	—	—
minimax/speech-2.6-turbo	—	—	—
minimax/speech-2.8-turbo	—	—	—
minimax/video-01-director	—	—	—
mistralai/Devstral-Small-2505	—	—	—
mistralai/Magistral-Small-2506	—	—	—
mistralai/Ministral-3-14B-Instruct-2512	—	—	—
mistralai/Mistral-7B-Instruct-v0.1	—	—	—
mistralai/Mistral-7B-Instruct-v0.3	—	—	—
mistralai/Mistral-7B-v0.1	—	—	—
mistralai/Mistral-Small-24B-Instruct-2501	—	—	—
mistralai/Mixtral-8x22B-Instruct-v0.1	—	—	—
mistralai/Mixtral-8x7B-Instruct-v0.1	—	—	—
mistralai/Mixtral-8x7B-Instruct-v0.1-FP8-Lora	—	—	—
mistralai/Mixtral-8x7B-v0.1	—	—	—
mixedbread-ai/mxbai-rerank-large-v2	—	—	—
moonshotai/Kimi-K2.5-fp4	—	—	—
moonshotai/Kimi-K2.6	—	—	—
moonshotai/Kimi-K2.7-Code	—	—	—
nim/meta/llama-3.1-70b-instruct	—	—	—
nim/meta/llama-3.1-8b-instruct	—	—	—
nim/meta/llama-3.2-11b-vision-instruct	—	—	—
nim/meta/llama-3.2-90b-vision-instruct	—	—	—
nim/meta/llama-3.3-70b-instruct	—	—	—
nim/mistralai/mixtral-8x22b-instruct-v01	—	—	—
nim/mistralai/mixtral-8x7b-instruct-v01	—	—	—
nim/nv-mistralai/mistral-nemo-12b-instruct	—	—	—
nim/nvidia/llama-3.1-nemotron-70b-instruct	—	—	—
nim/nvidia/llama-3.3-nemotron-super-49b-v1	—	—	—
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF	—	—	—
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16	—	—	—
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16	—	—	—
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8	—	—	—
nvidia/NVIDIA-Nemotron-Nano-9B-v2	—	—	—
nvidia/nemotron-3-asr-streaming-0.6b	—	—	—
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-fp8	—	—	—
nvidia/nemotron-3-ultra-550b-a55b	—	—	—
nvidia/nemotron-3.5-asr-streaming-0.6b	—	—	—
nvidia/parakeet-tdt-0.6b-v3	—	—	—
openai/gpt-image-1.5	—	—	—
openai/gpt-image-2	—	—	—
openai/gpt-oss-120b	—	—	—
openai/gpt-oss-20b	—	—	—
openai/sora-2	—	—	—
openai/sora-2-pro	—	—	—
openai/whisper-large-v3	—	—	—
pearl-ai/gemma-4-31b-it	—	—	—
pixverse/pixverse-v5	—	—	—
pixverse/pixverse-v5.6	—	—	—
pixverse/pixverse-v6	—	—	—
rime-labs/rime-arcana-v2	—	—	—
rime-labs/rime-arcana-v3	—	—	—
rime-labs/rime-arcana-v3-turbo	—	—	—
rime-labs/rime-mist-v2	—	—	—
rime-labs/rime-mist-v3	—	—	—
rime-labs/rime-mist-v3-omni	—	—	—
sarvamai/sarvam-m	—	—	—
stabilityai/stable-diffusion-3-medium	—	—	—
stabilityai/stable-diffusion-xl-base-1.0	—	—	—
thinkingmachines/Inkling	—	—	—
togethercomputer/EssentialAI-RNJ-1-Instruct	—	—	—
togethercomputer/meta-llama-3.1-8B-Instruct-AWQ-INT4	—	—	—
vidu/vidu-2.0	—	—	—
vidu/vidu-q1	—	—	—
vidu/vidu-q3	—	—	—
vidu/vidu-q3-turbo	—	—	—
zai-org/GLM-4.5-Air-FP8	—	—	—
zai-org/GLM-4.5V	—	—	—
zai-org/GLM-4.6	—	—	—
zai-org/GLM-4.7	—	—	—
zai-org/GLM-4.7-FP8	—	—	—
zai-org/GLM-4.7-fp4	—	—	—
zai-org/GLM-5	—	—	—
zai-org/GLM-5-FP4	—	—	—
zai-org/GLM-5.1	—	—	—
zai-org/GLM-5.2	—	—	—
zai-org/GLM-OCR	—	—	—

FAQ

: Yes — fine-tuning via LoRA, QLoRA, and full fine-tuning is available for many open models through a managed pipeline. Training jobs are billed per GPU-hour.