Fireworks AI launches the training platform preview version, supporting trillion-parameter full-parameter training

BlockBeatNews

According to 1M AI News monitoring, the AI inference infrastructure company Fireworks AI has released a Fireworks Training preview version, expanding from a pure inference platform into an end-to-end platform for training and deployment. Fireworks AI was founded by Qiao Lin (Lin Qiao), a former Meta engineer who worked on building PyTorch. It currently has a $4.0 billion valuation, and its daily processed token volume reaches 150 trillion.

The platform offers three tiers:

  1. Training Agent: For product teams without ML infrastructure, you can describe the task and upload the data to complete the entire workflow from training to deployment. It currently only supports LoRA
  2. Managed Training: For ML engineers, it supports SFT, DPO, and reinforcement learning fine-tuning, including full-parameter training
  3. Training API: For research teams, it allows you to customize loss functions and training loops, supporting algorithms such as GRPO, DAPO, and others

The scale of full-parameter training ranges from a single-node Qwen3 8B to Kimi K2.5 (trillion-parameter) on 64 blocks of Nvidia B200.

Fireworks AI’s production inference customers—AI programming tools Cursor, Vercel, and Genspark—have completed frontier reinforcement learning training on this platform. Vercel trained an automatic error-correction model for its code generation product v0; the no-error code generation rate reaches 93%. Its CTO Malte Ubl said that compared to Sonnet 3.5, it is only 62%, and end-to-end latency has improved 40x versus the previously used closed-source models. Genspark conducted reinforcement learning fine-tuning on the open-source trillion-parameter model Kimi K2 to build a deep research agent; tool-call volume increased by 33% and costs decreased by 50%. Cursor distributedly completed reinforcement learning training of Composer 2 across 3 to 4 clusters worldwide (currently ranked #1 on CursorBench), and training and production inference share the same GPU pool.

Fireworks AI emphasizes the core technical differentiator of numerical consistency between training and inference. MoE (mixture-of-experts) models are more fragile numerically than dense models; even small changes in hidden states may flip expert routing and cascade amplification. Fireworks publishes the KL divergence values between training and inference for all supported models, all of which are below 0.01.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments