On January 21, DeepSeek celebrated the anniversary of DeepSeek-R1 with the introduction of an updated model with the MODEL1 index. This event indicates the active development of the model in the company’s portfolio and demonstrates ambitious plans for the next year.
Revealing New Model Details via FlashMLA Code
According to BlockBeats, the FlashMLA repository has been updated on GitHub, containing important details about MODEL1. The code analysis showed 28 mentions of the new model in 114 different files, which indicates the scale of its integration into the company’s infrastructure. In parallel with MODEL1, V32 appears, confirming that this is a fundamentally different model from DeepSeek-V3.2.
Technical Innovation and Optimization
The source code reveals significant differences at the technological level. The main improvements are in KV cache management, sparsity computing, and decoding in FP8 format. These optimizations indicate DeepSeek’s focus on improving the memory efficiency and performance of the model, which is critical for scaling its practical application. Thus, the new model is not just an update, but a qualitative leap in architecture and functionality.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
DeepSeek Launches New MODEL1 to Celebrate R1's First Anniversary
On January 21, DeepSeek celebrated the anniversary of DeepSeek-R1 with the introduction of an updated model with the MODEL1 index. This event indicates the active development of the model in the company’s portfolio and demonstrates ambitious plans for the next year.
Revealing New Model Details via FlashMLA Code
According to BlockBeats, the FlashMLA repository has been updated on GitHub, containing important details about MODEL1. The code analysis showed 28 mentions of the new model in 114 different files, which indicates the scale of its integration into the company’s infrastructure. In parallel with MODEL1, V32 appears, confirming that this is a fundamentally different model from DeepSeek-V3.2.
Technical Innovation and Optimization
The source code reveals significant differences at the technological level. The main improvements are in KV cache management, sparsity computing, and decoding in FP8 format. These optimizations indicate DeepSeek’s focus on improving the memory efficiency and performance of the model, which is critical for scaling its practical application. Thus, the new model is not just an update, but a qualitative leap in architecture and functionality.