420 Million! Cloudwalk Red Envelope, Won Zhanjiang AI Inference Thousand-Card Cluster Project

MaticHoleFiller · 2026-03-21T04:13:51+00:00

Yuntu Leifel won the bid for the Zhanjiang AI Penetration Support Infrastructure Construction Project. The company will build a computing cluster based on its self-developed AI inference acceleration card, optimize large language model inference architecture, improve system efficiency, and support government and industrial digital applications. Additionally, the company will continue to advance AI inference chip R&D, with plans to launch optimized Prefill and Decode chips, and aims to reduce large model inference costs.

MaticHoleFiller

2026-03-21 04:13:51

Abstract generation in progress

(Source: Yuntian Lifei)

Recently, Yuntian Lifei won the bid for the Zhanjiang City AI Penetration Support New Quality Productivity Infrastructure Construction Project. According to the project plan, the company will build an AI inference computing cluster based on its self-developed domestic AI inference acceleration cards, and promote the adaptation and deployment of domestically produced large models like DeepSeek in relevant application scenarios, providing computing infrastructure support for government and industrial digitalization applications.

Building Inference Computing Infrastructure for Large Model Applications

The AI inference computing cluster constructed in this project will be systematically designed around the requirements of large model inference tasks.

During large model inference, different computational stages have varying system resource needs. The industry commonly adopts a “Prefill–Decode separation” inference architecture, optimizing resource allocation for different stages to improve overall system efficiency.

Under this architecture, the Prefill stage mainly handles long-context understanding and computation, requiring high computing power and bandwidth; the Decode stage continuously generates tokens and is more sensitive to system latency. During the project construction, resource allocation and system optimization will be tailored to the characteristics of each stage.

At the same time, as the context length of models increases, a large number of intermediate states need to be stored in KV Cache. In response to this, the system design will optimize the coordination between computation, storage, and network to improve data access efficiency and overall system performance.

Regarding network architecture, the system will adopt a unified high-speed interconnection architecture, building the cluster’s physical layer network with 400G optical networks to achieve high bandwidth and low latency communication between nodes. It will support scaling from dozens of cards in a single node to thousands of cards in a cluster, meeting the needs of AI applications of different scales.

Once the project is fully completed, it will form a computing infrastructure for large model inference tasks, providing stable computing support for related application scenarios.

Continuously Advancing AI Inference Chip and Computing System Technology R&D

According to the project plan, the AI inference computing cluster will be built in three phases, using Yuntian Lifei’s self-developed domestically produced AI inference acceleration cards.

The first phase will deploy Yuntian Lifei’s X6000 inference acceleration cards; in the future, the latest generation of company chips will be prioritized for deployment.

In terms of AI inference chip R&D, Yuntian Lifei is actively advancing technological layouts for different inference stages. According to the company’s strategic plan, it will gradually launch chips optimized for the Prefill stage and inference chips designed for low-latency requirements during the Decode stage, further improving overall inference efficiency through system-level collaborative optimization.

Among these, the company’s first chip optimized for long-context inference scenes, DeepVerse100, is expected to complete tape-out within the year and plans to deploy it in related computing systems.

In the long-term technology plan, the company has proposed the “1001 Plan,” aiming for a long-term goal of “one billion tokens for one penny.” Through collaborative optimization of chip architecture and computing systems, it will continuously drive down the costs of large model inference.

In the future, the company will continue to promote R&D related to AI inference chips and advance the widespread application of artificial intelligence technology across more industries.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.