Dawning Information Original Native RDMA High-Speed Network scaleFabric Makes Debut, Some Indicators Surpass Nvidia NDR

robot
Abstract generation in progress

(Source: Caixin)

The technical specifications of the newly released scaleFabric 400 series network products fully align with NVIDIA’s NDR, with some metrics surpassing them.

On March 12, Sugon (603019.SH) announced a major breakthrough in domestically developed high-end native RDMA technology and officially launched the first full-stack self-developed 400G lossless high-speed network—scaleFabric. This product is based on native RDMA architecture, with 100%自主研发 from the underlying 112G SerDes IP and hardware devices to the upper-layer management software, filling a gap in China’s data center high-speed network field. It offers performance comparable to top international products, paving the way for ultra-large AI computing clusters with high bandwidth, low latency, true losslessness, and ultra-reliability—a “computing power artery.”

scaleFabric is China’s first native lossless RDMA high-speed network, designed for ultra-large AI computing clusters. It features自主研发 from core IP, switching chips, network cards, to switches, drivers, and management software, forming a complete hardware-software technical system.

The technical specifications of the scaleFabric 400 series network products announced this time fully align with NVIDIA’s NDR, with some metrics exceeding them. In terms of performance, the scaleFabric 400 network card is based on PCIe 5.0, with port bandwidth reaching 400Gbps and end-to-end communication latency as low as 0.9 microseconds; the scaleFabric 400 switch supports single-port bandwidth of 800Gbps, with a total switching capacity of up to 64Tbps bidirectional, and switching latency around 260 nanoseconds. It supports 800G×40 or 400G×80 port expansion. This performance combination fully meets the extreme demands of multi-thousand-card AI training clusters for high bandwidth and low latency networks.

In terms of stability and scalability, the product adopts a credit-based lossless flow control mechanism to fundamentally avoid congestion and packet loss risks. Link failure recovery time is less than 1 millisecond, supporting nearly 10,000-card clusters to operate stably for over 10 months. Compared to NVIDIA’s NDR, switch port density has increased by 25%, maximum QP support on network cards has doubled, and the maximum subnet interconnection scale is 2.33 times that of traditional InfiniBand, easily supporting deployments of up to 114,000 cards, while reducing overall network costs by 30%.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin