The strongest AI infrastructure solution in history: NVIDIA releases the Vera Rubin platform, reducing the cost per token to one-tenth

robot
Abstract generation in progress

NVIDIA Unveils Vera Rubin AI Platform at GTC 2026

On March 17, at the GTC 2026 conference held in San Jose, California, NVIDIA announced the Vera Rubin AI platform to advance the development of Agentic AI.

NVIDIA founder and CEO Jensen Huang emphasized that Vera Rubin represents a generational leap, marking the beginning of the company’s largest infrastructure buildout in history, covering the entire AI lifecycle from large-scale pretraining to real-time agent reasoning.

This move signifies NVIDIA’s official entry into the traditional CPU direct sales market, directly competing with Intel and AMD, and challenging cloud giants with their self-developed Arm-based processors.

According to a blog post cited by IT Home, to significantly improve computational efficiency, each Vera CPU chip features 88 cores and 144 threads. The chip uses NVIDIA’s custom-designed Arm v9.2-A Olympus cores, achieving an impressive 1.5x generation leap in instructions per cycle (IPC).

Additionally, the architecture introduces a groundbreaking “Spatial Multithreading” technology, physically isolating pipeline components to enable multiple threads to run simultaneously on a single core, eliminating the resource contention typical of traditional multithreading.

At the core computing level, the new NVL72 rack achieves a breakthrough in efficiency. It connects 72 Rubin GPUs and 36 Vera CPUs via NVLink6.

Compared to the previous Blackwell platform, this system can complete training of large mixture-of-experts (MoE) models with only a quarter of the GPUs, while inference throughput per watt increases up to 10 times, and the cost per token drops to one-tenth.

Furthermore, the Vera CPU rack, designed for validating AI model results, integrates 256 liquid-cooled CPUs, doubling the efficiency of traditional CPUs and increasing speed by 50%.

To meet the low latency and long context requirements of agent systems, NVIDIA introduced the Groq3LPX inference acceleration rack. This system includes 256 LPU processors, and combined with Vera Rubin, it boosts inference throughput per megawatt to as high as 35 times.

In terms of data storage, the new BlueField-4STX architecture builds an AI-native storage infrastructure. Using the new DOCA Memos framework, it efficiently handles massive key-value (KV) cache data generated by large language models, significantly reducing energy consumption while increasing inference throughput by up to 5 times, enabling faster multi-turn AI interactions.

NVIDIA GTC 2026 Conference Highlights

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin