Google Releases Untrained TurboQuant Compression Algorithm, Claims at Least 6x Reduction in AI Memory Requirements; Memory Stocks Plunge, Analysts Divided
(Background: Google aims to complete quantum cryptography migration by 2029, six years ahead of government target, prompting the encryption industry to keep pace)
(Additional context: The Wall Street Journal reports Trump plans to appoint Zuckerberg, Huang Renxun, and Elon Musk to PCAST to build the “American AI National Team”)
A new algorithm causes memory chip stocks to plummet across the board? On the 25th, Google Research officially announced TurboQuant compression algorithm, claiming it can quantize the KV cache of large language models (LLMs) down to just 3 bits, with no loss in model accuracy and at least a 6-fold reduction in memory usage.
Following the announcement, memory giant Micron dropped as much as 6.1% during trading, closing at $382.09, hitting a three-week low. Meanwhile, SanDisk fell 3.5%, Seagate declined 2.59%, and Western Digital dropped 1.63%, leading to a complete collapse in the memory sector.
Asian markets also came under pressure today, with Samsung Electronics opening down 3.6%, SK Hynix down 4.5%. Investors’ logic is straightforward: if AI models no longer require so much memory, the recent pricing power driven by component shortages may be at risk.
KV cache (Key-Value Cache) is the core mechanism allowing LLMs to “remember” processed data, storing previous attention data so the model doesn’t need to recompute for each token. As context windows expand, KV cache has become a major memory bottleneck.
TurboQuant targets this pain point. Google states that traditional vector quantization methods introduce about 1 to 2 bits of overhead per value in memory, but TurboQuant eliminates this burden through a two-stage process:
Stage 1: Uses PolarQuant to rotate data vectors, achieving high-quality compression.
Stage 2: Applies Quantized Johnson-Lindenstrauss algorithm to eliminate residual errors.
Benchmark tests on NVIDIA H100 GPUs show that 4-bit TurboQuant improves performance in attention score calculations by 8 times compared to unquantized 32-bit keys, with KV cache memory usage compressed by at least 6 times.
More importantly, this algorithm requires no training or fine-tuning, incurs minimal runtime overhead, and is suitable for deployment in inference environments and large-scale vector search systems. The official paper is scheduled for release at the ICLR 2026 conference in April.
However, not everyone agrees with the “memory apocalypse” narrative.
Some analysts invoke Jensen’s paradox: when technological advances lower resource costs, overall demand can actually increase because resources become more accessible. Supporters argue that if TurboQuant truly lowers the barrier for AI inference, it could accelerate AI model adoption, ultimately driving larger memory demands rather than reducing them.
Lynx Equity Strategies analysts directly state in a report: “The method detailed by Google is unlikely to reduce demand for memory and flash storage over the next 3 to 5 years because supply remains extremely constrained.” As a result, the firm maintains a $700 target price for Micron.