Author: Xiao Jing
Silicon Valley is buzzing with a new term: Tokenmaxxing (maximizing token usage).
Inside Meta and OpenAI, engineers are competing on AI usage rankings. According to foreign media reports, one engineer consumed 210 billion tokens in a week, equivalent to the text volume of 33 Wikipedia articles. Some have monthly AI bills as high as $150,000.
An Ericsson engineer based in Stockholm spends more on Claude than their salary, but the company covers the bill. Token budgets are becoming a new perk for engineers—just like free snacks or free lunches once were.
Shopify CEO Tobi Lütke issued an internal memo as early as April 2025, declaring “AI usage is Shopify’s baseline expectation,” requiring all teams to prove AI cannot complete a task before hiring new personnel, and incorporating AI usage into performance evaluations. Meta later announced that starting in 2026, “AI-driven impact” would be officially included in all employees’ performance reviews.
When token consumption begins to appear in KPIs, it has become a signal of organizational behavior.
Meanwhile, industry signals are equally intense. On March 16, Jensen Huang at NVIDIA GTC called tokens “the cornerstone of the AI era,” stating they will become “the most valuable commodity.” The next day, Alibaba announced the establishment of Alibaba Token Hub Business Group, directly led by CEO Wu Yongming, positioning it as “creating tokens, delivering tokens, applying tokens.”
Image: Jensen Huang’s GTC speech showing a chart of token costs versus revenue, dividing data centers into free, mid-tier, high-tier, and premium layers to allocate computing power, and projecting Vera Rubin chips bringing five times the revenue compared to Grace Blackwell.
A year ago, tokens were just a technical measurement of interest to developers. Now, they have become the language used by chip companies to define product value, a reason for tech giants to reorganize business units around them, and a new benefit and core KPI in engineers’ offers.
However, the Tokenmaxxing leaderboard only tracks consumption, not how many effective tasks those tokens accomplish.
This is precisely the biggest blind spot in today’s token economy.
210 billion tokens sounds like an astonishing number. But understanding its true meaning requires abandoning one assumption: that tokens are standardized.
Image: Tokscale global token consumption leaderboard, Tokscale is an open-source tool for tracking and ranking token usage across platforms like Claude Code, Cursor, OpenCode, Codex, etc., allowing users to submit data for global ranking.
Two years ago, the pricing of large models was relatively simple, usually based on input tokens and output tokens. But today, mainstream vendors have clearly layered their pricing systems. The same “token” can have completely different charges depending on call conditions.
For example, Anthropic’s Claude Opus 4.6 charges $5 per million input tokens and $25 per million output tokens. With prompt caching enabled, cache writes for 5 minutes cost $6.25, for 1 hour $10, and cache reads $0.50. Using Batch API, both input and output prices can be halved; if inference is restricted to the US, token prices increase by 10%; in Fast Mode, input and output prices jump to six times the standard rate.
In other words, the same vendor, same model, and the same “token” billing unit can have price differences of several times or even over ten times depending on caching, batching, regional inference, and speed tiers.
The real cost drivers are no longer just model invocation fees. OpenAI’s current pricing shows that web search costs vary by model: $10 per thousand queries for GPT-4.1, GPT-4o, etc., and $25 per thousand for GPT-5 and reasoning models.
File Search costs $2.50 per thousand queries, plus $0.10 per GB per day for vector storage, with the first GB free. Code containers are now billed separately: $0.03 per GB container, with higher prices for 4GB, 16GB, and 64GB containers; from March 31, 2026, this will switch to a session-based billing every 20 minutes per container.
Beyond models, search, retrieval, storage, and execution environments—once considered “peripheral capabilities”—are now split into independent cost centers.
Google is following the same trend. The Vertex AI official pricing page shows that from February 11, 2026, Code Execution, Sessions, and Memory Bank in Agent Engine will start charging separately, based on vCPU hours and GiB of memory hours.
Today, talking about “large model prices” can no longer focus solely on input and output token costs. What has truly changed is the billing logic: vendors now sell a complete set of operational, storable, searchable, callable, and continuously executable AI foundational capabilities.
Image: OpenAI pricing page screenshot showing multi-layered charges beyond tokens (Web Search, File Search, Containers, etc.)
If you only look at the headline prices of model APIs, tokens seem to be approaching bargain prices. Anthropic’s Opus dropped from $15 per million tokens to $5—a two-thirds reduction. DeepSeek V3.2 costs as low as $0.28. Google Gemini 2.5 Flash Lite is around $0.10.
Chinese models have even more price advantages. According to OpenRouter data, Chinese models’ token prices are roughly one-sixth to one-tenth of overseas competitors. Even after Tencent Cloud’s Mix Yuan HY2.0 Instruct ended public testing subsidies and increased prices by over 460%, the input price is about $0.62 per million tokens—still lower than Anthropic’s cheapest Haiku 4.5 at $1, and less than one-fifth of Sonnet 4.6.
Image: Artificial Analysis maintains a real-time LLM ranking, showing huge price gaps between different models.
But the total cost of AI usage has not decreased accordingly. Three mechanisms are at play.
First, models have become smarter, but at the cost of being more “talkative.” Artificial Analysis reports that the average output token usage for reasoning models is about 5.5 times that of non-reasoning models. Both Anthropic and OpenAI bill extended thinking tokens as output tokens; the deeper the thinking, the longer the bill. While unit prices have fallen, the total token count for completing the same task has increased several times.
Second, agents turn “one-time consumption” into “continuous consumption.” This is the core driver of Tokenmaxxing. Engineers aren’t manually burning tokens; their AI programming agents run 24/7, automatically splitting tasks, calling tools, and self-iterating. According to Alibaba Cloud, a single agent’s compute power consumption is 100 to 1,000 times that of traditional chatbots. China’s daily token consumption surpassed 30 trillion by mid-2025 and soared to 180 trillion by February 2026.
Third, the underlying costs of producing tokens are rising. On March 18, 2026, Alibaba Cloud and Baidu Smart Cloud announced price hikes for AI compute and storage products, with increases up to 34%. AWS raised machine learning capacity costs by about 15% in January, and Google Cloud announced infrastructure fee increases starting in May.
An industry expert said, “This cloud market price adjustment is mainly driven by supply and demand, and cost factors. Future prices will mainly follow the overall supply chain’s cost trends.”
GPU, parallel storage, high-speed networks, data center power—all are rising, even as model prices fall. When Anthropic launched Opus 4.6, they emphasized “price remains unchanged,” implying that the vendor absorbs the cost for enhanced capabilities.
In other words, models are the engine, but fuel, tolls, and highway fees are all increasing.
These three mechanisms together have created a widening gap between token list prices and the actual costs of completing tasks.
Returning to Tokenmaxxing. The leaderboard records token consumption but not output quality. Burning 33 Wikipedia tokens in a week doesn’t mean completing 33 Wikipedia-value tasks.
Big companies embed token consumption into KPIs or treat it as a “benefit.” But is this truly a productivity leap, or just a “productivity performance”?
This exposes a fundamental structural flaw in the token economy: the industry has yet to establish an effective metric linking token consumption to task completion. Tokens measure input, not output. An agent that spends 1 million tokens to complete a task versus one that spends 100,000 tokens—on the leaderboard, the former ranks higher.
Shopify CEO Lütke’s memo notes that some colleagues are contributing “10 times the output previously thought impossible,” but he doesn’t specify how that output is measured.
A new kind of professional anxiety has emerged: if you don’t demonstrate AI productivity through high token consumption, you risk being seen as outdated. This mirrors the early 2000s rush to build websites and the 2010s obsession with apps: adopting technology becomes a signal, consumption becomes a proxy metric, and real value measurement is delayed.
But unlike before, this round’s costs are real. $150,000 monthly AI bills, 2.1 trillion tokens burned weekly, rising underlying compute and storage costs—Tokenmaxxing isn’t free. When costs become high enough, the line between “burning tokens” and “creating value with tokens” shifts from a philosophical question to a financial one.
Token prices will continue to fall—there’s no doubt about that.
The real concern is who can most efficiently turn tokens into task completion. For every programmer, every company, every user, the key isn’t how much each million tokens costs, but how much value they get from each task completed.
The gap between these two numbers is the biggest business opportunity—and the deepest cost trap—in the next phase of the “AI era measured by tokens.”