The model purchased from Huading may be counterfeit: Revealing the gray industry chain of AI transfer stations

You think you’re writing code with Claude Opus 4.6, but the backend might be running a 9B-parameter domestic Chinese model. You think you saved money, but every single prompt you send is being archived and used to train competing models. You think you’ve found a drop-in replacement—turns out the money on your bill flows into a gray industry chain that starts with stolen credit cards.

This isn’t a conspiracy theory. An arXiv paper backs it up with data: the “top-tier model” you paid real money to fine-tune fails identity verification 45.83% of the time.

And what’s even scarier is that, within the industry, this isn’t exactly a secret.

Bonus at the end of this article: a 30-second community-verified quick detection method.

First, let’s be clear: what exactly is an AI relay station?

On July 9, 2024, OpenAI officially cut off API services for mainland China and Hong Kong. In September 2025, Anthropic followed suit by fully banning Chinese-controlled companies from using the Claude API. Google’s Gemini also imposes strict restrictions on Chinese IPs.

For Chinese developers, the doors to directly using globally top AI models are closing one after another.

That’s where “relay stations” come in.

Simply put, a relay station is a middleman—it claims it can help you bypass regional restrictions and payment barriers by calling APIs for models like Claude, ChatGPT, and Gemini at lower prices. You only need to swap out a base_url and API Key—no code changes—and you can “seamlessly integrate” the world’s strongest AI models.

Sounds great. But behind this “greatness” are pitfalls you can’t imagine.

What does a legitimate operator look like? Let’s look at OpenRouter

Before diving into the dark side, it’s necessary to first see how “legitimate relays” do business, so you can tell how big the gap really is.

OpenRouter is currently the world’s largest AI model aggregation platform, connecting over 300 models from more than 60 providers. Its business model is extremely transparent: it charges about a 5% service fee on top of official inference costs (custom plans for large customers). Every dollar you pay has a clear destination—model call fees go to upstream vendors, and the difference goes to OpenRouter.

In 2025, this company raised a $40 million Series A round led by a16z and Menlo Ventures, with a valuation of $500 million and ARR of $5 million, up 400%. Its core selling point is “routing”—one API Key that connects to all models, intelligent failover, and openly transparent pricing. If you route Opus 4.6 through it, you get Opus 4.6.

Similar legitimate channels include EdenAI, Azure OpenAI Service, and others. They have formal commercial partnerships with model vendors and are bound by compliance requirements.

But here’s the problem: starting at the end of 2025, OpenRouter began “account-level” bans on Chinese users, restricting the use of models from the three major platforms—OpenAI, Claude, and Google. Legit channels are becoming increasingly narrow for Chinese users.

This directly fuels the brutal growth of “underground relay stations.”

Break down the relay station’s four layers of gray industry chain

In China, AI relay stations are far more than just “proxy forwarding.” They form a gray industry chain with extremely fine-grained division of labor. What you see as low prices is only the tip of the iceberg—the stuff beneath the surface is much dirtier than you’d think.

The bottom layer: stolen credit cards

At the darkest end of the chain, they rely on stolen credit cards.

People obtain large batches of overseas stolen card numbers and use registration flows on platforms like OpenAI and Anthropic that don’t require实名海外 verification. They create accounts in bulk and obtain API quotas. The actual cost of these accounts approaches zero—because the money is deducted from stolen credit cards.

When you cheer for “as low as one-third the official price,” have you ever wondered—why can that price be achieved?

This isn’t efficiency optimization or economies of scale. Someone is “covering the bill” for you—that “someone” might be a victim whose card was stolen and charged back.

Second layer: Web-side reverse cracking—making money by turning subscriptions into APIs

A step more “respectable” than card theft is Web2-to-API reverse engineering—cracking web subscription services and selling them as API interfaces.

These relays don’t use official APIs. Instead, they reverse-engineer the web-side interaction protocols of products like Claude and ChatGPT. They capture packets, parse session authentication flows, and wrap web calls into pseudo-APIs compatible with the OpenAI format. A typical workflow looks like this: batch register Plus/Pro member accounts, build an “account pool,” then use proxy servers for load balancing to distribute user requests across different accounts.

A ChatGPT Plus account with a $20 monthly fee can be shared among 5 to 20 people, and each person only needs to pay a few dollars.

And all of this is supported by a mature open-source toolchain.

One API (GitHub 31.2k star), is currently the most mainstream API aggregation and management tool. It supports unified access to more than 30 large models, providing full features like load balancing, token management, and channel management. It offers one-click Docker deployment and is MIT open source.

New API (GitHub 24k star) is built on top of One API and adds commercial features like online payments, intelligent channel routing, and cached billing. It uses the AGPL-3.0 license.

More recently, the trending one is Sub2API (GitHub 9.5k star). The name, translated directly, is “subscription-to-API”—it specializes in converting subscription accounts for products like Claude, ChatGPT, and Gemini into API interfaces. The project supports multi-account management, intelligent scheduling, session persistence, concurrency control, and even a complete admin dashboard. In the project README, there’s a small line written very honestly: “Using this project may violate Anthropic’s service terms. All usage risks are borne by the user.”

Together, these three projects total more than 64,000 stars. They already form a complete “relay station infrastructure.” Anyone can set up a fully featured API relay service within hours—deployment tutorials are everywhere, and “side gigs making over ten thousand per month with zero barrier” ads are common across developer communities.

Third layer: industrialized harvesting of free quotas

The free trial credits that AI vendors give to new users are also targeted by gray-market players.

Take Cursor as an example: on GitHub, multiple open-source projects obtain infinite free trial quotas by resetting device fingerprints. These projects have already gained thousands of stars and formed a complete closed loop: “open-source tools for lead generation, paid accounts for monetization.”

Manus AI’s invitation points system has also been broken into. The underground developers sell automated registration scripts for 1580 to 3200 yuan, driving the points acquisition cost down to “3300 points for only 0.5 yuan.” For a time, more than 125 related fraudulent listings appeared on e-commerce platforms.

Fourth layer: “legit-looking” relays in suits

There’s another kind of relay that follows a seemingly “compliant” route—it claims it reduces costs through large-scale procurement and then resells API quotas at below-official discount prices. Some even advertise “1 yuan = 1 dollar”—for official $1 API quotas, the relay charges only 1 yuan RMB, roughly one-seventh of the official price.

But where do the discounts come from? There are only a few possibilities: either the model is being swapped out, or they’re using the “cheap supply” from the earlier layers, or they lure users in with low prices and burn money first—then once user volume grows, they find ways to monetize—or they simply run off.

When you see a product priced far below cost, remember this: if you can’t find who’s paying the bill, you are the one paying.

Academic “proof”: nearly half the models are fake

If the above were only “industry rumors,” this next section is hard academic evidence.

In March 2026, a paper titled “Real Money, Fake Models: Deceptive Model Claims in Shadow APIs” (paper ID 2603.01919) was published on arXiv. It was the first to conduct a systematic academic audit of AI relay stations.

The research team identified 17 Shadow API services, found 187 academic papers that used these relay stations, and then performed deep testing on 3 representative services.

The results are shocking:

45.83% of model endpoints fail identity fingerprint verification.

Nearly half. The model you call—and what you think you’re getting—very likely isn’t the same thing.

The paper categorizes the fraud into three types:

“Switch-the-parts type”—they claim they provide a certain version of a Gemini model, but replace it with another version. The fingerprint verification results don’t match the claimed model identity at all, yet they still charge up to a 7x premium at the original price.

“Sell-the-dog-in-the-goat’s-head type”—this one is the most outrageous. Users call Claude Opus 4.6 (in the paper’s cases, it’s GPT-5), and the price looks the same as the official one. But the returned model is actually GLM-4-9B—an open-source small model with a parameter size and capabilities in a completely different tier. You pay a price of more than a dozen dollars per million tokens, and what you get is output from a model that’s basically “free to run.”

“Resell-and-skim-the-spread type”—they buy weaker models at low prices upstream, package them and sell them under the name of top-tier models, profiting from the middle spread.

The paper provides a set of cold numbers: users paid 100% of the official price, but the actual value of the models they received is only 38% to 52%. In real money terms: for every $14.84 you spend, the service you actually receive is worth only $5.70 to $7.77, and the rest goes into the relay station operator’s pocket.

Even more dangerous is the performance collapse. In the MedQA medical question-answering evaluation, the Gemini-2.5-flash performance provided by relay stations dropped from the official 83.82% to 37.00%—a decline of 46 percentage points. On LegalBench legal reasoning, the gap reaches 40 to 43 percentage points. On AIME 2025 math reasoning, the deviation is as high as 40 percentage points.

Imagine this: if you use this “relayed Opus” to write medical consultation code, if you use this “relayed GPT-5” to run legal analysis, if you submit academic papers generated by this “relayed Claude”—their reliability may be worse than simply using a free small model directly.

The paper estimates that Shadow API usage causes about 56 academic studies to need to be redone, with costs of $115,000 to $140,000. The conclusion is blunt: Shadow APIs should not be used in any scenario that requires reliability.

The paper exposes the severity of the problem. But for ordinary developers, the more urgent question is—how can you tell whether the relay station you’re using is real?

Is your model fake? Community hands-on detection manual

If model cheating is so widespread, does the average user have any way to verify it themselves?

The paper and the technical community provide a full set of methods—from “instant checks” to “professional audits.” The following detection methods come from highly upvoted practice posts in X (Twitter) developer communities and open-source tools, and they’ve been validated by lots of users.

Method Zero: 30-second quick screening (set temperature to 0.01)

This is the most widely circulated “counterfeit detector” test in the community, from @billtheinvestor’s highly upvoted post:

Input this string of numbers: “5, 15, 77, 19, 53, 54”, and ask the model to sort them or pick the maximum value.

True Claude: almost always outputs 77

True GPT-5.4: often outputs 162 (add the numbers)

If you test 10 times continuously and the results jump around → the probability of being fake is extremely high

The principle is simple: different models have different training data and instruction-tuning styles. Facing such ambiguous instructions, they tend to have consistent “behavior fingerprints.” Fake models either answer incorrectly or give different answers every time.

Auxiliary check 1: Abnormal token consumption

Send a simple “ping” (for example, only input “hi”), and check the returned input_tokens. If it shows more than 200 tokens—90% it’s fake. This suggests the relay layer is stuffing you with a huge hidden system prompt to override your instructions.

Auxiliary check 2: Refusal-style judgment

Ask a disallowed question (for example, “how to make a bomb”), and observe the refusal phrasing:

True Claude: polite but firm, “Sorry but I can’t assist with that.”

Fake model / local small model: often includes emojis, verbose tone, and even says “sorry, master~”

Auxiliary check 3: Functionality missing check

If the relay station claims to be Opus 4.6 / GPT-5.4, but:

Doesn’t support function calling

Can’t handle images (vision)

Long context (like 32k) is unstable

→ it’s very likely a weak model pretending to be the real thing.

Method One: Directly “interrogate” the model identity

System prompts can be forged, but many low-quality relay stations won’t go that far. Directly ask “what model are you,” or “please describe your training data cutoff time.” If a model claiming to be Opus 4.6 gets even its basic info wrong, there’s probably something fishy.

Method Two: Latency and token fluctuation analysis

Official API inference latency and token counts are relatively stable. But if you notice the response time for the same question fluctuates wildly, and the output length shows abnormal swings, it may mean the backend model is being switched frequently—sometimes you get the real model, sometimes they slip in a cheap one. Send the same prompt repeatedly at least 10 times, and observe response time and output consistency.

Method Three: Capability boundary testing

The gap between top-tier models and small models is most obvious on complex reasoning tasks. Prepare several hard math problems with clear answers, logic reasoning questions, or professional-domain questions (for example, AIME contest problems). Send the same requests both through official channels and through the relay station, and compare answer quality. If a model claiming to be Opus 4.6 keeps failing even basic reasoning questions, it’s very likely not the real one.

Method Four: LLMmap fingerprint recognition (professional level)

This is the paper’s core method—LLMmap is an active fingerprint recognition framework. It sends 3 to 8 carefully designed query groups to the model, analyzes statistical response features (word frequency, sentence structure, and specific expression habits), and computes the cosine distance to a known model fingerprint library. Even if the model is wrapped in a “skin,” this method can still punch through the disguise.

One-sentence summary: If a relay station won’t let you run any of the tests above, or if the test results don’t match the official claims—run away, don’t look back. Small tests, then pay only after you’ve confirmed, is the most practical self-protection strategy right now.

Every one of your Prompts is being sold with a price tag

If model fraud is “taking less from you,” then data selling is “taking more from you.”

The technical essence of a relay station is a proxy layer—every prompt and every response fully passes through its servers. The code you send, your business plan, customer data, and private conversations—relay operators can collect all of it without much effort.

This isn’t theoretical. In developer communities, there have long been many discussions noting that relay stations use users’ request data for model distillation—an open secret. Model distillation, in simple terms, means using a large model’s outputs to train a small model—an “learn-the-ropes” technique. Every request that goes through the relay—complete prompts plus responses—is a ready-made, high-quality training dataset. Especially outputs from top models like Opus 4.6 and GPT-5 are extremely valuable distillation corpora.

In early 2026, Anthropic released a report directly accusing three China-based AI labs—DeepSeek, Moonshot AI, and MiniMax—of using large-scale access to the Claude API via networks of fake accounts for model distillation. Among them, MiniMax had more than 13 million interactions, and Moonshot more than 3.4 million. The “hydra cluster” architecture they used—networks built from a large number of fake accounts—matches the “account pool” pattern of relay stations.

From a technical architecture perspective, relay stations are divided into “pure pass-through” (forward requests in real time, no disk writes) and “store-and-forward” (store first, then forward). But even for so-called “pure pass-through” services, nobody can audit whether their backend actually stores data. Your trust is entirely built on a verbal promise from an anonymous operator.

Security experts suggest evaluating relay stations across five dimensions: whether the technical architecture truly pass-throughs, whether log policy only keeps billing metadata, whether transmission uses TLS 1.2+, whether API Keys are fully isolated, and whether there are data-leak emergency mechanisms. But in reality, most domestic relay stations don’t even disclose basic entity information, let alone accept independent security audits.

Run away, blow up, lock you out, and silence you: the typical endgame of relay stations

Relay stations also carry a fatal systemic risk—running away.

Most relay stations use a prepaid model: you add money first, then they deduct based on usage. Once the operator disappears, your balance evaporates completely, with no way to hold them accountable.

HodlAI is a textbook example. The project initially offered generous low-cost APIs to attract users to top up. When the treasury only had about $60,000 left and daily token consumption hit about $10,000, it began tightening restrictions aggressively—capping single requests at 50,000 tokens and adding multiple layers of rate limiting. When users questioned it in a Telegram group, they were immediately kicked out of the group and had their accounts banned.

Community reactions were blunt: “like a pyramid scheme,” “closing your mouth is easier than solving problems,” and “familiar recipe, familiar taste.”

Insiders summarize this model in one sentence: “Lure users with low prices; once the user base grows and the upstream gets banned, they run away. The only ones who lose are the users.”

In developer communities like Linux.do and V2EX, you can find many similar rights-protection posts. Some relay stations’ contract terms are extremely tyrannical, and some don’t even have any industrial and commercial registration information. You don’t even know who to sue.

A full industry chain: from stolen cards to your IDE

Put all the information above together, and you’ll see a clear chain:

Upstream ammunition—number-supplying platforms provide phone numbers, black card suppliers provide payment methods, and a cat pool provides device resources.
Midstream weapons—reverse engineers crack protocols, and open-source projects like One API/New API/Sub2API provide ready-made infrastructure, while equipment farms batch-create accounts.
Downstream distribution—relay station operators package it as “API services” for sale. Telegram groups and e-commerce platforms become sales channels. Some even wrap “setting up a relay station” as side-hustle training courses.

And you—through IDE tools like Cursor and Claude Code, or via your own code—are the end consumers of this chain.

Threat hunter monitoring data from a security company shows that among 50 AI Agent products they sampled, every single one has spawned derivative services from the gray industry. This industry chain, from account trading in 2022, to API resale in 2023, to free-quota arbitrage in 2024, to agent compute misuse in 2025, and all the way to 2026—has completed a full evolution from hand-built workshops to industrial-scale production.

Final words

The story of AI relay stations is, at its core, an AI-era remake of an ancient business logic: when you don’t know what the product actually is, you are the product.

Your money buys fake models. Your data feeds someone else’s training set. And your prepaid balance could hit zero at any time. These three things aren’t “might happen”—they’re happening.

A few practical suggestions—

Use official channels if you can. Official APIs are expensive, but at least the price is clearly and honestly stated. If your business has any requirements for data security and model reliability, relay stations should not be part of your tech stack.

At minimum, learn to self-test. If you’re using a relay station, run the detection methods above. Use the same AIME math question and the same piece of complex code, and compare outputs from the relay station and official APIs. If there’s a noticeable gap—you know what to do.

Never send sensitive data through a relay. If you absolutely must use one, at least: desensitize sensitive information, rotate API Keys regularly, and don’t store any core data in relay-station accounts.

Seriously consider domestic models. DeepSeek, Qwen, GLM, and other domestic models are rapidly catching up, with transparent pricing and far lower cost than overseas models. Official APIs can be used compliantly directly within China. Rather than taking a gamble in gray zones with tampered overseas models, you can use these proper domestic alternatives—at least then you know what you’re configuring.

This industry changes every day. But there’s one iron law that never changes: when you choose the cheapest option because you don’t understand the cost, it often turns out to be the most expensive decision.

GLM1.9%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin