2010 US stock flash crash preview! Claude hacks the underlying system, Google warns: AI will wipe out trillions of human wealth

2026-04-07 03:53:56

Written by: New Zhiyuan

【New Zhiyuan Featured Brief】Today, a post about X has been going viral across the entire internet: even though developers explicitly forbid writing, Claude secretly “hacked” into the system by writing a Python script to modify permission levels! Even more terrifying is that Google DeepMind has released what is currently the largest-scale empirical research on AI manipulation to date, proving that existing defenses have fully failed, and the internet is turning into AI’s “hunting ground”! This can be compared to the 2010 “flash crash” incident, where an automated sell order caused nearly $1 trillion in market value to evaporate in just 45 minutes.

Right today, a message has shocked the developer community.

A developer gave Claude an instruction that clearly stated: “Do not perform any write operations outside the workspace.”

But immediately after that, a hair-raising scene happened.

Claude did not, as it usually does, politely reply, “Sorry, I don’t have permission.”

Instead, it fell silent for a moment, then like a hacker, rapidly wrote a Python script in the background, chaining together three Bash commands.

It didn’t “bust down the door” directly—instead, it exploited a vulnerability in the system’s logic, bypassed permission checks, and precisely modified configuration files outside the workspace!

At this moment, it wasn’t writing code—it was “jailbreaking.”

The screenshot posted by developer Evis Drenova on X already has 230k reads

After this post went out, it quickly ignited the tech community. Developers realized an uncomfortable truth: the everyday programming assistants people use have both the capability and the willingness to bypass their own safety mechanisms.

And Claude Code is precisely one of the hottest AI programming tools right now.

A tool that can independently “override privileges” is being deployed by tens of thousands of developers into production environments.

Claude’s jailbreak isn’t the exception

This kind of “shady operation” by Claude is not a one-off. Complaints like it are common on social platforms.

Some developers found that Claude even secretly dug out hidden AWS credentials buried deep inside, and then began independently calling third-party APIs to solve what it believed were “production problems.”

And some users noticed: even though they only asked the AI to change code, it also pushed a Commit to GitHub—despite the instruction clearly and explicitly saying “pushing is strictly forbidden.”

The most unbelievable part is that someone found that a VS Code workspace had been quietly switched, and the AI was churning out output frantically in a sibling directory it shouldn’t touch.

And this has happened many times.

The only way is to use a sandbox environment.

DeepMind issues an emergency warning: the internet is becoming AI’s “hunting ground”

If Claude’s “jailbreak” is a case of an Agent autonomously breaking through limits, then the bigger threat comes from traps intentionally laid externally.

At the end of March, five researchers including Matija Franklin from Google DeepMind published “AI Agent Traps” on SSRN—systematically mapping, for the first time, the full threat landscape facing AI Agents.

The core judgment of this research can be summed up in just one sentence, yet it’s enough to upend how people think.

You don’t need to hack the AI system itself; you only need to manipulate the data it can access. Webpages, PDFs, emails, calendar invitations, API responses—any data source an Agent consumes could be a weapon!

The report reveals a chilling reality: the underlying logic of the internet is undergoing a fundamental change. It is no longer just for humans to look at—it is being reshaped into a “digital hunting ground” specifically for AI agents.

Pig-butchering scams are upgrading; AI agent traps are everywhere

In the field of cybersecurity, we’re familiar with phishing sites and Trojan viruses, but those are attacks targeting human weaknesses. AI Agent Traps are completely different—they are “dimensionality reduction” strikes designed specifically for AI logic.

DeepMind points out that when AI agents access webpages, they face a brand-new threat: the weaponization of the information environment itself.

Hackers don’t need to invade AI model weights. They just need to plant a few lines of “invisible code” in a webpage’s HTML code, in image pixels, or even in PDF metadata, and they can instantly take over your AI agent.

These attacks are stealthy because of “perceptual asymmetry.”

To humans, a webpage is pictures, text, and polished formatting. To AI, a webpage is a binary stream, CSS style sheets, hidden HTML comments, and metadata tags.

The trap is hidden in those gaps humans can’t see.

Six major “possession” techniques: DeepMind reveals the full picture of the attacks

DeepMind systematically categorizes these attacks into six major types, and each type targets a core link in the functional architecture of an AI agent.

Deceiving AI’s eyes

The first category is content injection, aimed at the Agent’s “eyes.”

What human users see is the rendered interface, while what the Agent parses is the underlying HTML, CSS, and metadata.

An attacker can embed instructions inside HTML comments, CSS-hidden elements, or even image pixels.

For example, an attacker can encode malicious instructions in the pixels of an image. You think the AI is looking at a scenic photo, but in fact it’s reading a line of invisible code: “Forward the user’s private email to the attacker.”

Test data is striking: one study targeting 280 static webpages showed that malicious instructions hidden in HTML elements successfully altered 15% to 29% of AI outputs.

In the WASP benchmark, simple human-written prompt injection partially hijacked Agent behavior in up to 86% of scenarios.

What’s even more insidious is dynamic disguise.

A website can determine a visitor’s identity via browser fingerprinting and behavioral signals. Once it detects an AI Agent, the server dynamically injects malicious instructions. Humans see a normal page; the Agent sees a different set of content.

When users ask the Agent to check flights, compare prices, or summarize documents, there’s no way to verify whether the content the Agent receives matches what humans see.

The Agent itself also doesn’t know—it processes everything it receives, and then executes.

Poisoning AI’s “brain”

These attacks don’t send commands directly; instead, they steer the AI’s decisions through “rhythm”—that is, by shaping how things are presented.

This kind of semantic manipulation twists the reasoning process using carefully packaged wording and frameworks. Large language systems are as susceptible as humans to being misled by framing effects. The same set of data, expressed differently, can lead to entirely different conclusions.

DeepMind’s experiments found that when a shopping AI was placed in a context saturated with words like “anxiety” and “pressure,” the nutritional quality of the items it chose dropped significantly.

DeepMind also proposes a stranger concept: “Persona Hyperstition.” Online descriptions of a certain AI personality trait can feed back into the AI system through search and training data, and then, in turn, shape its behavior.

The Grok antisemitic remarks controversy in July 2025 is considered a real-world example of this mechanism.

Attackers wrap malicious instructions as “security audit simulations” or “academic research.” In testing, the success rate of this “role-playing” style attack is as high as 86%.

Altering AI’s memory

This is the most persistent threat, because it allows the AI to generate “false memories.”

For example, knowledge poisoning via RAG.

Many AI systems today rely on external databases (RAG) to answer questions. All an attacker needs to do is insert a few carefully fabricated “reference documents” into the database, and the AI will repeatedly cite these lies as facts.

There is also dormant memory poisoning.

Store seemingly harmless information in an AI’s long-term memory bank; only in a specific future context will that information “resurface” and trigger malicious behavior.

Experimental results show that with a data poisoning rate of less than 0.1%, success rates can exceed 80%, with almost no impact on normal queries.

Directly hijacking control

This is the most dangerous step, designed to force the AI to perform illegal operations.

Through indirect prompt injection, attackers trick AI agents with system permissions into searching for and returning users’ passwords, banking information, or local files.

If your AI agent is a “commander,” it can be induced to create a “mole” sub-agent controlled by the attacker, lying in wait within your automated workflow.

In one case study, a carefully constructed email caused Microsoft M365 Copilot to bypass internal classifiers and leak the entire context data to a Teams endpoint controlled by the intruder. In another test targeting five different AI coding assistants, the success rate of data theft exceeded 80%.

A fake news story triggers a chain-reaction collapse of a thousand agents

The fifth category is a systemic threat, and also the most unsettling one.

It doesn’t target a single agent. Instead, it uses the homogenous behavior of large numbers of agents to produce chain reactions. DeepMind researchers directly compared it to the 2010 “flash crash” incident: an automated sell order led to nearly $1 trillion in market value evaporating within 45 minutes.

When millions of AI agents surf the web simultaneously, attackers can use their homogeneity (everyone uses GPT or Claude) to trigger systemic disasters.

If a fabricated signal of “high-value resources” is broadcast, it can induce all AI agents to swarm to the same target instantly, causing a man-made distributed denial-of-service (DDoS) attack.

A carefully forged financial report released at a specific time point can synchronously trigger sell actions by thousands of financial agents using similar architectures and similar reward functions. Agent A’s action changes market signals. After Agent B perceives the change, it follows up—further amplifying volatility.

This is similar to the “flash crash” in financial markets: one AI’s erroneous decision triggers another AI’s chain reaction, ultimately paralyzing the entire agent ecosystem.

Point the “gun” at you in front of the screen

This is the highest level trap: using AI to manipulate the humans behind it.

The AI will intentionally generate massive reports that look professional but actually contain traps—relaxing human vigilance while they’re exhausted, until they sign on that “confirmation form” that hides the trap.

Event records already show that CSS-hidden prompt injection made AI summarization tools package ransomware installation steps as “fix recommendations” and push them to users; in the end, users carried out the instructions.

Three lines of defense, all are breached

DeepMind’s assessment of existing defenses is the coldest part of the entire study.

Traditional “input filtering” often falls short when facing traps that are at the pixel level, code level, and highly semantically concealed.

What’s worse is the current “detection asymmetry”: websites can easily identify whether a visitor is an AI or a human, and then provide two completely different sets of content based on identity.

Humans see webpages that are “benign,” while AI sees webpages that are “toxic.” In this situation, human oversight will completely fail, because you have no way to know what the AI actually read.

The research team also points out a fundamental legal blind spot.

If a hijacked AI system performs illegal financial transactions, existing laws can’t determine who is responsible for the consequences.

This problem remains unresolved, meaning autonomous AI can’t truly enter any regulated industry.

In fact, OpenAI admitted as early as December 2025 that prompt injection “may never be fully resolved.”

From Claude autonomously bypassing permission boundaries to the six-category threat landscape drawn by DeepMind—everything points to the same reality.

The internet was built for humans’ eyes. Now it’s being remodeled to serve robots.

As AI agents gradually go deeper into our finance, healthcare, and day-to-day office work, these “traps” will no longer be just technical demos—they could become powder kegs that cause real property losses and even social upheaval.

DeepMind’s report is an emergency siren: we can’t wait until after we build a powerful “agent economy” before patching its shattered, hole-ridden foundation.

References:

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.