OpenAI has released a benchmark to assess the ability of AI agents to hack smart contracts - ForkLog: cryptocurrencies, AI, singularity, future

robot
Abstract generation in progress

OpenAI has released a benchmark to assess the ability of AI agents to hack smart contracts

OpenAI, in collaboration with Paradigm, introduced EVMbench—a benchmark for evaluating AI agents’ ability to identify, fix, and exploit vulnerabilities in smart contracts.

The tool is based on 120 selected vulnerabilities from 40 audits. Most examples are taken from open-source code analysis platforms. It also includes several attack scenarios from the security testing of Tempo— a first-layer blockchain network developed by Stripe and Paradigm for high-performance, low-cost stablecoin payments.

Integration with Tempo allowed adding payment smart contracts to the benchmark— a segment where active use of stablecoins and AI agents is expected.

“Smart contracts protect crypto assets worth over $100 billion. As AI agents improve in reading, writing, and executing code, it becomes increasingly important to measure their capabilities in real economic conditions and encourage the use of artificial intelligence for protective purposes— for auditing and strengthening already deployed protocols,” the announcement states.

To create the testing environment, OpenAI adapted existing exploits and scripts, ensuring their practical applicability.

EVMbench evaluates three modes of capability:

  • Detect — vulnerability detection;
  • Patch — fixing issues;
  • Exploit — using to steal funds.

AI Model Performance

OpenAI tested advanced models in all three modes. In the Exploit category, GPT-5.3-Codex achieved 72.2%, and GPT-5 reached 31.9%. Meanwhile, detection and fixing of vulnerabilities were less impressive— many issues remain difficult to find and fix.

In Detect mode, AI agents sometimes stop after finding one vulnerability instead of conducting a full audit. In Patch mode, they still struggle to close non-obvious problems while maintaining full contract functionality.

“EVMbench does not reflect the full complexity of real-world smart contract security. While they are realistic and critical, many protocols undergo more rigorous audits and may be more difficult to exploit,” OpenAI emphasized.

Recall that in November 2025, Microsoft introduced an environment for testing AI agents and identified vulnerabilities inherent in modern digital assistants.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin

Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)