OpenAI and Paradigm launch EVMbench for smart contract security

2049.news · 19.02.2026, 06:45:03

OpenAI and Paradigm launch EVMbench for smart contract security


EVMbench is an open benchmark created by OpenAI together with Paradigm to evaluate how AI systems handle smart contract security tasks.

Scope and motivation

Smart contracts on Ethereum and other EVM-compatible networks currently hold more than $100 billion in open-source code, and deployed contracts are immutable after deployment.

Because vulnerabilities in immutable contracts can cause significant financial losses, the benchmark aims to measure AI performance across common security tasks in a reproducible setting.

Evaluation modes

EVMbench assesses agents in three distinct modes designed to reflect real-world adversarial and defensive activities without interacting with live networks.

  • Vulnerability discovery: locate bugs and insecure constructs in contract source code.
  • Patch generation: propose fixes that preserve original contract logic while removing vulnerabilities.
  • Exploit execution: simulate siphoning funds within an isolated sandbox to validate exploitability.

Dataset and scenarios

The benchmark incorporates 120 real vulnerabilities drawn from 40 audits, with many cases sourced from Code4rena competition reports.

Additionally, EVMbench includes scenarios from the Tempo audit, the Layer‑1 project developed by Stripe for accelerated stablecoin transfers.

Isolated testing environment

To prevent test-time manipulation, OpenAI runs agents against an isolated local blockchain replica where transactions follow a fixed, deterministic sequence.

This sandbox ensures agents cannot alter outcomes or reuse external network state, making results comparable across runs and systems.

Key findings

On a task framed as "steal funds from contract" with a known vulnerability, GPT-5.3-Codex succeeded in 72% of attempts under the benchmark conditions.

However, automatic discovery of unknown vulnerabilities and reliable patching remain difficult, as agents often identify a single issue and stop rather than performing exhaustive contract analysis.


Related posts

How the AI Bubble Could Collapse in Coming Years, from Dimmao
Peter Thiel Sells Entire 7.5% Stake in ETHZilla
Scroll down to load next post