AI-generated smart contracts: Can LLMs write secure code?

Dmitry Volkov · 01.01.2026, 15:52:00

AI-generated smart contracts: Can LLMs write secure code?


Author: Dmitry Volkov | Smart Contract Security Auditor | Partner at CertiK

Last week, a developer asked me to audit a DeFi protocol. The core contracts were written by Claude. Not with Claude's help — by Claude, with minimal human review. The developer was proud of shipping in two weeks instead of two months.

I found 7 critical vulnerabilities. One would have allowed draining the entire protocol. The developer had asked the AI to write secure code. The AI confidently produced code that looked secure but wasn't.

This is becoming my daily reality. AI-generated smart contracts are everywhere. And we're not ready for the consequences.

The appeal is obvious

Let me be fair to the developers. The productivity gains are real.

Writing smart contracts is tedious. Boilerplate code, standard patterns, repetitive structures. An LLM can generate a basic ERC-20 token in seconds. A lending protocol skeleton in minutes. What took weeks of coding now takes hours of prompting.

The code often looks good. Modern LLMs have ingested millions of Solidity examples. They know the patterns, the conventions, the best practices. The generated code compiles, passes basic tests, and follows readable style.

For prototyping and learning, this is genuinely valuable. Explore ideas fast. Understand patterns. Iterate quickly. AI coding assistants are legitimate tools.

The problem starts when prototypes become production.

What AI gets wrong

After auditing dozens of AI-generated contracts, I've catalogued the failure patterns.

Reentrancy remains a killer. LLMs know about reentrancy — they'll even add comments warning about it. But they frequently place state updates after external calls anyway. The code looks like it follows checks-effects-interactions pattern. It doesn't actually follow it.

Access control is inconsistent. AI might correctly protect one admin function and forget another. Or implement modifiers that look right but have subtle logic errors. The randomness is what makes it dangerous — you can't predict which functions are vulnerable.

Integer handling is treacherous. Pre-Solidity 0.8 overflow issues, precision loss in calculations, rounding errors that accumulate — LLMs make these mistakes confidently. They've seen both correct and incorrect patterns in training data; they can't reliably distinguish which is which.

Business logic errors are the worst. The AI doesn't understand what your protocol is supposed to do. It generates code that compiles and looks reasonable but implements something subtly different from your intent. These bugs are invisible without deep understanding of the requirements.

The confidence problem

Here's what scares me most: AI-generated code is confident code.

When a junior developer writes insecure code, it often looks insecure. Messy, uncertain, obviously needs review. You know to check it carefully.

AI-generated code looks professional. Clean formatting, proper comments, standard patterns. It projects competence. Reviewers let their guard down. "This looks like it was written by an experienced developer."

But the AI has no actual understanding of security. It's pattern-matching, not reasoning. It doesn't know why certain patterns are dangerous. It can't think through attack vectors. It produces code that superficially resembles secure code without the underlying security.

The result: vulnerabilities hidden behind a veneer of professionalism.

Real examples from my audits

Let me share some sanitized examples from recent work.

A yield aggregator asked Claude to implement flash loan protection. The AI added a modifier that checked if the transaction originated from a contract. Looks reasonable — flash loans come from contracts. But it blocked legitimate smart wallet users and did nothing against flash loans that route through EOAs. The protection was useless but the developer thought the problem was solved.

A token contract needed a pause mechanism. GPT-4 implemented it perfectly for transfers. But the burn function wasn't protected. During the audit, I demonstrated that an attacker could burn tokens even when paused, manipulating supply during emergencies.

A governance contract required time-locked execution. The AI implemented the timelock correctly, then added a "fast-track for emergencies" function — with no access control. Anyone could bypass the timelock by calling it an emergency.

In each case, the developer had explicitly asked for security. The AI had explicitly acknowledged the requirements. The code still failed.

The audit problem

Professional audits are supposed to catch these issues. But AI-generated code is changing the audit landscape too.

Volume is exploding. More developers shipping more code faster. Audit firms are backlogged. The pressure to approve quickly increases.

AI-generated code is harder to audit. It lacks the human reasoning trail. Why was this design chosen? What edge cases were considered? With human-written code, you can often infer intent from structure. AI code is a black box.

Some auditors are using AI to audit AI code. This is terrifying. If the AI couldn't write secure code, why would it reliably identify insecure code? The same pattern-matching limitations apply.

Our firm has had to extend timelines for AI-generated contracts. We assume nothing. Verify everything. It takes longer, not shorter, than auditing human code.

What actually helps

I'm not saying never use AI for smart contracts. I'm saying use it correctly.

Use AI for scaffolding, not security-critical logic. Generate the boilerplate. Write the core logic yourself. Let AI handle the boring parts while humans handle the dangerous parts.

Review AI code like you'd review untrusted code — because it is untrusted code. Don't let professional appearance lower your standards. Assume every function has bugs until you prove otherwise.

Formal verification becomes more important, not less. Mathematical proofs of correctness don't care whether the code was human or AI-generated. Invest in verification tools and techniques.

Test adversarially. Don't just test happy paths. Actively try to break the code. Assume attackers will find whatever you miss. Fuzz testing, invariant testing, simulation of attack scenarios.

Professional audits remain essential. Not AI audits — human audits by experienced security researchers who understand both the technology and the threat landscape. Budget for this. Don't skip it because "the AI wrote secure code."

Where this is heading

AI coding will improve. Models will get better at security patterns. Fine-tuning on audited codebases will help. Better prompting techniques will emerge.

But the fundamental problem remains: AI generates code without understanding consequences. Until we have AI that can reason about security — not just pattern-match to security-looking code — human oversight stays essential.

My prediction for 2025: at least one major exploit — $50 million+ — will be traced directly to AI-generated code that passed human review. The incident will force the industry to take this seriously.

I hope I'm wrong. But I keep finding critical vulnerabilities in AI-generated contracts, and those contracts keep going to production. The math is not in our favor.

Dmitry Volkov leads smart contract security audits at CertiK. He has identified vulnerabilities in protocols with combined TVL exceeding $2 billion and previously worked on compiler security at Google.

#Crypto


Related posts

On-chain AI: When smart contracts meet machine learning
Solana vs Ethereum: Battle for developers
Scroll down to load next post