Formal Verification of EVM Bytecode in Dafny
The Ethereum network provides a decentralised execution environment powered by the Ethereum Virtual Machine (EVM). The EVM executes smart contracts, the programs that encode the business logic of decentralised applications (dApps) running on the network. Smart contracts on Ethereum can be written in high-level languages like Solidity or Vyper, but must be compiled to EVM bytecode to be executed by the EVM.
A single vulnerability in a smart contract can result in huge financial losses. Most security efforts focus on testing and auditing high-level source (Solidity or Vyper) code before deployment. However, because Ethereum executes EVM bytecode, not source code, verification at the bytecode level is essential for several reasons.
EVM-Specific Features
The EVM has execution-level behaviours that are not visible in high-level Solidity or Vyper code, such as:
- Stack overflows/underflows. The EVM is a stack machine with a fixed size stack. Exceeding the maximum stack size (1024) results in an exception reverting execution without effect on the state of the system. Stack underflows may indicate issues in the compiler or bytecode generation, where an operation is executed without the required number of operands on the stack. In contrast, stack overflows are typically the result of executing deeply nested or recursive contract code that pushes too many values without sufficient pops. Both cases must be detected and handled by the EVM to ensure safe and predictable execution.
- Gas consumption. EVM execution is metered in gas units, that track resource usage. A program must declare before execution its maximum gas budget. If a program runs out of gas, it triggers an out-of-gas exception, reverting execution without effect on the state of the system. Analyzing gas usage at the bytecode level helps prevent unexpected failures due to insufficient gas.
Compiler Bugs
Compilers may introduce errors during compilation, leading to incorrect or vulnerable bytecode. Recent examples include:
- 2023: A Vyper compiler reentrancy bug was exploited, resulting in $26M stolen The Vyper Compiler Saga: Unraveling the Reentrancy Bug that Shook DeFi.
- 2022: A Solidity memory optimization issue mistakenly removed operations affecting computation Overly Optimistic Optimizer – Certora Bug Disclosure.
- 2021: A Solidity dynamic array bug caused potential memory corruption Bug Disclosure – Solidity Code Generation Bug Can Cause Memory Corruption.
The bytecode consists of low-level instructions, each representing a specific operation, identified by an opcode. The EVM supports around 150 opcodes, covering a wide range of functionality, including arithmetic, memory and storage access, control flow.
Objective
At the beginning of 2022, the Trustworthy Smart Contracts team at ConsenSys began developing a formal and executable semantics of the Ethereum Virtual Machine (EVM) [1].
Our objective was to provide a rigorous framework for reasoning about EVM bytecode. Achieving this requires two key components: a formal semantics of the EVM and a verification engine capable of expressing and proving properties over that semantics. For this purpose, we chose Dafny, a verification-aware programming language. Our choice of Dafny was motivated by two main factors:
- Several team members already had prior experience with Dafny, and
- Dafny offers a more developer-friendly experience compared to traditional theorem provers; its syntax is readable and approachable for developers, making the semantics easier to understand, maintain, and evolve.
In this post, I’ll introduce the Dafny-based EVM semantics, and explain how it can be used to formally verify properties of EVM bytecode.
Related work
In 2022, the most popular reference specification of the EVM was the Yellow paper, the most popular EVM implementation go-ethereum (geth), and there was a formal specification KEVM using the K-framework. We used all these resources to understand and formally define the semantics of the EVM in Dafny. We also ran the official EVM test suites on the Dafny-EVM to validate that our semantics closely matches the behavior of real-world EVM implementations.
The Dafny-EVM is the first formal specification of the EVM that can be reasoned about and executed.
The Dafny-EVM
We have defined a formal and executable semantics, the Dafny-EVM, which is publicly available in the following repository evm-dafny. The main features of the Dafny-EVM are presented in this section.
EVM states
The EVM is a stack-based machine with volatile memory used during computation, and permanent storage, which persists across transactions.
Permanent storage defines the state of the EVM and is updated by executing transactions. Transactions are sequences of bytecode instructions – opcodes and their arguments – and the EVM is a virtual machine that interprets the bytecode, executing one instruction at a time. If the execution of the instruction succeeds from a given state, the EVM reaches an EXECUTING
state, and if an exception occurs (e.g. stack overflow) it reaches an ERROR
state and the EVM halts.
The execution of EVM bytecode relies not only on opcodes, which define the functional behaviour, but also on a metering mechanism to prevent very long and possibly non-terminating executions. This is enforced by limiting the resource usage (memory, operations) of each program. Resource usage is measured in gas units. Every computation has a non-zero (gas) cost. When a user submits a transaction, they also specify a maximum gas budget. If the execution of the transaction exceeds this budget, it is halted, and the state reverted to the state before executing the transaction, ensuring that the network remains responsive and can execute other transactions.
We have specified the state of the EVM as the datatype State
:
module EvmState {
datatype State =
EXECUTING(evm: T)
| ERROR(error: Error, gas: nat := 0, data: Array<u8> := [])
datatype T = EVM(
fork: Fork, // version of the EVM
context: Context.T, // sender of the transaction etc
precompiled: Precompiled.T, // native functions
world: WorldState.T, // global (permanent) storage
stack: EvmStack, // stack of max size 1024
memory: Memory.T, // volatile memory
transient: TransientStorage.T,// transient memory
code: Code.T, // bytecode to execute
substate: SubState.T, // accounts destructed etc
gas: nat, // available gas units
pc: nat // program counter
)
datatype Error =
REVERTS
| INSUFFICIENT_GAS
| INSUFFICIENT_FUNDS
| INVALID_OPCODE
| STACK_UNDERFLOW
| STACK_OVERFLOW
| MEMORY_OVERFLOW
| BALANCE_OVERFLOW
| RETURNDATA_OVERFLOW
| INVALID_JUMPDEST
| INVALID_PRECONDITION
| CODESIZE_EXCEEDED
| CALLDEPTH_EXCEEDED
| ACCOUNT_COLLISION
| WRITE_PROTECTION_VIOLATED
...
}
The complete definition of the state type along with some member functions is in the state.dfy file within a module EvmState
.
Opcodes semantics
We have written the EVM semantics as pure functions that map a state of the EVM to a new state. The advantage is that it is readable, language-agnostic (we use mathematical functions) and can be executed.
The opcodes semantics are defined in the Bytecode
module in bytecode.dfy.
For example the semantics of the ADD opcode is as follows:
module Bytecode {
...
/**
* Unsigned integer addition with modulo arithmetic.
* @param st A state.
* @returns The state after executing an `ADD` or an `Error` state.
*/
function Add(st: ExecutingState): (st': State)
ensures st'.EXECUTING? || st' == ERROR(STACK_UNDERFLOW)
ensures st'.EXECUTING? <==> st.Operands() >= 2
ensures st'.EXECUTING? ==> st'.Operands() == st.Operands() - 1
{
if st.Operands() >= 2 // Operands is the size of the stack
then
var lhs := st.Peek(0) as int; // top of stack
var rhs := st.Peek(1) as int; // second element
var res := (lhs + rhs) % TWO_256; // computes modulo 2^256
st.Pop(2).Push(res as u256).Next() // st -- Pop.Pop.Push.Next -> st'
else
ERROR(STACK_UNDERFLOW)
}
...
First note that we can only apply this function to an ExecutingState
and as Dafny is statically typed, this prevents us from
accidentally calling the Add
function with an error state.
Second the definition of the semantics of ADD
(lines 14-21) distinguishes two cases:
- success: if the stack has enough operands (
st.Operands >= 2
), the addition takes two arguments at the top of the stack, pops them, adds them and pushes the result on top the stack. It also advances the program counter to the next instruction (Next
). - exception: if
st.Operands < 2
, the instruction cannot be executed and we reach anERROR
state.
The Dafny specifications (lines 10-12) help to identify the main properties of the ADD
opcode:
- line 10: it results in either an executing state or an error state (runtime exception) of type
ERROR(STACK_UNDERFLOW)
. - line 11: there is an exception if and only if the stack has less than 2 elements.
- line 12: if there is no exception, then the stack size decreases by one.
Bytecode execution & Gas accounting
EVM bytecode comes as a sequence of bytes of type u8
, unsigned integers over 8 bits.
To interpret the bytecode we need to decode the sequence of bytes into opcodes (and possibly their arguments).
The EVM
module in evm.dfy provides some helpers to
execute the next instruction of the bytecode.
The execution of an opcode is provided by the ExecuteBytecode
function that matches a u8
(byte) opcode to its semantics and applies the semantics to the state:
/**
* Execute a given bytecode from the executing state. This assumes gas has
* already been deducted. Again, this may or may not result in an executing
* state, depending on whether the necessary conditions for execution were
* met. For example, executing an instruction (e.g. ADD) which requires
* operands on the stack when there are no operands on the stack will result
* in an error state.
*/
function ExecuteBytecode(op: u8, st: ExecutingState): State {
match op
case STOP => Bytecode.Stop(st)
case ADD => Bytecode.Add(st)
case MUL => Bytecode.Mul(st)
case SUB => Bytecode.Sub(st)
case DIV => Bytecode.Div(st)
case SDIV => Bytecode.SDiv(st)
case MOD => Bytecode.Mod(st)
...
// 0xf0
case CREATE => Bytecode.Create(st)
case CALL => Bytecode.Call(st)
case CALLCODE => Bytecode.CallCode(st)
case RETURN => Bytecode.Return(st)
case DELEGATECALL => Bytecode.DelegateCall(st)
case CREATE2 => Bytecode.Create2(st)
case STATICCALL => Bytecode.StaticCall(st)
case REVERT => Bytecode.Revert(st)
case SELFDESTRUCT => Bytecode.SelfDestruct(st)
case _ => ERROR(INVALID_OPCODE)
}
The bytecode to execute is part of the (EXECUTING
) state of the EVM in the code
section.
Given a state st: ExecutingState
, st.evm.code
contains the sequence of bytes that correspond to the bytecode, and st.evm.pc
the current value of the program counter. The Execute
function decodes the next instruction and computes the next state of the EVM:
/**
* Execute the next bytecode as determined by the current machine's state.
* This requires decoding the bytecode at the current PC location.
*/
function Execute(st: ExecutingState): State {
// Read opcode byte from code section. If the read is out-of-bounds, then
// STOP is returned by default.
var opcode := Code.DecodeUint8(st.evm.code, st.evm.pc as nat);
// Check fork supports given bytecode
if st.evm.fork.IsBytecode(opcode)
then
// Deduct gas for the given bytecode.
match DeductGas(opcode, st)
// Not out of gas
case EXECUTING(vm) => ExecuteBytecode(opcode, EXECUTING(vm))
// Out of gas (or invalid opcode)
case s => s
else
ERROR(INVALID_OPCODE)
}
This function performs the following checks:
- the opcode is valid for the given fork (line 10). EVM opcodes may be added or removed and the fork (A fork is a point in time where changes can be applied to the Ethereum infrastructure, including the EVM.) identifies the version that should be used to interpret the opcodes.
- there is enough gas to execute the next instruction (line 13). The
DeductGas
function returns anERROR
state if there is not enough gas to execute the opcode. If there is enough gas the cost of executing the opcode is deducted from the current gas budget and the corresponding new state is returned.
The Dafny-EVM separates the functional semantics of each opcode from the gas accounting, allowing for a clean, modular structure. This separation improves clarity, simplifies reasoning about correctness, and makes it easier to test or verify either aspect independently.
Formal verification with the Dafny-EVM
In this section we show how to use the Dafny-EVM to formally prove some properties of bytecode.
Overflow checker
The EVM computes modulo 2^256 and as a result it does not abort on arithmetic overflows (this is in contrast to other VMs for smart contracts like the MoveVM).
To detect arithmetic overflows the compiler (Solidity or Vyper) has to generate instrumented bytecode. For example detecting an overflow in an addition (opcode ADD
) is done using the following bytecode snippet (assuming the two operands are top of the stack):
-------------------------
Address|Instruction|Stack
-------------------------
0x00: DUP2 [a, b, a, ...]
0x01: ADD [a + b, a, ...]
0x02: LT [a + b < a, ...]
0x03: PUSH1 0x07 [0x07, a + b < a, ...]
0x05: JUMPI [if a + b < a then goto 0x07 else goto next 0x06]
0x06: STOP
0x07: JUMPDEST // Label of valid goto target. Overflow
0x08: PUSH1 0x00 // prepare error code and data to return
0x10: PUSH1 0x00
0x12: REVERT // Overflow: revert
The bytecode above uses the necessary and sufficient condition specified by lemma AddOverflowNSC
:
/** Necessary and sufficient condition for detecting overflow
* in ADD.
*/
lemma AddOverflowNSC(x: u256, y: u256)
ensures x as nat + y as nat > MAX_U256
<==> (x as nat + y as nat) % TWO_256 < x as nat
{
// Thanks Dafny
}
For two 256-bit unsigned integers x, y: u256
, to detect whether x + y
overflows we just need to check whether the result of the addition modulo 2^256 is less than x
(or y
).
We can formally prove that the instrumented code correctly detects overflows, and also that the lemma AddOverflowNSC
is correct (thanks Dafny!).
First we define the bytecode as a sequence of opcodes for readability (the comments specify the content of the stack):
/** Code snippet that detects overflow in addition. */
const OVERFLOW_CHECK := Code.Create(
[
// PC Ins stack [x, y, ...] top --> bottom
// 0x00
DUP2, // [y, x, y, ...]
// 0x01
ADD, // [y + x, y, ...]
// 0x02
LT, // [x + y < y?1:0, ...]
// 0x03
PUSH1, 0x07, // [0x07, x < x + y?1:0, ...]
// 0x05
JUMPI, // [...]
// If stack(1) is 0 no overflow, STOP, otherwise JUMP to 0x07 and revert.
// 0x06
STOP, // Normal end of execution
// 0x07 0x08 0x09 0x10 0x11,
JUMPDEST, PUSH1, 0x00, PUSH1, 0x00,
// 0x12
REVERT // Revert (exception)
]
)
Next we create a Dafny program that starts in an arbitrary EXECUTING
state and executes the code OVERFLOW_CHECK
:
/**
* This is a pattern that is used to detect overflows for ADD.
*
* @param st A state.
* @param x A u256.
* @param y A u256.
* @returns A normal state with `x + y` top of stack if no overflow, a
* revert state otherwise..
* @note The check relies on the property specified by lemma AddOverflowNSC.
* @note The overflow is specified as x + y exceeding MAX_U256.
*/
method OverflowCheck(st: ExecutingState, x: u256, y: u256) returns (st': State)
/** EXECUTING state and initial PC. */
requires /* Pre0 */ st.PC() == 0 && st.Fork() == EvmFork.BERLIN
/** Enough gas. Longest path gas-wise is via JUMPI. */
requires /* Pre1 */ st.Gas() >= 6 * Gas.G_VERYLOW + Gas.G_HIGH + Gas.G_JUMPDEST
/** Initial stack is [x, y]. */
requires /* Pre2 */ st.GetStack() == Stack.Make([x, y])
/** The code is the snippet to detect overflow. */
requires st.evm.code == OVERFLOW_CHECK
/** The contract never runs out of gas thanks to Pre1. */
ensures (st'.ERROR? && st'.error == REVERTS) || st'.RETURNS?
/** Should revert iff overflow. */
ensures (st'.ERROR? && st'.error == REVERTS) <==> x as nat + y as nat > MAX_U256
/** Normal termination iff no overflow. */
ensures st'.RETURNS? <==> x as nat + y as nat <= MAX_U256
{
// Execute 4 steps from st -- DUP2 ADD LT PUSH1 0x07
st' := ExecuteN(st, 4);
assert st'.PC() == 0x05;
// Depending on result of LT comparison overflow or not
if st'.Peek(1) == 0 {
st':= Execute(st');
assert st'.PC() == 0x06;
st' := Execute(st');
assert st'.RETURNS?;
} else {
st':= Execute(st');
assert st'.PC() == 0x07;
st' := ExecuteN(st', 4);
assert st'.ERROR? && st'.error == REVERTS;
}
}
The method OverflowCheck
has a few pre-conditions (requires
lines 30–37) to initialise the EVM in a state has enough gas and operands to execute the ADD
.
Dafny can prove the correctness of the method OverflowCheck
which provides several guarantees.
First, the body of the method OverflowCheck
executes the bytecode as follows:
- executes 4 steps from the initial state
st
(line 46).ExecuteN
appliesExecute
(defined in the previous section) four times. Note that becauseExecute
takes anEXECUTING
state as a parameter, this implies that the first four execution steps do not result in an exception. - after the four steps, we reach program counter
0x05
(verified at line 47). The instruction at0x05
is a conditional jumpJUMPI
the semantics of which is: if the second element of the stack is non-zero, we jump (goto) to the address at the top of the stack; otherwise we continue to the instruction at0x06
. - from
0x05
, there are two execution paths:- the
if-else
statement at line 49/54 verifies that if the second element of the stackst'.Peek(1)
is non-zero, we jump to the instruction at0x07
and otherwise we jump to0x06
. - depending on the path taken we execute one instruction (line 52) or 4 instructions (lines 57) to complete the execution of the bytecode. The assertions at lines 53 and 58 provide some guarantees about the type of the final state.
- the
Second, the post-conditions (ensures
lines 38–43) specifies the correctness of the overflow checker:
- the execution either succeeds with a
RETURNS
(normal state) or an exception of typeREVERTS
(error state). This rules out the out-of-gas-exception and this is why we provide enough gas (line 33) to execute the code. - the execution reverts if and only if there is an overflow.
This simple example demonstrates how the Dafny-EVM can be used to check real-life code albeit short, but code that is used repeatedly to detect exceptions and is critical to the correctness of the contracts. A more detailed presentation of this example is available in this Dafny-EVM github section.
Optimisations
Another usage of the Dafny-EVM is to formally prove equivalence between code snippets.
Proposition 12 in this Thesis proposes to replace any sequence of the form SWAPN POP^(N+1)
by POP^(N+1)
(we use regular-language notation so a^k
stands for a
repeated k
times).
The EVM supports the SWAPN
instruction that swaps element N+1
in the stack with the top of the stack for 1 <= N <= 16
.
We can formally verify that this optimisation is correct and that the cost of the optimised version is strictly less than the non-optimised one.
To do so we create (lines 24 and 40) two virtual machines that start in the same state vm
(line 21).
We then execute the two sequences, optimised (lines 25–33) and non-optimised (lines 41–52) separately and compare the results (lines 53–54) and the gas cost (line 56).
As Dafny can prove that this method Proposition12b
is correct, we get a formal proof that the optimised version leads to the same state and is less expensive gas-wise.
Note that the proof is valid for any N
between 1 <= N <= 16
which covers the range of the SWAPN
instructions available in the EVM.
/**
*
* Proposition 12:
* n + 1 consecutive Pop <_gas SwapN . n + 1 consecutive Pop
* Gas cost for POP is G_BASE and for SWAP G_VERYLOW
*
* @param n As described above.
* @param s An initial stack content.
* @param g An initial value of gas.
*
* @note General case.
*
*/
method Proposition12b(n: nat, s: seq<u256>, g: nat)
requires 1 <= n <= 16
/** Stack must have at least n + 1 elements. */
requires n + 1 <= |s| <= 1024
/** Minimum gas needed. */
requires g >= (n + 1) * Gas.G_BASE + Gas.G_VERYLOW
{
var vm := EVM.Init(gas := g, stk := s, code := []);
// Execute n + 1 POPs in vm1.
var vm1 := vm;
for i := 0 to n + 1
invariant vm1.EXECUTING?
invariant vm1.Gas() == g - (i * Gas.G_BASE)
invariant vm1.Operands() >= n - i
invariant vm1.GetStack() == vm.SlicePeek(i, |s|)
{
vm1 := Execute(POP, vm1);
assert vm1.EXECUTING?;
}
assert vm1.Gas() >= Gas.G_VERYLOW;
assert vm1.Gas() == g - ((n+1) * Gas.G_BASE);
// Stack after n + 1 POPs is suffix of initial stack starting at index n + 1
assert vm1.GetStack() == vm.SlicePeek(n + 1, |s|);
// Execute SWAPn and then n + 1 POPs in vm2.
var vm2 := vm;
vm2 := Swap(vm2, n).UseGas(Gas.G_VERYLOW);
for i := 0 to n + 1
invariant vm2.EXECUTING?
invariant vm2.Gas() == g - (i * Gas.G_BASE) - Gas.G_VERYLOW
invariant vm2.Operands() >= n + 1 - i
invariant vm2.Operands() == vm.Operands() - i == |s| - i
invariant vm2.SlicePeek(n + 1 - i, |s| - i) == vm.SlicePeek(n + 1, |s|)
{
vm2 := Execute(POP, vm2);
assert vm2.EXECUTING?;
}
assert vm2.SlicePeek(0, |s| - n - 1) == vm.SlicePeek(n + 1, |s|);
assert vm1.GetStack() == vm2.GetStack();
assert vm2.Gas() == vm1.Gas() - Gas.G_VERYLOW;
assert vm2.Gas() < vm1.Gas();
}
A more detailed presentation of this example is available in this Dafny-EVM github section.
Verification of arbitrary EVM bytecode
The examples we have given so far are relatively small and we can instrument or reason about the bytecode by manually annotating it with specifications.
But what about real-world smart contracts with thousands of instructions? To be able to formally reason about arbitrary EVM bytecode, we need to build an artefact that encodes the semantics of the bytecode and can be reasoned about. A step into this direction is proposed in [2]: build proof objects that are representations of EVM bytecode using the Dafny-EVM semantics. In a nutshell, we generate instrumented EVM bytecode automatically by disassembling and analysing (resolving dynamic JUMPs) the bytecode. The technique proposed in [2] and implemented in Dafny in this github repository can disassemble real-world contracts (benchmarks of ~3000 lines of codes) and perform some verification. The automatic verification is for now limited to checking the absence of stack underflows/overflows but can certainly be extended to functional properties.
The proof objects can be further manually annotated with functional specifications and reasoned about like any Dafny program. This is certainly doable for relatively small pieces of code but there are still several hurdles to reason about EVM bytecode:
- the bytecode does not (usually) contain information about the high-level source code. For example, an addition in Solidity source code like
a + b
is compiled into and EVM bytecode usingPUSHes, POPs,
andADD
manipulating the stack, but there is no easy way in the bytecode to figure out that saystack[1]
isb
andstack[0]
isa
. This makes it hard to transfer high-level specification (in Solidity or Vyper) to EVM bytecode. - memory locations of elements of data structures like
maps
in high-level code (Solidity or Vyper) are computed in the bytecode using hashes. This makes it very hard to reason about any program that uses memory as the addresses of reads and writes are hard to evaluate. - the EVM has some instructions to dynamically call another contract. In many instances, the called contracts are not known at compile-time, and it is almost impossible to include them in the verification model. To a certain extent, it is possible to reason about external calls in an adversarial environment as we have demonstrated in [3] and [4] using Dafny, but this is done at the source level (Dafny).
References
[1] Franck Cassez, Joanne Fuller, Milad K. Ghale, David J. Pearce, and Horacio M. A. Quiles. 2023. Formal and Executable Semantics of the Ethereum Virtual Machine in Dafny. In Formal Methods: 25th International Symposium, FM 2023, Lübeck, Germany, March 6–10, 2023, Proceedings. Springer-Verlag, Berlin, Heidelberg, 571–583. https://doi.org/10.1007/978-3-031-27481-7_32
[2] Franck Cassez. ByteSpector: A Verifying disassembler for EVM bytecode. OASIcs, Volume 129. Forthcoming, Proceedings of Formal Methods for Blockchains (FMBC’25). https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.FMBC.2025.4
[3] Franck Cassez, Joanne Fuller, and Horacio M. A. Quiles. 2022. Deductive Verification of Smart Contracts with Dafny. In: Groote, J.F., Huisman, M. (eds) Formal Methods for Industrial Critical Systems. FMICS 2022. Lecture Notes in Computer Science, vol 13487. Springer, Cham. https://doi.org/10.1007/978-3-031-15008-1_5
[4] Franck Cassez, Joanne Fuller, and Horacio M. A. Quiles. 2024. Deductive verification of smart contracts with Dafny. Int. Journal Software Tools Technology Transfer 26, 131–145. https://doi.org/10.1007/s10009-024-00738-1