tests(benchmark): add missing scenario and optimization #1768

LouisTsai-Csie · 2025-11-07T15:04:18Z

🗒️ Description

Enhance the code generator for BenchmarkTest wrapper:

Add initial balance setting for JumpLoopGenerator, which already supports in the other code generator, this could simplify test_create.
Add safety checks for ExtCallGenerator, when calculating the max iterations, it should deduct the stack delta that comes from setup section, or it might lead to stack overflow issue.
Most of the contract will be filled with repeated pattern, this PR further fills the remaining space with STOP opcode. This could help simplify some cases like test_codecopy & test_return_revert.
For ExtCallGenerator, there are two different account: target contract and loop contract. The loop contract will repeatedly calls into target contract via STATICCALL. For some operations that involves CALLDATA interactions, we should forward the CALLDATA from loop contract to target contract via (1) forward the CALLDATA from loop contract and (2) Configure the memory from CALLDATA in target contract.

Test Case Enhancement:

test_ext_account_query_warm: Add initial balance / initial storage parametrization.
test_blockhash: Add extra cases for fixed block index and dynamic block index
test_pc_op: Add missing benchmark test
test_keccak: Add this case to parametrize memory size / offset for gas repricing effort.
test_tload: Optimize the benchmark by switching to ExtCallGenerator
test_block_full_of_ether_transfers: Add scenario that the receiver is not an empty account.

🔗 Related Issues or PRs

N/A.

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx tox -e static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

LouisTsai-Csie · 2025-11-10T10:20:49Z

tests/benchmark/compute/instruction/test_control_flow.py

    )


+def test_pc_op(


We miss the benchmark for PC opcode

LouisTsai-Csie · 2025-11-10T10:21:58Z

tests/benchmark/compute/instruction/test_keccak.py

+@pytest.mark.parametrize("mem_alloc", [b"", b"ff", b"ff" * 32])
+@pytest.mark.parametrize("offset", [0, 31, 1024])
+def test_keccak(


Instead of finding the optimal length, add a new case that parametrized initial memory layout and access offset. Request by gas repricing effort.

LouisTsai-Csie · 2025-11-10T10:24:57Z

tests/benchmark/compute/scenario/test_transaction_types.py

    ],
 )
+@pytest.mark.parametrize("balance", [0, 1])
 def test_block_full_of_ether_transfers(


Add a new scenario for ether transfer benchmark: whether the receiver is an empty account.

LouisTsai-Csie · 2025-11-10T10:27:31Z

tests/benchmark/compute/instruction/test_storage.py


    benchmark_test(
-        code_generator=JumpLoopGenerator(
+        code_generator=ExtCallGenerator(


By updating fro JumpLoopGenerator to ExtCallGenerator, we could get rid of the POP opcode each operation, enhancing the TLOAD density.

This might break the cases using CALLVALUE because we are now running this in the context of the subcall, and since we are calling using STATICCALL to call that subcontract, CALLVALUE is always going to be zero.
Perhaps replacing this with CALLDATASIZE, since we are passing along the data from the transaction?

LouisTsai-Csie · 2025-11-10T10:28:43Z

tests/benchmark/compute/instruction/test_block_context.py

    # Create 256 dummy blocks to fill the blockhash window.
    blocks = [Block()] * 256

+    block_number = Op.AND(Op.GAS, 0xFF) if index is None else index


Adding extra BLOCKHASH operation for benchmarking, as it was one of the slowest operations.
Cases:

Valid block index

Invalid block index

Dynamic block index

LouisTsai-Csie · 2025-11-10T10:30:11Z

tests/benchmark/compute/instruction/test_account_query.py

+    empty_account: bool,
+    initial_balance: bool,
+    initial_storage: bool,


Add new test scenario for gas repricing effort:

Accessing empty / non-empty account

Accessing account contains zero / non-zero balance

Accessing account contains zero / non-zero storage

marioevz

Thanks!

In general, I liked the refactoring and changes, I've just left comments on multiple files just to make everything more polished.

marioevz · 2025-11-10T17:23:54Z

packages/testing/src/execution_testing/specs/benchmark.py

        code = setup + Op.JUMPDEST + repeated_code * max_iterations + cleanup
        code += Op.JUMP(len(setup)) if len(setup) > 0 else Op.PUSH0 + Op.JUMP
+        # Pad the code to the maximum code size.
+        code += Op.STOP * (max_code_size - len(code))


We should make this optional via a new flag in the BenchmarkCodeGenerator class:

code_padding_opcode: Op | None = None

So only tests that need to pad the code to the max size do this.

marioevz · 2025-11-10T17:31:28Z

tests/benchmark/compute/instruction/test_account_query.py

-        sender=pre.fund_eoa(),
+    benchmark_test(
+        code_generator=JumpLoopGenerator(
+            setup=setup, attack_block=attack_block


Suggested change

setup=setup, attack_block=attack_block

setup=setup, attack_block=attack_block, code_padding_opcode=Op.STOP

Following previous suggestion.

marioevz · 2025-11-10T17:56:45Z

tests/benchmark/compute/instruction/test_account_query.py

 )
 @pytest.mark.parametrize(
-    "absent_target",
+    "empty_account",


The meaning of "empty_account" seems confusing IMO. From what I'm reading below, it seems like it's rather "empty_code"?

marioevz · 2025-11-10T18:01:25Z

tests/benchmark/compute/instruction/test_account_query.py

+        target_addr = pre.fund_eoa(
+            storage={0: 0x1337} if initial_storage else {0: 0}
+        )


This unconditionally sets the balance to a non-zero value, even when initial_balance==False.

Also, {0: 0}, given the current fund_eoa logic, is still going to try to touch the storage (arguably a bug but besides the point).

We should instead build a kwargs and then call pre.fund_eoa(**kwargs) to make this cleaner.

marioevz · 2025-11-10T18:06:25Z

tests/benchmark/compute/instruction/test_account_query.py

+    if not initial_balance and not initial_storage:
+        target_addr = pre.empty_account()
+    elif initial_balance or initial_storage:


I think we should refactor this logic a bit:

Suggested change

if not initial_balance and not initial_storage:

target_addr = pre.empty_account()

elif initial_balance or initial_storage:

if not initial_balance and not initial_storage and empty_code:

target_addr = pre.empty_account()

else:

And in the else branch we construct the kwargs to either call pre.fund_eoa or pre.deploy_contract.

marioevz · 2025-11-10T18:25:10Z

tests/benchmark/compute/instruction/test_memory.py

    benchmark_test(
        code_generator=ExtCallGenerator(
-            setup=Op.MLOAD(Op.SELFBALANCE) + Op.POP,
+            setup=Op.POP(Op.MLOAD(Op.SELFBALANCE)),


Nice, I'm guessing this helps with the stack push/pop elements calculations.

marioevz · 2025-11-10T18:46:37Z

tests/benchmark/compute/instruction/test_storage.py

-        attack_block = Op.POP(Op.TLOAD(Op.DUP1))
-        code_key_mut = Op.POP + Op.GAS
-        code_val_mut = Op.TSTORE(Op.DUP2, Op.GAS)
+        setup = Op.GAS + Op.TSTORE(Op.DUP2, Op.GAS)


While trying to read this code, I realize that key_mut -> fixed_key, val_mut -> fixed_value, and flipping the logic, would make it easier to read IMO.

marioevz · 2025-11-10T18:53:24Z

tests/benchmark/compute/instruction/test_storage.py


    benchmark_test(
-        code_generator=JumpLoopGenerator(
+        code_generator=ExtCallGenerator(


This might break the cases using CALLVALUE because we are now running this in the context of the subcall, and since we are calling using STATICCALL to call that subcontract, CALLVALUE is always going to be zero.
Perhaps replacing this with CALLDATASIZE, since we are passing along the data from the transaction?

marioevz · 2025-11-10T18:59:40Z

tests/benchmark/compute/instruction/test_system.py

    )
-    executable_code = mem_preparation + opcode(size=return_size)
-    code = executable_code
-    if return_non_zero_data:


This is lost because we are now by default returning code full of Op.STOP, which is all zeros. This can also be addressed by using the code_padding_opcode I mentioned in a previous comment (doing code_padding_opcode=Op.INVALID instead).

Nice catch, i do not notice STOP is 0x00. This is helpful.

marioevz · 2025-11-10T19:02:02Z

tests/benchmark/compute/scenario/test_transaction_types.py

        yield receiver


-@pytest.fixture


This could still work as a fixture if we parametrize balance=0 on the rest of the test cases.

LouisTsai-Csie · 2025-11-11T06:39:02Z

Sorry for adding complexity to this PR, I’ve added one more commit that optimizes the DUP and CALLDATASIZE operations.

test_dup: The setup stack delta is now handled in ExtCallGenerator, which simplifies the test logic.
test_calldatasize: This test is also simplified using ExtCallGenerator, as the generator now forwards calldata to the target contract.

feat: enhance benchmark code generator and wrapper

b5c2255

LouisTsai-Csie force-pushed the benchmark/add-missing-cases branch from f9fdb17 to 01e4d6b Compare November 10, 2025 10:02

LouisTsai-Csie self-assigned this Nov 10, 2025

LouisTsai-Csie added E-medium Experience: of moderate difficulty P-medium C-refactor Category: refactor A-test-benchmark Area: Tests Benchmarks—Performance measurement (eg. `tests/benchmark/*`, `p/t/s/e/benchmark/*`) labels Nov 10, 2025

refactor: optimize benchmark cases

1b44dea

LouisTsai-Csie force-pushed the benchmark/add-missing-cases branch from 01e4d6b to 1b44dea Compare November 10, 2025 10:05

LouisTsai-Csie commented Nov 10, 2025

View reviewed changes

refactor: add more scenario for warm account query

ac154c4

LouisTsai-Csie marked this pull request as ready for review November 10, 2025 10:38

marioevz self-requested a review November 10, 2025 17:03

marioevz reviewed Nov 10, 2025

View reviewed changes

LouisTsai-Csie added 2 commits November 11, 2025 14:31

refactor: apply suggested changes

ca5d64a

refactor: optimize dup and calldatasize benchmark

2b6823a

LouisTsai-Csie force-pushed the benchmark/add-missing-cases branch from c720a28 to 2b6823a Compare November 11, 2025 06:35

	setup=setup, attack_block=attack_block
	setup=setup, attack_block=attack_block, code_padding_opcode=Op.STOP

tests(benchmark): add missing scenario and optimization #1768

Are you sure you want to change the base?

tests(benchmark): add missing scenario and optimization #1768

Uh oh!

Conversation

LouisTsai-Csie commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LouisTsai-Csie commented Nov 7, 2025 •

edited

Loading