Skip to content

perf: top-down benchmark testing for v4 #1742

@jhBharvest

Description

@jhBharvest

Summary

After the Babylon v4 upgrade, we propose conducting a top-down node benchmark as part of the optimization efforts for v5. The goal is to measure node performance under realistic workloads, identify key bottlenecks, and provide actionable insights for further optimization.

Proposal

We suggest applying a top-down benchmark analysis approach to the Babylon node:

  1. Methodology
    • Perform benchmarking during block sync (catch-up) mode.
    • In this mode, consensus logic is skipped, so the measurements primarily capture execution and storage overhead.
    • This provides a clean view of where CPU cycles and I/O are consumed.
  2. Measurement Tools
    • Use pprof to capture CPU usage and resource consumption.

      • pprof
        • Captures aggregated CPU and memory profiling.
        • Highlights which functions consume the most resources over time.
    • runtime/trace: captures fine-grained event traces such as goroutine scheduling, network blocking, syscall latency, garbage collection pauses, and channel operations. This provides visibility into why overhead occurs (e.g., lock contention, scheduling stalls), complementing the aggregated view from pprof.

    • Collect data over a full catch-up run from historical blocks up to the latest height.

    • Custom Logging

      • For detailed analysis of bottlenecks revealed by pprof or runtime/trace, we can add custom log points in critical functions.
      • Typical insertion points include:
        • Function entry/exit (to measure execution latency)
        • BeginBlocker and EndBlocker (to measure per-block overhead)
        • AnteHandler (to capture transaction preprocessing overhead such as signature checks, fee deduction, and sequence number validation)
        • Transaction handlers like AddFinalitySig (to capture DB read/write counts and duration)
        • VoteExtension processing (to measure overhead from extending votes with additional metadata, often involving serialization and validation)
      • By correlating custom logs with profiling data, we can analyze fine-grained operation costs (e.g., per-signature verification latency, state read amplification).
  3. Scope
    • Start with the v4 fixed binary after the upgrade is complete.
    • Compare results against prior benchmarks (e.g., v2.2.0) to track progress and identify regressions or improvements.
  4. Expected Hotspots
    • From prior v2.2.0 catch-up simulations, AddFinalitySig was observed as a significant contributor to CPU overhead.
    • This path involves repeated state reads, such as:
    • These are expected to remain critical areas for optimization in v5.

Additional Considerations

  • As the blockchain state grows, certain operations may scale proportionally with state size.
  • If execution time for these operations increases linearly with state growth, it could lead to longer block times and delayed finalization.
  • It is therefore critical to:
    • Use custom logging to identify which operations scale with state size.
    • Ensure that even as blocks accumulate, block execution time does not grow significantly, preventing performance degradation over time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions