This repository contains three C benchmarks for measuring spinlock contention and pause instruction behavior on multi-core systems. Each benchmark is designed to highlight different aspects of cache contention and thread synchronization.
These benchmarks are designed and tested for x86/x86_64 processors only.
Requirements:
- CPU: x86 or x86_64 architecture
- OS: Linux or similar Unix-like system
- Compiler: GCC with support for inline assembly and atomic builtins
- Features: Uses x86-specific PAUSE instruction and RDTSC for cycle-accurate timing
Note: The code uses x86-specific inline assembly (pause, rdtsc) and has not been tested on other architectures. For cross-platform compatibility, consider this a reference implementation for x86 systems.
-
pause_bench.c
A single-threaded microbenchmark that measures the latency of thepauseinstruction in a tight loop. Useful for establishing a baseline for pause latency on a given system. -
lock_bench_threaded.c
A multi-threaded benchmark where all threads contend on a single global spinlock and shared counter. This file demonstrates the effects of lock contention and cache line bouncing as thread count increases. -
lock_bench_padded.c
Similar tolock_bench_threaded.c, but the spinlock variable is padded and aligned to a cache line to reduce false sharing with adjacent variables. This helps isolate the effect of lock contention from false sharing.
Run the following script to compile all benchmarks:
./build.shRequirements:
- GCC compiler with support for inline assembly
- x86/x86_64 target architecture
- POSIX threads (pthread) support
Each executable will be named after its source file:
pause_benchlock_bench_threadedlock_bench_padded
Each benchmark can be run directly from the command line. Most accept optional arguments for thread count and work units. For example:
./lock_bench_threaded 16 10000000
```plaintext
Refer to the source code comments for details on usage and parameters.