The zerobuffer library includes comprehensive benchmarks using Google Benchmark to measure:
- Single Frame Latency: Round-trip time for one frame
- Pipelined Latency: Latency with multiple frames in flight
- Wakeup Latency: Time to wake blocked reader
- Maximum Throughput: Frames/second for various sizes
- Burst Throughput: How fast can we fill the buffer
- Memory Pressure: Performance with small buffers
- Wrap-Around Overhead: Cost of circular buffer wrap
- Metadata Operations: Metadata read/write performance
- Random Access: Real-world variable frame sizes
- Setup/Teardown: Connection establishment cost
- High Sequence Numbers: Performance at scale
# Run all benchmarks
./run_benchmarks.sh
# Run individual benchmark
./build/benchmarks/benchmark_latency
# Run with detailed output
./build/benchmarks/benchmark_latency --benchmark_repetitions=10
# Run specific benchmark
./build/benchmarks/benchmark_latency --benchmark_filter=BM_SingleFrameLatency
# Output formats
./build/benchmarks/benchmark_latency --benchmark_format=json > results.json
./build/benchmarks/benchmark_latency --benchmark_format=csv > results.csv- Best Case: ~1-5 microseconds (both processes ready)
- Typical: ~10-50 microseconds (including context switch)
- Wakeup: ~50-200 microseconds (blocked to active)
- Small Frames (64B-1KB): Limited by syscall overhead
- Medium Frames (4KB-64KB): Optimal performance
- Large Frames (1MB+): Limited by memory bandwidth
-
Frame Size
- Smaller frames = higher overhead percentage
- Larger frames = better throughput, worse latency
-
Buffer Size
- Larger buffer = better burst handling
- Smaller buffer = more blocking, lower memory usage
-
CPU Affinity
- Same core = best latency via L1/L2 cache
- Different cores = higher latency, better throughput
-
System Load
- Context switches add 1-10 microseconds
- Memory pressure affects large frames more
// Small buffer, small frames, same CPU core
BufferConfig config(0, 64 * 1024); // 64KB buffer
// Use CPU affinity to pin reader/writer to same core// Large buffer, medium frames, different CPU cores
BufferConfig config(0, 100 * 1024 * 1024); // 100MB buffer
// Use 4KB-64KB frames for optimal performance// Buffer size = 2-3x typical burst size
BufferConfig config(0, max_burst_size * 3);
// Tune based on actual usage patternsBM_SingleFrameLatency/1024 2.35 us [64B frame latency]
BM_Throughput/1024/1048576 1.23 ms [1KB frames, 1MB buffer]
BM_WrapAroundOverhead/4096 0.15 us [4KB wrap overhead]
- Latency Spikes: Indicate blocking or system issues
- Throughput Plateaus: Show saturation points
- High Variance: Suggests system interference
// Reader statistics
uint64_t frames = reader.frames_read();
uint64_t bytes = reader.bytes_read();
// Writer statistics
uint64_t frames = writer.frames_written();
uint64_t bytes = writer.bytes_written();# Monitor shared memory usage
ls -la /dev/shm/
# Monitor semaphore usage
ipcs -s
# Monitor CPU usage
htop -p $(pgrep your_app)-
High Latency
- Check CPU scheduling/affinity
- Verify no memory swapping
- Look for lock contention
-
Low Throughput
- Increase buffer size
- Use larger frames
- Check for reader bottlenecks
-
Inconsistent Performance
- Disable CPU frequency scaling
- Use real-time scheduling
- Isolate CPU cores
# Increase shared memory limits
echo 1073741824 > /proc/sys/kernel/shmmax
# Use huge pages for large buffers
echo 1024 > /proc/sys/vm/nr_hugepages
# Set CPU performance governor
cpupower frequency-set -g performance// Use real-time scheduling
struct sched_param param;
param.sched_priority = 90;
sched_setscheduler(0, SCHED_FIFO, ¶m);
// Lock memory to prevent paging
mlockall(MCL_CURRENT | MCL_FUTURE);