Add runtime ARM SVE2 detection support #4564

iksaif · 2026-01-07T14:40:00Z

Motivation

Enable deployment of a single zstd binary across heterogeneous ARM fleets with varying CPU capabilities. This is particularly important for cloud deployments where applications run across multiple instance types:

AWS Graviton 2 (Neoverse N1): Baseline ARM64, no SVE/SVE2
AWS Graviton 3 (Neoverse V1): 256-bit SVE, no SVE2
AWS Graviton 4 (Neoverse V2): 128-bit SVE2
GCP C4D (Neoverse V2): 128-bit SVE2

Currently, to leverage SVE2 optimizations, you must compile with -march=neoverse-v2 or similar flags, which produces binaries that won't run on older processors. This forces users to either:

Build multiple binaries for different targets
Build for the lowest common denominator and lose performance
Use separate container images for different instance types

This PR implements runtime CPU feature detection, similar to the existing BMI2 support on x86-64, allowing a single binary compiled for Neoverse N1 baseline (-mcpu=neoverse-n1) to automatically use SVE2 optimizations when available.

Changes

This PR adds runtime ARM SVE2 detection infrastructure:

Core Infrastructure

CPU feature detection (lib/common/cpu.h): Platform-specific detection via getauxval() on Linux/Android
Build macros (lib/common/portability_macros.h): DYNAMIC_SVE2 macro to enable runtime dispatch
Target attributes (lib/common/compiler.h): SVE2_TARGET_ATTRIBUTE for selective function compilation
Context initialization (lib/compress/zstd_compress.c): Detect SVE2 once per compression context

Platform Support

✅ Linux/Android aarch64: Full runtime detection via getauxval()
❌ Apple platforms: Explicitly disabled (Apple Silicon doesn't support SVE/SVE2)
⏸️ Windows on ARM: Placeholder for future support

Recommended Flags

Benchmarking on Graviton 4:

# Compile for Neoverse N1 baseline with runtime SVE2 detection
make CC=gcc-15 CFLAGS="-O3 -mcpu=neoverse-n1"

# Binary automatically uses SVE2 on Graviton 4, falls back to baseline on Graviton 2/3

Overhead

Zero overhead on non-SVE2 systems:

CPU detection happens once per compression context initialization
No runtime checks in hot paths when SVE2 is unavailable
Binary size increase is minimal

Implement runtime detection of ARM SVE and SVE2 CPU capabilities, similar to the existing BMI2 runtime detection for x86-64. Changes: - Add ARM CPU feature detection in lib/common/cpu.h using platform-specific APIs (getauxval on Linux/Android, disabled on macOS/Windows) - Add DYNAMIC_SVE and DYNAMIC_SVE2 macros in portability_macros.h - Add SVE2_TARGET_ATTRIBUTE for selective function compilation - Add sve2 field to compression context (ZSTD_CCtx) - Update histogram functions to support dynamic SVE2 dispatch - Explicitly disable SVE/SVE2 on Apple platforms (not supported) Platform support: - Linux/Android aarch64: Full runtime detection via getauxval() - Apple platforms: Disabled (Apple Silicon doesn't support SVE/SVE2) - Windows on ARM: Placeholder (API not yet available) Benefits: - Enables SVE2 optimizations on capable hardware without requiring build-time flags - Zero overhead on non-SVE2 systems - Expected 2-3x speedup in histogram counting on SVE2-capable CPUs (AWS Graviton4, Ampere AmpereOne) Note: Currently only SVE2 optimizations exist. CPUs with SVE but not SVE2 (e.g., Fujitsu A64FX) could benefit from future SVE-only implementations.

meta-cla bot added the CLA Signed label Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add runtime ARM SVE2 detection support #4564

Add runtime ARM SVE2 detection support #4564

iksaif commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add runtime ARM SVE2 detection support #4564

Are you sure you want to change the base?

Add runtime ARM SVE2 detection support #4564

Conversation

iksaif commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Core Infrastructure

Platform Support

Recommended Flags

Overhead

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

iksaif commented Jan 7, 2026 •

edited

Loading