Add runtime ARM SVE2 detection support #4564
Open
+183
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Enable deployment of a single zstd binary across heterogeneous ARM fleets with varying CPU capabilities. This is particularly important for cloud deployments where applications run across multiple instance types:
Currently, to leverage SVE2 optimizations, you must compile with
-march=neoverse-v2or similar flags, which produces binaries that won't run on older processors. This forces users to either:This PR implements runtime CPU feature detection, similar to the existing BMI2 support on x86-64, allowing a single binary compiled for Neoverse N1 baseline (
-mcpu=neoverse-n1) to automatically use SVE2 optimizations when available.Changes
This PR adds runtime ARM SVE2 detection infrastructure:
Core Infrastructure
lib/common/cpu.h): Platform-specific detection viagetauxval()on Linux/Androidlib/common/portability_macros.h):DYNAMIC_SVE2macro to enable runtime dispatchlib/common/compiler.h):SVE2_TARGET_ATTRIBUTEfor selective function compilationlib/compress/zstd_compress.c): Detect SVE2 once per compression contextPlatform Support
getauxval()Recommended Flags
Benchmarking on Graviton 4:
Overhead
Zero overhead on non-SVE2 systems:
Related
This follows the same pattern as the existing x86-64 BMI2 runtime detection, extending it to ARM architectures.