Add RISC-V Zvbc vector CRC32C acceleration#3332
Conversation
Implement CRC32C using Zvbc vector carry-less multiplication (vclmul/vclmulh RVV intrinsics). Processes 4 lanes of 128-bit folding per iteration (64 bytes), with 4-to-1 lane reduction and Barrett reduction for finalization. - Add rv_crc32c_vclmul() using vclmul/vclmulh intrinsics - Add isZvbc() runtime detection via /proc/cpuinfo - Add WITH_RISCV_ZVBC cmake option - Fix macro guards: support __riscv_zvbc without __riscv_zbc - Zvbc preferred over Zbc in Choose_Extend()
There was a problem hiding this comment.
Pull request overview
This PR adds a new RISC-V CRC32C fast path using the Zvbc vector carry-less multiply instructions (RVV vclmul/vclmulh), along with a CMake option to enable compiling that path and runtime selection logic that prefers Zvbc over the existing Zbc scalar optimization.
Changes:
- Add
rv_crc32c_vclmul(RVV-based CRC32C folding + Barrett reduction) andisZvbc()runtime detection. - Update runtime dispatch (
Choose_Extend/IsFastCrc32Supported) to prefer Zvbc when available, otherwise fall back to Zbc scalar. - Add
WITH_RISCV_ZVBCCMake option and set-marchaccordingly.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/butil/crc32c.cc |
Adds the Zvbc vector CRC32C implementation and runtime selection logic. |
CMakeLists.txt |
Adds build option/flags to enable compiling the Zvbc path on riscv64. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Set up RVV for 64-bit elements: vl = min(VLEN/64, 2) = 2 for VLEN=128 | ||
| size_t vl = __riscv_vsetvl_e64m1(2); |
There was a problem hiding this comment.
Fixed. Added a vl < 2 check after __riscv_vsetvl_e64m1(2) that falls back to the bitwise path if VLEN < 128.
| if(WITH_RISCV_ZVBC) | ||
| set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=rv64gc_zbc_zvbc") | ||
| elseif(WITH_RISCV_ZBC) |
There was a problem hiding this comment.
Fixed. Changed -march=rv64gc_zbc_zvbc to -march=rv64gcv_zbc_zvbc to include the base RVV extension.
| vuint64m1_t lane1 = __riscv_vle64_v_u64m1((const uint64_t*)(p + 0), vl); | ||
| vuint64m1_t lane2 = __riscv_vle64_v_u64m1((const uint64_t*)(p + 16), vl); | ||
| vuint64m1_t lane3 = __riscv_vle64_v_u64m1((const uint64_t*)(p + 32), vl); | ||
| vuint64m1_t lane4 = __riscv_vle64_v_u64m1((const uint64_t*)(p + 48), vl); |
There was a problem hiding this comment.
Fixed. All vector loads now go through memcpy into a uint64_t[2] staging buffer before vle64, avoiding the uint8_t*-to-uint64_t* cast.
| if (strstr(line, "isa") || strstr(line, "hart isa")) { | ||
| char* colon = strchr(line, ':'); | ||
| if (colon) { | ||
| if (strstr(colon, "_zvbc") || strstr(colon, "zvbc")) { | ||
| supported = true; |
There was a problem hiding this comment.
Fixed. Both isZbc() and isZvbc() now only match _zbc and _zvbc (with underscore prefix), removing the bare zbc/zvbc substring checks that could cause false positives.
…flag - Add vl<2 fallback for VLEN<128 (prevents UB from uninitialized elements) - Use memcpy instead of uint8_t*-to-uint64_t* casts (strict-aliasing safe) - Tighten /proc/cpuinfo ISA matching to _zbc/_zvbc only (prevents _zvbc falsely matching zbc) - Add missing 'v' base extension in -march flag (rv64gcv_zbc_zvbc)
Summary
vclmul/vclmulhRVV intrinsics)WITH_RISCV_ZVBCcmake optionChoose_Extend()for higher throughputBackground
This extends the existing Zbc scalar CRC32C optimization (merged in #3312) with vector support. The implementation follows the same 128-bit folding + Barrett reduction approach used in x86 SSE4.2 and ARM PMULL.
Key implementation details
clmul/clmulh(Zbc instructions, available when Zvbc is present)Testing
Verified on QEMU with
-cpu rv64,zvbc=true,v=true,vlen=128:Build