Skip to content

Add PCLMULQDQ-accelerated CRC32C path with large block folding#74

Open
pkdog wants to merge 1 commit into
google:mainfrom
pkdog:feature/x86-pclmul-optimization
Open

Add PCLMULQDQ-accelerated CRC32C path with large block folding#74
pkdog wants to merge 1 commit into
google:mainfrom
pkdog:feature/x86-pclmul-optimization

Conversation

@pkdog

@pkdog pkdog commented May 2, 2026

Copy link
Copy Markdown

Summary

  • Adds a new SSE4.2 + PCLMULQDQ accelerated CRC32C implementation (ExtendSse42Clmul) that uses PCLMULQDQ carry-less multiplication for stripe folding, replacing the skip-table approach
  • Adds larger block sizes (16128B / 4032B) beyond the AWS checksums baseline (3072B) to reduce folding frequency by 5.3x, matching SSE4.2 skip-table block hierarchy
  • Improves prefetch horizon from 128B to 256B

Performance (Intel Xeon E5-2678 v3, Release build)

Buffer PCLMUL (this PR) SSE4.2 skip-table (existing)
256B 16.0 ns (14.9 GiB/s) 32.1 ns (7.4 GiB/s)
64KiB 2672 ns (22.8 GiB/s) 2691 ns (22.7 GiB/s)
16MiB 715785 ns (21.8 GiB/s) 724845 ns (21.6 GiB/s)

PCLMUL path is now equal to or faster than SSE4.2 skip-table at all buffer sizes.

Files changed

  • New: src/crc32c_sse42_clmul.{cc,h} — PCLMUL-accelerated implementation
  • Modified: src/crc32c.cc — dispatch logic (PCLMUL > SSE4.2 > ARM64 > portable)
  • Modified: src/crc32c_sse42_check.h — added CanUseClmul() CPUID detection
  • Modified: CMakeLists.txt — PCLMUL compile detection and OBJECT library
  • Modified: src/crc32c_config.h.in — added HAVE_PCLMUL define
  • Modified: plans/x86-pclmul-optimization.md — updated optimization plan

Test plan

  • ctest — all 2 test suites pass
  • crc32c_bench — all benchmarks pass, PCLMUL ≥ SSE4.2 at all sizes
  • Algorithm verified against awslabs/aws-checksums reference
  • Folding constants verified by GF(2) linear algebra tool

Draft PR — pending final review.

@google-cla

google-cla Bot commented May 2, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@pkdog pkdog force-pushed the feature/x86-pclmul-optimization branch 5 times, most recently from 91960d8 to e29e199 Compare May 2, 2026 04:30
@pkdog pkdog force-pushed the feature/x86-pclmul-optimization branch from e29e199 to 578fe78 Compare May 2, 2026 04:34
@pkdog pkdog marked this pull request as ready for review May 2, 2026 04:36
@pkdog pkdog marked this pull request as draft May 2, 2026 12:41
@pkdog pkdog marked this pull request as ready for review May 2, 2026 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant