x86: use simd::intrinsics for saturating packs#2033
Merged
folkertdev merged 4 commits intorust-lang:mainfrom Feb 19, 2026
Merged
x86: use simd::intrinsics for saturating packs#2033folkertdev merged 4 commits intorust-lang:mainfrom
simd::intrinsics for saturating packs#2033folkertdev merged 4 commits intorust-lang:mainfrom
Conversation
Use intrinsics for `sse2`, `sse41`, `avx2`, `avx512bw` The majority of implementations make use of `simd_shuffle` since that optimized through to the avx512 intrinsics that made use of the lower target feature intrinsics. Combined with masked stores, instruction tests would fail presumably due to the casting and clamping that the compiler couldn't see through. This is a known weakness as seen in the other masked stores like the truncating conversion stores.
Collaborator
|
r? @folkertdev rustbot has assigned @folkertdev. Use Why was this reviewer chosen?The reviewer was selected based on:
|
folkertdev
approved these changes
Feb 19, 2026
Contributor
folkertdev
left a comment
There was a problem hiding this comment.
Neat, this looks good to me
cc @sayantn if you have thoughts, otherwise I'll just merge this tomorrow
Contributor
|
It lgtm too, just one point - all the intrinsics can now be marked |
Contributor
|
That's better as a follow-up I think |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
simd::intrinsicsforsse2,sse41,avx2,avx512bwAll but one of the implementations make use of
simd_shuffle. Someavx512intrinsics call the lower-target-feature intrinsics but with additional masking capability which caused some trial and error figuring out how to make the optimizer happy. Saturating packing instructions are essentially shuffles and LLVM can recognize a lot of these patterns by now.Combined with masked stores, instruction tests routinely failed unless using shuffles which is probably the lack of being able to see through the clamping and truncating as in the truncating conversion stores issue. This same strategy could probably be used to get more of the saturating masked truncation instructions to pass.
_mm_packs_epi32was the single case that failed to optimize at all unless I wrote it without a shuffle.