Skip to content

[CUDA] 3/5/6-bit quants for qmm_naive#3352

Merged
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:qmm-native-2
Apr 1, 2026
Merged

[CUDA] 3/5/6-bit quants for qmm_naive#3352
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:qmm-native-2

Conversation

@zcbenz
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz commented Apr 1, 2026

Followup to #3315 that adds support for 3/5/6-bit affine quantization, the tricky part with CuTe is that by default the tiled copy only supports number of bits that is power of 2, for reading 3/5/6-bit weights we have to add some custom integer types (uint24_t /uint40_t /uint48_t) to make tiled copy compile.

Copy link
Copy Markdown
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@zcbenz zcbenz merged commit 2ffafe0 into ml-explore:main Apr 1, 2026
16 checks passed
@zcbenz zcbenz deleted the qmm-native-2 branch April 1, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants