[CUDA] 3/5/6-bit quants for qmm_naive by zcbenz · Pull Request #3352 · ml-explore/mlx

zcbenz · 2026-04-01T04:57:42Z

Followup to #3315 that adds support for 3/5/6-bit affine quantization, the tricky part with CuTe is that by default the tiled copy only supports number of bits that is power of 2, for reading 3/5/6-bit weights we have to add some custom integer types (uint24_t /uint40_t /uint48_t) to make tiled copy compile.

angeloskath

Nice!

[CUDA] 3/5/6-bit quants for qmm_naive

786a208

angeloskath approved these changes Apr 1, 2026

View reviewed changes

zcbenz merged commit 2ffafe0 into ml-explore:main Apr 1, 2026
16 checks passed

zcbenz deleted the qmm-native-2 branch April 1, 2026 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] 3/5/6-bit quants for qmm_naive#3352

[CUDA] 3/5/6-bit quants for qmm_naive#3352
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:qmm-native-2

zcbenz commented Apr 1, 2026

Uh oh!

angeloskath left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zcbenz commented Apr 1, 2026

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants