-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Hi,
Thanks for the great work! Just wondering if there're plans for supporting lower bitwidth kernels (e.g., 2 bit + 2:4 sparsity).
For a bit of context, we were working on a project that compresses the difference between the fine-tuned model and the base model, and it turned out we can compress it more aggressively (see: https://arxiv.org/abs/2312.05215), and it would be great if we can leverage marlin & sparse marlin to accelerate the inference.
Thanks in advance!
Best regards,
Xiaozhe
cc: @alexm-neuralmagic (since I saw there's a PR for 8bit, but closed)
Metadata
Metadata
Assignees
Labels
No labels