Even lower bitwidth kernel

Hi,

Thanks for the great work! Just wondering if there're plans for supporting lower bitwidth kernels (e.g., 2 bit + 2:4 sparsity).

For a bit of context, we were working on a project that compresses the difference between the fine-tuned model and the base model, and it turned out we can compress it more aggressively (see: https://arxiv.org/abs/2312.05215), and it would be great if we can leverage marlin & sparse marlin to accelerate the inference.

Thanks in advance!

Best regards,
Xiaozhe 

cc: @alexm-neuralmagic (since I saw there's a PR for 8bit, but closed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Even lower bitwidth kernel #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Even lower bitwidth kernel #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions