Supporting group-wise quantization and sub1 packing

Dear Authors,

Sorry for the intrusion once more.

To the best of my understanding, the original GPTQ algorithm accommodates a range of group-wise quantizations, such as group sizes of -1, 128, and 64. Upon reviewing the code, and assuming my interpretation is correct, it appears that although the `batch_GPTQ` inherently supports various group sizes, the [`add_expert`](https://github.com/IST-DASLab/qmoe/blob/9110baa9466f2a7d8590e3c5dc3a5e11f7446604/switch.py#L427) function within the `Sub1CheckpointManager` class and the `make` function in the [`Sub1Linear`](https://github.com/IST-DASLab/qmoe/blob/9110baa9466f2a7d8590e3c5dc3a5e11f7446604/sub1.py#L149) seemingly only support row-wise quantization by default, corresponding to a group size of -1. Consequently, only the row-wise `min_max` variable is preserved for subsequent packing operations.

Would it be feasible to apply the LWZ algorithm to tensors that have undergone group-wise quantization (for instance, groupsize=128, ternary weights) and to design the sub1 packing process accordingly?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting group-wise quantization and sub1 packing #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Supporting group-wise quantization and sub1 packing #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions