Feat (gptaq): initial implementation of GPTAQ #1411
Draft
+177
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reason for this PR
Initial implementation of GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration
Reference implementation: https://github.com/Intelligent-Computing-Lab-Panda/GPTAQ/tree/main
Here are some 3-bit asymmetric weight-only quantization results for several Qwen3 models (config and reproduction details below):
Notable details
The reference implementation adds some new$$\alpha$$ parameter here to scale their difference matrix $$P$$ but this was not mentioned in the paper. We follow their reference implementation by defaulting
self.alpha=0.25; see full discussion here.Config and reproduction details
The above table was collected using layerwise Hadamard rotations for 3-bit asymmetric weight-only quantization. The config is given below:
The data is collected via:
where you can specify algorithms by adding
--gptq,--gpfq,--qronos, and--gptaqto the CLI args. For Learned Round, the results were collected with--learned-round=linear_round --learned-round-fast-update. Noticeably, the scales were not learned alongside the rounding offset. This is to make sure all the algorithms in the table used the same grid heuristic for even comparison.Changes Made in this PR
GPTAQclass that can leveragegpfq_modelikeGPFQandQronosapply_gptaqfunction to Huggingface entry point inbrevitas_examplesTesting Summary
TODO
Risk Highlight
Checklist
devbranch.