fix(quant): migrate int8/fp8/awq/lepto_fp8 to new Catcher API (regression from #318)#352
Open
hobostay wants to merge 1 commit into
Open
fix(quant): migrate int8/fp8/awq/lepto_fp8 to new Catcher API (regression from #318)#352hobostay wants to merge 1 commit into
hobostay wants to merge 1 commit into
Conversation
PR Tencent#318 (86479db) refactored the Catcher class to a new API and migrated gptq.py to it, but four sibling algorithms were left calling the removed constructor/attribute: # old API (gone) layers[0] = Catcher(layers[0], self.inps, cache) layer_kwargs = layers[0].layer_kwargs The current Catcher.__init__ signature is (module, max_seq_length=None), and it exposes .captured_inputs / .captured_kwargs (per-sample lists), not .layer_kwargs. As a result INT8, FP8, AWQ and LeptoFP8 quantization all crashed at the start of calibration: TypeError: __init__() takes 2 positional arguments but 4 were given (then) AttributeError: ... has no attribute 'layer_kwargs' Migrate the four callers to the new API the same way gptq.py already does: construct with max_seq_length=self.seq_length, then rebuild the fixed-shape self.inps tensor from captured_inputs and take layer_kwargs from captured_kwargs[0] (matching the previous single-dict usage). Verified the reconstruction handles variable seq_len and that layer_kwargs stays forward-safe (hidden_states / past_key_values are filtered by Catcher). Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
INT8,FP8,AWQ, andLeptoFP8quantization all crash at the very start of calibration:(followed by
AttributeError: ... has no attribute 'layer_kwargs'once the constructor were bypassed.)Four core PTQ algorithms are effectively unusable.
Root cause
PR #318 (
86479db, "fix(gptq): fix Hessian computation, variable-length sequence support…") refactored theCatcherclass to a new API and migratedgptq.pyto it — but four sibling modules were left calling the removed API:The current
Catcher(angelslim/compressor/quant/modules/catcher.py) has signature:and exposes
.captured_inputs/.captured_kwargs(per-sample lists) — there is no 3rd positional arg and no.layer_kwargsattribute. Onlygptq.pywas updated to the new API in #318:Fix
Migrate the four callers the same way
gptq.pyalready does: construct withmax_seq_length=self.seq_length, then rebuild the fixed-shapeself.inpstensor fromcaptured_inputsand takelayer_kwargsfromcaptured_kwargs[0](matching the previous single-shared-dict usage). The existing per-layer forward loops (self.inps[i:i+bs]/self.inps[j:j+1]with**layer_kwargs) are unchanged.awq.py'shunyuan_vloverride (layer_kwargs["position_embeddings"] = None) still works since it operates on the reconstructed dict.Verification
python3 -m py_compilepasses for all four files; no staleself.inps, cache/.layer_kwargs/cache["i"]references remain.Catcher):captured_inputs([1, seq, hidden]) andcaptured_kwargs;[N, seq_length, hidden]tensor and handles variableseq_lenvia padding;layer_kwargs = captured_kwargs[0]is forward-safe (hidden_states/past_key_valuesare filtered byCatcher._FILTER_KWARGS), andlayer(hidden_states=..., **layer_kwargs)succeeds for every sample.angelslim/compressor/quant/modules/__init__.pyand imported byptq.py.Note: I validated the capture/reconstruction logic in isolation (no GPU); a final on-hardware smoke run of each algorithm by a maintainer would be welcome.