Skip to content

fix(quant): migrate int8/fp8/awq/lepto_fp8 to new Catcher API (regression from #318)#352

Open
hobostay wants to merge 1 commit into
Tencent:mainfrom
hobostay:fix/quant-catcher-api-regression
Open

fix(quant): migrate int8/fp8/awq/lepto_fp8 to new Catcher API (regression from #318)#352
hobostay wants to merge 1 commit into
Tencent:mainfrom
hobostay:fix/quant-catcher-api-regression

Conversation

@hobostay

Copy link
Copy Markdown
Contributor

Problem

INT8, FP8, AWQ, and LeptoFP8 quantization all crash at the very start of calibration:

TypeError: Catcher.__init__() takes 2 positional arguments but 4 were given

(followed by AttributeError: ... has no attribute 'layer_kwargs' once the constructor were bypassed.)

Four core PTQ algorithms are effectively unusable.

Root cause

PR #318 (86479db, "fix(gptq): fix Hessian computation, variable-length sequence support…") refactored the Catcher class to a new API and migrated gptq.py to it — but four sibling modules were left calling the removed API:

# int8.py, fp8.py, awq.py, lepto_fp8.py  (OLD API — no longer exists)
layers[0] = Catcher(layers[0], self.inps, cache)
layer_kwargs = layers[0].layer_kwargs

The current Catcher (angelslim/compressor/quant/modules/catcher.py) has signature:

def __init__(self, module, max_seq_length=None): ...

and exposes .captured_inputs / .captured_kwargs (per-sample lists) — there is no 3rd positional arg and no .layer_kwargs attribute. Only gptq.py was updated to the new API in #318:

layers[0] = Catcher(layers[0], max_seq_length=self.seq_length)
...
inps = layers[0].captured_inputs
layer_kwargs_list = layers[0].captured_kwargs

Fix

Migrate the four callers the same way gptq.py already does: construct with max_seq_length=self.seq_length, then rebuild the fixed-shape self.inps tensor from captured_inputs and take layer_kwargs from captured_kwargs[0] (matching the previous single-shared-dict usage). The existing per-layer forward loops (self.inps[i:i+bs] / self.inps[j:j+1] with **layer_kwargs) are unchanged.

awq.py's hunyuan_vl override (layer_kwargs["position_embeddings"] = None) still works since it operates on the reconstructed dict.

Verification

  • python3 -m py_compile passes for all four files; no stale self.inps, cache / .layer_kwargs / cache["i"] references remain.
  • End-to-end simulation of capture → reconstruct → layer forward (using the real Catcher):
    • per-sample capture produces captured_inputs ([1, seq, hidden]) and captured_kwargs;
    • reconstruction rebuilds the [N, seq_length, hidden] tensor and handles variable seq_len via padding;
    • layer_kwargs = captured_kwargs[0] is forward-safe (hidden_states / past_key_values are filtered by Catcher._FILTER_KWARGS), and layer(hidden_states=..., **layer_kwargs) succeeds for every sample.
  • The reachability of these four algorithms is confirmed: they are exported from angelslim/compressor/quant/modules/__init__.py and imported by ptq.py.

Note: I validated the capture/reconstruction logic in isolation (no GPU); a final on-hardware smoke run of each algorithm by a maintainer would be welcome.

PR Tencent#318 (86479db) refactored the Catcher class to a new API and migrated
gptq.py to it, but four sibling algorithms were left calling the removed
constructor/attribute:

    # old API (gone)
    layers[0] = Catcher(layers[0], self.inps, cache)
    layer_kwargs = layers[0].layer_kwargs

The current Catcher.__init__ signature is (module, max_seq_length=None),
and it exposes .captured_inputs / .captured_kwargs (per-sample lists), not
.layer_kwargs. As a result INT8, FP8, AWQ and LeptoFP8 quantization all
crashed at the start of calibration:

    TypeError: __init__() takes 2 positional arguments but 4 were given
    (then) AttributeError: ... has no attribute 'layer_kwargs'

Migrate the four callers to the new API the same way gptq.py already does:
construct with max_seq_length=self.seq_length, then rebuild the fixed-shape
self.inps tensor from captured_inputs and take layer_kwargs from
captured_kwargs[0] (matching the previous single-dict usage). Verified the
reconstruction handles variable seq_len and that layer_kwargs stays
forward-safe (hidden_states / past_key_values are filtered by Catcher).

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant