fix(quant): migrate int8/fp8/awq/lepto_fp8 to new Catcher API (regression from #318) by hobostay · Pull Request #352 · Tencent/AngelSlim

hobostay · 2026-06-17T12:57:21Z

Problem

INT8, FP8, AWQ, and LeptoFP8 quantization all crash at the very start of calibration:

TypeError: Catcher.__init__() takes 2 positional arguments but 4 were given

(followed by AttributeError: ... has no attribute 'layer_kwargs' once the constructor were bypassed.)

Four core PTQ algorithms are effectively unusable.

Root cause

PR #318 (86479db, "fix(gptq): fix Hessian computation, variable-length sequence support…") refactored the Catcher class to a new API and migrated gptq.py to it — but four sibling modules were left calling the removed API:

# int8.py, fp8.py, awq.py, lepto_fp8.py  (OLD API — no longer exists)
layers[0] = Catcher(layers[0], self.inps, cache)
layer_kwargs = layers[0].layer_kwargs

The current Catcher (angelslim/compressor/quant/modules/catcher.py) has signature:

def __init__(self, module, max_seq_length=None): ...

and exposes .captured_inputs / .captured_kwargs (per-sample lists) — there is no 3rd positional arg and no .layer_kwargs attribute. Only gptq.py was updated to the new API in #318:

layers[0] = Catcher(layers[0], max_seq_length=self.seq_length)
...
inps = layers[0].captured_inputs
layer_kwargs_list = layers[0].captured_kwargs

Fix

Migrate the four callers the same way gptq.py already does: construct with max_seq_length=self.seq_length, then rebuild the fixed-shape self.inps tensor from captured_inputs and take layer_kwargs from captured_kwargs[0] (matching the previous single-shared-dict usage). The existing per-layer forward loops (self.inps[i:i+bs] / self.inps[j:j+1] with **layer_kwargs) are unchanged.

awq.py's hunyuan_vl override (layer_kwargs["position_embeddings"] = None) still works since it operates on the reconstructed dict.

Verification

python3 -m py_compile passes for all four files; no stale self.inps, cache / .layer_kwargs / cache["i"] references remain.
End-to-end simulation of capture → reconstruct → layer forward (using the real Catcher):
- per-sample capture produces captured_inputs ([1, seq, hidden]) and captured_kwargs;
- reconstruction rebuilds the [N, seq_length, hidden] tensor and handles variable seq_len via padding;
- layer_kwargs = captured_kwargs[0] is forward-safe (hidden_states / past_key_values are filtered by Catcher._FILTER_KWARGS), and layer(hidden_states=..., **layer_kwargs) succeeds for every sample.
The reachability of these four algorithms is confirmed: they are exported from angelslim/compressor/quant/modules/__init__.py and imported by ptq.py.

Note: I validated the capture/reconstruction logic in isolation (no GPU); a final on-hardware smoke run of each algorithm by a maintainer would be welcome.

PR Tencent#318 (86479db) refactored the Catcher class to a new API and migrated gptq.py to it, but four sibling algorithms were left calling the removed constructor/attribute: # old API (gone) layers[0] = Catcher(layers[0], self.inps, cache) layer_kwargs = layers[0].layer_kwargs The current Catcher.__init__ signature is (module, max_seq_length=None), and it exposes .captured_inputs / .captured_kwargs (per-sample lists), not .layer_kwargs. As a result INT8, FP8, AWQ and LeptoFP8 quantization all crashed at the start of calibration: TypeError: __init__() takes 2 positional arguments but 4 were given (then) AttributeError: ... has no attribute 'layer_kwargs' Migrate the four callers to the new API the same way gptq.py already does: construct with max_seq_length=self.seq_length, then rebuild the fixed-shape self.inps tensor from captured_inputs and take layer_kwargs from captured_kwargs[0] (matching the previous single-dict usage). Verified the reconstruction handles variable seq_len and that layer_kwargs stays forward-safe (hidden_states / past_key_values are filtered by Catcher). Co-Authored-By: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quant): migrate int8/fp8/awq/lepto_fp8 to new Catcher API (regression from #318)#352

fix(quant): migrate int8/fp8/awq/lepto_fp8 to new Catcher API (regression from #318)#352
hobostay wants to merge 1 commit into
Tencent:mainfrom
hobostay:fix/quant-catcher-api-regression

hobostay commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hobostay commented Jun 17, 2026

Problem

Root cause

Fix

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant