fix(stem): IndexError on models with more than 38 layers (Qwen3-32B/235B, DeepSeek-R1)#351
Open
hobostay wants to merge 1 commit into
Open
fix(stem): IndexError on models with more than 38 layers (Qwen3-32B/235B, DeepSeek-R1)#351hobostay wants to merge 1 commit into
hobostay wants to merge 1 commit into
Conversation
generate_exact_k_schedule() read the per-layer keep ratio from a fixed-length
list:
_DEFAULT_LAYER_KEEP_RATIOS = [1.0, 1.0] + [0.2] * 36 # length 38
keep_ratio = _DEFAULT_LAYER_KEEP_RATIOS[layer_idx]
layer_idx is assigned to every transformer layer (stem/patch.py sets
self_attn.layer_idx = i for all layers). For models with more than 38 layers
this raised IndexError on the first prefill, before producing any output:
IndexError: list index out of range
This affects flagship models AngelSlim explicitly supports, e.g. Qwen3-32B
(64 layers) and Qwen3-235B-A22B (94 layers).
Replace the length-38 list with a function of layer_idx. It reproduces the
exact previous values for layers 0..37 (1.0 for the first two layers, 0.2
thereafter) and extends the same rule to deeper layers.
Co-Authored-By: Claude <noreply@anthropic.com>
Collaborator
|
Thanks for fixing this, please pass the pre-commit checks
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Running Stem sparse-attention prefill on a model with more than 38 transformer layers crashes immediately:
This affects flagship models AngelSlim explicitly supports, e.g. Qwen3-32B (64 layers) and Qwen3-235B-A22B (94 layers), as well as DeepSeek-R1/V3 (61 layers).
Root cause
generate_exact_k_schedule()reads the per-layer keep ratio from a fixed-length list:layer_idxis assigned to every transformer layer (stem/patch.pysetsself.self_attn.layer_idx = ifor all layers), so for any model with > 38 layers the lookup overflows the list.Fix
Express the default keep-ratio as a function of
layer_idxinstead of a length-38 list. It reproduces the exact previous values for layers0..37(1.0 for the first two layers, 0.2 thereafter) and extends the same rule to deeper layers:Verification
python3 -m py_compilepasses._DEFAULT_LAYER_KEEP_RATIOShad no other references (only the definition + this call site).