feat: add SpeechNet (SilentWear) training test#31
Draft
runwangdl wants to merge 8 commits into
Draft
Conversation
a74bfac to
95fef65
Compare
Add SpeechNet EMG silent speech recognition model (14 channels, 700 time samples, 9 classes, ~15K params) to the training test suite. Changes: - Add SpeechNet training ONNX artifacts (network.onnx, inputs.npz, outputs.npz, optimizer network) exported from Onnx4Deeploy with static reshape (no dynamic Shape/Flatten ops). - Fix ConvLayer.computeShapes bias shape: wrap scalar int in tuple to prevent graphsurgeon export crash on Conv layers with bias. - Register SpeechNet in L2 singlebuffer training test config (l1=128000, l2=2000000). Untiled test verified: 4/4 loss diff=0.000000, 285M train cycles.
Freeze Block0 (first conv layer with large 14×701 activations) to avoid tiling issues with its backward pass. Train Block1-4 + FC (18 trainable params, 4 ConvGrad + 4 BatchNormGrad). Tiled test verified: 4/4 loss diff < 0.001, 96M train cycles. Block0 backward tiling hang is tracked separately — the 314 KB activation tensor requires heavy L1 tiling that triggers a simulation hang in the ConvGrad/AveragePoolGrad backward path.
Now that ConvGradX uses the naive kernel, full SpeechNet training (all 5 blocks + FC, 22 trainable params) passes tiled simulation.
Step-by-step tutorial covering PyTorch model design, Onnx4Deeploy export, untiled/tiled Deeploy deployment, tiling pipeline overview, common pitfalls, and GVSoC trace debugging.
24bae6b to
b185be6
Compare
…tile ConvGradW accumulates the weight gradient across spatial (H/W) tiles via the kernel's mm_add. The dW buffer must be zeroed exactly once per backward pass, before the first spatial tile. The memset guard used `*tileIdxPtr == 0`, but tileIdxPtr is a per-EXECUTION index (it selects the numTiles prefix-sum range and is incremented after the whole tile loop), so it stays constant across an execution's spatial tiles. The guard was therefore true for every tile of the first execution, re-zeroing grad_weight on each tile and wiping the cross-tile accumulation -- only the last tile's partial dW survived (~1/numTiles of the true gradient). Add a dedicated per-execution `dwZeroFlagPtr` (reset to 0 on every backward pass, set to 1 after the first tile zeroes grad_weight) and guard the H/W-tiled memset on it instead. grad_weight is now zeroed once and accumulated across all spatial tiles. Effect: SpeechNet tiled L2 training, which spatially tiles the wide block-0/1 convolutions, went from per-step loss drift up to 2.7e-3 on real EMG data (exceeding the 1e-3 tolerance) to bit-exact (diff = 0.000000) against the PyTorch/ORT reference. ResNet8 and CCT training regressions still pass.
Replace the random-normal input tensors in the SpeechNet training test fixture with 4 real surface-EMG windows (14 channels x 700 samples, 1.4 s @ 500 Hz) drawn from the public PulpBio/SilentWear dataset (subject S01, vocalized, session 1), labels [7, 3, 2, 6]. Reference losses are recomputed by ORT for these inputs. With the ConvGradW spatial-tile dW-accumulation fix, the tiled L2 single-buffer training loss now matches the reference bit-exactly (diff = 0.000000 across all 4 steps) on real EMG data, where it previously drifted up to 2.7e-3 and exceeded the 1e-3 tolerance.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ConvLayer.computeShapesbias shape bug:inputShapes[1][0]→(inputShapes[1][0],)— prevents graphsurgeon export crash on Conv layers with biasl1=128000,l2=2000000)Test results
Untiled (verified):
Test plan
🤖 Generated with Claude Code