kernel-bench: seed sm90/sm100 ground truth (bootstrap for sglang#28138) by BBuf · Pull Request #21 · sgl-project/ci-data

BBuf · 2026-06-15T02:37:50Z

What

Seeds the kernel-benchmark regression ground truth under kernel-bench/ so the per-PR gate in sgl-project/sglang#28138 ([CI] Kernel benchmark regression gate) has a baseline to pull and compare against. This is the one-time bootstrap that the nightly nightly-kernel-bench-gt.yml workflow will take over once #28138 lands on main (the workflow can't workflow_dispatch until it's on the default branch).

kernel-bench/sm90.json — generated on NVIDIA H100 80GB HBM3 (sm90 / cc 9.0)
kernel-bench/sm100.json — generated on NVIDIA B200 (sm100 / cc 10.0)

Provenance

Generated from sglang branch feat/kernel-bench-regression-ci @ cf8dbf44d9 (PR #28138, freshly rebased on main).
Command: python3 -m kernel_bench_regression generate --repeat 5 --commit cf8dbf44d9.
Each SKU was measured on its native hardware (sm90 on real H100, sm100 on real B200) so the numbers are faithful baselines for the gate's H100/B200 runners — no cross-class extrapolation. torch 2.11.0+cu130 on both; sm100 reused the container's prebuilt sgl_kernel, sm90 built sgl_kernel from the PR branch (the container's preinstalled 0.4.2.post1 predates the new dsv4_fused_q_norm_rope op).

Coverage / known gaps

sm90: 13 cases, 34/37 real measurements. cutlass_mla_decode correctly auto-skipped (sm100-only).
sm100: 14 cases, 30/40 real measurements. cutlass_mla_decode ran.

Two cases recorded null (the harness records null and continues; the gate skips null baseline entries). These are pre-existing issues in #28138's bench wiring / kernels, not in the GT pipeline, and are called out on the PR for follow-up:

fp8_gemm — null on both SKUs. bench_fp8_gemm.py passes a per-token scale (M,)/(N,) into the static per-tensor-quant path, which asserts a scalar {1} scale (per_tensor_quant_fp8.cuh:113) → Tensor match failed. Independent of the kernel build.
per_token_group_quant_8bit — null on sm100 only (works on sm90, ~3–27us). On B200 the bench utility's profiler-table check trips on a large FillFunctor + "Command Buffer Full" rows.

These null entries are harmless for the gate (skipped), and can be backfilled by the nightly once the two bench issues are fixed in #28138.

Companion to sgl-project/sglang#28138.

BBuf added 2 commits June 15, 2026 10:37

kernel-bench: seed sm90 (H100) ground truth for sglang#28138

fbb8e00

kernel-bench: seed sm100 (B200) ground truth for sglang#28138

396ccab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel-bench: seed sm90/sm100 ground truth (bootstrap for sglang#28138)#21

kernel-bench: seed sm90/sm100 ground truth (bootstrap for sglang#28138)#21
BBuf wants to merge 2 commits into
sgl-project:mainfrom
BBuf:kernel-bench-gt-bootstrap

BBuf commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BBuf commented Jun 15, 2026

What

Provenance

Coverage / known gaps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant