FP8 Megablox for batch split by BirdsOfAFthr · Pull Request #3770 · AI-Hypercomputer/maxtext

BirdsOfAFthr · 2026-04-29T05:00:19Z

Description

(1) This update enables FP8 Megablox quantization support for DeepSeek batch split configurations.

When quantization is active, the following changes apply:

Kernel Quantization: gmm kernels allow FP8 recipes (defined via the MaxText command line) in both forward and backward passes.
gmm forward: weight is manually quantized to bypass the expcilt sharding error; activation is quantized using qwix
gmm backward: gradients are quantized using qwix.

(2) This change also enables merging gating gmm kernels.

In the previous SwiGLU/GLU implementation, the gate-projection and up-projection were processed using two sequential gmm_fn calls. By concatenating these weights and processing them together, we effectively double the contiguous hidden dimension of the kernel. This is especially critical for FP8 utilizing Expert Parallelism (EP) that shard along the contracting dimension. Because this sharding strategy inherently shrinks the local MLP hidden dimension on each device, the matrix multiplications can become small and bottlenecked by memory bandwidth. Merging $W_0$ and $W_1$ effectively gives us a 2X increase in that local dimension, restoring arithmetic intensity and hardware utilization.

Tests

Verification: Validated via end-to-end (e2e) perf and convergence benchmarks.
Coverage: Unit tests will be added in a subsequent update.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

gobbleturk · 2026-04-29T18:10:40Z

Note there is more general support in this PR #3736

shuningjin

Thanks for GMM FP8 support with careful manual quantization, while bringing back merging gating from PR#3199! Had a comment wrt sharding, and other minor changes.

shuningjin · 2026-05-01T05:34:07Z

+  Returns:
+    The result of the grouped matrix multiplication.
+  """
+  if config.use_qwix_quantization:


maybe this condition is clearer?

if config.quantization == "fp8_full":

BirdsOfAFthr added the pull ready label Apr 29, 2026

BirdsOfAFthr marked this pull request as ready for review April 29, 2026 05:01

BirdsOfAFthr requested review from NicoGrande, NuojCheng, RissyRan, bvandermoon, gagika, gobbleturk, jesselu-google, jiangjy1982, parambole, richjames0, shralex, shuningjin and suexu1025 as code owners April 29, 2026 05:01

BirdsOfAFthr force-pushed the amandaliang branch from 47d4976 to 3da5049 Compare April 29, 2026 19:07

BirdsOfAFthr requested review from A9isha, SurbhiJainUSC, abhinavclemson, aireenmei, dipannita08, hengtaoguo, igorts-git, jshin1394, khatwanimohit, liudangyi, michelle-yooh and vipannalla as code owners April 29, 2026 19:07

BirdsOfAFthr force-pushed the amandaliang branch from 3da5049 to 5992420 Compare April 29, 2026 19:13

BirdsOfAFthr changed the title ~~Support merging gating gmm kernels~~ FP8 Megablox for batch split Apr 29, 2026

BirdsOfAFthr force-pushed the amandaliang branch from 5992420 to 61a6832 Compare April 30, 2026 21:45

shuningjin reviewed May 1, 2026

View reviewed changes

Comment thread src/maxtext/kernels/megablox/ops.py Outdated

Comment thread src/maxtext/kernels/megablox/ops.py Outdated

Comment thread src/maxtext/layers/quantizations.py Outdated

Comment thread src/maxtext/models/deepseek_batchsplit.py Outdated

BirdsOfAFthr force-pushed the amandaliang branch from 61a6832 to 49d07fb Compare May 1, 2026 05:20

FP8 Megablox for batch split

5ef365f

BirdsOfAFthr force-pushed the amandaliang branch from 49d07fb to 5ef365f Compare May 1, 2026 05:24

shuningjin reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 Megablox for batch split#3770

FP8 Megablox for batch split#3770
BirdsOfAFthr wants to merge 1 commit intomainfrom
amandaliang

BirdsOfAFthr commented Apr 29, 2026 •

edited

Loading

Uh oh!

gobbleturk commented Apr 29, 2026

Uh oh!

shuningjin left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shuningjin May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BirdsOfAFthr commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

(1) This update enables FP8 Megablox quantization support for DeepSeek batch split configurations.

(2) This change also enables merging gating gmm kernels.

Tests

Checklist

Uh oh!

gobbleturk commented Apr 29, 2026

Uh oh!

shuningjin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shuningjin May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BirdsOfAFthr commented Apr 29, 2026 •

edited

Loading

shuningjin left a comment •

edited

Loading