Supports eagle3 training for Gemma3 27B and Gemma4 26B. by pyc96 · Pull Request #553 · sgl-project/SpecForge

pyc96 · 2026-05-01T21:48:55Z

Motivation

This PR supports eagle3 training for Gemma3 27B and Gemma4 26B. Other Gemma3/4 models should be supported as well but didn't verify.

Modifications

Besides the new models, it also supports the following features

reuse_target_lm_head: use target lm head when the flag is true
use_aux_norm: Add additional norm layers before the fc layer

For Gemma4, it requires transformers v5+.

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://sgl-fru7574.slack.com/archives/C09784E3EN6 to discuss your PR.

gemini-code-assist

Code Review

This pull request introduces support for Gemma 3 and Gemma 4 models within the Eagle3 framework, including new configurations, training scripts, and a dedicated gemma-4 chat template. Key architectural improvements include a fast path for models where draft and target vocab sizes match, the ability to reuse and freeze the target model's LM head, and an improved weight initialization strategy for stable training. The training script now supports multiple data paths and directory resolution. Feedback focuses on preventing race conditions in distributed output directory creation, improving error handling for mismatched tool lists, and adhering to PEP-8 import standards.

gemini-code-assist · 2026-05-01T21:50:48Z

+        run_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        args.output_dir = os.path.join(args.output_dir, run_timestamp)


Generating the run_timestamp independently on each rank can lead to different output directories across processes if they cross a second boundary during initialization. This will break distributed training and checkpoint saving. The timestamp should be generated on rank 0 and broadcasted to all other ranks.

Suggested change

run_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

args.output_dir = os.path.join(args.output_dir, run_timestamp)

run_timestamp = [datetime.now().strftime("%Y%m%d_%H%M%S") if dist.get_rank() == 0 else None]

dist.broadcast_object_list(run_timestamp, src=0)

args.output_dir = os.path.join(args.output_dir, run_timestamp[0])

gemini-code-assist · 2026-05-01T21:50:48Z

+    if tools is None or len(tools) != len(conversations):
+        tools = [[] for _ in range(len(conversations))]


Silently replacing the tools list with empty lists when the length doesn't match conversations can hide data preparation bugs. It is safer to raise a ValueError if an explicitly provided tools list has an incorrect length.

Suggested change

if tools is None or len(tools) != len(conversations):

tools = [[] for _ in range(len(conversations))]

if tools is None:

tools = [[] for _ in range(len(conversations))]

elif len(tools) != len(conversations):

raise ValueError(f"Length of tools ({len(tools)}) does not match length of conversations ({len(conversations)})")

gemini-code-assist · 2026-05-01T21:50:48Z

+            # transformers v5 mutating rope_scaling/rope_parameters and other
+            # fields in model.config during save_pretrained.
+            if getattr(args, "draft_model_config", None):
+                import json


Per PEP-8, imports should be placed at the top of the file. Moving import json to the module level improves readability and follows standard Python practices.

References

Imports should be at the top of the file, after any module comments and docstrings, and before module globals and constants. ^(link)

Gemma3 27B and Gemma4 26B have a vocabulary size of 262144, which makes triton.next_power_of_2 round up to 262144 (==2^18). The previous limit of 131072 caused _calculate_settings() to raise RuntimeError before the log-softmax loss kernel could launch, preventing Eagle3 training on these targets.

Bump MAX_FUSED_SIZE to 262208 to fit Gemma3/4 vocab

pyc96 added 2 commits April 24, 2026 23:40

gemma3 eagle3 training

078d71c

support use_aux_norm

ceefdeb

pyc96 changed the title ~~Gemma upstream~~ Supports eagle3 training for Gemma3 27B and Gemma4 26B. May 1, 2026

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

pyc96 force-pushed the gemma-upstream branch 2 times, most recently from 0074bd7 to c9910b2 Compare May 1, 2026 23:08

cleanup

a79a5f5

pyc96 force-pushed the gemma-upstream branch from c9910b2 to a79a5f5 Compare May 1, 2026 23:24

pyc96 marked this pull request as ready for review May 1, 2026 23:27

pyc96 requested review from FlamingoPg, FrankLeeeee, shuaills, sleepcoo and zyksir as code owners May 1, 2026 23:27

tcligg mentioned this pull request May 2, 2026

Bump MAX_FUSED_SIZE to 262208 to fit Gemma3/4 vocab pyc96/SpecForge#1

Merged

Merge pull request #1 from tcligg/gemma-upstream-loss-bump

578deb8

Bump MAX_FUSED_SIZE to 262208 to fit Gemma3/4 vocab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supports eagle3 training for Gemma3 27B and Gemma4 26B.#553

Supports eagle3 training for Gemma3 27B and Gemma4 26B.#553
pyc96 wants to merge 5 commits into
sgl-project:mainfrom
pyc96:gemma-upstream

pyc96 commented May 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		run_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
		args.output_dir = os.path.join(args.output_dir, run_timestamp)

-        run_timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
-        args.output_dir = os.path.join(args.output_dir, run_timestamp)
+        run_timestamp = [datetime.now().strftime("%Y%m%d_%H%M%S") if dist.get_rank() == 0 else None]
+        dist.broadcast_object_list(run_timestamp, src=0)
+        args.output_dir = os.path.join(args.output_dir, run_timestamp[0])

		if tools is None or len(tools) != len(conversations):
		tools = [[] for _ in range(len(conversations))]

Conversation

pyc96 commented May 1, 2026

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants