Repeatkv transform by quic-dhirajku · Pull Request #997 · quic/efficient-transformers

quic-dhirajku · 2026-05-19T09:56:34Z

No description provided.

…VLMs. Based on PR quic#625. Addressed most of the comments made on the previous PR. Repeat check is done on a subset of models during CI, primarily due to difference in configs of such models. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

…ng with changes made for the new transforms. TODO: Check for the ONNX directory path name being different. Check if the list of classes for mapping covers all the models that we support. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

…oder Wrappers were added to string mapping list to enable dummy model export for CI. Changes were made to prevent multiple application of ReplicateKVTransform if done in either Encoder or Decoder Wrapper already. Modeling files updated to access config in EncoderWrapper as well. Infra added for causalLM and VLM checks for repeatKV setup CI tests. CausalLM script APIRunner instantiation moved to allow updated input shapes to be made. Similarly commented export in VLM script since compile will call it with updated changes already. TODO: Confirm the changes that were made for DeepSeekV3 model for RepeatKV, currently they were removed for a generic approach. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

quic-rishinr · 2026-05-25T08:11:42Z

@ochougul @vbaddi please review the PR

Made changes to allow generic name based transformation of heads (num_attention_heads, n_heads, n_head etc). Minor edits and utils created for this task. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

Edited the changes as suggested by quic-mamta. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

vbaddi · 2026-05-29T05:26:39Z

nit: should we rename this to num_replicate_kv_heads? @quic-rishinr @ochougul @quic-dhirajku

vbaddi · 2026-05-29T05:27:58Z

+            architectures = getattr(model_config, "architectures", None) or []
+            is_deepseek_v3 = "DeepseekV3ForCausalLM" in architectures
+            if qaic_config:
+                if is_deepseek_v3 and (qaic_config.get("blocking_mode", None) == "h"):


nit: for models w/mla and single kv heads, we do not want to replicate, ex: deepseekv3 is this what is being done here? not clear.

vbaddi · 2026-05-29T05:29:59Z

                self.model.config.text_config.use_cache = True
            else:
                self.model.config.use_cache = True
+        # self.model, replicate_kv_transformed = ReplicateKVHeadTransform.apply(self.model, **kwargs)


nit: remove commented code.

vbaddi · 2026-05-29T05:31:26Z

+            if cls._is_mla_attention(attn):
+                # Legacy MLA support: KV compression projection is organized as
+                # [kv_heads, kv_lora_rank + qk_rope_head_dim, hidden_size].
+                mla_orig_kv_heads = 1


nit: remove magic numbers, get it from the constants file

vbaddi · 2026-05-29T05:32:00Z

+# Generic config key aliases used across model families.
+ATTENTION_HEAD_CONFIG_KEYS = ("num_attention_heads", "n_head", "n_heads", "num_heads")
+KV_HEAD_CONFIG_KEYS = ("num_key_value_heads", "n_kv_heads", "num_kv_heads", "effective_n_kv_heads")
+HIDDEN_SIZE_CONFIG_KEYS = ("hidden_size", "n_embd", "d_model")


does this cover all the models we support as of today?

vbaddi · 2026-05-29T05:32:41Z

+        "meta-llama/Llama-3.2-1B",
+        # "unsloth/gemma-2b",
+        # "unsloth/gemma-2-2b",
+        # "TheBloke/TinyLlama-1.1B-Chat-v0.3-AWQ",


why is this commented? any known failures w/awq, gemma, mistral models?

vbaddi · 2026-05-29T06:52:03Z

@quic-dhirajku also added detailed pr desp. about the design and changes added and test plan validated. thanks

quic-mamta · 2026-05-29T07:49:58Z

-                        qaic_config["head_block_size"] = qaic_config.get("head_block_size", num_devices)
-                    num_kv_heads_repeat = qaic_config.get("num_kv_heads_repeat", 1)
+            architectures = getattr(model_config, "architectures", None) or []
+            is_deepseek_v3 = "DeepseekV3ForCausalLM" in architectures


Please remove the lines 459-463, not needed.

quic-dhirajku added 3 commits May 19, 2026 13:35

quic-rishinr mentioned this pull request May 22, 2026

Created ReplicateKVHeadTransform to integrate KV-heads replication module within Qefficient library. #625

Closed

quic-rishinr requested review from ochougul and vbaddi and removed request for vbaddi May 25, 2026 08:11

quic-rishinr added the 1.22 Release 1.22 candidate label May 25, 2026

Added RepeatKVTransform operations needed for DeepseekV3ForCausalLM.

1272fcb

Made changes to allow generic name based transformation of heads (num_attention_heads, n_heads, n_head etc). Minor edits and utils created for this task. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

quic-mamta reviewed May 25, 2026

View reviewed changes

Comment thread QEfficient/base/modeling_qeff.py

Addressed Internal Code Review comments.

b40a34d

Edited the changes as suggested by quic-mamta. Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>

quic-dhirajku marked this pull request as ready for review May 27, 2026 08:03

vbaddi requested changes May 29, 2026

View reviewed changes

quic-mamta requested changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeatkv transform#997

Repeatkv transform#997
quic-dhirajku wants to merge 5 commits into
quic:mainfrom
quic-dhirajku:repeatkv_transform

quic-dhirajku commented May 19, 2026

Uh oh!

quic-rishinr commented May 25, 2026

Uh oh!

Uh oh!

vbaddi commented May 29, 2026

Uh oh!

vbaddi May 29, 2026

Uh oh!

vbaddi May 29, 2026

Uh oh!

vbaddi May 29, 2026

Uh oh!

vbaddi May 29, 2026

Uh oh!

vbaddi May 29, 2026

Uh oh!

vbaddi commented May 29, 2026

Uh oh!

quic-mamta May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

quic-dhirajku commented May 19, 2026

Uh oh!

quic-rishinr commented May 25, 2026

Uh oh!

Uh oh!

vbaddi commented May 29, 2026

Uh oh!

vbaddi May 29, 2026

Choose a reason for hiding this comment

Uh oh!

vbaddi May 29, 2026

Choose a reason for hiding this comment

Uh oh!

vbaddi May 29, 2026

Choose a reason for hiding this comment

Uh oh!

vbaddi May 29, 2026

Choose a reason for hiding this comment

Uh oh!

vbaddi May 29, 2026

Choose a reason for hiding this comment

Uh oh!

vbaddi commented May 29, 2026

Uh oh!

quic-mamta May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

quic-mamta May 29, 2026 •

edited

Loading