[TransferEngine] Make InfiniBand Service Level configurable via MC_IB_SL#2525
[TransferEngine] Make InfiniBand Service Level configurable via MC_IB_SL#2525catyans wants to merge 1 commit into
Conversation
The RDMA endpoint hard-codes `attr.ah_attr.sl = 0`, so Mooncake traffic cannot be steered into a dedicated Virtual Lane for QoS isolation. On a shared NIC (e.g. MoE inference where Expert-Parallel all-to-all and KV-cache transfers contend for the same RDMA NIC), there is no way to separate the two flows at the SL/VL level. Peer libraries already expose this knob: NCCL `NCCL_IB_SL`, UCX `UCX_IB_SL`, NVSHMEM `NVSHMEM_IB_SL`, DeepEP `EP_OVERRIDE_RDMA_SL`. Mooncake was the only one on the KV-transfer path without it. (Note `MC_IB_TC` only sets `ah_attr.grh.traffic_class`, which does not drive VL selection on native InfiniBand.) This mirrors the existing `MC_IB_TC` implementation: - add `ib_service_level = -1` to GlobalConfig (-1 = use default 0); - parse `MC_IB_SL`, validated to the InfiniBand SL range 0-15; - apply it to `attr.ah_attr.sl` only when configured; - log it in dumpGlobalConfig. Fully backward compatible: when MC_IB_SL is unset the SL stays 0, exactly as before. Adds unit tests for the env-var parsing (default/valid/boundary/out-of-range/negative/non-numeric) and documents the knob. Closes kvcache-ai#2515
There was a problem hiding this comment.
Code Review
This pull request introduces support for configuring the InfiniBand Service Level (SL) via the MC_IB_SL environment variable (0-15) to enable QoS isolation on RDMA QPs. This change includes configuration updates, environment variable parsing, documentation, RDMA endpoint integration, and unit tests. The feedback suggests checking if the environment variable is non-empty before parsing to avoid throwing exceptions or logging spurious warnings when it is set to an empty string.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| const char* service_level_env = std::getenv("MC_IB_SL"); | ||
| if (service_level_env) { |
There was a problem hiding this comment.
To prevent throwing a std::invalid_argument exception and logging a spurious warning when MC_IB_SL is set to an empty string (which is common in some deployment environments), check if the environment variable is non-empty before parsing it. This also aligns with the pattern used elsewhere in this file (e.g., for MC_MLX5_QP_UDP_SPORTS and MC_MLX5_QP_LAG_PORT_BALANCE).
| const char* service_level_env = std::getenv("MC_IB_SL"); | |
| if (service_level_env) { | |
| const char* service_level_env = std::getenv("MC_IB_SL"); | |
| if (service_level_env && *service_level_env) { |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
The failing On the Codecov note: the 2 uncovered lines in Could a maintainer re-run these jobs with their permissions? Happy to rebase onto latest |
|
Good catch! It is a solution for congestion control. |
alogfans
left a comment
There was a problem hiding this comment.
LGTM. You can apply to TENT as well.
Description
The RDMA endpoint hard-codes
attr.ah_attr.sl = 0inrdma_endpoint.cpp, so Mooncake traffic cannot be steered into adedicated InfiniBand Virtual Lane for QoS isolation.
This matters on a shared NIC. In MoE inference (e.g. DeepSeek-V3,
Qwen3-235B) with PD disaggregation, Expert-Parallel all-to-all traffic
and KV-cache transfers contend for the same RDMA NIC. SL→VL mapping on
the switch is the standard way to separate them, but Mooncake currently
gives KV traffic no SL knob, so it cannot be moved off the default lane.
Peer libraries on this path already expose the knob:
NCCL_IB_SLUCX_IB_SLNVSHMEM_IB_SLEP_OVERRIDE_RDMA_SLNote: the existing
MC_IB_TConly setsah_attr.grh.traffic_class,which does not drive VL selection on native InfiniBand, so it does not
cover this case.
This change mirrors the existing
MC_IB_TCimplementation:ib_service_level = -1toGlobalConfig(-1= use default 0);MC_IB_SL, validated to the InfiniBand SL range0-15;attr.ah_attr.slonly when configured;dumpGlobalConfig.Fully backward compatible: when
MC_IB_SLis unset, the SL stays0, exactly as today.Closes #2515
Module
mooncake-transfer-engine)Type of Change
How Has This Been Tested?
Added unit tests in
tests/config_test.cppfor theMC_IB_SLenv-varparsing, covering: unset default (
-1), valid override, min/maxboundaries (
0/15), out-of-range (16), negative, and non-numericinput (all reject and preserve the prior value). Mirrors the existing
MC_PKEY_INDEXtest pattern.Checklist
.clang-format)pre-commit run --all-filesand all hooks passAI Assistance Disclosure