Skip to content

Rework Linker dispatching for cross-major nvJitLink/driver skew#1911

Open
cpcloud wants to merge 4 commits intoNVIDIA:mainfrom
cpcloud:linker-dispatch-rework-712
Open

Rework Linker dispatching for cross-major nvJitLink/driver skew#1911
cpcloud wants to merge 4 commits intoNVIDIA:mainfrom
cpcloud:linker-dispatch-rework-712

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 14, 2026

Summary

  • Replaces module-level "decide once" backend selection with per-Linker-instance dispatch at __init__ time
  • Factors decision into pure _choose_backend() helper for GPU-free unit testing
  • Handles nvJitLink/driver major-version mismatches: falls back to driver linker for non-LTO linking, raises RuntimeError for LTO when backends are incompatible
  • Probes driver_version() lazily — environments with nvJitLink but no driver (build containers) still work
  • _probe_nvjitlink() cached, warns at most once when nvJitLink is absent

Breaking change: options.link_time_optimization=True with nvJitLink absent now raises RuntimeError instead of silently passing CU_JIT_LTO to the driver (which was not real LTO linking).

Decision matrix

driver nvJitLink ltoir input lto/ptx result
any None no no driver
any None yes/lto raise
M (M,*) any any nvJitLink
D≠N (N,*) no no driver fallback
D≠N (N,*) yes/lto raise
None available any any nvJitLink

Test plan

  • GPU-free parameterized tests for full decision matrix (test_linker_dispatch.py)
  • Test helpers handle driver-version failure gracefully
  • CI: existing GPU tests pass with per-instance dispatch
  • CI: cross-major behavior verified (requires multiple CTK versions)

Closes #712

🤖 Generated with Claude Code

@cpcloud cpcloud added this to the cuda.core v1.0.0 milestone Apr 14, 2026
@cpcloud cpcloud added enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module breaking Breaking changes are introduced labels Apr 14, 2026
@cpcloud cpcloud self-assigned this Apr 14, 2026
@github-actions
Copy link
Copy Markdown

cpcloud and others added 3 commits April 14, 2026 18:15
Replace the module-level "decide once, use everywhere" nvJitLink-vs-driver
choice with a per-Linker-instance decision that considers the CUDA driver
major version, nvJitLink's availability and major version, the input code
types, and whether link-time optimization is requested.

The dispatch is factored into a pure helper `_choose_backend()` that is
fully unit-testable without a GPU. Its decision matrix:

- no nvJitLink, no LTO  -> driver
- matching majors       -> nvJitLink
- cross-major, no LTO   -> driver (nvJitLink output may not be loadable)
- LTO + no nvJitLink    -> RuntimeError
- LTO + cross-major     -> RuntimeError

This resolves the cross-major-driver scenario described in NVIDIA#712, where an
nvJitLink 12.x may produce a CUBIN the driver 13.x (or vice versa) cannot
load. The previous code committed to nvJitLink unconditionally when it was
importable.

Tests:

- `tests/test_linker_dispatch.py` parametrizes the entire matrix against
  `_choose_backend()` with mocked versions (no GPU, no driver required).
- `tests/test_linker.py::TestLinkerDispatch` drives the same decision
  through the real `Linker` constructor via monkeypatched version probes.
- `tests/test_optional_dependency_imports.py` is updated to exercise the
  new `_probe_nvjitlink()` helper in place of the removed
  `_decide_nvjitlink_or_driver()`.
- `tests/test_program.py` and `tests/test_linker.py` use a small local
  helper to compute the effective backend for the current environment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
driver_version() was called unconditionally during Linker.__init__,
which fails in environments where nvJitLink is installed but the
CUDA driver is absent (e.g., build containers). Now catches the
exception and sets driver_major=None. When driver_major is unknown
and nvJitLink is available, optimistically selects the nvJitLink
backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test helpers calling driver_version() at module scope would crash
in no-driver environments before test collection. Mirror the
production lazy-probe pattern: catch exceptions and pass None.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpcloud cpcloud force-pushed the linker-dispatch-rework-712 branch from 9064059 to 0ca3034 Compare April 14, 2026 22:15
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking changes are introduced cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Re-work on Linker dispatching logic

1 participant