CI: Detect changed paths and gate cuda.bindings tests#1908
Open
cpcloud wants to merge 7 commits intoNVIDIA:mainfrom
Open
CI: Detect changed paths and gate cuda.bindings tests#1908cpcloud wants to merge 7 commits intoNVIDIA:mainfrom
cpcloud wants to merge 7 commits intoNVIDIA:mainfrom
Conversation
|
Add a detect-changes job to ci.yml that classifies which top-level modules were touched by a PR (cuda_bindings, cuda_core, cuda_pathfinder, cuda_python, cuda_python_test_helpers, shared infra) using dorny/paths-filter. The job emits composed gating outputs that account for the dependency graph (pathfinder -> bindings -> core). Thread a new skip-bindings-test input through the reusable test-wheel workflows so cuda.bindings tests (and their Cython counterparts) are skipped when the detect-changes output for test_bindings is false. For PRs that only touch cuda_core this skips the expensive bindings suite while still running cuda.core and cuda.pathfinder tests against the wheel built in the current CI run. Split the existing SKIP_CUDA_BINDINGS_TEST env var in ci/tools/env-vars into two orthogonal flags: USE_BACKPORT_BINDINGS drives the backport branch download path (CTK major mismatch), while SKIP_CUDA_BINDINGS_TEST remains the test-time gate. This lets path-filter-based skips reuse the existing SKIP_CUDA_BINDINGS_TEST plumbing without triggering a cross-branch artifact fetch. Non-PR events (push to main, tag, schedule, workflow_dispatch) still exercise the full pipeline. Refs NVIDIA#299
Two fixes from code review: 1. Set `base: main` on dorny/paths-filter so backport PRs targeting non-default branches still diff against main for path detection. 2. Decouple SKIP_CYTHON_TEST from SKIP_BINDINGS_TEST_OVERRIDE. The path-filter override only skips bindings tests; cuda.core Cython tests should still run on core-only PRs. Cython skip is now driven solely by CTK minor-version mismatch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dorny/paths-filter v3 already diffs against the repo default branch for non-default-branch pushes. Explicit base: main was redundant and would produce wrong baselines for backport PRs targeting release branches (all files seen as changed vs main). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove third-party dorny/paths-filter action dependency. Use git merge-base + git diff --name-only + grep instead. Same behavior, zero supply-chain risk, full control over base ref resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
67f8baf to
d205955
Compare
rwgk
reviewed
Apr 15, 2026
Collaborator
rwgk
left a comment
There was a problem hiding this comment.
Generated with Cursor GPT-5.4 Extra High Fast
Reviewed the final tree for PR 1908 at commit d205955
actionlint passed on the touched workflows, so the notes below are about CI behavior / control flow, not YAML syntax.
1. High: detect-changes can fail without clearly failing the final required status
Evidence
- All three test jobs now depend on
detect-changes:.github/workflows/ci.yml:284.github/workflows/ci.yml:309.github/workflows/ci.yml:334
- The terminal
checksjob does not depend ondetect-changesdirectly and only inspects:doctest-linux-64test-linux-aarch64test-windows
at.github/workflows/ci.yml:365
- The shell logic in
checksonly treatsfailureandcancelledas errors:.github/workflows/ci.yml:394.github/workflows/ci.yml:398
Why this matters
- If
detect-changesfails, the downstream test jobs can becomeskippedbecause a required dependency did not complete successfully. - In that state,
checksmay still exit 0 because it does not treat thoseskippedresults as an error. - Result: a broken path-detection step could suppress or reduce CI coverage without producing a clearly failing final status.
Agent hint
- Re-check GitHub Actions
needssemantics foralways()plus an upstream failed prerequisite leading to downstreamskipped. - The most direct fix is probably one of:
- add
detect-changestochecks.needsand explicitly requireneeds.detect-changes.result == 'success' - or treat unexpected
skippedtest jobs as fatal whendoc_only != true
- add
- Preserve the intended
[doc-only]skip behavior while making accidental skips fail closed.
2. Medium: diff baseline is hardcoded to origin/main
Evidence
detect-changescomputes the merge base againstorigin/mainat.github/workflows/ci.yml:118.- The repo also maintains a release/backport branch in
ci/versions.yml:4:backport_branch: "12.9.x"
Why this matters
- For PR 1908, whose base is
main, this is fine. - If the same
pull-request/*CI path is ever used for PRs targeting12.9.xor another non-mainbranch, changed-path classification will be wrong. - Wrong classification can suppress outputs such as
test_bindings, which means under-testing on backport / release PRs.
Agent hint
- First confirm whether
pull-request/*branches are ever created for non-mainPR bases in this repo. - If yes, derive the true PR base branch before
git merge-baseinstead of assumingmain. - Because this workflow runs on pushes to
pull-request/*,github.base_refmay not be available; one workable approach is to resolve the PR number fromgithub.ref_nameand querygh pr view --json baseRefName. - If non-
mainPR bases are impossible by construction, add a short comment documenting that invariant so future readers do not "fix" it incorrectly. - This area already had churn within the PR history, so the next agent should validate the actual branch-generation semantics before changing behavior.
Bottom line
- I did not find a concrete bug in the
USE_BACKPORT_BINDINGS/SKIP_CUDA_BINDINGS_TESTsplit itself. - The two findings above are the only items that looked important enough to hand back to the PR author.
copy-pr-bot mirrors every PR to a pull-request/<N> branch regardless of whether the upstream PR targets main or a backport branch such as 12.9.x. Diffing against origin/main unconditionally therefore misclassifies changed paths on backport PRs and silently suppresses the cuda.bindings test matrix. Look up the real base ref via nv-gha-runners/get-pr-info (already used elsewhere in this repo) and pass it to git merge-base so the changed- paths classification matches the PR's actual target.
The checks job used if:always() plus shell-level result inspection of doc and the three test jobs, treating skipped as non-fatal to preserve intentional [doc-only] skips. detect-changes is a needs prerequisite of every test job but was absent from checks.needs, so a failure in the gating step silently cascaded into test-job skips and a green final status. Add detect-changes to checks.needs and require its result to be success. The legitimate doc-only skip path is unaffected because that leaves detect-changes itself successful.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
detect-changesjob toci.ymlusing nativegit merge-base+git diff --name-onlyto classify which top-level modules changed in a PRcuda_pathfinder → cuda_bindings → cuda_coreskip-bindings-testinput throughtest-wheel-linux.ymlandtest-wheel-windows.ymlso cuda.bindings tests are skipped when only unrelated modules changedSKIP_CUDA_BINDINGS_TESTenv var into two orthogonal flags:USE_BACKPORT_BINDINGS(artifact fetch for CTK major mismatch) andSKIP_CUDA_BINDINGS_TEST(test-time gate)Closes #299
🤖 Generated with Claude Code