Skip to content

CI: Detect changed paths and gate cuda.bindings tests#1908

Open
cpcloud wants to merge 7 commits intoNVIDIA:mainfrom
cpcloud:ci-detect-changes-299
Open

CI: Detect changed paths and gate cuda.bindings tests#1908
cpcloud wants to merge 7 commits intoNVIDIA:mainfrom
cpcloud:ci-detect-changes-299

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 14, 2026

Summary

  • Adds a detect-changes job to ci.yml using native git merge-base + git diff --name-only to classify which top-level modules changed in a PR
  • Composes gating outputs respecting the dependency graph: cuda_pathfinder → cuda_bindings → cuda_core
  • Threads skip-bindings-test input through test-wheel-linux.yml and test-wheel-windows.yml so cuda.bindings tests are skipped when only unrelated modules changed
  • Splits the overloaded SKIP_CUDA_BINDINGS_TEST env var into two orthogonal flags: USE_BACKPORT_BINDINGS (artifact fetch for CTK major mismatch) and SKIP_CUDA_BINDINGS_TEST (test-time gate)
  • Non-PR events (push to main, tag, schedule) unconditionally run everything
  • Build-side gating outputs are emitted but not yet consumed (follow-up PR)

Closes #299

🤖 Generated with Claude Code

@cpcloud cpcloud added this to the cuda.core v1.0.0 milestone Apr 14, 2026
@cpcloud cpcloud added enhancement Any code-related improvements P0 High priority - Must do! CI/CD CI/CD infrastructure labels Apr 14, 2026
@cpcloud cpcloud self-assigned this Apr 14, 2026
@github-actions
Copy link
Copy Markdown

cpcloud and others added 5 commits April 14, 2026 18:15
Add a detect-changes job to ci.yml that classifies which top-level
modules were touched by a PR (cuda_bindings, cuda_core, cuda_pathfinder,
cuda_python, cuda_python_test_helpers, shared infra) using
dorny/paths-filter. The job emits composed gating outputs that account
for the dependency graph (pathfinder -> bindings -> core).

Thread a new skip-bindings-test input through the reusable test-wheel
workflows so cuda.bindings tests (and their Cython counterparts) are
skipped when the detect-changes output for test_bindings is false. For
PRs that only touch cuda_core this skips the expensive bindings suite
while still running cuda.core and cuda.pathfinder tests against the
wheel built in the current CI run.

Split the existing SKIP_CUDA_BINDINGS_TEST env var in ci/tools/env-vars
into two orthogonal flags: USE_BACKPORT_BINDINGS drives the backport
branch download path (CTK major mismatch), while SKIP_CUDA_BINDINGS_TEST
remains the test-time gate. This lets path-filter-based skips reuse the
existing SKIP_CUDA_BINDINGS_TEST plumbing without triggering a
cross-branch artifact fetch.

Non-PR events (push to main, tag, schedule, workflow_dispatch) still
exercise the full pipeline.

Refs NVIDIA#299
Two fixes from code review:

1. Set `base: main` on dorny/paths-filter so backport PRs targeting
   non-default branches still diff against main for path detection.

2. Decouple SKIP_CYTHON_TEST from SKIP_BINDINGS_TEST_OVERRIDE. The
   path-filter override only skips bindings tests; cuda.core Cython
   tests should still run on core-only PRs. Cython skip is now driven
   solely by CTK minor-version mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dorny/paths-filter v3 already diffs against the repo default branch
for non-default-branch pushes. Explicit base: main was redundant and
would produce wrong baselines for backport PRs targeting release
branches (all files seen as changed vs main).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove third-party dorny/paths-filter action dependency. Use
git merge-base + git diff --name-only + grep instead. Same
behavior, zero supply-chain risk, full control over base ref
resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cpcloud cpcloud force-pushed the ci-detect-changes-299 branch from 67f8baf to d205955 Compare April 14, 2026 22:15
Copy link
Copy Markdown
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated with Cursor GPT-5.4 Extra High Fast


Reviewed the final tree for PR 1908 at commit d205955

actionlint passed on the touched workflows, so the notes below are about CI behavior / control flow, not YAML syntax.

1. High: detect-changes can fail without clearly failing the final required status

Evidence

  • All three test jobs now depend on detect-changes:
    • .github/workflows/ci.yml:284
    • .github/workflows/ci.yml:309
    • .github/workflows/ci.yml:334
  • The terminal checks job does not depend on detect-changes directly and only inspects:
    • doc
    • test-linux-64
    • test-linux-aarch64
    • test-windows
      at .github/workflows/ci.yml:365
  • The shell logic in checks only treats failure and cancelled as errors:
    • .github/workflows/ci.yml:394
    • .github/workflows/ci.yml:398

Why this matters

  • If detect-changes fails, the downstream test jobs can become skipped because a required dependency did not complete successfully.
  • In that state, checks may still exit 0 because it does not treat those skipped results as an error.
  • Result: a broken path-detection step could suppress or reduce CI coverage without producing a clearly failing final status.

Agent hint

  • Re-check GitHub Actions needs semantics for always() plus an upstream failed prerequisite leading to downstream skipped.
  • The most direct fix is probably one of:
    • add detect-changes to checks.needs and explicitly require needs.detect-changes.result == 'success'
    • or treat unexpected skipped test jobs as fatal when doc_only != true
  • Preserve the intended [doc-only] skip behavior while making accidental skips fail closed.

2. Medium: diff baseline is hardcoded to origin/main

Evidence

  • detect-changes computes the merge base against origin/main at .github/workflows/ci.yml:118.
  • The repo also maintains a release/backport branch in ci/versions.yml:4:
    • backport_branch: "12.9.x"

Why this matters

  • For PR 1908, whose base is main, this is fine.
  • If the same pull-request/* CI path is ever used for PRs targeting 12.9.x or another non-main branch, changed-path classification will be wrong.
  • Wrong classification can suppress outputs such as test_bindings, which means under-testing on backport / release PRs.

Agent hint

  • First confirm whether pull-request/* branches are ever created for non-main PR bases in this repo.
  • If yes, derive the true PR base branch before git merge-base instead of assuming main.
  • Because this workflow runs on pushes to pull-request/*, github.base_ref may not be available; one workable approach is to resolve the PR number from github.ref_name and query gh pr view --json baseRefName.
  • If non-main PR bases are impossible by construction, add a short comment documenting that invariant so future readers do not "fix" it incorrectly.
  • This area already had churn within the PR history, so the next agent should validate the actual branch-generation semantics before changing behavior.

Bottom line

  • I did not find a concrete bug in the USE_BACKPORT_BINDINGS / SKIP_CUDA_BINDINGS_TEST split itself.
  • The two findings above are the only items that looked important enough to hand back to the PR author.

cpcloud added 2 commits April 16, 2026 12:49
copy-pr-bot mirrors every PR to a pull-request/<N> branch regardless
of whether the upstream PR targets main or a backport branch such as
12.9.x. Diffing against origin/main unconditionally therefore
misclassifies changed paths on backport PRs and silently suppresses
the cuda.bindings test matrix.

Look up the real base ref via nv-gha-runners/get-pr-info (already used
elsewhere in this repo) and pass it to git merge-base so the changed-
paths classification matches the PR's actual target.
The checks job used if:always() plus shell-level result inspection of
doc and the three test jobs, treating skipped as non-fatal to preserve
intentional [doc-only] skips. detect-changes is a needs prerequisite
of every test job but was absent from checks.needs, so a failure in
the gating step silently cascaded into test-job skips and a green
final status.

Add detect-changes to checks.needs and require its result to be
success. The legitimate doc-only skip path is unaffected because that
leaves detect-changes itself successful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: Detect if build/test is needed

2 participants