Skip to content

fix(pipeline): honor discovery exclusions in pkgmap/path-alias/envscan walks#811

Open
DeusData wants to merge 3 commits into
mainfrom
distill/793-walk-exclusions
Open

fix(pipeline): honor discovery exclusions in pkgmap/path-alias/envscan walks#811
DeusData wants to merge 3 commits into
mainfrom
distill/793-walk-exclusions

Conversation

@DeusData

@DeusData DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

fix(pipeline): honor discovery exclusions in pkgmap/path-alias/envscan walks

Summary

The auxiliary filesystem walks — the pkgmap manifest scan (cbm_pkgmap_scan_repo), the tsconfig/jsconfig path-alias discovery (find_alias_files), and the env-URL scan — ignored the directory subtrees that discovery had already excluded (gitignore matches + hardcoded skip dirs). On a huge monorepo with a gitignored vendored tree this kept the pkgmap walk busy for ~15 minutes re-traversing directories the index never uses (#792, priority/high).

This distills PR #793 by Patrick (@Sowiedu) — thank you! The mechanism is applied as-is:

  • New shared predicate cbm_pipeline_relpath_is_excluded (static inline, pipeline_internal.h): root-anchored prefix match with a /-boundary check, so vendor_big excludes vendor_big and vendor_big/lib/... but never vendor_bigger or src/vendor_big.
  • Exclusion parameters threaded through cbm_pkgmap_scan_repo / cbm_pkgmap_build_from_repo, find_alias_files / cbm_load_path_aliases_excluded, and cbm_scan_project_env_urls_excluded.
  • NULL-exclusion wrappers preserve the old signatures for existing callers.
  • cbm_pipeline_ctx_t gains a borrowed excluded_dirs/excluded_count pair, populated from the pipeline's discovery results.

Added beyond #793

  1. Incremental plumbing (pipeline: incremental reindex still walks excluded trees in pkgmap (residue of #792) #804). pipeline_incremental.c's extract ctx did not carry excluded_dirs, so on any incremental run with changed files, cbm_parallel_extract → merge_pkg_entries → cbm_pkgmap_scan_repo still walked the repo unexcluded — the pkgmap/envscan repo walks bypass gitignore exclusions - 15+ min apparent hang on repos with large ignored artifact dirs #792 walk survived on real incremental reindexes. The excluded list (via cbm_pipeline_get_excluded, populated by the discovery pass that routed to the incremental path) is now threaded into the incremental ctx and its path-alias load, the same borrow as the full path.
  2. Regression tests (the PR had none; repo rule is reproduce-first):
    • pipeline_relpath_excluded_boundary — boundary semantics of the shared predicate (exact match, subtree, /-boundary vs. shared-prefix siblings, root anchoring, NULL/empty safety).
    • pkgmap_scan_repo_honors_discovery_exclusions — fixture repo with a control manifest and one inside an excluded subtree; the excluded manifest is not ingested, the control is.
    • envscan_walk_honors_discovery_exclusions — same shape for the env-URL walk.
    • path_alias_loader_honors_discovery_exclusions — same shape for the tsconfig loader.
    • Every test runs an unexcluded control first (both fixtures found) so the exclusion assertion cannot pass vacuously.
  3. Narrative correction. The envscan walker (cbm_scan_project_env_urls) has no production callers today — it is exercised only by tests. Its header comment now says so; the exclusion plumbing is kept for consistency with the pkgmap/path-alias walks.

Red-first evidence

With the test additions kept and all src/ changes stashed (git stash push -- src/), the test build fails against the unfixed sources:

tests/test_pipeline.c:4801:17: error: call to undeclared function 'cbm_pipeline_relpath_is_excluded'
tests/test_pipeline.c:4862:44: error: too many arguments to function call, expected 2, have 4   (cbm_pkgmap_scan_repo)
tests/test_pipeline.c:4907:13: error: call to undeclared function 'cbm_scan_project_env_urls_excluded'
tests/test_path_alias.c:374:12: error: call to undeclared function 'cbm_load_path_aliases_excluded'

With the fix restored, the pkgmap test's own log lines show the behavior directly: control run pkgmap.scan_repo manifests=2, excluded run manifests=1.

Verification

  • make -f Makefile.cbm cbm-Werror clean.
  • make -f Makefile.cbm lint-ci — cppcheck + clang-format + NOLINT check pass.
  • ./build/c/test-runner pipeline path_alias (ASan/UBSan build) — 224 passed, 0 failed.
  • ./build/c/test-runner incremental — 161 passed, 0 failed.
  • Grep-verified both cbm_pipeline_ctx_t construction sites (pipeline.c:1269, pipeline_incremental.c:820) carry .excluded_dirs/.excluded_count.

Closes #792. Closes #804.

DeusData and others added 3 commits July 3, 2026 19:57
…n walks

The auxiliary filesystem walks (pkgmap manifest scan, tsconfig/jsconfig
path-alias discovery, env-URL scan) ignored the directory subtrees that
discovery had already excluded (gitignore + skip dirs). On a huge
monorepo with a gitignored vendored tree this kept the pkgmap walk busy
for ~15 minutes re-traversing directories the index never uses (#792).

Distills PR #793 by Patrick (@Sowiedu): a shared root-anchored,
'/'-boundary exclusion predicate (cbm_pipeline_relpath_is_excluded) in
pipeline_internal.h, exclusion parameters threaded through
cbm_pkgmap_scan_repo / cbm_pkgmap_build_from_repo, find_alias_files /
cbm_load_path_aliases_excluded and cbm_scan_project_env_urls_excluded,
NULL-exclusion wrappers preserving the old signatures, and the borrowed
excluded_dirs/excluded_count pair on cbm_pipeline_ctx_t.

Beyond #793:
- Thread the excluded list into pipeline_incremental.c's extract ctx
  and its path-alias load as well, so incremental runs stop walking
  excluded trees too (#804).
- Add regression tests: boundary semantics of the exclusion predicate,
  plus walk-level exclusion tests for pkgmap, path-alias, and envscan
  (each with an unexcluded control run so they cannot pass vacuously).
  All are red against the unfixed sources.
- Correct the envscan narrative: cbm_scan_project_env_urls has no
  production callers today; the exclusion plumbing is kept for
  consistency and exercised by tests.

Closes #792. Closes #804.

Co-authored-by: Patrick <11910229+Sowiedu@users.noreply.github.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
@DeusData DeusData enabled auto-merge July 3, 2026 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant