fix(pipeline): honor discovery exclusions in pkgmap/path-alias/envscan walks#811
Open
DeusData wants to merge 3 commits into
Open
fix(pipeline): honor discovery exclusions in pkgmap/path-alias/envscan walks#811DeusData wants to merge 3 commits into
DeusData wants to merge 3 commits into
Conversation
…n walks The auxiliary filesystem walks (pkgmap manifest scan, tsconfig/jsconfig path-alias discovery, env-URL scan) ignored the directory subtrees that discovery had already excluded (gitignore + skip dirs). On a huge monorepo with a gitignored vendored tree this kept the pkgmap walk busy for ~15 minutes re-traversing directories the index never uses (#792). Distills PR #793 by Patrick (@Sowiedu): a shared root-anchored, '/'-boundary exclusion predicate (cbm_pipeline_relpath_is_excluded) in pipeline_internal.h, exclusion parameters threaded through cbm_pkgmap_scan_repo / cbm_pkgmap_build_from_repo, find_alias_files / cbm_load_path_aliases_excluded and cbm_scan_project_env_urls_excluded, NULL-exclusion wrappers preserving the old signatures, and the borrowed excluded_dirs/excluded_count pair on cbm_pipeline_ctx_t. Beyond #793: - Thread the excluded list into pipeline_incremental.c's extract ctx and its path-alias load as well, so incremental runs stop walking excluded trees too (#804). - Add regression tests: boundary semantics of the exclusion predicate, plus walk-level exclusion tests for pkgmap, path-alias, and envscan (each with an unexcluded control run so they cannot pass vacuously). All are red against the unfixed sources. - Correct the envscan narrative: cbm_scan_project_env_urls has no production callers today; the exclusion plumbing is kept for consistency and exercised by tests. Closes #792. Closes #804. Co-authored-by: Patrick <11910229+Sowiedu@users.noreply.github.com> Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(pipeline): honor discovery exclusions in pkgmap/path-alias/envscan walks
Summary
The auxiliary filesystem walks — the pkgmap manifest scan (
cbm_pkgmap_scan_repo), the tsconfig/jsconfig path-alias discovery (find_alias_files), and the env-URL scan — ignored the directory subtrees that discovery had already excluded (gitignore matches + hardcoded skip dirs). On a huge monorepo with a gitignored vendored tree this kept the pkgmap walk busy for ~15 minutes re-traversing directories the index never uses (#792, priority/high).This distills PR #793 by Patrick (@Sowiedu) — thank you! The mechanism is applied as-is:
cbm_pipeline_relpath_is_excluded(static inline,pipeline_internal.h): root-anchored prefix match with a/-boundary check, sovendor_bigexcludesvendor_bigandvendor_big/lib/...but nevervendor_biggerorsrc/vendor_big.cbm_pkgmap_scan_repo/cbm_pkgmap_build_from_repo,find_alias_files/cbm_load_path_aliases_excluded, andcbm_scan_project_env_urls_excluded.cbm_pipeline_ctx_tgains a borrowedexcluded_dirs/excluded_countpair, populated from the pipeline's discovery results.Added beyond #793
pipeline_incremental.c's extract ctx did not carryexcluded_dirs, so on any incremental run with changed files,cbm_parallel_extract → merge_pkg_entries → cbm_pkgmap_scan_repostill walked the repo unexcluded — the pkgmap/envscan repo walks bypass gitignore exclusions - 15+ min apparent hang on repos with large ignored artifact dirs #792 walk survived on real incremental reindexes. The excluded list (viacbm_pipeline_get_excluded, populated by the discovery pass that routed to the incremental path) is now threaded into the incremental ctx and its path-alias load, the same borrow as the full path.pipeline_relpath_excluded_boundary— boundary semantics of the shared predicate (exact match, subtree,/-boundary vs. shared-prefix siblings, root anchoring, NULL/empty safety).pkgmap_scan_repo_honors_discovery_exclusions— fixture repo with a control manifest and one inside an excluded subtree; the excluded manifest is not ingested, the control is.envscan_walk_honors_discovery_exclusions— same shape for the env-URL walk.path_alias_loader_honors_discovery_exclusions— same shape for the tsconfig loader.cbm_scan_project_env_urls) has no production callers today — it is exercised only by tests. Its header comment now says so; the exclusion plumbing is kept for consistency with the pkgmap/path-alias walks.Red-first evidence
With the test additions kept and all
src/changes stashed (git stash push -- src/), the test build fails against the unfixed sources:With the fix restored, the pkgmap test's own log lines show the behavior directly: control run
pkgmap.scan_repo manifests=2, excluded runmanifests=1.Verification
make -f Makefile.cbm cbm—-Werrorclean.make -f Makefile.cbm lint-ci— cppcheck + clang-format + NOLINT check pass../build/c/test-runner pipeline path_alias(ASan/UBSan build) — 224 passed, 0 failed../build/c/test-runner incremental— 161 passed, 0 failed.cbm_pipeline_ctx_tconstruction sites (pipeline.c:1269,pipeline_incremental.c:820) carry.excluded_dirs/.excluded_count.Closes #792. Closes #804.