perf(mcp): prefilter scoped search_code filelist by path_filter#819
Merged
Conversation
Apply path_filter while writing the scoped filelist instead of only after grep, so large indexed projects do not scan files whose hits the post-grep filter would discard anyway. The prefilter runs the SAME compiled regex via the same cbm_regexec call against the same root-relative path (Windows separators normalized first) as the post-grep filter in collect_grep_matches, which is kept as belt-and-suspenders — results-preserving by construction. Guard the edge the prefilter introduces: a path_filter matching zero indexed files now short-circuits with the normal empty result instead of invoking xargs on an empty filelist (GNU execs grep once with no operands, BSD skips). Tests: positive guard (filter matching the hit's file still returns it, out-of-filter file excluded) and zero-match guard (clean empty response, grep subprocess skipped). Both green pre-change too — perf-only change. Distilled from #756 rebased onto the post-#751 filelist writer. Co-authored-by: yutianyu111602-glitch <yutianyu111602@gmail.com> Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
perf(mcp): prefilter scoped search_code filelist by path_filter
Distilled from #756 (author: yutianyu111602-glitch), rebased onto the post-#751
filelist writer (NUL-delimited on Unix / newline on Windows, single fwrite loop).
What
Scoped
search_codenow appliespath_filterwhile WRITING the indexedfilelist (
write_scoped_filelist), instead of only after grep has scannedevery indexed file. On large indexed projects with a narrow
path_filter,grep no longer scans files whose hits the post-grep filter would discard
anyway.
Results-preserving by construction
The prefilter predicate is IDENTICAL to the post-grep filter in
collect_grep_matches: the same compiledpath_regex(cbm_regcompwithCBM_REG_EXTENDED|CBM_REG_NOSUB) run via the samecbm_regexeccall againstthe same root-relative path string (Windows separators normalized with
cbm_normalize_path_sepfirst, mirroring the post-filter). It can onlyremove filelist entries; nothing new reaches the shell. The post-grep filter
is kept as belt-and-suspenders (per #756).
Empty-filelist guard
The prefilter introduces a previously unreachable edge: a
path_filtermatching zero indexed files yields a 0-record filelist.
xargsbehavior onempty input is platform-dependent (GNU execs grep once with no operands, BSD
skips).
handle_search_codenow short-circuits: when the scoped filelist has0 records, the grep subprocess is skipped and the normal empty result is
returned directly. Verified empirically: on main this edge cannot occur (the
full filelist is always written and the post-filter drops all hits), so this
is a guard for the new edge, not a bug fix on main.
Tests (the original PR had none)
search_code_path_filter_prefilter_keeps_matches— positive invariantguard: a
path_filtermatching the file containing the hit still returnsthat hit (
total_grep_matches == 1), the out-of-filter file is excluded.Green on main AND with the fix — predicate-identity / no-over-filtering
invariant for a perf-only change.
search_code_path_filter_matches_nothing—path_filtermatching zeroindexed files returns a clean zero-result response (no error,
total_grep_matches == 0,total_results == 0). Green on main too(grep runs and the post-filter drops everything there); with the fix the
grep subprocess is skipped entirely (observable: 6ms → 0ms in the suite).
write_scoped_filelisthas no existing count log line, and adding loggingjust for a test was out of scope.
No linked issue — no
Closes.