Skip to content

perf(mcp): prefilter scoped search_code filelist by path_filter#819

Merged
DeusData merged 1 commit into
mainfrom
distill/756-search-prefilter
Jul 3, 2026
Merged

perf(mcp): prefilter scoped search_code filelist by path_filter#819
DeusData merged 1 commit into
mainfrom
distill/756-search-prefilter

Conversation

@DeusData

@DeusData DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

perf(mcp): prefilter scoped search_code filelist by path_filter

Distilled from #756 (author: yutianyu111602-glitch), rebased onto the post-#751
filelist writer (NUL-delimited on Unix / newline on Windows, single fwrite loop).

What

Scoped search_code now applies path_filter while WRITING the indexed
filelist (write_scoped_filelist), instead of only after grep has scanned
every indexed file. On large indexed projects with a narrow path_filter,
grep no longer scans files whose hits the post-grep filter would discard
anyway.

Results-preserving by construction

The prefilter predicate is IDENTICAL to the post-grep filter in
collect_grep_matches: the same compiled path_regex (cbm_regcomp with
CBM_REG_EXTENDED|CBM_REG_NOSUB) run via the same cbm_regexec call against
the same root-relative path string (Windows separators normalized with
cbm_normalize_path_sep first, mirroring the post-filter). It can only
remove filelist entries; nothing new reaches the shell. The post-grep filter
is kept as belt-and-suspenders (per #756).

Empty-filelist guard

The prefilter introduces a previously unreachable edge: a path_filter
matching zero indexed files yields a 0-record filelist. xargs behavior on
empty input is platform-dependent (GNU execs grep once with no operands, BSD
skips). handle_search_code now short-circuits: when the scoped filelist has
0 records, the grep subprocess is skipped and the normal empty result is
returned directly. Verified empirically: on main this edge cannot occur (the
full filelist is always written and the post-filter drops all hits), so this
is a guard for the new edge, not a bug fix on main.

Tests (the original PR had none)

  • search_code_path_filter_prefilter_keeps_matches — positive invariant
    guard: a path_filter matching the file containing the hit still returns
    that hit (total_grep_matches == 1), the out-of-filter file is excluded.
    Green on main AND with the fix — predicate-identity / no-over-filtering
    invariant for a perf-only change.
  • search_code_path_filter_matches_nothingpath_filter matching zero
    indexed files returns a clean zero-result response (no error,
    total_grep_matches == 0, total_results == 0). Green on main too
    (grep runs and the post-filter drops everything there); with the fix the
    grep subprocess is skipped entirely (observable: 6ms → 0ms in the suite).
  • A prefilter write-count log assertion was considered and skipped:
    write_scoped_filelist has no existing count log line, and adding logging
    just for a test was out of scope.

No linked issue — no Closes.

Apply path_filter while writing the scoped filelist instead of only after
grep, so large indexed projects do not scan files whose hits the post-grep
filter would discard anyway. The prefilter runs the SAME compiled regex via
the same cbm_regexec call against the same root-relative path (Windows
separators normalized first) as the post-grep filter in collect_grep_matches,
which is kept as belt-and-suspenders — results-preserving by construction.

Guard the edge the prefilter introduces: a path_filter matching zero indexed
files now short-circuits with the normal empty result instead of invoking
xargs on an empty filelist (GNU execs grep once with no operands, BSD skips).

Tests: positive guard (filter matching the hit's file still returns it,
out-of-filter file excluded) and zero-match guard (clean empty response,
grep subprocess skipped). Both green pre-change too — perf-only change.

Distilled from #756 rebased onto the post-#751 filelist writer.

Co-authored-by: yutianyu111602-glitch <yutianyu111602@gmail.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
@DeusData DeusData enabled auto-merge July 3, 2026 22:38
@DeusData DeusData merged commit 185451a into main Jul 3, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant