Skip to content

fix(mcp): prefilter scoped search_code paths#756

Closed
yutianyu111602-glitch wants to merge 1 commit into
DeusData:mainfrom
yutianyu111602-glitch:codex/search-code-path-filter-dco
Closed

fix(mcp): prefilter scoped search_code paths#756
yutianyu111602-glitch wants to merge 1 commit into
DeusData:mainfrom
yutianyu111602-glitch:codex/search-code-path-filter-dco

Conversation

@yutianyu111602-glitch

Copy link
Copy Markdown
Contributor

Summary

  • Apply path_filter while writing the indexed file list used by scoped search_code grep.
  • Keep the existing post-grep path filter as a defensive fallback.
  • Normalize indexed Windows paths before regex matching so filter semantics stay aligned with result paths.

Verification

  • git diff --check origin/main...HEAD
  • wsl.exe bash -lc 'cd /mnt/c/code/githubstar/codebase-memory-mcp && make -f Makefile.cbm cbm -j2'

Signed-off-by: yutianyu111602-glitch <yutianyu111602@gmail.com>
@DeusData DeusData added bug Something isn't working ux/behavior Display bugs, docs, adoption UX priority/normal Standard review queue; useful PR with ordinary maintainer urgency. labels Jul 3, 2026
@DeusData

DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Thanks for the scoped search_code path-filter fix. Triage: UX/search correctness bug.

This PR is currently conflicting, so the first step is a rebase on current main. After that, review will check that prefiltering and the existing post-filter agree across POSIX and Windows-normalized paths, and that scoped searches do not accidentally widen access.

@DeusData

DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Thanks — we verified this is correct-by-construction: the prefilter applies the same compiled regex to the same relative path string as the post-grep filter, so it can only remove work, never change results, and nothing new reaches the shell. Two asks: (1) rebase onto current main — the filelist loop moved under 3dd2471 (#751), so your filter-continue now belongs at the top of the fwrite loop; (2) two small tests: a scoped search whose hit survives the prefilter, and a path_filter matching zero indexed files returning a clean zero result (that second one grazes the old 'grep reads stdin on empty input' class and deserves an explicit guard). Then we're happy to merge — this is genuinely useful on large indexed projects. Thanks!


Update: to keep momentum on the bug backlog, we're going to carry the changes above over the line ourselves shortly — a distilled follow-up on current main implementing the notes in this thread, with you credited as Co-authored-by on the commit, and this PR closed referencing it. If you'd prefer to push the update yourself, just reply within the next couple of days and we'll gladly take yours instead. Thanks again for the contribution!

@DeusData

DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Thank you for this — the prefilter idea was correct-by-construction (same compiled regex, same relative path as the post-grep filter, so it can only remove work, never change results), and it's genuinely valuable on large indexed projects. Since the branch predated the #751 filelist rewrite and had no tests, we carried it over the line in a distilled follow-up rebased onto the new writer, with regression tests and a deterministic empty-filelist guard (grep now skipped entirely when the prefilter yields zero entries — measured 6ms→0ms), crediting you as co-author on the commit. Merged as 185451a (PR #819). Closing this one in its favor — thanks again for the contribution!

@DeusData DeusData closed this Jul 3, 2026
SyntaxSawdust pushed a commit to SyntaxSawdust/codebase-memory-mcp that referenced this pull request Jul 3, 2026
Apply path_filter while writing the scoped filelist instead of only after
grep, so large indexed projects do not scan files whose hits the post-grep
filter would discard anyway. The prefilter runs the SAME compiled regex via
the same cbm_regexec call against the same root-relative path (Windows
separators normalized first) as the post-grep filter in collect_grep_matches,
which is kept as belt-and-suspenders — results-preserving by construction.

Guard the edge the prefilter introduces: a path_filter matching zero indexed
files now short-circuits with the normal empty result instead of invoking
xargs on an empty filelist (GNU execs grep once with no operands, BSD skips).

Tests: positive guard (filter matching the hit's file still returns it,
out-of-filter file excluded) and zero-match guard (clean empty response,
grep subprocess skipped). Both green pre-change too — perf-only change.

Distilled from DeusData#756 rebased onto the post-DeusData#751 filelist writer.

Co-authored-by: yutianyu111602-glitch <yutianyu111602@gmail.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working priority/normal Standard review queue; useful PR with ordinary maintainer urgency. ux/behavior Display bugs, docs, adoption UX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants