fix(mcp): prefilter scoped search_code paths#756
Conversation
Signed-off-by: yutianyu111602-glitch <yutianyu111602@gmail.com>
|
Thanks for the scoped This PR is currently conflicting, so the first step is a rebase on current |
|
Thanks — we verified this is correct-by-construction: the prefilter applies the same compiled regex to the same relative path string as the post-grep filter, so it can only remove work, never change results, and nothing new reaches the shell. Two asks: (1) rebase onto current main — the filelist loop moved under 3dd2471 (#751), so your filter-continue now belongs at the top of the fwrite loop; (2) two small tests: a scoped search whose hit survives the prefilter, and a path_filter matching zero indexed files returning a clean zero result (that second one grazes the old 'grep reads stdin on empty input' class and deserves an explicit guard). Then we're happy to merge — this is genuinely useful on large indexed projects. Thanks! Update: to keep momentum on the bug backlog, we're going to carry the changes above over the line ourselves shortly — a distilled follow-up on current main implementing the notes in this thread, with you credited as |
|
Thank you for this — the prefilter idea was correct-by-construction (same compiled regex, same relative path as the post-grep filter, so it can only remove work, never change results), and it's genuinely valuable on large indexed projects. Since the branch predated the #751 filelist rewrite and had no tests, we carried it over the line in a distilled follow-up rebased onto the new writer, with regression tests and a deterministic empty-filelist guard (grep now skipped entirely when the prefilter yields zero entries — measured 6ms→0ms), crediting you as co-author on the commit. Merged as 185451a (PR #819). Closing this one in its favor — thanks again for the contribution! |
Apply path_filter while writing the scoped filelist instead of only after grep, so large indexed projects do not scan files whose hits the post-grep filter would discard anyway. The prefilter runs the SAME compiled regex via the same cbm_regexec call against the same root-relative path (Windows separators normalized first) as the post-grep filter in collect_grep_matches, which is kept as belt-and-suspenders — results-preserving by construction. Guard the edge the prefilter introduces: a path_filter matching zero indexed files now short-circuits with the normal empty result instead of invoking xargs on an empty filelist (GNU execs grep once with no operands, BSD skips). Tests: positive guard (filter matching the hit's file still returns it, out-of-filter file excluded) and zero-match guard (clean empty response, grep subprocess skipped). Both green pre-change too — perf-only change. Distilled from DeusData#756 rebased onto the post-DeusData#751 filelist writer. Co-authored-by: yutianyu111602-glitch <yutianyu111602@gmail.com> Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
Summary
path_filterwhile writing the indexed file list used by scopedsearch_codegrep.Verification
git diff --check origin/main...HEADwsl.exe bash -lc 'cd /mnt/c/code/githubstar/codebase-memory-mcp && make -f Makefile.cbm cbm -j2'