docs: query-spec honesty-fix for live search field set (#168 Direction B)#176
Conversation
…org#168 Direction B) isamplesorg#168 baseline (isamplesorg#167) showed that swapping doSearch to sample_facets_v2 (Direction A) recovers real recall — pottery Cyprus flips from 0 to 50 results — but exceeds the locked latency thresholds (cold pottery 12s vs ≤5s, multi-term 15s vs ≤6s). Native DuckDB benchmark showed CTE optimization is 8x faster, but in-browser DuckDB-WASM cold-cache HTTP range fetches dominate cost, evaporating the 8x win. Per the isamplesorg#168 decision rule (latency thresholds drive direction): land Direction B — keep doSearch on samples_map_lite, narrow query-spec.qmd to honestly describe what the live Explorer searches today, and point forward to isamplesorg#169 / SEARCH_INDEX_V1.md as the path that lifts both the recall gap and the latency gap. Refs isamplesorg#165, isamplesorg#167, isamplesorg#168, isamplesorg#169. Closes isamplesorg#168. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rdhyee
left a comment
There was a problem hiding this comment.
Reviewed the query-spec honesty fix. The wording change itself looks correct and matches the live label + place_name behavior.
One scope note: query-spec.qmd still says substrates that cannot index the full field set MUST surface the limitation in UI, while this PR explicitly remains doc-only and does not add the inline UI hint. I am treating that as a follow-up rather than a blocker for this doc honesty fix.
|
Review result: the query-spec honesty fix itself looks correct and matches the live Scope note: |
…samplesorg#176) query-spec.qmd §3.2 says: "Substrates that can't index all 15 fields MUST document which subset they cover and surface the limitation in UI." The original PR isamplesorg#176 only updated the doc text and left the UI side undone. Codex review correctly flagged that as half a fix. Adds: - A .search-help line under the search bar saying "Searches sample labels and place names only — descriptions are not yet indexed." - Forward link to isamplesorg#169 (substrate FTS) so users see the limitation is tracked, not abandoned. - Replaces the placeholder example "pottery Cyprus" (which returns 0 results in the current substrate per isamplesorg#167 baseline) with "basalt California" which actually matches. Inline styles on the .search-help div to avoid touching styles.css. Refs isamplesorg#167, isamplesorg#168, isamplesorg#169, isamplesorg#176. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Added UI hint (commit `006d335`)Codex review correctly flagged that `query-spec.qmd` §3.2's MUST clause ("surface the limitation in UI") wasn't closed by the doc-only fix. Adding the inline hint here so the spec normative requirement lands in the same PR. `explorer.qmd` changes (+5 lines, inline style only — no styles.css touch):
Diff: +5/-1. |
Summary
Direction B of #168, decided mechanically by the locked latency thresholds against measured #167 baseline data. Doc-only PR — no code changes.
The current Interactive Explorer searches `label` + `place_name` against `samples_map_lite.parquet`. `query-spec.qmd:225` previously claimed `label` + `description` + `place_name` — a known mismatch. This PR corrects the spec to match the live behavior and forward-points to #169 / SEARCH_INDEX_V1.md as the substrate work that lifts the gap.
What the data said
I implemented Direction A (swap to `sample_facets_v2.parquet`) on a side branch, ran the perf-smoke against three forms (baseline, naive LEFT JOIN, CTE-then-keyed-join), and posted the comparison on #168 (comment). Headlines:
Recall improvement is real (`pottery Cyprus` flips 0 → 50 results, `Çatalhöyük` flips 0 → 50, `100%` flips 0 → 50), but latency cost exceeds the locked threshold for the bare-text and multi-term cases. Native DuckDB benchmark showed CTE-then-join is 8× faster than naive LEFT JOIN; in-browser, cold-cache HTTP range fetches dominate cost, so the optimization evaporates. Direction A is structurally too slow on the current parquets.
Per the #168 decision rule (latency thresholds drive direction): land Direction B.
What this PR changes
What this PR explicitly does not do
Test plan
Closes #168. Refs #165, #167, #169, PR #173.
🤖 Generated with Claude Code