explorer: search perf-smoke baseline (#167)#173
Conversation
…amplesorg#167) Adds per-search instrumentation: - performance.mark + measure around the DuckDB query - structured console.log on every completion (success or error) with elapsed_ms, results_count, bytes_transfer, bytes_body, and filter composition flags - search measures appended to the existing ?perf=1 panel Cross-origin transferSize is 0 without Timing-Allow-Origin on R2; encodedBodySize is reported as a fallback so we capture something regardless of TAO. Validation of TAO on data.isamples.org tracked in the issue thread. No behavior change to the search itself. Track 1a of isamplesorg#165. Refs isamplesorg#165, isamplesorg#167. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…esorg#167) tests/test_search_perf.py drives the locked canonical query set against the deployed (or local-preview) Explorer, captures the structured isamples.search console events emitted by the new instrumentation, and writes a per-run JSON artifact to tests/search_baseline_<UTC_DATE>.json. Each query opens a fresh browser context to capture true file-cold timings, then runs the same query a second time on the same page for a warm measurement. Filter-composition queries set source-only filters before searching. Slow (~8-15 min). Run as a release-time benchmark, not on every CI invocation. The baseline numbers will be posted as a comment on isamplesorg#167 once the instrumentation hits a deployed preview. Refs isamplesorg#165, isamplesorg#167. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Self-review fixes against the originally-opened PR. Two files affected: explorer.qmd - Capture per-URL byte data (seen_urls list) in the structured search log so analysis can post-hoc filter concurrent fetches that aren't attributable to the search itself, rather than only summed bytes. tests/test_search_perf.py - Replace deprecated dt.datetime.utcnow() with dt.datetime.now(UTC). - Replace fixed 800ms wait_for_timeout after facet changes with a poll on '.facet-count.recomputing' clearing — the cross-filter handler has a 250ms debounce plus an async cluster reload that can exceed 1s cold; fixed waits flake on slow networks. - Add a composed-source-material query that exercises the facetFilterSQL() pid-IN-subquery path, not just sourceFilterSQL(). First material checkbox is used to avoid hard-coding URIs that may rotate between data snapshots. - Tighten the final assertion: refuse to treat a benchmark with any silent failures as valid (was '>= half'). Partial JSON still written for diagnosis. Refs isamplesorg#165, isamplesorg#167. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a16e2f7 to
aec3a2b
Compare
Self-review fixes appliedForce-pushed (rebased onto upstream main + 1 fix-up commit). PR is now 2 commits, scope is purely `explorer.qmd` (+72/-1) and `tests/test_search_perf.py` (+263). Dropped from PR: the unrelated `provenance/scancode-2026-05-04.json` commit that had been sitting unpushed on local main; carried in by accident when this branch was cut from main. Available on local main for a separate provenance PR. Code fixes:
Deferred (filed as #174):
|
…amplesorg#167) The material filter <div id='materialFilterBody'> ships with display:none (explorer.qmd:280); Playwright's click times out because the element is not visible. Add a one-time expansion of the section header before iterating checkboxes. Surfaced by the first perf-smoke run against local preview: 8/9 queries completed cleanly, only composed-source-material failed with the visible-element timeout. The strict assertion added in commit aec3a2b correctly caught the silent failure. Refs isamplesorg#165, isamplesorg#167. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All 9 canonical queries completed cleanly after the material-section expand fix (commit 8023c20). Headline: current ILIKE-on-samples_map_lite already misses every v1 contract budget (cold 4-12s vs ≤2s target; warm 1.4-2.6s vs ≤500ms). 'pottery Cyprus' returns 0 results because samples_map_lite has 0 Cyprus matches (Cyprus only appears in description, which lite doesn't carry). 'Çatalhöyük' returns 0 because ILIKE doesn't strip diacritics. All this is direct evidence supporting isamplesorg#168 honesty-fix direction A and substrate-substantially-better-than-ILIKE in isamplesorg#169. Refs isamplesorg#165, isamplesorg#167. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review findings from this pass:
Verified only with |
…busy flag, context cleanup) Codex review of isamplesorg#173 surfaced three issues, all fixed here. explorer.qmd: - doSearch finally block now appends a row to the ?perf=1 panel per search (panel renders once at boot from existing measure entries; this hooks each subsequent search). The PR claim "search measures are appended to the existing ?perf=1 panel per invocation" is now true. - Source filter and material/context/object_type change handlers wrap their async work in try/finally with body.classList 'explorer-busy' set during the chain. The 300ms post-refreshFacetCounts wait is intentional: refreshFacetCounts schedules a 250ms debounce that THEN sets .recomputing on facet count spans; we hold busy until that fires so external observers see a coherent "work started, work finished" signal instead of racing the debounce. - handleFacetFilterChange is now async and awaits loadViewportSamples in the point-mode branch (previously fire-and-forget). Required so the busy flag honestly covers all triggered work. tests/test_search_perf.py: - _wait_for_facet_settle is now two-phase: wait for body:not(.explorer- busy), then wait for .facet-count:not(.recomputing). Polling either signal alone races the app's two-stage settle. - _measure_one_query wraps its body in try/finally on context.close() so a TimeoutError in _run_search no longer leaks the browser context. Refs isamplesorg#165, isamplesorg#173. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review fixes (commit `76e37b6`)All three findings addressed: Medium #1 (panel updates per search): `doSearch` finally block now appends a row to the `?perf=1` panel per search. The panel renders once at boot from existing measure entries; this hooks each subsequent search so the panel stays current. The PR claim "search measures are appended to the existing ?perf=1 panel per invocation" is now true. Medium #2 (busy-flag race): Source filter and material/context/object_type change handlers wrap their async work in try/finally with `body.classList` 'explorer-busy' set during the chain. The 300 ms post-`refreshFacetCounts` wait is intentional: `refreshFacetCounts` schedules a 250 ms debounce that THEN sets `.recomputing` on facet count spans; we hold busy until that fires so external observers see a coherent "work started → work finished" signal instead of racing the debounce. `handleFacetFilterChange` is now async and awaits `loadViewportSamples` in the point-mode branch (was fire-and-forget) so the busy flag honestly covers all triggered work. `_wait_for_facet_settle` in the test is now two-phase: wait for `body:not(.explorer-busy)`, then wait for `.facet-count:not(.recomputing)`. Polling either signal alone races the app's two-stage settle. Low #3 (context leak on timeout): `_measure_one_query` body wrapped in try/finally on `context.close()`. A TimeoutError in `_run_search` no longer leaks the browser context. Open question on #176: agreed — the spec MUST clause is not closed by doc honesty alone. Filed the UI-hint fix as commit `006d335` on PR #176. Diff: +112/-43. |
Codex round-2 review caught that the binary body.classList 'explorer- busy' flag races when multiple change handlers overlap: e.g. _apply_source_filter clicks 3 source checkboxes back-to-back, each firing an independent async handler, each removing the same class on finally. The first/fastest handler clears busy while later handlers' loadRes() and facet recompute work is still running, so _wait_for_facet_settle() returns early and the search runs against a partially-loaded state. Replace with a depth-counted acquire/release pair: class added on the 0→1 transition, cleared on the 1→0 transition, Math.max-guarded against double-release. Both handlers (source filter, material/context/object_type) now use busyAcquire/busyRelease in their try/finally blocks. Refs isamplesorg#173. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 2 fix (commit `3b97e34`)You're right — the binary class racing on overlapping handlers was the bug behind the bug. Replaced with a depth counter: ```js Class is added on the 0→1 transition and cleared only on the 1→0 transition. `Math.max`-guarded against double-release. Both handlers (source filter, material/context/object_type) now use `busyAcquire`/`busyRelease` in their try/finally blocks. This means `_apply_source_filter` clicking 3 source checkboxes back-to-back now correctly holds the busy flag through the LAST handler's full work, not just the first/fastest. Diff: +22/-4. |
Summary
Track 1a of #165 — establishes a measured browser-side baseline for the current live search before any substrate work begins.
No behavior change to the search itself.
Baseline numbers (committed in this PR)
Local Quarto preview, 2026-05-08. `tests/search_baseline_2026-05-08.json`. All 9 canonical queries completed cleanly.
Headlines (also posted as a comment on #167):
Canonical query set (locked in #167)
`pottery`, `basalt`, `pottery Cyprus`, `xyzzyqqqplugh`, `100%`, `_test`, `Çatalhöyük`, `pottery` × OPENCONTEXT only, `pottery` × OPENCONTEXT × material[0].
Each query opens a fresh browser context for a true file-cold measurement, then re-runs on the same page for a warm measurement.
Open design questions (recorded on #167)
Test plan
Out of scope
Refs
#165, #167, #174 (deferred concerns from review)
🤖 Generated with Claude Code