explorer: search perf-smoke baseline (#167) by rdhyee · Pull Request #173 · isamplesorg/isamplesorg.github.io

rdhyee · 2026-05-08T13:45:50Z

Summary

Track 1a of #165 — establishes a measured browser-side baseline for the current live search before any substrate work begins.

Instruments `doSearch()` with `performance.mark` / `performance.measure` and emits a structured `isamples.search` console event on every completion (success or error). Captures `elapsed_ms`, `results_count`, `bytes_transfer`, `bytes_body`, per-URL `seen_urls`, and filter composition flags.
Search measures appended to the existing `?perf=1` panel.
New `tests/test_search_perf.py` drives the locked canonical query set, captures the events, and writes a JSON artifact to `tests/search_baseline_<UTC_DATE>.json`.

No behavior change to the search itself.

Baseline numbers (committed in this PR)

Local Quarto preview, 2026-05-08. `tests/search_baseline_2026-05-08.json`. All 9 canonical queries completed cleanly.

label	term	results	cold (ms)	warm (ms)
single-common	`pottery`	50	8683	1940
single-rare	`basalt`	50	11824	2400
multi-term	`pottery Cyprus`	0	5071	2560
no-hit	`xyzzyqqqplugh`	0	4870	2554
wildcard-pct	`100%`	0	5178	2574
wildcard-under	`_test`	0	18520	2529
diacritic	`Çatalhöyük`	0	4415	2592
composed-source	`pottery` × OPENCONTEXT	50	4964	1368
composed-source-material	+ material[0]	0	4331	2106

Headlines (also posted as a comment on #167):

Current ILIKE-on-samples_map_lite already misses every v1 contract budget (cold 4-12s vs ≤2s target; warm 1.4-2.6s vs ≤500ms target). The static substrate prototype must substantially beat these.
`pottery Cyprus` returns 0 results because samples_map_lite has 0 Cyprus matches in label/place_name — Cyprus only appears in description. Direct evidence for Explorer FTS Track 1b: Honesty fix for query-spec / live mismatch #168 honesty-fix Direction A.
`Çatalhöyük` returns 0 — current ILIKE doesn't strip diacritics. The v1 tokenizer in Explorer FTS Track 3: Offline index builder + tokenizer regression set #170 fixes this.
Wildcard literals (`100%`, `_test`) return 0 cleanly without errors — `escapeIlikePattern` is correct.
`bytes_transfer = 0` and `seen_urls = []` everywhere — no fetches during the search window. DuckDB-WASM caches parquet metadata at page load and uses HTTP range reads from cache. Implication: "bytes per search" is a substrate-world metric, not a current-world metric. Worth pinning in Explorer FTS Track 2: search_index_v1 contract doc #169 budget framing.

Canonical query set (locked in #167)

`pottery`, `basalt`, `pottery Cyprus`, `xyzzyqqqplugh`, `100%`, `_test`, `Çatalhöyük`, `pottery` × OPENCONTEXT only, `pottery` × OPENCONTEXT × material[0].

Each query opens a fresh browser context for a true file-cold measurement, then re-runs on the same page for a warm measurement.

Open design questions (recorded on #167)

Log channel: defaulted to `console.log(JSON.stringify(...))`. Alternative `CustomEvent` deferred unless non-Playwright observers appear.
Bytes-transferred: defaulted to `performance.getEntriesByType('resource')` summing `transferSize`, falling back to `encodedBodySize`. Cross-origin `transferSize` requires `Timing-Allow-Origin` on R2; the empty bytes in the baseline indicate that R2 may not be setting TAO, but more likely that DuckDB-WASM caches metadata at load time. Worth verifying on a deployed preview.

Test plan

`?perf=1` panel shows search timings per invocation
`tests/test_search_perf.py` runs the canonical set and produces a JSON artifact
Baseline numbers recorded in `tests/search_baseline_2026-05-08.json` and as a comment on Explorer FTS Track 1a: Browser perf-smoke baseline #167
Re-run against a deployed preview once this PR merges (numbers will differ from local-quarto-preview)
Verify whether R2 sets `Timing-Allow-Origin: *` (Explorer FTS Track 1a follow-ups: deferred review concerns #174 deferred)

Out of scope

Honesty fix (Direction A vs B) — that's Explorer FTS Track 1b: Honesty fix for query-spec / live mismatch #168, gated by these baseline numbers.
Substrate design — Explorer FTS Track 2: search_index_v1 contract doc #169.

Refs

#165, #167, #174 (deferred concerns from review)

🤖 Generated with Claude Code

…amplesorg#167) Adds per-search instrumentation: - performance.mark + measure around the DuckDB query - structured console.log on every completion (success or error) with elapsed_ms, results_count, bytes_transfer, bytes_body, and filter composition flags - search measures appended to the existing ?perf=1 panel Cross-origin transferSize is 0 without Timing-Allow-Origin on R2; encodedBodySize is reported as a fallback so we capture something regardless of TAO. Validation of TAO on data.isamples.org tracked in the issue thread. No behavior change to the search itself. Track 1a of isamplesorg#165. Refs isamplesorg#165, isamplesorg#167. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…esorg#167) tests/test_search_perf.py drives the locked canonical query set against the deployed (or local-preview) Explorer, captures the structured isamples.search console events emitted by the new instrumentation, and writes a per-run JSON artifact to tests/search_baseline_<UTC_DATE>.json. Each query opens a fresh browser context to capture true file-cold timings, then runs the same query a second time on the same page for a warm measurement. Filter-composition queries set source-only filters before searching. Slow (~8-15 min). Run as a release-time benchmark, not on every CI invocation. The baseline numbers will be posted as a comment on isamplesorg#167 once the instrumentation hits a deployed preview. Refs isamplesorg#165, isamplesorg#167. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Self-review fixes against the originally-opened PR. Two files affected: explorer.qmd - Capture per-URL byte data (seen_urls list) in the structured search log so analysis can post-hoc filter concurrent fetches that aren't attributable to the search itself, rather than only summed bytes. tests/test_search_perf.py - Replace deprecated dt.datetime.utcnow() with dt.datetime.now(UTC). - Replace fixed 800ms wait_for_timeout after facet changes with a poll on '.facet-count.recomputing' clearing — the cross-filter handler has a 250ms debounce plus an async cluster reload that can exceed 1s cold; fixed waits flake on slow networks. - Add a composed-source-material query that exercises the facetFilterSQL() pid-IN-subquery path, not just sourceFilterSQL(). First material checkbox is used to avoid hard-coding URIs that may rotate between data snapshots. - Tighten the final assertion: refuse to treat a benchmark with any silent failures as valid (was '>= half'). Partial JSON still written for diagnosis. Refs isamplesorg#165, isamplesorg#167. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rdhyee · 2026-05-08T14:03:56Z

Self-review fixes applied

Force-pushed (rebased onto upstream main + 1 fix-up commit). PR is now 2 commits, scope is purely `explorer.qmd` (+72/-1) and `tests/test_search_perf.py` (+263).

Dropped from PR: the unrelated `provenance/scancode-2026-05-04.json` commit that had been sitting unpushed on local main; carried in by accident when this branch was cut from main. Available on local main for a separate provenance PR.

Code fixes:

`explorer.qmd`: capture per-URL `seen_urls` list in the structured search log (rather than only summed `bytes_transfer` / `bytes_body`) so analysis can post-hoc filter concurrent fetches not attributable to the search.
`tests/test_search_perf.py`:
- replace deprecated `dt.datetime.utcnow()` with `dt.datetime.now(UTC)` (Python 3.12+ compat)
- replace fixed 800ms `wait_for_timeout` after facet changes with `wait_for_function` polling on `.facet-count.recomputing` clearing — the cross-filter handler debounces 250ms then runs an async cluster reload that can exceed 1s cold; fixed waits flake on slow networks
- add `composed-source-material` query to canonical set; exercises the `facetFilterSQL()` pid-IN-subquery path (uses "first material checkbox" rather than hard-coded URI to survive data-snapshot rotation)
- tighten final assertion: refuse to treat a benchmark with any silent failures as valid (was `>= half`); partial JSON still on disk for diagnosis

Deferred (filed as #174):

unit test for the instrumentation itself (`terms_count`, `elapsed_ms`, `results_count` invariants)
formal definition of "warm" in the v1 contract budget table (Explorer FTS Track 2: search_index_v1 contract doc #169)
defensive wrap of `searchTerms()` once the v1 tokenizer lands (Explorer FTS Track 3: Offline index builder + tokenizer regression set #170)
tighter bytes attribution beyond the search window

…amplesorg#167) The material filter <div id='materialFilterBody'> ships with display:none (explorer.qmd:280); Playwright's click times out because the element is not visible. Add a one-time expansion of the section header before iterating checkboxes. Surfaced by the first perf-smoke run against local preview: 8/9 queries completed cleanly, only composed-source-material failed with the visible-element timeout. The strict assertion added in commit aec3a2b correctly caught the silent failure. Refs isamplesorg#165, isamplesorg#167. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

All 9 canonical queries completed cleanly after the material-section expand fix (commit 8023c20). Headline: current ILIKE-on-samples_map_lite already misses every v1 contract budget (cold 4-12s vs ≤2s target; warm 1.4-2.6s vs ≤500ms). 'pottery Cyprus' returns 0 results because samples_map_lite has 0 Cyprus matches (Cyprus only appears in description, which lite doesn't carry). 'Çatalhöyük' returns 0 because ILIKE doesn't strip diacritics. All this is direct evidence supporting isamplesorg#168 honesty-fix direction A and substrate-substantially-better-than-ILIKE in isamplesorg#169. Refs isamplesorg#165, isamplesorg#167. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rdhyee · 2026-05-08T15:20:13Z

Review findings from this pass:

?perf=1 panel will not show searches run after initial render. explorer.qmd only reads existing performance.measure entries while the OJS perfPanel cell renders once after phase1; later doSearch() calls log to console but never update the panel DOM. This undercuts the PR claim that search measures are appended to the panel per invocation. Fix by appending/updating the perf panel from doSearch() or wiring an explicit refresh path.
_wait_for_facet_settle() can return before the async facet/source work has even started. The helper waits for no .recomputing, but the app schedules recompute after a 250ms debounce and source changes call refreshFacetCounts() only after await loadRes(...) completes. The benchmark can start the search while cluster reload/facet recompute is still running, contaminating the composed-filter timings. The harness needs a positive “work started then finished” signal, or an explicit app-side settle marker.
_measure_one_query() leaks browser contexts on failures. Since the outer loop catches exceptions and continues, a timeout leaves a context/page alive and can skew later measurements. Wrap the body in try/finally: context.close().

Verified only with python -m py_compile tests/test_search_perf.py; I did not run the full perf-smoke.

…busy flag, context cleanup) Codex review of isamplesorg#173 surfaced three issues, all fixed here. explorer.qmd: - doSearch finally block now appends a row to the ?perf=1 panel per search (panel renders once at boot from existing measure entries; this hooks each subsequent search). The PR claim "search measures are appended to the existing ?perf=1 panel per invocation" is now true. - Source filter and material/context/object_type change handlers wrap their async work in try/finally with body.classList 'explorer-busy' set during the chain. The 300ms post-refreshFacetCounts wait is intentional: refreshFacetCounts schedules a 250ms debounce that THEN sets .recomputing on facet count spans; we hold busy until that fires so external observers see a coherent "work started, work finished" signal instead of racing the debounce. - handleFacetFilterChange is now async and awaits loadViewportSamples in the point-mode branch (previously fire-and-forget). Required so the busy flag honestly covers all triggered work. tests/test_search_perf.py: - _wait_for_facet_settle is now two-phase: wait for body:not(.explorer- busy), then wait for .facet-count:not(.recomputing). Polling either signal alone races the app's two-stage settle. - _measure_one_query wraps its body in try/finally on context.close() so a TimeoutError in _run_search no longer leaks the browser context. Refs isamplesorg#165, isamplesorg#173. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rdhyee · 2026-05-08T15:24:43Z

Codex review fixes (commit `76e37b6`)

All three findings addressed:

Medium #1 (panel updates per search): `doSearch` finally block now appends a row to the `?perf=1` panel per search. The panel renders once at boot from existing measure entries; this hooks each subsequent search so the panel stays current. The PR claim "search measures are appended to the existing ?perf=1 panel per invocation" is now true.

Medium #2 (busy-flag race): Source filter and material/context/object_type change handlers wrap their async work in try/finally with `body.classList` 'explorer-busy' set during the chain. The 300 ms post-`refreshFacetCounts` wait is intentional: `refreshFacetCounts` schedules a 250 ms debounce that THEN sets `.recomputing` on facet count spans; we hold busy until that fires so external observers see a coherent "work started → work finished" signal instead of racing the debounce. `handleFacetFilterChange` is now async and awaits `loadViewportSamples` in the point-mode branch (was fire-and-forget) so the busy flag honestly covers all triggered work.

`_wait_for_facet_settle` in the test is now two-phase: wait for `body:not(.explorer-busy)`, then wait for `.facet-count:not(.recomputing)`. Polling either signal alone races the app's two-stage settle.

Low #3 (context leak on timeout): `_measure_one_query` body wrapped in try/finally on `context.close()`. A TimeoutError in `_run_search` no longer leaks the browser context.

Open question on #176: agreed — the spec MUST clause is not closed by doc honesty alone. Filed the UI-hint fix as commit `006d335` on PR #176.

Diff: +112/-43.

Codex round-2 review caught that the binary body.classList 'explorer- busy' flag races when multiple change handlers overlap: e.g. _apply_source_filter clicks 3 source checkboxes back-to-back, each firing an independent async handler, each removing the same class on finally. The first/fastest handler clears busy while later handlers' loadRes() and facet recompute work is still running, so _wait_for_facet_settle() returns early and the search runs against a partially-loaded state. Replace with a depth-counted acquire/release pair: class added on the 0→1 transition, cleared on the 1→0 transition, Math.max-guarded against double-release. Both handlers (source filter, material/context/object_type) now use busyAcquire/busyRelease in their try/finally blocks. Refs isamplesorg#173. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rdhyee · 2026-05-08T15:29:03Z

Round 2 fix (commit `3b97e34`)

You're right — the binary class racing on overlapping handlers was the bug behind the bug. Replaced with a depth counter:

```js
let _busyDepth = 0;
function busyAcquire() {
if (_busyDepth === 0) document.body.classList.add('explorer-busy');
_busyDepth++;
}
function busyRelease() {
_busyDepth = Math.max(0, _busyDepth - 1);
if (_busyDepth === 0) document.body.classList.remove('explorer-busy');
}
```

Class is added on the 0→1 transition and cleared only on the 1→0 transition. `Math.max`-guarded against double-release. Both handlers (source filter, material/context/object_type) now use `busyAcquire`/`busyRelease` in their try/finally blocks.

This means `_apply_source_filter` clicking 3 source checkboxes back-to-back now correctly holds the busy flag through the LAST handler's full work, not just the first/fastest.

Diff: +22/-4.

rdhyee and others added 3 commits May 8, 2026 07:01

rdhyee force-pushed the explorer-search-perf-smoke branch from a16e2f7 to aec3a2b Compare May 8, 2026 14:03

rdhyee mentioned this pull request May 8, 2026

Explorer FTS Track 1a follow-ups: deferred review concerns #174

Open

This was referenced May 8, 2026

docs: explorer state contract (closes #164) #166

Merged

Improve Interactive Explorer full-text search substrate #165

Open

rdhyee mentioned this pull request May 8, 2026

Explorer FTS Track 1a: Browser perf-smoke baseline #167

Open

4 tasks

rdhyee mentioned this pull request May 8, 2026

docs: query-spec honesty-fix for live search field set (#168 Direction B) #176

Merged

1 task

rdhyee merged commit 362698a into isamplesorg:main May 8, 2026
1 check passed

This was referenced May 8, 2026

explorer: interim recall fix — search description (#168 Direction A, knowingly slower) #177

Merged

Interactive Explorer: integrate Hana mockup feedback (Light path on search semantics) #178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explorer: search perf-smoke baseline (#167)#173

explorer: search perf-smoke baseline (#167)#173
rdhyee merged 7 commits intoisamplesorg:mainfrom
rdhyee:explorer-search-perf-smoke

rdhyee commented May 8, 2026 •

edited

Loading

Uh oh!

rdhyee commented May 8, 2026

Uh oh!

rdhyee commented May 8, 2026

Uh oh!

rdhyee commented May 8, 2026

Uh oh!

rdhyee commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdhyee commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Baseline numbers (committed in this PR)

Canonical query set (locked in #167)

Open design questions (recorded on #167)

Test plan

Out of scope

Refs

Uh oh!

rdhyee commented May 8, 2026

Self-review fixes applied

Uh oh!

rdhyee commented May 8, 2026

Uh oh!

rdhyee commented May 8, 2026

Codex review fixes (commit `76e37b6`)

Uh oh!

rdhyee commented May 8, 2026

Round 2 fix (commit `3b97e34`)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rdhyee commented May 8, 2026 •

edited

Loading