Fix flaky TestExpandHoldsNoAccessExclusiveLock#66
Conversation
TestExpandHoldsNoAccessExclusiveLock flagged a sustained ACCESS EXCLUSIVE when the lock was seen on 5 consecutive ~1ms polls (~5ms). The personal-index swap's DROP+RENAME is a legitimate size-INDEPENDENT metadata ACCESS EXCLUSIVE held for a few ms — which a slow/contended CI runner can stretch across those polls, false-failing the test. Measure CONTIGUOUS wall-clock hold and tolerate brief metadata flicks (<400ms); a real data-proportional in-place rebuild (the actual regression) holds it far longer. Also drop the secondary absolute >1s per-insert latency check (CI-noise sensitive) — the bounded per-insert context already detects a genuine block.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f8f07a1ef0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| default: | ||
| heldSince = time.Time{} // released (or a transient poll error): reset |
There was a problem hiding this comment.
Check elapsed hold time before resetting
If the observer samples present=true and then is descheduled until after the lock is released, this branch clears heldSince without comparing the elapsed wall time, so a real AccessExclusiveLock held longer than maxContiguousHold can pass whenever no successful poll occurs while the threshold is exceeded. Since this test is specifically meant to catch sustained locks under contended CI scheduling, record a failure on the release/stop path when time.Since(heldSince) exceeds the threshold before resetting it.
Useful? React with 👍 / 👎.
The config-layer GATEWAY_DATA_REGION validator was only exercised via its empty-region path (22% covered), dropping internal/config below the 88% per-package coverage gate. Add a direct branch test (empty/valid pass; uppercase/dot/space/slash/over-length rejected; at-bound allowed) → config coverage 86.8% -> 97.1%.
The #33 operational lock test false-failed in CI (
expand held a sustained AccessExclusiveLock). Root cause: the observer flagged a lock seen on 5 consecutive ~1ms polls (~5ms), but the personal-index swap'sDROP INDEX+RENAMEis a legitimate size-independent metadataACCESS EXCLUSIVEheld for a few ms — which a slow/contended CI runner stretches across those polls.>1sper-insert latency check (CI-noise sensitive); the bounded per-insert context already detects a genuine block.Test-only. Ran 25× with
-racelocally, green; full postgres package green; lint 0.