adapter/sqs: AdminPeekQueue backend (Phase 3)#794
Conversation
Phase 3 of the Admin Queue Peek and Purge design
(docs/design/2026_05_16_proposed_admin_purge_queue.md). Implements
the read-only counterpart to AdminPurgeQueue:
AdminPeekQueue(ctx, principal, name, opts)
([]AdminPeekedMessage, nextCursor string, error)
Peek walks the leader's visibility index at the next read timestamp,
returns up to opts.Limit currently-visible messages (visible_at
less than or equal to now), and bumps NO receive counts. No receipt
handle is minted; no visibility timer is started. Pure read.
New surface:
- AdminPeekedAttribute mirrors the typed SQS MessageAttribute
shape so binary payloads and the DataType discriminator survive
the SPA round-trip (Codex r11 on the design doc).
- AdminPeekedMessage carries body (truncated per BodyMaxBytes),
BodyTruncated, BodyOriginalSize, SentTimestamp, ReceiveCount,
GroupID, DeduplicationID, and the typed attribute map.
- AdminPeekMessageOptions{Limit, Cursor, BodyMaxBytes} with
documented defaults (20, "", 4096) and bounds
([1, 100], 512-byte cursor cap, [256, 262144] body cap).
- peekCursor (versioned JSON, base64url-wrapped) so the SPA can
page across partition boundaries deterministically. Cursor
pins Generation so a purge between pages forces the SPA to
refresh from the front (ErrAdminSQSValidation rather than
silently surfacing rows from a purged generation).
- AdminRole.canRead() - peek requires non-zero role; both
read-only and full principals satisfy the gate. Purge stays
on canWrite() so role-only principals can triage but cannot
drop the queue.
Partitioned FIFO support: peek uses rotated sequential scanning
over partitions, picking the start partition via the same
receiveFanoutCounters the receive path uses for fairness. The
cursor encodes both StartPartition (for the wrap detection) and
the current Partition + LastKey so subsequent pages continue
deterministically.
Caller audit (semantic-change rule): no existing function
signatures changed. canRead is a new method; AdminPeekQueue is a
new entrypoint. The sqsAttributeBaseType{String,Number} constant
extraction in sqs_messages.go fixes a pre-existing goconst
regression that the larger test surface tripped; the validator
switch now reads canonical constants instead of inline literals.
Tests (adapter/sqs_admin_peek_test.go) cover:
- happy path with body / MessageID / SentTimestamp assertions
- peek does NOT bump ApproximateReceiveCount across two calls
- body truncation reports BodyTruncated + BodyOriginalSize
- Limit clamping (0 to default 20, 500 to max 100)
- cursor round-trip across two pages, no overlap, complete
coverage
- stale-generation cursor after purge to ErrAdminSQSValidation
- oversize / bad-base64 / bad-JSON / wrong-version cursors all
map to ErrAdminSQSValidation
- read-only principal allowed; role-less denied
- missing queue to ErrAdminSQSNotFound
- empty / whitespace name to ErrAdminSQSValidation
- DelaySeconds > 0 hides the message until visible_at elapses
- FIFO MessageGroupId / MessageDeduplicationId both projected
- typed MessageAttribute (String + Binary) round-trips
- partitioned FIFO (PartitionCount=4): every sent message
surfaces exactly once across the union of paginated peek
pages, regardless of the rotated start partition
- cursor codec unit tests: round-trip, empty-cursor sentinel,
length cap, generation mismatch
- clampPeekLimit / clampPeekBodyBytes truth tables
- projectPeekedAttributes nil-safety and DataType preservation
- AdminRole.canRead() truth table
Out of scope (deferred to a follow-up):
- Throttle integration (bucketActionAdminPeek + dedicated
per-queue admin-peek bucket per design doc 3.1). Leader-only
+ Limit cap already bound the cost; the dedicated bucket adds
a steady-rate cap that should land alongside the SPA wiring
so the rate-limit metric has a real consumer.
- HTTP handler + bridge - Phase 4.
- principalForReadSensitive live RoleStore re-check - depends
on the wider RoleStore plumbing landing first (design doc
Goal 8, noted as a future direction).
|
@claude review |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Claude finished @bootjp's task in 6m 20s —— View job Code Review
|
There was a problem hiding this comment.
Code Review
This pull request introduces the AdminPeekQueue functionality for SQS, allowing users to inspect messages without affecting their visibility or receive counts. The implementation includes a new canRead authorization gate, support for paginated results via base64url-encoded JSON cursors, and configurable message body truncation. Comprehensive tests have been added to verify the behavior across standard and partitioned FIFO queues, ensuring that the peek operation remains non-destructive and respects security roles. I have no feedback to provide as there were no review comments to assess.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5838bddb7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if cursor.Generation != meta.Generation { | ||
| return nil, errors.Wrap(ErrAdminSQSValidation, | ||
| "admin peek: cursor is from a prior generation; restart from the front") | ||
| } | ||
| return cursor, nil |
There was a problem hiding this comment.
Reject out-of-range cursor partitions
preparePeekCursor validates only Generation, so a client can submit a cursor whose StartPartition is outside [0, meta.PartitionCount). In walkPeek, termination depends on nextPart == cursor.StartPartition; with an out-of-range start partition this condition is never true, so an empty/fully exhausted partitioned queue causes an unbounded loop of ScanAt calls and the request never returns. This is a request-amplification DoS against the admin endpoint and should be rejected as ErrAdminSQSValidation before scanning.
Useful? React with 👍 / 👎.
Codex r1 P1 on PR #794: walkPeek's partition-rotation termination condition is `(cursor.Partition + 1) % meta.PartitionCount == cursor.StartPartition`. With a client-supplied cursor whose StartPartition lies outside [0, PartitionCount, the modulo cycle never matches and the call loops ScanAt over guaranteed-empty partitions forever — a request-amplification DoS against the admin endpoint. preparePeekCursor now bounds-checks both StartPartition and Partition against max(meta.PartitionCount, 1) and rejects out-of-range values with ErrAdminSQSValidation. Non-partitioned queues (PartitionCount <= 1) require both fields to be 0. Caller audit: peek cursors are constructed in exactly two places — preparePeekCursor itself (now validated) and walkPeek's partition-advance (, which produces values in [0, PartitionCount) by construction). No other caller. Generation validation already handles the purge-between-pages case separately. Tests: regression test feeds a wire-level cursor with StartPartition=999 / Partition=999 against a PartitionCount=4 queue and asserts ErrAdminSQSValidation under a 5-second deadline so any future loss of the bounds check terminates rather than hanging CI. Four preparePeekCursor unit tests pin the truth table (partitioned out-of-range / non-partitioned non-zero / in-range accepted). EOF )
Codex r1 P1 on PR #794: walkPeek's partition-rotation termination condition is "(Partition + 1) mod PartitionCount == StartPartition". With a client-supplied cursor whose StartPartition lies outside [0, PartitionCount), the modulo cycle never matches and the call loops ScanAt over guaranteed-empty partitions forever - a request-amplification DoS against the admin endpoint. preparePeekCursor now bounds-checks both StartPartition and Partition against max(meta.PartitionCount, 1) and rejects out-of-range values with ErrAdminSQSValidation. Non-partitioned queues (PartitionCount <= 1) require both fields to be 0. Caller audit: peek cursors are constructed in exactly two places - preparePeekCursor itself (now validated) and walkPeek's partition-advance ((Partition+1) mod PartitionCount), which produces values in [0, PartitionCount) by construction. No other caller. Generation validation already handles the purge-between- pages case separately. Tests: regression test feeds a wire-level cursor with StartPartition=999 / Partition=999 against a PartitionCount=4 queue and asserts ErrAdminSQSValidation under a 5-second deadline so any future loss of the bounds check terminates rather than hanging CI. Four preparePeekCursor unit tests pin the truth table (partitioned out-of-range / non-partitioned non-zero / in-range accepted).
ed24325 to
9bfaacf
Compare
|
@claude review Addressed Codex r1 P1 in 9bfaacf:
|
|
Claude finished @bootjp's task in 1m 39s —— View job Code Review
|
| Sub-case | Cursor input | Meta | Expected |
|---|---|---|---|
StartPartition >= PartitionCount |
SP=99, P=0 | PC=4 | ErrAdminSQSValidation |
Partition >= PartitionCount |
SP=0, P=99 | PC=4 | ErrAdminSQSValidation |
| Non-partitioned: non-zero StartPartition | SP=1, P=0 | PC=0 | ErrAdminSQSValidation |
| Non-partitioned: non-zero Partition | SP=0, P=5 | PC=1 | ErrAdminSQSValidation |
| In-range cursor accepted | SP=3, P=2 | PC=4 | nil (accepted) |
All four rejection paths are structurally distinct — the acceptance case verifies valid cursors aren't blocked.
TestAdminPeekQueue_HostileCursorBoundedRequest (lines 834-862): End-to-end regression with a 5-second context.WithTimeout. The hostile cursor (StartPartition=999, Partition=999) is wire-encoded and passed through the full AdminPeekQueue stack. The deadline is the key safety net: if the bounds check is ever removed, the test times out (context deadline exceeded) rather than returning ErrAdminSQSValidation, making the regression immediately visible in CI rather than hanging the suite.
One minor observation: the hostile cursor uses Generation: 1, but the queue's actual generation may differ (depending on what installPartitionedMetaForTest initializes). If they differ, the generation check fires before the bounds check. The test still passes because ErrAdminSQSValidation is returned either way — it just may not exercise the partition bounds path end-to-end when the generations don't match. This is acceptable because TestPreparePeekCursor_PartitionOutOfRange pins the bounds check in isolation, and the integration test pins the "hostile cursor → 400" property regardless of which guard fires.
preparePeekCursor docstring
The rationale comment (lines 258-266) is exactly the right level of detail — it names the attack vector, explains why the termination condition fails, and cites the PR and issue. Future readers won't be tempted to "simplify" the check.
Verdict
The fix closes the DoS vector correctly. The unit-level truth table + end-to-end deadline regression is the right pairing. No additional changes needed for this fix.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bfaacfa4f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !exists { | ||
| return nil, "", ErrAdminSQSNotFound | ||
| } | ||
| cursor, err = preparePeekCursor(cursor, meta, s.nextReceiveFanoutStart(name, meta.PartitionCount)) |
There was a problem hiding this comment.
Avoid advancing fanout rotation on peek cursor resumes
AdminPeekQueue always calls nextReceiveFanoutStart before preparePeekCursor, even when the request already carries a continuation cursor. Because this counter is shared with ReceiveMessage, every extra peek page advances the receive start partition and can reintroduce fixed-stride aliasing (e.g., 4 partitions and 3 peek pages between receives makes every receive start at the same partition). Under sustained backlog with small MaxNumberOfMessages, this can repeatedly prioritize the same partitions and starve others, so continuation requests should not mutate the shared receive-rotation counter.
Useful? React with 👍 / 👎.

Summary
Phase 3 of
docs/design/2026_05_16_proposed_admin_purge_queue.md: backendAdminPeekQueuemethod on*adapter.SQSServer.opts.Limitcurrently-visible messages (visible_at <= now); pure read, no receive-count bump, no receipt-handle minting, no visibility-timer change.receiveFanoutCountersfor fairness.Generation,StartPartition,Partition, andLastKey; stale-generation cursors are rejected so a purge between pages forces the SPA to refresh from the front.AdminRole.canRead()— peek requires non-zero role; read-only principals can triage, purge stays oncanWrite().goconstregression insqs_messages.go(the validator switch's inline"String"/"Number"literals) fixed by extractingsqsAttributeBaseTypeString/sqsAttributeBaseTypeNumberalongside the existingsqsAttributeBaseTypeBinary.Caller audit (semantic-change rule): no existing function signatures changed.
canReadis a new method;AdminPeekQueueis a new entrypoint. The validator switch's constant extraction is a literal-for-named rename (byte-identical wire behaviour).Behavior change
AdminPeekQueueadmin method. Not yet wired to an HTTP handler (Phase 4) — this PR only adds the adapter surface.AdminPurgeQueue, orAdminDescribeQueue.Risk
Low. Pure-read entry; the only writes I touch are the constant rename in
sqs_messages.go(byte-identical to the prior inline literals). The cursor codec is hard-capped at 512 bytes after encoding; oversize cursors surface asErrAdminSQSValidationfrom the decoder.Self-review (5 passes)
sqs_messages.gois a literal-for-named rename with no behaviour change.isVerifiedSQSLeader. SinglenextTxnReadTSthreaded intoloadQueueMetaAt,ScanAt, and per-rowGetAt, so the page is a consistent MVCC snapshot. The cursor's stored Generation is checked against the live meta on every continuation page — a purge between pages producesErrAdminSQSValidationrather than rows from a purged generation.O(opts.Limit)ScanAtround-trips + per-rowGetAt. Hard caps:Limit <= 100, body <= 256 KiB. Cursor is bounded at 512 bytes. No hot-path allocations.Generation; mismatch surfaces asErrAdminSQSValidation. Partition rotation uses the samereceiveFanoutCountersthe receive path uses, so peek's partition fairness matches the data plane's. Data records are read viasqsMsgDataKeyDispatch, so partitioned FIFO data keys are routed to the same partition the vis-index entry was found under.go test -raceclean.Test plan
go test -race -count=1 ./adapter/...(targeted:TestAdminPeekQueue*,TestPeekCursorCodec*,TestClampPeek*,TestPreparePeekCursor*,TestEncodePeekCursor*,TestProjectPeekedAttributes*,TestAdminRole_CanRead,TestAdminPurge*,TestAdminQueueSummary*,TestSQSServer_PartitionedFIFO*,TestSQSServer_Send*)make lintclean on./adapter/...(0 issues)Out of scope (follow-ups)
bucketActionAdminPeek+resolveActionConfigexplicit case +*adminPeekThrottledError+*PeekThrottledError). Leader-only +Limit <= 100already bound the steady-state cost; the dedicated bucket adds a per-queue cap that should land alongside the SPA wiring so the metric has a real consumer.GET /admin/api/v1/sqs/queues/{name}/messages, bridge inmain_admin.go, integration tests.principalForReadSensitiveliveRoleStorere-check (design doc Goal 8): blocked on the widerRoleStoreplumbing — neitherAdminPurgeQueuenorAdminDeleteQueuedo a live re-check today; peek will inherit that pattern when the broader work lands.