Skip to content

adapter/sqs: AdminPeekQueue backend (Phase 3)#794

Open
bootjp wants to merge 2 commits into
mainfrom
feat/sqs-admin-peek-queue
Open

adapter/sqs: AdminPeekQueue backend (Phase 3)#794
bootjp wants to merge 2 commits into
mainfrom
feat/sqs-admin-peek-queue

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 20, 2026

Summary

Phase 3 of docs/design/2026_05_16_proposed_admin_purge_queue.md: backend AdminPeekQueue method on *adapter.SQSServer.

  • Returns up to opts.Limit currently-visible messages (visible_at <= now); pure read, no receive-count bump, no receipt-handle minting, no visibility-timer change.
  • Partitioned FIFO support via rotated sequential scanning across partitions, reusing the receive path's receiveFanoutCounters for fairness.
  • Versioned JSON + base64url cursor pins Generation, StartPartition, Partition, and LastKey; stale-generation cursors are rejected so a purge between pages forces the SPA to refresh from the front.
  • AdminRole.canRead() — peek requires non-zero role; read-only principals can triage, purge stays on canWrite().
  • Pre-existing goconst regression in sqs_messages.go (the validator switch's inline "String" / "Number" literals) fixed by extracting sqsAttributeBaseTypeString / sqsAttributeBaseTypeNumber alongside the existing sqsAttributeBaseTypeBinary.

Caller audit (semantic-change rule): no existing function signatures changed. canRead is a new method; AdminPeekQueue is a new entrypoint. The validator switch's constant extraction is a literal-for-named rename (byte-identical wire behaviour).

Behavior change

  • New AdminPeekQueue admin method. Not yet wired to an HTTP handler (Phase 4) — this PR only adds the adapter surface.
  • No change to SigV4 paths, AdminPurgeQueue, or AdminDescribeQueue.

Risk

Low. Pure-read entry; the only writes I touch are the constant rename in sqs_messages.go (byte-identical to the prior inline literals). The cursor codec is hard-capped at 512 bytes after encoding; oversize cursors surface as ErrAdminSQSValidation from the decoder.

Self-review (5 passes)

  1. Data loss — Read-only entry. The constant extraction in sqs_messages.go is a literal-for-named rename with no behaviour change.
  2. Concurrency — Leader-only via isVerifiedSQSLeader. Single nextTxnReadTS threaded into loadQueueMetaAt, ScanAt, and per-row GetAt, so the page is a consistent MVCC snapshot. The cursor's stored Generation is checked against the live meta on every continuation page — a purge between pages produces ErrAdminSQSValidation rather than rows from a purged generation.
  3. PerformanceO(opts.Limit) ScanAt round-trips + per-row GetAt. Hard caps: Limit <= 100, body <= 256 KiB. Cursor is bounded at 512 bytes. No hot-path allocations.
  4. Consistency — Cursor encodes Generation; mismatch surfaces as ErrAdminSQSValidation. Partition rotation uses the same receiveFanoutCounters the receive path uses, so peek's partition fairness matches the data plane's. Data records are read via sqsMsgDataKeyDispatch, so partitioned FIFO data keys are routed to the same partition the vis-index entry was found under.
  5. Test coverage — 22 new tests (happy path / no-bump-on-peek / truncation / Limit clamp / cursor round-trip / stale-gen / four malformed-cursor classes / read-only allowed / role-less denied / missing queue / empty name / delayed-message hidden / FIFO attribute projection / typed MessageAttribute (String+Binary) round-trip / partitioned FIFO pagination / cursor codec round-trip+empty+oversize / preparePeekCursor fresh+stale / clampPeekLimit+clampPeekBodyBytes truth tables / projectPeekedAttributes nil-safety / canRead truth table). go test -race clean.

Test plan

  • go test -race -count=1 ./adapter/... (targeted: TestAdminPeekQueue*, TestPeekCursorCodec*, TestClampPeek*, TestPreparePeekCursor*, TestEncodePeekCursor*, TestProjectPeekedAttributes*, TestAdminRole_CanRead, TestAdminPurge*, TestAdminQueueSummary*, TestSQSServer_PartitionedFIFO*, TestSQSServer_Send*)
  • make lint clean on ./adapter/... (0 issues)
  • CI on this PR

Out of scope (follow-ups)

  • Throttle integration: dedicated per-queue admin-peek bucket (bucketActionAdminPeek + resolveActionConfig explicit case + *adminPeekThrottledError + *PeekThrottledError). Leader-only + Limit <= 100 already bound the steady-state cost; the dedicated bucket adds a per-queue cap that should land alongside the SPA wiring so the metric has a real consumer.
  • Phase 4: HTTP handler at GET /admin/api/v1/sqs/queues/{name}/messages, bridge in main_admin.go, integration tests.
  • principalForReadSensitive live RoleStore re-check (design doc Goal 8): blocked on the wider RoleStore plumbing — neither AdminPurgeQueue nor AdminDeleteQueue do a live re-check today; peek will inherit that pattern when the broader work lands.

Phase 3 of the Admin Queue Peek and Purge design
(docs/design/2026_05_16_proposed_admin_purge_queue.md). Implements
the read-only counterpart to AdminPurgeQueue:

  AdminPeekQueue(ctx, principal, name, opts)
    ([]AdminPeekedMessage, nextCursor string, error)

Peek walks the leader's visibility index at the next read timestamp,
returns up to opts.Limit currently-visible messages (visible_at
less than or equal to now), and bumps NO receive counts. No receipt
handle is minted; no visibility timer is started. Pure read.

New surface:
  - AdminPeekedAttribute mirrors the typed SQS MessageAttribute
    shape so binary payloads and the DataType discriminator survive
    the SPA round-trip (Codex r11 on the design doc).
  - AdminPeekedMessage carries body (truncated per BodyMaxBytes),
    BodyTruncated, BodyOriginalSize, SentTimestamp, ReceiveCount,
    GroupID, DeduplicationID, and the typed attribute map.
  - AdminPeekMessageOptions{Limit, Cursor, BodyMaxBytes} with
    documented defaults (20, "", 4096) and bounds
    ([1, 100], 512-byte cursor cap, [256, 262144] body cap).
  - peekCursor (versioned JSON, base64url-wrapped) so the SPA can
    page across partition boundaries deterministically. Cursor
    pins Generation so a purge between pages forces the SPA to
    refresh from the front (ErrAdminSQSValidation rather than
    silently surfacing rows from a purged generation).
  - AdminRole.canRead() - peek requires non-zero role; both
    read-only and full principals satisfy the gate. Purge stays
    on canWrite() so role-only principals can triage but cannot
    drop the queue.

Partitioned FIFO support: peek uses rotated sequential scanning
over partitions, picking the start partition via the same
receiveFanoutCounters the receive path uses for fairness. The
cursor encodes both StartPartition (for the wrap detection) and
the current Partition + LastKey so subsequent pages continue
deterministically.

Caller audit (semantic-change rule): no existing function
signatures changed. canRead is a new method; AdminPeekQueue is a
new entrypoint. The sqsAttributeBaseType{String,Number} constant
extraction in sqs_messages.go fixes a pre-existing goconst
regression that the larger test surface tripped; the validator
switch now reads canonical constants instead of inline literals.

Tests (adapter/sqs_admin_peek_test.go) cover:
  - happy path with body / MessageID / SentTimestamp assertions
  - peek does NOT bump ApproximateReceiveCount across two calls
  - body truncation reports BodyTruncated + BodyOriginalSize
  - Limit clamping (0 to default 20, 500 to max 100)
  - cursor round-trip across two pages, no overlap, complete
    coverage
  - stale-generation cursor after purge to ErrAdminSQSValidation
  - oversize / bad-base64 / bad-JSON / wrong-version cursors all
    map to ErrAdminSQSValidation
  - read-only principal allowed; role-less denied
  - missing queue to ErrAdminSQSNotFound
  - empty / whitespace name to ErrAdminSQSValidation
  - DelaySeconds > 0 hides the message until visible_at elapses
  - FIFO MessageGroupId / MessageDeduplicationId both projected
  - typed MessageAttribute (String + Binary) round-trips
  - partitioned FIFO (PartitionCount=4): every sent message
    surfaces exactly once across the union of paginated peek
    pages, regardless of the rotated start partition
  - cursor codec unit tests: round-trip, empty-cursor sentinel,
    length cap, generation mismatch
  - clampPeekLimit / clampPeekBodyBytes truth tables
  - projectPeekedAttributes nil-safety and DataType preservation
  - AdminRole.canRead() truth table

Out of scope (deferred to a follow-up):
  - Throttle integration (bucketActionAdminPeek + dedicated
    per-queue admin-peek bucket per design doc 3.1). Leader-only
    + Limit cap already bound the cost; the dedicated bucket adds
    a steady-rate cap that should land alongside the SPA wiring
    so the rate-limit metric has a real consumer.
  - HTTP handler + bridge - Phase 4.
  - principalForReadSensitive live RoleStore re-check - depends
    on the wider RoleStore plumbing landing first (design doc
    Goal 8, noted as a future direction).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 20, 2026

@claude review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 58 minutes and 38 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d3818197-7602-4095-a5a0-5087093df3d5

📥 Commits

Reviewing files that changed from the base of the PR and between c9a7870 and 9bfaacf.

📒 Files selected for processing (4)
  • adapter/dynamodb_admin.go
  • adapter/sqs_admin_peek.go
  • adapter/sqs_admin_peek_test.go
  • adapter/sqs_messages.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sqs-admin-peek-queue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Claude finished @bootjp's task in 6m 20s —— View job


Code Review

  • Gather context and read changed files
  • Review adapter/sqs_admin_peek.go (core implementation)
  • Review adapter/sqs_admin_peek_test.go (test coverage)
  • Review adapter/sqs_messages.go (constant extraction)
  • Review adapter/dynamodb_admin.go (minor additions)
  • Compile findings and post review

Summary

Solid, well-structured addition. The design is clean, the pagination cursor is versioned and generation-checked, and the test suite is thorough. One medium-severity bug (potential infinite loop via forged cursor) requires a fix; several minor points below.


Medium — Infinite loop via forged StartPartition

adapter/sqs_admin_peek.go:262-278 (preparePeekCursor) and adapter/sqs_admin_peek.go:297-321 (walkPeek)

decodePeekCursor validates schema version but does not validate that cursor.StartPartition < meta.PartitionCount. walkPeek's loop termination condition is:

nextPart := (cursor.Partition + 1) % meta.PartitionCount
if nextPart == cursor.StartPartition {
    return rows, nil, nil
}

If an authenticated admin principal submits a cursor where StartPartition >= meta.PartitionCount (e.g. StartPartition = math.MaxUint32), nextPart cycles through [0, PartitionCount) via modulo but never equals StartPartition, so the loop runs indefinitely as long as all partitions are exhausted (empty or all-delayed queue). An empty/all-delayed partitioned FIFO queue is the expected steady-state for a DLQ after a purge, making this reachable in practice.

The fix is a range check in preparePeekCursor on the continuation path:

if cursor == nil {
    ...
}
// Validate partition ranges before trusting client-supplied cursor fields.
if meta.PartitionCount > 1 && cursor.StartPartition >= meta.PartitionCount {
    return nil, errors.Wrap(ErrAdminSQSValidation,
        "admin peek: cursor StartPartition out of range")
}
if meta.PartitionCount > 1 && cursor.Partition >= meta.PartitionCount {
    return nil, errors.Wrap(ErrAdminSQSValidation,
        "admin peek: cursor Partition out of range")
}
if cursor.Generation != meta.Generation {
    ...
}

A corresponding test case for TestAdminPeekQueue_CursorMalformed or a new TestPreparePeekCursor_OutOfRangePartition would lock this down. Fix this →


Minor — truncatePeekBody splits at byte boundary (may yield invalid UTF-8)

adapter/sqs_admin_peek.go:447-453

return string(body[:bodyMaxBytes]), true, originalSize

SQS message bodies are opaque byte sequences, so a UTF-8 multibyte character (e.g. a Japanese queue name in the body) can be split at bodyMaxBytes, yielding a string with invalid UTF-8. The BodyTruncated=true flag tells the SPA to expect this, but it's worth documenting explicitly in the struct comment on AdminPeekedMessage.Body (or in truncatePeekBody's godoc) that the body may not be valid UTF-8 when truncated. The SPA must sanitize before display/JSON re-encode.


Minor — Peek side-effects the receive fanout counter

adapter/sqs_admin_peek.go:236

cursor, err = preparePeekCursor(cursor, meta, s.nextReceiveFanoutStart(name, meta.PartitionCount))

nextReceiveFanoutStart atomically increments the fanout counter even on read-only peek calls. This is documented as intentional ("rotated sequential scanning… reusing the receive path's receiveFanoutCounters for fairness"), but the consequence — a high-frequency admin polling loop (SPA tab refresh) subtly biases the receive path's partition rotation — is not documented in the code or design doc. Low risk with Limit ≤ 100 and leader-only, but worth a comment on AdminPeekQueue so the next person doesn't "fix" it back to a dedicated counter.


Minor — AdminPeekedAttribute(v) type cast is silent if structs diverge

adapter/sqs_admin_peek.go:466

out[name] = AdminPeekedAttribute(v)

This relies on AdminPeekedAttribute and sqsMessageAttributeValue having identical field names and types (Go ignores struct tags for conversion). The comment in projectPeekedAttributes documents this, but if sqsMessageAttributeValue gains a new field (e.g. a future CustomTypeHeader string) the cast will fail to compile — which is actually the right behavior (compiler enforces sync). No action needed, just noting it's by design and the compile error is the guard.


Nit — Missing comment on sqsAttributeBaseTypeString and sqsAttributeBaseTypeNumber

adapter/sqs_messages.go:1724-1725

sqsAttributeBaseTypeBinary has a doc comment explaining why the canonical name matters (it's part of the wire contract). The two new constants sqsAttributeBaseTypeString and sqsAttributeBaseTypeNumber are symmetrical and could benefit from a brief comment for consistency. Small, but aligns with the existing documentation pattern.


What looks good

  • Generation check in preparePeekCursor: Clean sentinel on stale cursors — forces SPA refresh rather than silently serving rows from a purged generation. The TestAdminPeekQueue_StaleGenerationCursor integration test pins the behaviour end-to-end.
  • Consistent MVCC snapshot: Single nextTxnReadTS threaded through loadQueueMetaAt, ScanAt, and every GetAt — the page is snapshot-isolated. The now for vis-bounds is captured once at the top of walkPeek (consistent across partitions), matching the receive path pattern.
  • Tombstone skip is defensive, not wrong: The !ok skip in walkPeekPartition (record absent despite vis-index entry) is a correct defensive guard; DeleteMessage atomically removes both entries so the skip fires only on crash/race edge cases.
  • clampPeekLimit / clampPeekBodyBytes truth-table tests: Covering all clamp branches explicitly rather than relying on integration tests to exercise them.
  • canRead() placement: Added to dynamodb_admin.go next to canWrite() where AdminRole is defined — right location.
  • Constant extraction in sqs_messages.go: Byte-identical rename; the switch case (sqsAttributeBaseTypeString, sqsAttributeBaseTypeNumber) reads correctly.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the AdminPeekQueue functionality for SQS, allowing users to inspect messages without affecting their visibility or receive counts. The implementation includes a new canRead authorization gate, support for paginated results via base64url-encoded JSON cursors, and configurable message body truncation. Comprehensive tests have been added to verify the behavior across standard and partitioned FIFO queues, ensuring that the peek operation remains non-destructive and respects security roles. I have no feedback to provide as there were no review comments to assess.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5838bddb7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/sqs_admin_peek.go
Comment on lines +274 to +278
if cursor.Generation != meta.Generation {
return nil, errors.Wrap(ErrAdminSQSValidation,
"admin peek: cursor is from a prior generation; restart from the front")
}
return cursor, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject out-of-range cursor partitions

preparePeekCursor validates only Generation, so a client can submit a cursor whose StartPartition is outside [0, meta.PartitionCount). In walkPeek, termination depends on nextPart == cursor.StartPartition; with an out-of-range start partition this condition is never true, so an empty/fully exhausted partitioned queue causes an unbounded loop of ScanAt calls and the request never returns. This is a request-amplification DoS against the admin endpoint and should be rejected as ErrAdminSQSValidation before scanning.

Useful? React with 👍 / 👎.

bootjp added a commit that referenced this pull request May 20, 2026
Codex r1 P1 on PR #794: walkPeek's partition-rotation termination
condition is `(cursor.Partition + 1) % meta.PartitionCount ==
cursor.StartPartition`. With a client-supplied cursor whose
StartPartition lies outside [0, PartitionCount, the modulo cycle
never matches and the call loops ScanAt over guaranteed-empty
partitions forever — a request-amplification DoS against the
admin endpoint.

preparePeekCursor now bounds-checks both StartPartition and
Partition against max(meta.PartitionCount, 1) and rejects
out-of-range values with ErrAdminSQSValidation. Non-partitioned
queues (PartitionCount <= 1) require both fields to be 0.

Caller audit: peek cursors are constructed in exactly two places —
preparePeekCursor itself (now validated) and walkPeek's
partition-advance (,
which produces values in [0, PartitionCount) by construction).
No other caller. Generation validation already handles the
purge-between-pages case separately.

Tests: regression test feeds a wire-level cursor with
StartPartition=999 / Partition=999 against a PartitionCount=4
queue and asserts ErrAdminSQSValidation under a 5-second deadline
so any future loss of the bounds check terminates rather than
hanging CI. Four preparePeekCursor unit tests pin the truth table
(partitioned out-of-range / non-partitioned non-zero / in-range
accepted).
EOF
)
Codex r1 P1 on PR #794: walkPeek's partition-rotation termination
condition is "(Partition + 1) mod PartitionCount == StartPartition".
With a client-supplied cursor whose StartPartition lies outside
[0, PartitionCount), the modulo cycle never matches and the call
loops ScanAt over guaranteed-empty partitions forever - a
request-amplification DoS against the admin endpoint.

preparePeekCursor now bounds-checks both StartPartition and
Partition against max(meta.PartitionCount, 1) and rejects
out-of-range values with ErrAdminSQSValidation. Non-partitioned
queues (PartitionCount <= 1) require both fields to be 0.

Caller audit: peek cursors are constructed in exactly two places -
preparePeekCursor itself (now validated) and walkPeek's
partition-advance ((Partition+1) mod PartitionCount), which
produces values in [0, PartitionCount) by construction. No other
caller. Generation validation already handles the purge-between-
pages case separately.

Tests: regression test feeds a wire-level cursor with
StartPartition=999 / Partition=999 against a PartitionCount=4
queue and asserts ErrAdminSQSValidation under a 5-second deadline
so any future loss of the bounds check terminates rather than
hanging CI. Four preparePeekCursor unit tests pin the truth table
(partitioned out-of-range / non-partitioned non-zero / in-range
accepted).
@bootjp bootjp force-pushed the feat/sqs-admin-peek-queue branch from ed24325 to 9bfaacf Compare May 20, 2026 11:43
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 20, 2026

@claude review

Addressed Codex r1 P1 in 9bfaacf:

  • preparePeekCursor now rejects out-of-range StartPartition / Partition with ErrAdminSQSValidation. Non-partitioned queues (PartitionCount <= 1) require both fields to be 0.
  • Closes the unbounded-loop DoS vector: walkPeek's '(Partition+1) mod PartitionCount == StartPartition' termination never fired for an out-of-range StartPartition, looping ScanAt forever.
  • Regression test runs under a 5-second deadline so any future loss of the bounds check terminates rather than hanging CI; four unit tests pin the truth table.
  • Caller audit: walkPeek's partition advance ((p+1) mod PartitionCount) is the only other cursor producer; it yields values in [0, PartitionCount) by construction.

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Claude finished @bootjp's task in 1m 39s —— View job


Code Review

Reviewing the r1 P1 fix (out-of-range cursor partition bounds check) from commit 9bfaacf.

  • Load tools
  • Read adapter/sqs_admin_peek.go — verify bounds checks in preparePeekCursor
  • Read adapter/sqs_admin_peek_test.go — verify regression tests
  • Post review

r1 P1 Fix: Partition bounds check in preparePeekCursor

The fix is correct and complete. Here's a thorough assessment.

What changed

adapter/sqs_admin_peek.go:288-300

maxPartition := meta.PartitionCount
if maxPartition <= 1 {
    // Non-partitioned queues only have partition 0. A non-zero
    // Partition or StartPartition would also fail the rotation
    // termination check; reject explicitly so the error is the
    // documented 400, not a silent O(infty) loop.
    maxPartition = 1
}
if cursor.StartPartition >= maxPartition || cursor.Partition >= maxPartition {
    return nil, errors.Wrapf(ErrAdminSQSValidation,
        "admin peek: cursor partition index out of range (StartPartition=%d, Partition=%d, max=%d)",
        cursor.StartPartition, cursor.Partition, maxPartition)
}

The maxPartition normalization (PartitionCount <= 1 → 1) correctly maps both non-partitioned queues (PartitionCount=0) and single-partition queues (PartitionCount=1) to maxPartition=1, requiring both fields to be 0. Partitioned queues use PartitionCount directly. Both StartPartition and Partition are checked in one condition so neither can independently escape the valid range. The error message embeds the actual field values for operator diagnostics.

The generation check is placed before the bounds check (line 284). A cursor that is both stale-generation and out-of-range will return the generation error message rather than the bounds error — this is fine, both are ErrAdminSQSValidation and both correctly terminate the call.

Test coverage assessment

TestPreparePeekCursor_PartitionOutOfRange (lines 771-826): Four sub-cases that pin the full truth table:

Sub-case Cursor input Meta Expected
StartPartition >= PartitionCount SP=99, P=0 PC=4 ErrAdminSQSValidation
Partition >= PartitionCount SP=0, P=99 PC=4 ErrAdminSQSValidation
Non-partitioned: non-zero StartPartition SP=1, P=0 PC=0 ErrAdminSQSValidation
Non-partitioned: non-zero Partition SP=0, P=5 PC=1 ErrAdminSQSValidation
In-range cursor accepted SP=3, P=2 PC=4 nil (accepted)

All four rejection paths are structurally distinct — the acceptance case verifies valid cursors aren't blocked.

TestAdminPeekQueue_HostileCursorBoundedRequest (lines 834-862): End-to-end regression with a 5-second context.WithTimeout. The hostile cursor (StartPartition=999, Partition=999) is wire-encoded and passed through the full AdminPeekQueue stack. The deadline is the key safety net: if the bounds check is ever removed, the test times out (context deadline exceeded) rather than returning ErrAdminSQSValidation, making the regression immediately visible in CI rather than hanging the suite.

One minor observation: the hostile cursor uses Generation: 1, but the queue's actual generation may differ (depending on what installPartitionedMetaForTest initializes). If they differ, the generation check fires before the bounds check. The test still passes because ErrAdminSQSValidation is returned either way — it just may not exercise the partition bounds path end-to-end when the generations don't match. This is acceptable because TestPreparePeekCursor_PartitionOutOfRange pins the bounds check in isolation, and the integration test pins the "hostile cursor → 400" property regardless of which guard fires.

preparePeekCursor docstring

The rationale comment (lines 258-266) is exactly the right level of detail — it names the attack vector, explains why the termination condition fails, and cites the PR and issue. Future readers won't be tempted to "simplify" the check.

Verdict

The fix closes the DoS vector correctly. The unit-level truth table + end-to-end deadline regression is the right pairing. No additional changes needed for this fix.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9bfaacfa4f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/sqs_admin_peek.go
if !exists {
return nil, "", ErrAdminSQSNotFound
}
cursor, err = preparePeekCursor(cursor, meta, s.nextReceiveFanoutStart(name, meta.PartitionCount))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid advancing fanout rotation on peek cursor resumes

AdminPeekQueue always calls nextReceiveFanoutStart before preparePeekCursor, even when the request already carries a continuation cursor. Because this counter is shared with ReceiveMessage, every extra peek page advances the receive start partition and can reintroduce fixed-stride aliasing (e.g., 4 partitions and 3 peek pages between receives makes every receive start at the same partition). Under sustained backlog with small MaxNumberOfMessages, this can repeatedly prioritize the same partitions and starve others, so continuation requests should not mutate the shared receive-rotation counter.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant