feat(encryption): Stage 6D-2 — ErrNodeIDCollision + ErrLocalEpochRollback primitives + probe-node-id CLI by bootjp · Pull Request #788 · bootjp/elastickv

bootjp · 2026-05-18T09:27:03Z

Summary

Stage 6D-2 per the 6D design doc §5 (landed in PR #786). Ships the two bundled 6C-3 startup-guard primitives as pure functions in the encryption package, plus the probe-node-id CLI helper from §5.1 mitigation #2.

Operator-inert until 6D-6 wires the primitives into main.go's startup-guard phase. The probe-node-id CLI is reachable today but only computes locally — no operator action against the cluster.

What this ships

Two new sentinel errors (`internal/encryption/errors.go`)

ErrNodeIDCollision — two distinct full_node_id values hash to the same 16-bit node_id (would reuse GCM nonces under the active DEK). Triage message points at probe-node-id for pre-join verification.
ErrLocalEpochRollback — sidecar local_epoch <= registry.LastSeenLocalEpoch (strict-ahead invariant from §5.2). Triage message points at resync-sidecar or DEK rotation.

Two pure-function guard primitives

CheckNodeIDCollision(fullNodeIDs []uint64) error (node_id_collision.go)
CheckLocalEpochRollback(registry, fullNodeID, dekID, sidecarLocalEpoch) error (local_epoch_rollback.go)

Both use the shipped uint16(full_node_id & 0xFFFF) narrowing (matches applier.go::nodeIDMask). Local-state-only — no RPC, no engine access. Same-value-twice is NOT a collision (duplicate-membership-entry defense). Equality fires ErrLocalEpochRollback per the §5.2 strict-ahead invariant (replay would reuse the same (node_id, local_epoch) prefix).

CLI: `elastickv-admin encryption probe-node-id`

Per §5.1 mitigation #2 — operator runs against a candidate full_node_id BEFORE joining a node to verify the value is free in the cluster's allocated node_id space. Accepts decimal or 0x-prefixed hex input; prints both the 16-hex-digit canonical full_node_id and the derived node_id. No RPC; pure local computation.

Test coverage

19 new tests:

File	Tests
`node_id_collision_test.go`	Empty, SingleNode, NoCollision, Collision, DuplicateNotCollision (same value twice ≠ collision), OrderIndependent, ErrorIncludesBothIDs
`local_epoch_rollback_test.go`	NoRow (freshly-joined-learner skip), SidecarStrictlyAhead, SidecarLessThan, SidecarEqual (strict-ahead replay case), PebbleError (not classified as rollback), NarrowingMatchesShippedConvention
`encryption_test.go` (new tests added)	ProbeNodeID_Hex, ProbeNodeID_Decimal, RequiresFlag, RejectsBadInput, ProbeNodeIDSubcommand (dispatch reaches handler)

Caller audit

No semantic changes to existing functions; only additions:

New sentinels in errors.go — exported package-surface additions, no callers to update.
New pure functions — no callers yet (operator-inert until 6D-6).
New CLI subcommand probe-node-id — additive; no existing subcommand changed.

Five-lens self-review

Data loss — net-positive. Both guards catch nonce-reuse scenarios BEFORE any encrypted write that would corrupt confidentiality.
Concurrency / distributed failures — pure functions over local snapshots; no shared state. Run before any goroutine fan-out.
Performance — O(N) map walk for collision check; single Pebble Get for rollback check. Paid once per process start on happy path.
Data consistency — narrowing matches the shipped nodeIDMask = 0xFFFF convention from applier.go. The NarrowingMatchesShippedConvention test pins this against a future change that would diverge.
Test coverage — 19 new unit tests cover every documented branch + operationally important edges (equality for strict-ahead, narrowing equivalence, error propagation).

Verification

go test -race -timeout=60s ./internal/encryption/ ./cmd/elastickv-admin/ — PASS
golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ — 0 issues
go build ./... — PASS

Plan

Per the 6D design doc §7 decomposition, the next sub-PRs are:

6D-3 — Voters ∪ Learners CapabilityFanout helper (used by 6D-6's pre-flight gate and by Stage 6E)
6D-4 — Wire format addition (RotateSubEnableStorageEnvelope = 0x04) + FSM apply path with len(Wrapped) == 0 check + sidecar StorageEnvelopeCutoverIndex write
6D-5 — §6.2 storage-layer toggle (PutAt reads StorageEnvelopeActive)
6D-6 — EnableStorageEnvelope admin RPC + CLI + integration test + main.go wiring of THIS PR's startup-guard primitives

Test plan

go test -race -timeout=60s ./internal/encryption/ ./cmd/elastickv-admin/ — PASS (19 new tests + existing)
golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ — 0 issues
go build ./... — PASS
Jepsen — not relevant for an operator-inert PR; the 6D-6 integration test will exercise the wired guards end-to-end

Summary by CodeRabbit

New Features
- Added a CLI subcommand to display a provided full node identifier (decimal or hex) and its derived 16-bit node id.
- Added startup validation checks that detect 16-bit node-id collisions and local-epoch rollback conditions to prevent unsafe encryption states.
Tests
- Added unit and integration tests covering the new CLI parsing, collision detection, and local-epoch rollback behaviors.

…back primitives + probe-node-id CLI Stage 6D-2 per the 6D design doc 5 (PR #786, merged). Ships the two bundled 6C-3 startup-guard primitives as pure functions in the encryption package, plus the probe-node-id CLI helper from 5.1 mitigation #2. Operator-inert until 6D-6 wires the primitives into main.go's startup-guard phase. ## What this ships ### Two new sentinel errors - ErrNodeIDCollision (errors.go): two distinct full_node_id values in the local route-catalog hash to the same 16-bit node_id, which would reuse GCM nonces under the active DEK. Triage message points at the probe-node-id CLI for pre-join verification. - ErrLocalEpochRollback (errors.go): sidecar local_epoch for the active storage DEK is <= the local writer-registry last_seen_local_epoch (strict-ahead invariant from 5.2 of the 6D design doc). Triage message points at resync-sidecar or DEK rotation. ### Two pure-function guard primitives - CheckNodeIDCollision(fullNodeIDs []uint64) error (node_id_collision.go): narrows each full_node_id via uint16(full_node_id & 0xFFFF) - same shipped mask as applier.go's nodeIDMask - and returns ErrNodeIDCollision if any two DISTINCT values share a node_id. Order- independent. Same-value-twice is NOT a collision (duplicate-membership-entry quirk; the route catalog watcher should dedupe upstream but the primitive defends correctness here). Local-state-only; no RPC, no engine access. - CheckLocalEpochRollback(registry, fullNodeID, dekID, sidecarLocalEpoch) error (local_epoch_rollback.go): reads the local Pebble writer-registry row at RegistryKey(dekID, uint16(fullNodeID & 0xFFFF)). Returns nil if no row exists (freshly-joined learner; 4.1 case-1 first-seen creates the row on first encrypted write). Returns ErrLocalEpochRollback when sidecar <= registry (the strict-ahead invariant - equality is NOT safe because the same (node_id, local_epoch) prefix would be replayed). Pebble I/O errors propagate as wrapped errors NOT marked with ErrLocalEpochRollback so the operator triages transport failure separately from a real rollback. ### CLI: probe-node-id - elastickv-admin encryption probe-node-id --full-node-id=<u64> (cmd/elastickv-admin/encryption.go): derives the 16-bit node_id from a candidate full_node_id using the same shipped narrowing convention. Operator runs it against candidates BEFORE joining a new node to verify the value is free in the cluster's allocated node_id space - the recommended mitigation per the 6D design doc 5.1 mitigation menu. Accepts decimal or 0x-prefixed hex input; prints both the 16-hex-digit canonical full_node_id and the derived node_id. ## Test coverage 11 new tests pin the load-bearing properties: node_id_collision_test.go (8 tests): - Empty - skip - SingleNode - skip - NoCollision - nil - Collision - ErrNodeIDCollision - DuplicateNotCollision - same full_node_id twice is NOT a collision - OrderIndependent - either ordering of the colliding pair catches the hit - ErrorIncludesBothIDs - operator-triage shape names both colliding values and the shared node_id local_epoch_rollback_test.go (6 tests): - NoRow - freshly-joined-learner skip - SidecarStrictlyAhead - nil (healthy case) - SidecarLessThan - ErrLocalEpochRollback (classic rollback) - SidecarEqual - ErrLocalEpochRollback (strict-ahead invariant; replay scenario) - PebbleError - underlying err propagated, NOT classified as ErrLocalEpochRollback - NarrowingMatchesShippedConvention - looks up registry row at uint16-narrowed key matching applier.go:230 encryption_test.go (5 new tests for probe-node-id): - Hex - 0x-prefixed input - Decimal - decimal input - RequiresFlag - missing --full-node-id is an error - RejectsBadInput - non-numeric/non-hex input is an error - EncryptionMain_ProbeNodeIDSubcommand - dispatch reaches runEncryptionProbeNodeID (not the unknown-subcommand default branch) ## Caller audit No semantic changes to existing functions; only additions. - New sentinels in errors.go - exported package surface additions, no caller audit needed (no existing callers to update). - New pure functions in node_id_collision.go and local_epoch_rollback.go - no callers yet (operator-inert until 6D-6 wires them). - New CLI subcommand probe-node-id - additive; no existing subcommand changed. ## Five-lens self-review 1. Data loss - net-positive. Both guards catch potential nonce-reuse scenarios BEFORE the first encrypted write that would corrupt confidentiality. 2. Concurrency / distributed failures - primitives are pure functions over local snapshots; no shared state. Run in the startup-before-serving phase before any goroutine fan-out. 3. Performance - O(N) hash map walk for CheckNodeIDCollision (N = cluster member count). Single Pebble Get for CheckLocalEpochRollback. Both paid once per process start on the happy path. 4. Data consistency - matches the shipped 16-bit narrowing convention from applier.go (nodeIDMask = 0xFFFF) and the writer-registry keying. The narrowing-matches test pins this against a future change that would diverge. 5. Test coverage - 11 new unit tests cover every documented branch + the operationally important edge cases (equality for strict-ahead, narrowing equivalence, error propagation). ## Verification - go test -race -timeout=60s ./internal/encryption/ ./cmd/elastickv-admin/ - PASS - golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ - 0 issues - go build ./... - PASS ## Plan Stage 6D-2 ships the primitives + CLI as the bundled 6C-3 piece per the design doc 7 decomposition. Next sub-PRs per 7: - 6D-3: Voters union Learners CapabilityFanout helper - 6D-4: Wire format + FSM apply path - 6D-5: 6.2 storage-layer toggle - 6D-6: EnableStorageEnvelope admin RPC + CLI + integration test + main.go wiring of THIS PR's primitives The two primitives in this PR are operator-inert until 6D-6 wires them into main.go's startup-guard phase. The probe-node-id CLI is reachable today but only computes locally - no operator action required against the cluster.

coderabbitai · 2026-05-18T09:27:11Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 561c10d7-0267-40a7-914a-29a3717440e4

📥 Commits

Reviewing files that changed from the base of the PR and between eab8c49 and 5aaa3e0.

📒 Files selected for processing (5)

cmd/elastickv-admin/encryption.go
cmd/elastickv-admin/encryption_test.go
internal/encryption/errors.go
internal/encryption/local_epoch_rollback.go
internal/encryption/local_epoch_rollback_test.go

🚧 Files skipped from review as they are similar to previous changes (3)

cmd/elastickv-admin/encryption_test.go
internal/encryption/errors.go
cmd/elastickv-admin/encryption.go

📝 Walkthrough

Walkthrough

Adds startup guards and diagnostics for storage-envelope cutover: CheckNodeIDCollision (detects 16-bit node-ID collisions), CheckLocalEpochRollback (enforces local-epoch monotonicity via the writer-registry), exported sentinel errors, and an admin CLI probe-node-id subcommand with tests.

Changes

Storage-envelope startup validation

Layer / File(s)	Summary
Error sentinels `internal/encryption/errors.go`	Defines exported `ErrNodeIDCollision` and `ErrLocalEpochRollback` used by startup guards.
Node-ID collision detection `internal/encryption/node_id_collision.go`, `internal/encryption/node_id_collision_test.go`	`CheckNodeIDCollision` checks that distinct `full_node_id` values do not collide when narrowed to 16-bit `node_id`. Tests cover empty/single inputs, non-collision, collision detection, duplicate handling, order-independence, and error message contents.
Local epoch rollback detection `internal/encryption/local_epoch_rollback.go`, `internal/encryption/local_epoch_rollback_test.go`	`CheckLocalEpochRollback` looks up the writer-registry by `(DEK ID, uint16(full_node_id & 0xFFFF))` and returns `ErrLocalEpochRollback` when sidecar epoch ≤ LastSeenLocalEpoch or when an active envelope lacks a proving registry row. Tests include pre/post-cutover missing-row behavior, epoch comparisons, registry errors propagation, foreign-row handling, and key-narrowing convention checks.
CLI probe tool for node-ID narrowing `cmd/elastickv-admin/encryption.go`, `cmd/elastickv-admin/encryption_test.go`	Adds `probe-node-id --full-node-id` (accepts `0x` hex or decimal), `parseUint64WithRadix` helper, and tests validating canonical output, required flag behavior, malformed input rejection, partial-parse rejection, help flag, and main dispatch routing.

Sequence Diagram

sequenceDiagram
  participant Startup as Startup
  participant CollisionCheck as CheckNodeIDCollision
  participant EpochCheck as CheckLocalEpochRollback
  participant Registry as WriterRegistry
  participant CLI as elastickv-admin(probe-node-id)

  Startup->>CollisionCheck: validate full_node_id list
  CollisionCheck->>CollisionCheck: narrow each full_node_id to uint16
  CollisionCheck-->>Startup: nil or ErrNodeIDCollision

  Startup->>EpochCheck: validate sidecarLocalEpoch for a node
  EpochCheck->>Registry: GetRegistryRow(DEK, uint16(full&0xFFFF))
  Registry-->>EpochCheck: row / missing / error
  EpochCheck-->>Startup: nil or ErrLocalEpochRollback

  CLI->>CLI: parse --full-node-id (hex or decimal) via parseUint64WithRadix
  CLI->>CLI: narrow to uint16 and format output
  CLI-->>Operator: printed full_node_id (hex) and node_id (uint16)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

bootjp/elastickv#759: Implements bootstrap encryption validation that rejects duplicate full_node_id and uint16-derived node-id collisions, aligning with this PR's collision logic.
bootjp/elastickv#707: Design-doc changes matching node-ID collision detection and local-epoch rollback semantics implemented here.
bootjp/elastickv#765: Contains related applier scaffolding that halts on uint16 node-id collisions and local-epoch rollback using the same masked node-id convention.

"I hopped through bytes beneath the moon,
Checked low sixteen bits by noon,
No collisions in my burrowed den,
Epochs march onward—safe again!" 🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the three main components of this PR: two new sentinel errors (ErrNodeIDCollision and ErrLocalEpochRollback) and their supporting primitives, plus the new probe-node-id CLI subcommand.
Docstring Coverage	✅ Passed	Docstring coverage is 96.77% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/encryption-stage6d2-startup-guards

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bootjp · 2026-05-18T09:27:27Z

@claude review

Stage 6D-2 ships the two bundled 6C-3 startup-guard primitives per §5 of the 6D design doc (#786, merged):

ErrNodeIDCollision + CheckNodeIDCollision — pure function over local route-catalog snapshot; uses the shipped uint16(full_node_id & 0xFFFF) narrowing (matches applier.go::nodeIDMask). Duplicate-same-value defended explicitly (catalog watcher should dedupe, primitive defends correctness).
ErrLocalEpochRollback + CheckLocalEpochRollback — pure function; reads local Pebble writer-registry. Strict-ahead invariant (equality is rollback). Pebble I/O errors propagate as wrapped errors, NOT classified as ErrLocalEpochRollback (operator triages transport failure separately).
elastickv-admin encryption probe-node-id CLI — per §5.1 mitigation Update dependency go to v1.21.4 #2. Decimal or 0x-hex input.

Operator-inert until 6D-6 wires the primitives into main.go's startup-guard phase.

Please validate:

Narrowing consistency — the two primitives + the CLI must use the SAME 16-bit narrowing convention as the shipped writer-registry keying. Specifically: uint16(full_node_id & 0xFFFF), matching applier.go:230 and encryption_admin.go:64. The NarrowingMatchesShippedConvention test pins this against future divergence.
Strict-ahead invariant for ErrLocalEpochRollback — equality is a rollback case per §5.2 of the design doc (the replay scenario). The SidecarEqual test pins this.
Caller audit — both new sentinels are exported package-surface additions with no existing callers. Both new primitives have no callers yet (operator-inert). The new CLI subcommand is additive; no existing subcommand changed. No semantic changes to existing functions.
Test coverage — 19 new tests across three files; every documented branch + the operationally important edges (equality, narrowing equivalence, error propagation, dispatch).
Operator-inert contract — the wiring into main.go's startup-guard phase is deferred to 6D-6 per the design doc §7. This PR is pure scaffolding ready for that wiring; the probe-node-id CLI is the only operator-visible surface and it's a pure local computation (no RPC).

Verification:

go test -race -timeout=60s ./internal/encryption/ ./cmd/elastickv-admin/ — PASS
golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ — 0 issues
go build ./... — PASS

claude · 2026-05-18T09:27:41Z

Claude finished @bootjp's task in 4m 1s —— View job

Stage 6D-2 Code Review

Tasks:

Gather context and read changed files
Validate narrowing consistency (uint16(full_node_id & 0xFFFF))
Validate strict-ahead invariant for ErrLocalEpochRollback
Perform caller audit
Validate test coverage
Validate operator-inert contract
Post final review

Review

1. Narrowing Consistency ✅

All three sites correctly use uint16(x & 0xFFFF):

Site	Implementation
`node_id_collision.go:59`	`uint16(fnid & nodeIDMask)` — uses package-level `nodeIDMask = 0xFFFF` from `applier.go:670`
`local_epoch_rollback.go:65`	`uint16(fullNodeID&nodeIDMask)` — same package-level constant
`encryption.go:94-95`	`const nodeIDMask = 0xFFFF; uint16(full & nodeIDMask)` — local redefinition (correct; CLI is a different package)

All three match applier.go:230 (uint16(p.FullNodeID&nodeIDMask)). Consistent.

Minor comment accuracy note: node_id_collision.go:59's nolint comment says "matches applier.go/encryption_admin.go convention", but encryption_admin.go uses localEpochMask for a different purpose (not node ID narrowing). The reference to encryption_admin.go is a red herring; applier.go alone is the authoritative source. Not a bug, but slightly misleading for future readers.

2. Strict-Ahead Invariant ✅

local_epoch_rollback.go:85:

if sidecarLocalEpoch <= registryRow.LastSeenLocalEpoch {

Correctly fires on < (classic rollback) and == (replay case). The TestCheckLocalEpochRollback_SidecarEqual test pins the equality branch. Correct.

The Pebble I/O error path is also correctly handled — wrapping without marking as ErrLocalEpochRollback, verified by TestCheckLocalEpochRollback_PebbleError.

3. Caller Audit ✅

ErrNodeIDCollision, ErrLocalEpochRollback — new exports; no existing callers.
CheckNodeIDCollision, CheckLocalEpochRollback — new functions; no callers until 6D-6.
probe-node-id subcommand — additive; no existing subcommand changed; encryptionMain's switch, default, and help strings all updated consistently (encryption.go:46, encryption.go:52, encryption.go:58).

4. Test Coverage ✅ (with minor count discrepancy)

18 test functions total (PR description says 19 — minor miscounting):

node_id_collision_test.go: 7 tests
local_epoch_rollback_test.go: 6 tests
encryption_test.go: 5 new tests

All documented branches are covered. TestCheckLocalEpochRollback_NarrowingMatchesShippedConvention correctly pins the load-bearing invariant by seeding with one full_node_id and looking up with a different one that shares the same low 16 bits — verifying the registry key derivation matches the shipped convention.

5. Operator-Inert Contract ✅

No changes to main.go. Both primitives are pure functions over local state, no RPCs. probe-node-id is local computation only. The 6D-6 wiring is deferred as documented. Contract holds.

Issues Found

🔴 `parseUint64WithRadix` silently accepts partial input

encryption.go:107-117 uses fmt.Sscanf for both hex and decimal parsing. fmt.Sscanf does not require consuming the full input string — it returns success after matching the format even if trailing characters remain. This means:

"0x1234ZZZZ" → parses as 0x1234 without error
"1234abc" → parses as 1234 without error

TestRunEncryptionProbeNodeID_RejectsBadInput only tests "notanumber" (which fails because 'n' is not parseable at all), so this edge case is untested.

Fix: use strconv.ParseUint which rejects partial input:

// hex path
v, err := strconv.ParseUint(s[2:], 16, 64)

// decimal path  
v, err := strconv.ParseUint(s, 10, 64)

Fix this →

🟡 `const minMembersForCollision` placement breaks `go doc` attribution

node_id_collision.go:41-47: The 38-line comment block starting at line 7 forms one unbroken block immediately preceding const minMembersForCollision = 2 at line 45. In Go, a comment immediately preceding a declaration is attributed to that declaration by go doc. The comment block will therefore be attributed to the constant, not to CheckNodeIDCollision (which has a blank line before it at line 47). go doc encryption.CheckNodeIDCollision will produce no documentation.

Fix: hoist the constant and its short comment before the function's doc comment:

// minMembersForCollision is the smallest membership-set size at which
// two distinct full_node_id values can exist and a narrowing collision
// is possible. With 0 or 1 member there is nothing to compare.
const minMembersForCollision = 2

// CheckNodeIDCollision is the §9.1 / 6C-3 startup-guard primitive ...
func CheckNodeIDCollision(fullNodeIDs []uint64) error {

Fix this →

Summary

The five validation points all pass. The narrowing is consistent, the strict-ahead invariant is correct, the caller audit is clean, test coverage is comprehensive, and the operator-inert contract holds. The two issues above are the actionable items before merge: the strconv.ParseUint fix closes a silent truncation bug in the CLI input parser, and the constant placement fix restores go doc attribution for CheckNodeIDCollision.

gemini-code-assist

Code Review

This pull request introduces critical safety guards to prevent AES-GCM nonce reuse within the encryption system. Key additions include the probe-node-id administrative subcommand for verifying node ID mappings, and automated startup checks to detect node_id collisions and local_epoch rollbacks. Feedback was provided regarding a formatting discrepancy in the CLI output where the hex padding width incorrectly accounted for the '0x' prefix, which could lead to inconsistent canonical representations.

gemini-code-assist · 2026-05-18T09:30:21Z

+	if _, err := fmt.Fprintf(out, "full_node_id: %#016x (%d)\nnode_id:      %#04x (%d)\n",
+		full, full, narrowed, narrowed); err != nil {


The format string %#016x for full_node_id and %#04x for node_id does not align with the "canonical 16-hex-digit" and "4-hex-digit" forms described in the comments and expected by the tests. In Go, the width in %#0Nx includes the 0x prefix, so %#016x only provides 14 hex digits of padding. For example, TestRunEncryptionProbeNodeID_Decimal expects 0x0000000012345678 (18 characters), but the current code would produce 0x00000012345678 (16 characters). To consistently output the full 16 and 4 hex digits respectively, use 0x%016x and 0x%04x.

Suggested change

if _, err := fmt.Fprintf(out, "full_node_id: %#016x (%d)\nnode_id: %#04x (%d)\n",

full, full, narrowed, narrowed); err != nil {

if _, err := fmt.Fprintf(out, "full_node_id: 0x%016x (%d)\nnode_id: 0x%04x (%d)\n",

full, full, narrowed, narrowed); err != nil {

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a246a56259

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T09:32:56Z

+	if err := fs.Parse(args); err != nil {
+		return errors.Wrap(err, "parse probe-node-id flags")


Treat probe-node-id help as non-error exit

runEncryptionProbeNodeID wraps every fs.Parse error, so elastickv-admin encryption probe-node-id -h/--help returns an error and exits non-zero instead of behaving like other encryption subcommands that treat flag.ErrHelp as a successful usage request. This breaks shell/help workflows that rely on help returning success.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-18T09:32:56Z

+		if _, err := fmt.Sscanf(s[2:], "%x", &v); err != nil {
+			return 0, errors.Wrap(err, "hex parse")


Reject malformed full-node-id strings completely

Using fmt.Sscanf here accepts partial parses (e.g. 0xdeadbeefZZ or 123abc) and silently ignores trailing garbage, so mistyped --full-node-id inputs can be treated as valid and produce the wrong derived node_id. For a safety probe command, accepting truncated IDs can mislead operators into validating the wrong value before join.

Useful? React with 👍 / 👎.

…dium (go-doc attribution) Two findings from claude r1 review on PR #788: ## claude high - parseUint64WithRadix silently accepts partial input cmd/elastickv-admin/encryption.go's parseUint64WithRadix used fmt.Sscanf for both hex and decimal parsing. Sscanf returns success once it matches the format, even if the input has trailing unparsed characters: "0x1234ZZZZ" -> parses as 0x1234 without error "1234abc" -> parses as 1234 without error This was unsafe for an operator-visible identifier where every digit matters - a typo'd full_node_id with a non-hex tail would silently return a different value than the operator meant, leading the operator to join a node under the wrong node_id. Fix: switched to strconv.ParseUint, which requires the ENTIRE input to be valid for the chosen radix. Updated import set and added TestRunEncryptionProbeNodeID_RejectsPartialInput with three regression cases (hex prefix + non-hex tail, decimal prefix + letter tail, hex with embedded space). ## claude medium - minMembersForCollision placement broke go doc internal/encryption/node_id_collision.go had the minMembersForCollision constant declared IMMEDIATELY before CheckNodeIDCollision (no blank line). In Go, a doc comment preceding a declaration is attributed to that declaration, so the 38-line doc comment block was being attributed to the CONSTANT by go doc, not to the function. go doc encryption.CheckNodeIDCollision would have produced no documentation. Fix: hoisted the constant declaration ABOVE the function's doc comment, with its own short comment. Both declarations now have correct go-doc attribution. ## Gemini medium NOT addressed (verified false) Gemini r1 flagged %#016x as producing 14 hex digits (claiming the 0x prefix counted toward the width). Verified directly: Go emits 16 hex digits after the 0x prefix for %#016x (output 0x0000000012345678 = 18 chars total). My tests already assert this exact form and pass. Gemini's claim is incorrect per Go's actual format behavior; no change needed. ## Caller audit (per cron directive) parseUint64WithRadix is package-private; the only caller is runEncryptionProbeNodeID (one call site). The semantic change is "strict parse vs. silent prefix" - the strict parse is the safer posture and no existing test relied on the silent-prefix behavior (the existing RejectsBadInput test used "notanumber" which fails both implementations). minMembersForCollision is a package-private constant only used within node_id_collision.go. Hoisting the declaration is structural; no semantic change. ## Verification - go test -race -timeout=60s ./internal/encryption/ ./cmd/elastickv-admin/ - PASS - golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ - 0 issues - go build ./... - PASS No semantic changes to existing functions outside the parseUint64WithRadix bug-fix.

bootjp · 2026-05-18T09:36:35Z

@claude review

Round-1 (commit 9eecdb55): addressed both claude r1 findings.

claude high — `parseUint64WithRadix` silent-prefix bug

Switched from fmt.Sscanf to strconv.ParseUint. New regression test TestRunEncryptionProbeNodeID_RejectsPartialInput covers three cases (hex prefix + non-hex tail, decimal prefix + letter tail, hex with embedded space) that the previous Sscanf implementation would have silently truncated.

Caller audit: parseUint64WithRadix is package-private with a single caller (runEncryptionProbeNodeID). The semantic change "strict parse vs silent prefix" makes the function safer; no existing test relied on the silent-prefix behavior.

claude medium — go-doc attribution

Hoisted minMembersForCollision constant + its own short comment ABOVE the function's doc comment block. Now go doc encryption.CheckNodeIDCollision produces the intended 38-line function documentation.

Gemini r1 format-width finding — verified false, no change

Verified directly: Go's %#016x produces 16 hex digits AFTER the 0x prefix (output is 0x0000000012345678 = 18 chars total). My tests already assert this exact form and pass. Gemini's claim that the width includes the prefix is incorrect per Go's actual behavior.

Verification

go test -race ./internal/encryption/ ./cmd/elastickv-admin/ — PASS (21 new tests, +3 for the partial-input regression)
golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ — 0 issues

claude · 2026-05-18T09:36:50Z

Claude encountered an error after 0s —— View job

I'll analyze this and get back to you.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9eecdb559c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T09:47:25Z

+		// §5.2 skip-condition, this is a legitimate lifecycle
+		// state — the first encrypted write will create the row
+		// via §4.1 case-1 first-seen.
+		return nil


Preserve missing-row signal in CheckLocalEpochRollback

Returning nil when the writer-registry row is absent collapses two different states—"safe strict-ahead" and "no rollback anchor exists"—into the same success result. That makes it impossible for startup wiring to enforce the post-cutover requirement (when StorageEnvelopeActive is true) to refuse a missing (full_node_id, dek_id) row, and can let a node start without a nonce-reuse guardrail. The helper should surface this case (e.g., distinct error/signal) instead of unconditionally succeeding.

Useful? React with 👍 / 👎.

…l split per 5.2) Codex r1 P2 on PR788: CheckLocalEpochRollback collapsed two distinct states - "safe strict-ahead" and "no rollback anchor exists" - into the same nil return. The 6D design doc 5.2 explicitly splits these on storage_envelope_active: - PRE-cutover (storage_envelope_active == false): missing row is the freshly-joined-learner case (allow startup; 4.1 case-1 first-seen creates the row on first encrypted write) - POST-cutover (storage_envelope_active == true): missing row means encrypted writes are happening cluster-wide with no rollback anchor on this node (refuse startup; the guard's raison d'etre) The previous implementation collapsed both to nil, meaning a 6D-6 implementor wiring the guard would have no way to enforce the post-cutover requirement without reading the registry again. Codex's flag is valid. ## Fix: storageEnvelopeActive bool parameter Added a `storageEnvelopeActive` bool to CheckLocalEpochRollback. The missing-row branch now splits internally: - false + no row -> nil (pre-cutover freshly-joined-learner) - true + no row -> ErrLocalEpochRollback wrapped with "storage_envelope_active=true but no registry row" diagnostic This matches 5.2 exactly. The caller (main.go wiring in 6D-6 per the design doc 7) passes sidecar.StorageEnvelopeActive as the bool. ## Caller audit (per cron directive - SEMANTIC CHANGE) CheckLocalEpochRollback is currently called only from tests (operator-inert per 6D-2 scope). grep verified: - internal/encryption/local_epoch_rollback_test.go (6 callers) - No production callers (the main.go wiring is deferred to 6D-6 per the design doc 7) All six test callers updated to pass `false` (pre-cutover) as the new bool, preserving the existing assertion semantics. Added one new test for the post-cutover branch: TestCheckLocalEpochRollback_NoRow_PostCutover - verifies missing-row + active=true returns ErrLocalEpochRollback with the "storage_envelope_active=true" context message. Renamed the existing TestCheckLocalEpochRollback_NoRow to TestCheckLocalEpochRollback_NoRow_PreCutover to make the split explicit at the test level. ## Gemini r1 format-width finding - confirmed false, no change Format width %#016x produces 16 hex digits after the 0x prefix (output 0x0000000012345678 = 18 chars total). Verified via direct Go execution. No change needed. ## Verification - go test -race -timeout=60s ./internal/encryption/ ./cmd/elastickv-admin/ - PASS - golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ - 0 issues - go build ./... - PASS No semantic changes outside CheckLocalEpochRollback's signature.

bootjp · 2026-05-18T11:05:37Z

@claude review

Round-2 (commit eab8c49d): addressed codex r1 P2.

codex P2 — preserve missing-row signal in CheckLocalEpochRollback

The previous implementation collapsed two distinct states into nil:

"safe strict-ahead" (no row + pre-cutover → freshly-joined learner)
"no rollback anchor exists" (no row + post-cutover → cluster writing encrypted data with no monotonicity guard on this node)

The 6D design doc §5.2 explicitly splits these on storage_envelope_active; codex's flag is valid.

Fix

Added storageEnvelopeActive bool parameter to CheckLocalEpochRollback. The missing-row branch now splits:

false + no row → nil (pre-cutover)
true + no row → ErrLocalEpochRollback wrapped with "storage_envelope_active=true but no registry row"

Matches §5.2 exactly. The 6D-6 wiring will pass sidecar.StorageEnvelopeActive.

Caller audit (SEMANTIC CHANGE per cron directive)

CheckLocalEpochRollback is currently called only from tests (operator-inert per 6D-2 scope). grep -rn CheckLocalEpochRollback confirms:

6 test callers — all updated to pass false (preserve existing assertions)
0 production callers (6D-6 will wire it)

New test: TestCheckLocalEpochRollback_NoRow_PostCutover — verifies missing-row + active=true returns ErrLocalEpochRollback with the "storage_envelope_active=true" context. Existing TestCheckLocalEpochRollback_NoRow renamed to _PreCutover for symmetry.

Gemini r1 format-width finding — verified false, no change

%#016x produces 16 hex digits after the 0x prefix (output 0x0000000012345678 = 18 chars total). Verified via direct Go execution.

Verification

go test -race -timeout=60s ./internal/encryption/ ./cmd/elastickv-admin/ — PASS
golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ — 0 issues

claude · 2026-05-18T11:05:52Z

Claude finished @bootjp's task in 3m 37s —— View job

Stage 6D-2 Code Review — Round 2

Tasks:

Gather context on Round-2 changes
Read changed files
Validate storageEnvelopeActive parameter split (§5.2 missing-row guard)
Verify Gemini format-width finding (%#016x hex digit count)
Check caller audit completeness
Post final review

Review

1. `storageEnvelopeActive` fix (codex r1 P2) ✅

local_epoch_rollback.go:90-105 correctly splits the missing-row case on the new parameter:

if !ok {
    if storageEnvelopeActive {
        return pkgerrors.Wrapf(ErrLocalEpochRollback,
            "writer-registry has no row for full_node_id=%#x dek_id=%d but storage_envelope_active=true; cannot prove nonce monotonicity for the active DEK",
            fullNodeID, activeStorageDEKID)
    }
    return nil
}

storageEnvelopeActive=false + no row → nil (pre-cutover; §4.1 case-1 first-seen branch will create the row) ✅
storageEnvelopeActive=true + no row → ErrLocalEpochRollback with context string ✅
Error message includes "storage_envelope_active=true" — correctly pinned by TestCheckLocalEpochRollback_NoRow_PostCutover:93 ✅

Caller audit is clean: grep confirms 6 test callers (all updated to pass false) and 0 production callers. Signature change is backward-compatible at the 6D-6 wiring point; 6D-6 will pass sidecar.StorageEnvelopeActive.

Minor: errors.go:263-264 still says the "no registry row" case is an unconditional skip condition for ErrLocalEpochRollback. After this change it's only a skip for pre-cutover. The comprehensive doc in local_epoch_rollback.go:44-75 covers the split correctly; the errors.go skip-condition note is now slightly stale. Not a correctness issue — the primitive is authoritative — but worth a follow-up line edit.

2. Gemini format-width finding — correctly rejected ✅

Gemini claimed %#016x only produces 14 hex digits because "the width in %#0Nx includes the 0x prefix." This is C's behavior, not Go's.

In Go's fmt package, the 0 flag inserts zero padding after the 0x prefix, so the width applies to the hex digit portion only. %#016x for uint64(0x12345678) produces 0x0000000012345678 (18 chars: 0x + 16 hex digits), which is exactly what TestRunEncryptionProbeNodeID_Decimal:504 asserts.

The tests passing is confirmation — the test asserts "full_node_id: 0x0000000012345678" (16 hex digits after 0x) and "node_id: 0x1234" (4 hex digits) — both correct. Gemini's suggested rewrite to 0x%016x would produce identical output, not a correction.

3. Caller audit ✅

CheckLocalEpochRollback signature changed: 6 test callers updated, 0 production callers
CheckNodeIDCollision, both sentinels — unchanged, still 0 callers outside tests
probe-node-id subcommand — unchanged, additive

Remaining open item

🟡 `runEncryptionProbeNodeID` does not handle `flag.ErrHelp` (codex r1 P2 #1, unaddressed)

encryption.go:85-87:

if err := fs.Parse(args); err != nil {
    return errors.Wrap(err, "parse probe-node-id flags")
}

Every other encryption subcommand returns nil for flag.ErrHelp:

runEncryptionStatus (encryption.go:169-172) ✅
runEncryptionRotateDEK (encryption_mutators.go:74) ✅
runEncryptionRegisterWriter (encryption_mutators.go:119) ✅
runEncryptionBootstrap (encryption_mutators.go:261) ✅

probe-node-id alone wraps flag.ErrHelp as an error — elastickv-admin encryption probe-node-id -h exits non-zero, breaking shell scripts that test $? on help invocations.

Fix:

if err := fs.Parse(args); err != nil {
    if errors.Is(err, flag.ErrHelp) {
        return nil
    }
    return errors.Wrap(err, "parse probe-node-id flags")
}

Add a test to match TestRunEncryptionStatus_HelpFlagExitsZero:

func TestRunEncryptionProbeNodeID_HelpFlagExitsZero(t *testing.T) {
    t.Parallel()
    var buf bytes.Buffer
    if err := runEncryptionProbeNodeID([]string{"-h"}, &buf); err != nil {
        t.Errorf("runEncryptionProbeNodeID(-h) returned %v, want nil", err)
    }
}

Fix this →

Summary

The codex P2 "preserve missing-row signal" is correctly implemented — the storageEnvelopeActive split matches §5.2 exactly, all callers are updated, and the new test pins the post-cutover refusal. The Gemini format-width finding is still correctly rejected. The one remaining P2 (help-flag exit code) was not addressed in Round-2; it's a 3-line fix with a corresponding test to match the pattern used in all other subcommands.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

cmd/elastickv-admin/encryption.go (1)
95-97: ⚡ Quick win

Remove the avoidable //nolint:gosec suppression.

Line 96 can avoid the cast entirely while preserving behavior, which keeps lint policy clean.
♻️ Proposed refactor
-	const nodeIDMask = 0xFFFF
-	narrowed := uint16(full & nodeIDMask) //nolint:gosec // masked to 16 bits; matches applier.go convention
+	const nodeIDMask uint64 = 0xFFFF
+	narrowed := full & nodeIDMask
As per coding guidelines, "**/*.go: Use gofmt formatting and pass linters configured in .golangci.yaml ... Avoid //nolint annotations — refactor instead."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/elastickv-admin/encryption.go` around lines 95 - 97, The code suppresses
gosec for a uint16 cast on narrowing full; instead make the mask have the same
unsigned integer type as full (e.g., change nodeIDMask to uint64(0xFFFF) or the
concrete type of full), remove the explicit uint16 cast and the //nolint:gosec
comment, and use narrowed := full & nodeIDMask so the mask preserves type and
behavior when passed to fmt.Fprintf (see nodeIDMask, narrowed and the
fmt.Fprintf call).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/elastickv-admin/encryption.go`:
- Around line 85-87: The parse block in probe-node-id treats all parse errors as
failures; change the logic around fs.Parse(args) so that if it returns
flag.ErrHelp you return nil (success/normal help exit) instead of wrapping it;
for any other error keep returning errors.Wrap(err, "parse probe-node-id
flags"). Locate the fs.Parse call inside the probe-node-id handling code and add
an explicit check for errors.Is(err, flag.ErrHelp) or err == flag.ErrHelp before
wrapping to implement this behavior.

In `@internal/encryption/errors.go`:
- Around line 260-264: Update the stale comment for ErrLocalEpochRollback to
reflect the new pre-cutover vs post-cutover behavior: explain that missing
registry-row is still a skip condition before the cutover (when
storageEnvelopeActive=false or Active.Storage==0/bootstrap not committed) but
after cutover (storageEnvelopeActive=true) the guard now refuses startup if
there is no (full_node_id, active_storage_dek_id) registry row; mention the
related concepts so readers can find the logic (ErrLocalEpochRollback,
storageEnvelopeActive, Active.Storage, RegisterEncryptionWriter) and clarify the
different handling paths for pre-cutover vs post-cutover.

---

Nitpick comments:
In `@cmd/elastickv-admin/encryption.go`:
- Around line 95-97: The code suppresses gosec for a uint16 cast on narrowing
full; instead make the mask have the same unsigned integer type as full (e.g.,
change nodeIDMask to uint64(0xFFFF) or the concrete type of full), remove the
explicit uint16 cast and the //nolint:gosec comment, and use narrowed := full &
nodeIDMask so the mask preserves type and behavior when passed to fmt.Fprintf
(see nodeIDMask, narrowed and the fmt.Fprintf call).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b13ddfd9-60ac-44ae-9744-12c0a245b80c

📥 Commits

Reviewing files that changed from the base of the PR and between 7a05466 and eab8c49.

📒 Files selected for processing (7)

cmd/elastickv-admin/encryption.go
cmd/elastickv-admin/encryption_test.go
internal/encryption/errors.go
internal/encryption/local_epoch_rollback.go
internal/encryption/local_epoch_rollback_test.go
internal/encryption/node_id_collision.go
internal/encryption/node_id_collision_test.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eab8c49dd4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T11:14:38Z

+	if sidecarLocalEpoch <= registryRow.LastSeenLocalEpoch {
+		return pkgerrors.Wrapf(ErrLocalEpochRollback,


Validate registry row belongs to requested full_node_id

CheckLocalEpochRollback looks up by uint16(fullNodeID&nodeIDMask) and then compares epochs without verifying that registryRow.FullNodeID matches the fullNodeID argument. If a different writer has (or previously had) the same 16-bit node ID, this function can treat that foreign row as this node’s rollback anchor and even return nil when sidecarLocalEpoch is greater than the foreign LastSeenLocalEpoch, bypassing the intended missing-row/post-cutover refusal for the real node. This leaves startup-guard behavior inconsistent and can defer failure to apply-time collision handling instead of failing early.

Useful? React with 👍 / 👎.

…lp handling) Two CodeRabbit minor findings from round-2 review. ## CodeRabbit minor #1 (errors.go:260-264) - stale skip-condition doc The ErrLocalEpochRollback skip-conditions comment in errors.go still listed "no registry row" as an unconditional skip. After the round-2 storageEnvelopeActive split, the missing-row case is no longer unconditional: pre-cutover it's a skip (returns nil), post-cutover it's the refusal that the guard exists to catch. Fix: rewrote the skip-conditions paragraph to enumerate just the two unconditional skips (encryption disabled, no active storage DEK) and cross-reference local_epoch_rollback.go for the §5.2 missing-row split. The comprehensive doc on CheckLocalEpochRollback is the authoritative source; errors.go now points there. ## CodeRabbit minor #2 (encryption.go:85-87) - ErrHelp handling probe-node-id wrapped flag.ErrHelp as an error, breaking the exit-zero-on-help convention used by every other encryption subcommand: - runEncryptionStatus (encryption.go:169-172) - runEncryptionRotateDEK (encryption_mutators.go:74) - runEncryptionRegisterWriter (encryption_mutators.go:119) - runEncryptionBootstrap (encryption_mutators.go:261) Operators running `elastickv-admin encryption probe-node-id -h` would get exit code 2 (the flag package default for help) which breaks shell scripts that test $?. Fix: added the standard `if errors.Is(err, flag.ErrHelp) { return nil }` short-circuit, matching the existing pattern. Added TestRunEncryptionProbeNodeID_HelpFlagExitsZero to pin the behavior. ## Verification - go test -race -timeout=60s ./internal/encryption/ ./cmd/elastickv-admin/ - PASS (24 tests total in these files) - golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ - 0 issues No semantic changes to function signatures or behavior outside of the help-flag exit code.

bootjp · 2026-05-18T11:15:32Z

@claude review

Round-3 (commit 189cd596): closed out two CodeRabbit minors from round-2.

CodeRabbit minor — `ErrLocalEpochRollback` doc stale (errors.go)

After the round-2 storageEnvelopeActive split, the missing-row case is no longer an unconditional skip — pre-cutover it's a skip, post-cutover it's the refusal. Rewrote the skip-conditions comment in errors.go to enumerate only the two unconditional skips and cross-reference local_epoch_rollback.go for the §5.2 missing-row split.

CodeRabbit minor — `probe-node-id -h` exit code

The subcommand wrapped flag.ErrHelp as an error, breaking the exit-zero-on-help convention used by every other encryption subcommand. Added the standard if errors.Is(err, flag.ErrHelp) { return nil } short-circuit + TestRunEncryptionProbeNodeID_HelpFlagExitsZero pinning test.

Both were CodeRabbit minor (below P0/P1 threshold) but tiny fixes that close out the round.

No semantic changes to function signatures outside the help-flag exit code.

claude · 2026-05-18T11:15:46Z

Claude finished @bootjp's task in 3m 13s —— View job

Stage 6D-2 Code Review — Round 3

Tasks:

Gather context on Round-3 changes
Read errors.go — validate stale doc fix
Read encryption.go — validate flag.ErrHelp fix + test
Check outstanding codex P2 (registry row full_node_id validation)
Post final review

Review

1. `ErrLocalEpochRollback` doc fix ✅

errors.go:260-271 now correctly describes the pre/post-cutover split: "The missing-registry-row case is NOT an unconditional skip" with the two branches enumerated. Cross-reference to local_epoch_rollback.go is correct. The prior stale phrasing is gone. Done.

2. `probe-node-id -h` exit code fix ✅

encryption.go:85-96: The flag.ErrHelp short-circuit is correctly placed with an explanatory comment that cites the convention across the other subcommands. TestRunEncryptionProbeNodeID_HelpFlagExitsZero at encryption_test.go:550-557 pins -h → nil. The pattern matches runEncryptionStatus exactly. Done.

Open: Codex round-2 P2 — `registryRow.FullNodeID` mismatch (not addressed in round-3)

local_epoch_rollback.go:107-113: after DecodeRegistryValue the code compares epochs without verifying the decoded registryRow.FullNodeID == fullNodeID.

The code in question:

registryRow, err := DecodeRegistryValue(rawVal)
// ... error check ...
if sidecarLocalEpoch <= registryRow.LastSeenLocalEpoch {
    return pkgerrors.Wrapf(ErrLocalEpochRollback, ...)
}

Why this matters: the registry key is RegistryKey(dekID, uint16(fullNodeID & 0xFFFF)) — keyed by the 16-bit truncation. If a node with a colliding full_node_id was previously a cluster member and has since been removed, its registry row persists under the same key. CheckNodeIDCollision reads the current route-catalog snapshot and would not see the departed node's full_node_id, so the collision check would pass. CheckLocalEpochRollback would then read the stale row from the departed node and compare epochs against it — a comparison that is semantically meaningless for the new node and could either falsely fire ErrLocalEpochRollback (if departed node's epoch > new node's sidecar epoch) or silently pass (if departed node's epoch < new node's sidecar epoch), defeating the missing-row post-cutover refusal (storageEnvelopeActive=true).

Fix — 4 lines after the DecodeRegistryValue call:

registryRow, err := DecodeRegistryValue(rawVal)
if err != nil {
    return pkgerrors.Wrapf(err, ...)
}
// Row belongs to a different writer under the same 16-bit slot (e.g. departed node).
// From this node's perspective the slot is unoccupied; apply the missing-row logic.
if registryRow.FullNodeID != fullNodeID {
    if storageEnvelopeActive {
        return pkgerrors.Wrapf(ErrLocalEpochRollback,
            "writer-registry slot for full_node_id=%#x dek_id=%d is occupied by a different full_node_id=%#x; cannot prove nonce monotonicity",
            fullNodeID, activeStorageDEKID, registryRow.FullNodeID)
    }
    return nil
}
if sidecarLocalEpoch <= registryRow.LastSeenLocalEpoch {

Add a companion test TestCheckLocalEpochRollback_MismatchedFullNodeID with sub-cases for pre/post-cutover to match the existing _NoRow_PreCutover / _NoRow_PostCutover pair.

Runtime risk today: zero — both guards are operator-inert until 6D-6. This can be fixed in this PR or explicitly deferred to 6D-6 with a // NOTE: caller must ensure CheckNodeIDCollision passed first comment. Given the defense-in-depth posture of the entire guard system, fixing it here is preferred.

Fix this →

Open nitpick: `//nolint:gosec` in `encryption.go:106` (CLAUDE.md)

encryption.go:106:

const nodeIDMask = 0xFFFF
narrowed := uint16(full & nodeIDMask) //nolint:gosec

CLAUDE.md requires "Avoid //nolint — refactor instead." For the CLI code, the nolint can be eliminated cleanly because narrowed is only passed to fmt.Fprintf (no uint16-typed callee):

const nodeIDMask uint64 = 0xFFFF
narrowed := full & nodeIDMask  // uint64 bounded to 16 bits; no cast needed

local_epoch_rollback.go:83 is harder: RegistryKey takes uint16, so the narrowing cast to uint16 is required by the callee. If the project wants to eliminate the nolint there too, the cleanest approach is a package-level func narrowNodeID(full uint64) uint16 in registry.go with a single nolint — reducing duplication and centralizing the policy. But that can be a follow-up in any of 6D-3 through 6D-6 when the nolint pattern appears in more places.

For now, encryption.go:106 is the actionable one-line refactor; local_epoch_rollback.go:83 can wait.

Fix this →

Summary

The two round-3 fixes are correct: the errors.go doc now accurately describes the pre/post-cutover missing-row split, and probe-node-id -h exits 0 per the established subcommand convention. One codex P2 from round-2 remains open: CheckLocalEpochRollback does not verify registryRow.FullNodeID == fullNodeID, which can produce a wrong result for a node whose 16-bit slot was previously occupied by a since-departed peer. The fix is 4 lines + 1 test, and since the PR is operator-inert until 6D-6 there is no runtime urgency — but addressing it here keeps the primitive self-contained.

…ullNodeID match) Codex r3 P2 on PR788: CheckLocalEpochRollback looked up the registry row by uint16(fullNodeID&nodeIDMask) and compared epochs without verifying that the decoded RegistryValue's FullNodeID matched the caller's argument. If a different writer previously occupied the same 16-bit slot (historical occupancy that the route-catalog watcher would not surface to ErrNodeIDCollision), this function would treat that foreign row as THIS node's rollback anchor. When sidecarLocalEpoch happens to exceed the foreign LastSeenLocalEpoch, the function would return nil and the missing-row-post-cutover refusal would be bypassed. ## Fix: defense-in-depth FullNodeID equality check After decoding the registry row, compare registryRow.FullNodeID against the caller's fullNodeID. On mismatch, route through the existing 5.2 split via a new shared helper: - missingRegistryRowResult(storageEnvelopeActive, ...) - pre-cutover allow startup, post-cutover refuse with ErrLocalEpochRollback. The mismatch is treated as "no row for THIS node" because the node's rollback anchor genuinely doesn't exist - the foreign writer's anchor doesn't apply to THIS node's monotonicity contract. This is layered defense: ErrNodeIDCollision is the primary guard for active collisions, but a since-removed member can leave its row behind. The FullNodeID check handles that case correctly without depending on ErrNodeIDCollision's route-catalog snapshot. ## Test coverage Two new tests pin the foreign-row scenarios: - TestCheckLocalEpochRollback_ForeignRowPreCutover - foreign row + pre-cutover -> allow startup (next OpRegistration will overwrite via 4.1 case 1). - TestCheckLocalEpochRollback_ForeignRowPostCutover - foreign row + post-cutover -> refuse with ErrLocalEpochRollback AND uses the missing-row context message ("storage_envelope_active=true"). Includes the "sidecar > foreign.LastSeen" sub-case which would have masked the bug without the FullNodeID check. The existing TestCheckLocalEpochRollback_NarrowingMatchesShippedConvention was confusingly using the foreign-row scenario as its fixture. Rewrote it to use the SAME full_node_id for seed and lookup (still confirms the high-bit-aware uint16 narrowing), and moved the cross-full_node_id case to the dedicated foreign-row tests where it belongs. ## Caller audit (per cron directive) CheckLocalEpochRollback's exported signature is unchanged - the semantic change is internal (a new branch that routes through the existing 5.2 split). No production callers (operator-inert per 6D-2). All 6 existing test callers continue to work unchanged. ## Verification - go test -race -timeout=60s ./internal/encryption/ ./cmd/elastickv-admin/ - PASS (29 tests total in these files) - golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ - 0 issues

bootjp · 2026-05-18T11:20:28Z

@claude review

Round-4 (commit 2ca075ac): addressed codex r3 P2.

codex P2 — registry row FullNodeID mismatch

Defense-in-depth: after decoding the registry row, verify registryRow.FullNodeID == fullNodeID. On mismatch (historical occupant of the same 16-bit slot — not catchable by ErrNodeIDCollision because the route-catalog watcher doesn't see since-removed members), route through the same §5.2 split as the no-row case:

pre-cutover → allow startup (4.1 case-1 will overwrite the foreign row on next OpRegistration)
post-cutover → refuse with ErrLocalEpochRollback + missing-row context

Without this check, a sidecar that happens to be ahead of the foreign writer's LastSeenLocalEpoch would return nil and bypass the missing-anchor refusal.

Tests

Two new pinning tests:

ForeignRowPreCutover — foreign row + pre-cutover → nil
ForeignRowPostCutover — foreign row + post-cutover → ErrLocalEpochRollback with "storage_envelope_active=true" message. Includes the sidecar>foreign-LastSeen sub-case which would have masked the bug pre-fix.

The existing narrowing test was inadvertently exercising the foreign-row case as its fixture. Rewrote it to use the same full_node_id for seed + lookup (preserves the high-bit-aware narrowing assertion).

Caller audit

Signature unchanged; internal branch added. No production callers. All 6 existing test callers still work.

Verification

go test -race ./internal/encryption/ ./cmd/elastickv-admin/ — PASS
golangci-lint run ./internal/encryption/ ./cmd/elastickv-admin/ — 0 issues

claude · 2026-05-18T11:20:46Z

Claude finished @bootjp's task in 1m 42s —— View job

Stage 6D-2 Code Review — Round 4

Tasks:

Gather context on Round-4 changes
Read local_epoch_rollback.go — validate FullNodeID mismatch fix
Read local_epoch_rollback_test.go — validate new pinning tests
Verify narrowing test fixture rewrite
Check any remaining open items from prior rounds
Post final review

Review

1. `missingRegistryRowResult` helper ✅

Extracting the shared §5.2 split into missingRegistryRowResult (lines 82-89) is a clean DRY move. The function's doc comment at lines 76-81 correctly identifies both call sites — the !ok (no-row from Pebble) path and the new FullNodeID-mismatch path — so the invariant is documented once and applied consistently in both branches. ✅

2. FullNodeID mismatch guard ✅

local_epoch_rollback.go:128-130:

if registryRow.FullNodeID != fullNodeID {
    return missingRegistryRowResult(storageEnvelopeActive, fullNodeID, activeStorageDEKID)
}

The comment at lines 114-127 correctly explains:

Why the collision guard does not catch this case (route-catalog watcher sees only current members; historical occupants leave a row behind).
Pre-cutover behaviour (allow startup; §4.1 case-1 overwrites the foreign row on the next OpRegistration).
Post-cutover behaviour (refuse; no anchor for THIS node).

Routing through missingRegistryRowResult means the message the operator sees is the same "missing anchor" diagnostic regardless of whether the slot was empty or occupied by a departed peer — which is semantically correct (from this node's perspective, the anchor is absent either way). ✅

3. `ForeignRowPostCutover` test sub-case ✅

local_epoch_rollback_test.go:210-216 adds the exact scenario that was the risk before the fix: sidecar=100 > foreign.LastSeen=5. Pre-fix this would have fallen through to the epoch comparison, returned nil (sidecar > foreign), and the missing anchor would have gone undetected. The test comment at lines 208-215 names the failure mode precisely. ✅

4. Narrowing test fixture rewrite ✅

TestCheckLocalEpochRollback_NarrowingMatchesShippedConvention now seeds and looks up with const fullID uint64 = 0xDEADBEEF_0000AAAA — the same value for both — so it tests the key derivation convention without incidentally exercising the foreign-row path. The comment at lines 233-234 calls out the separation explicitly. ✅

5. Prior-round open items status

Item	Status
`strconv.ParseUint` silent-prefix fix (r1)	✅ Fixed in r1
`go doc` attribution for `CheckNodeIDCollision` (r1)	✅ Fixed in r1
`storageEnvelopeActive` §5.2 split (r2)	✅ Fixed in r2
`errors.go` stale doc (r3)	✅ Fixed in r3
`probe-node-id -h` exit code (r3)	✅ Fixed in r3
`registryRow.FullNodeID` mismatch (r3)	✅ Fixed in r4 (this round)
`//nolint:gosec` in `encryption.go:106` (r3 nitpick)	⚪ Not yet addressed

One open nitpick: `//nolint:gosec` in `encryption.go:106`

CodeRabbit raised this in round-3 (and I noted it) — CLAUDE.md requires "avoid //nolint — refactor instead". The fix remains the same 2-line change:

// Before
const nodeIDMask = 0xFFFF
narrowed := uint16(full & nodeIDMask) //nolint:gosec

// After — mask typed uint64; no cast needed; fmt.Fprintf accepts any integer
const nodeIDMask uint64 = 0xFFFF
narrowed := full & nodeIDMask

The local_epoch_rollback.go:98 and node_id_collision.go:59 nolints are harder — their callees require uint16 — so those are reasonably deferred to the narrowNodeID(uint64) uint16 helper cleanup in 6D-3+. But encryption.go:106 has no uint16-typed callee (narrowed feeds only fmt.Fprintf) and is the clean 2-liner. Either fix it here or explicitly accept a follow-up cleanup PR for all three at once.

Fix this →

Summary

Round-4 is correct. The FullNodeID mismatch guard is sound: missingRegistryRowResult cleanly centralises the §5.2 split, the mismatch check fires before the epoch comparison, and the sub-case that would have masked the anchor absence (sidecar > foreign.LastSeen) is explicitly pinned by the test. The narrowing test rewrite correctly separates "key derivation correctness" from "foreign-row handling". All five prior-round findings are now resolved. The one remaining item is the encryption.go:106 nolint — either close it with a 2-line fix or batch it with the narrowNodeID cleanup deferred to 6D-3+.

…ude r4 deferred item) Use the uint64 mask result directly for %#04x / %d formatting; the uint16 cast was only there for display and pulled in a gosec G115 nolint that did not exist anywhere else in the operator-tool path. The single-source narrow-and-cast convention (applier.go:230, local_epoch_rollback.go:98, node_id_collision.go:59) is unaffected - those sites genuinely need uint16 to build the 6-byte registry key. Display-only sites do not, and now do not pay the nolint cost.

bootjp · 2026-05-19T20:25:19Z

@claude review

Round-5 (commit 5aaa3e0b): closed out the deferred encryption.go:106 //nolint:gosec from round-4.

Change

// Before
const nodeIDMask = 0xFFFF
narrowed := uint16(full & nodeIDMask) //nolint:gosec // masked to 16 bits; matches applier.go convention
fmt.Fprintf(out, "...node_id: %#04x (%d)\n", full, full, narrowed, narrowed)

// After
const nodeIDMask = 0xFFFF
// keep masked as uint64: the value is only formatted for display, ...
masked := full & nodeIDMask
fmt.Fprintf(out, "...node_id: %#04x (%d)\n", full, full, masked, masked)

%#04x / %d format identically on uint64 and uint16 for values in [0, 0xFFFF], so output bytes are unchanged (node_id: 0x1234 and node_id: 0x5678 test fixtures both still match exactly).

Why this site and not the others

The 3 production sites still keep the convention because they need the actual uint16 type:

applier.go:230 — building the 6-byte registry key (dekID || uint16(full_node_id & 0xFFFF))
local_epoch_rollback.go:98 — same registry-key derivation
node_id_collision.go:59 — building the map[uint16]uint64 collision-detection map

encryption.go:106 is the only display-only site. The narrowing exists purely to show the operator the 16-bit slice in human-readable output, and the uint64 mask result formats identically — so the cast (and the nolint:gosec) was load-bearing nowhere.

Verification

go test ./cmd/elastickv-admin/... — all probe-node-id tests pass with unchanged output bytes (TestRunEncryptionProbeNodeID_Hex expects node_id: 0x1234, TestRunEncryptionProbeNodeID_Decimal expects node_id: 0x5678)
golangci-lint run cmd/elastickv-admin/... — 0 issues
No semantic change to caller behavior; runEncryptionProbeNodeID is a leaf and is invoked only via the probe-node-id subcommand dispatcher

claude · 2026-05-19T20:25:32Z

Claude finished @bootjp's task in 2m 8s —— View job

Stage 6D-2 Code Review — Round 5

Tasks:

Gather context on Round-5 changes
Read cmd/elastickv-admin/encryption.go — validate nolint removal + variable rename
Read cmd/elastickv-admin/encryption_test.go — verify test fixtures unchanged
Check all prior-round items resolved
Post final review

Review

1. `//nolint:gosec` removal ✅

encryption.go:105-109:

// Before (Round-4)
const nodeIDMask = 0xFFFF
narrowed := uint16(full & nodeIDMask) //nolint:gosec
fmt.Fprintf(out, "...%#04x (%d)\n", ..., narrowed, narrowed)

// After (Round-5)
const nodeIDMask = 0xFFFF
// keep masked as uint64: the value is only formatted for display...
masked := full & nodeIDMask
fmt.Fprintf(out, "...%#04x (%d)\n", ..., masked, masked)

nodeIDMask is an untyped constant 0xFFFF. In full & nodeIDMask, full is uint64, so the constant is promoted to uint64 and masked is uint64 bounded to [0, 0xFFFF] by construction. No explicit cast, no gosec warning, CLAUDE.md's no-nolint rule is satisfied.

2. Output format invariant ✅

The PR description correctly claims %#04x (%d) produces identical output for uint64 and uint16 at the same value. Go's fmt package formats the numeric value, not the type — uint64(0x1234) and uint16(0x1234) both produce 0x1234 under %#04x. The test fixtures pin this:

TestRunEncryptionProbeNodeID_Hex expects "node_id: 0x1234" ✅
TestRunEncryptionProbeNodeID_Decimal expects "node_id: 0x5678" ✅

No output bytes changed.

3. Variable rename `narrowed` → `masked` ✅

The rename is accurate: the variable is no longer a narrowed uint16 — it's a uint64 masked to 16 bits. The comment on line 106 explains the WHY (the cast was not load-bearing at a display-only site), which meets the "non-obvious constraint" bar from CLAUDE.md.

4. Production-site nolints correctly retained ✅

The PR description is correct that the three remaining sites must keep uint16:

Site	Reason
`local_epoch_rollback.go:98`	`RegistryKey(dekID, uint16)` signature
`node_id_collision.go:59`	`map[uint16]uint64` key type
`applier.go:230`	same registry-key derivation

For these sites, a narrowNodeID(uint64) uint16 helper would consolidate the nolint to one location — but that's a deferred follow-up in 6D-3+, as the prior round established.

5. Prior-round item status

Round	Item	Status
R1	`strconv.ParseUint` silent-prefix	✅
R1	`go doc` attribution for `CheckNodeIDCollision`	✅
R2	`storageEnvelopeActive` §5.2 missing-row split	✅
R3	`errors.go` stale skip-condition doc	✅
R3	`probe-node-id -h` exits 0	✅
R4	`registryRow.FullNodeID` mismatch guard	✅
R4	`encryption.go:106` `//nolint:gosec`	✅ this round

All rounds resolved. The two remaining //nolint:gosec annotations in local_epoch_rollback.go:98 and node_id_collision.go:59 are correctly deferred to the narrowNodeID helper cleanup in 6D-3+.

Summary

Round-5 is correct. The //nolint:gosec is eliminated by keeping masked as uint64 — the only display-only site — with no change to output bytes. All seven prior-round findings are resolved. No new issues. This PR is ready to merge.

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread cmd/elastickv-admin/encryption.go

Comment thread internal/encryption/errors.go Outdated

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

		if _, err := fmt.Fprintf(out, "full_node_id: %#016x (%d)\nnode_id: %#04x (%d)\n",
		full, full, narrowed, narrowed); err != nil {

		if err := fs.Parse(args); err != nil {
		return errors.Wrap(err, "parse probe-node-id flags")

		if _, err := fmt.Sscanf(s[2:], "%x", &v); err != nil {
		return 0, errors.Wrap(err, "hex parse")

		if sidecarLocalEpoch <= registryRow.LastSeenLocalEpoch {
		return pkgerrors.Wrapf(ErrLocalEpochRollback,

Conversation

bootjp commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this ships

Two new sentinel errors (internal/encryption/errors.go)

Two pure-function guard primitives

CLI: elastickv-admin encryption probe-node-id

Test coverage

Caller audit

Five-lens self-review

Verification

Plan

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Uh oh!

bootjp commented May 18, 2026

Uh oh!

claude Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stage 6D-2 Code Review

Review

1. Narrowing Consistency ✅

2. Strict-Ahead Invariant ✅

3. Caller Audit ✅

4. Test Coverage ✅ (with minor count discrepancy)

5. Operator-Inert Contract ✅

Issues Found

🔴 parseUint64WithRadix silently accepts partial input

🟡 const minMembersForCollision placement breaks go doc attribution

Summary

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 18, 2026

claude high — parseUint64WithRadix silent-prefix bug

claude medium — go-doc attribution

Gemini r1 format-width finding — verified false, no change

Verification

Uh oh!

claude Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 18, 2026

codex P2 — preserve missing-row signal in CheckLocalEpochRollback

Fix

Caller audit (SEMANTIC CHANGE per cron directive)

Gemini r1 format-width finding — verified false, no change

Verification

Uh oh!

bootjp commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Two new sentinel errors (`internal/encryption/errors.go`)

CLI: `elastickv-admin encryption probe-node-id`

coderabbitai Bot commented May 18, 2026 •

edited

Loading

claude Bot commented May 18, 2026 •

edited

Loading

🔴 `parseUint64WithRadix` silently accepts partial input

🟡 `const minMembersForCollision` placement breaks `go doc` attribution

claude high — `parseUint64WithRadix` silent-prefix bug

claude Bot commented May 18, 2026 •

edited

Loading

claude Bot commented May 18, 2026 •

edited

Loading

1. `storageEnvelopeActive` fix (codex r1 P2) ✅

🟡 `runEncryptionProbeNodeID` does not handle `flag.ErrHelp` (codex r1 P2 #1, unaddressed)

CodeRabbit minor — `ErrLocalEpochRollback` doc stale (errors.go)

CodeRabbit minor — `probe-node-id -h` exit code

claude Bot commented May 18, 2026 •

edited

Loading

1. `ErrLocalEpochRollback` doc fix ✅

2. `probe-node-id -h` exit code fix ✅

Open: Codex round-2 P2 — `registryRow.FullNodeID` mismatch (not addressed in round-3)

Open nitpick: `//nolint:gosec` in `encryption.go:106` (CLAUDE.md)

claude Bot commented May 18, 2026 •

edited

Loading

1. `missingRegistryRowResult` helper ✅

3. `ForeignRowPostCutover` test sub-case ✅

One open nitpick: `//nolint:gosec` in `encryption.go:106`

claude Bot commented May 19, 2026 •

edited

Loading