Skip to content

feat(encryption): Stage 6E-1b — EnableRaftEnvelope admin RPC + elastickv-admin CLI#907

Merged
bootjp merged 4 commits into
mainfrom
feat/encryption-6e-1b-admin-rpc-and-cli
Jun 2, 2026
Merged

feat(encryption): Stage 6E-1b — EnableRaftEnvelope admin RPC + elastickv-admin CLI#907
bootjp merged 4 commits into
mainfrom
feat/encryption-6e-1b-admin-rpc-and-cli

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Jun 2, 2026

Summary

Stage 6E-1b — operator-facing trigger for the Phase-2 raft cutover, mirroring the 6D-6 EnableStorageEnvelope path. The FSM apply machinery shipped in #899 (Stage 6E-1a); this PR adds the proto RPC, server method, and elastickv-admin enable-raft-envelope subcommand so operators can actually invoke the cutover.

Operator-inert by design: with only 6E-1 (a + b) deployed, the cutover entry advances the sidecar's RaftEnvelopeCutoverIndex but no Raft proposal is wrapped and no apply is unwrapped — those land atomically in 6E-2.

What changed

  • proto/encryption_admin.proto — new EnableRaftEnvelope RPC + EnableRaftEnvelopeRequest / EnableRaftEnvelopeResponse messages. The response omits the storage variant's cutover_index_unknown defensive field because RaftEnvelopeCutoverIndex != 0 IS the active sentinel — there's no separate bool to defend against, and 6E-1a already fail-closes on raftIdx == 0 before any sidecar mutation.
  • adapter/encryption_admin.goEnableRaftEnvelope server method. Same semaphore / pre-check / fan-out / propose / post-check sequence as EnableStorageEnvelope. Variant-specific differences: target Purpose=PurposeRaft, source slot Active.Raft, active-sentinel field RaftEnvelopeCutoverIndex.
  • cmd/elastickv-admin/enable-raft-envelope subcommand (parser/runner/printer). Output prefixes (enabled / already-active) match the storage subcommand for script compatibility. No warning line — no defensive cutover_index_unknown field.
  • docs/design/2026_05_31_proposed_6e_enable_raft_envelope.mdpartial: milestone 6E-1 (both sub-milestones) is now complete. 6E-2 and 6E-3 remain.

Protoc version skew (documented)

The pinned protoc toolchain (libprotoc 29.3) is not available in this dev environment; locally installed is libprotoc 34.0. Empirical comparison: a no-op regen against the unmodified proto file with both versions produces byte-identical output except the // protoc vX.Y.Z version comment in the file header. The version comment in this PR has been reverted to the pinned 29.3 output, so the diff is purely the new additive content. CI/release should re-run make gen with the pinned 29.3 toolchain to confirm the generated artifacts.

Test plan

  • go test ./adapter/ ./cmd/elastickv-admin/ ./proto/ ./internal/encryption/ (without -race): all pass.
  • Focused -run 'EnableStorageEnvelope|EnableRaftEnvelope' race-tests in adapter: pass (0.571s).
  • go test ./kv/... (FSM apply chain): pass.
  • golangci-lint --config=.golangci.yaml run on all impacted packages: 0 issues.
  • Note: full go test -race ./adapter/... times out at 10 min on this hardware due to existing slow integration tests in adapter; not specific to this PR. The new tests themselves are quick unit tests.

Self-review (per CLAUDE.md)

  1. Data loss — no replication / FSM apply path changes; new server method only proposes through the existing proposeEncryptionEntry seam (same as EnableStorageEnvelope).
  2. ConcurrencyacquireCutoverSemaphore reused verbatim from the storage variant; ctx-aware wait, pre-canceled short-circuit, and serialization properties inherited (those branches are tested by the storage suite, which exercises the same code paths).
  3. Performance — one extra entry in proto type registry; no hot-path impact.
  4. Data consistency — proposal shape (Purpose=PurposeRaft, DEKID=Active.Raft, len(Wrapped)==0) re-validated at apply by 6E-1a's validateEnableRaftEnvelopePayload. Server and applier agree on the constraints.
  5. Test coverage — 5 new server tests (zero proposer ID, oversized epoch, not-bootstrapped, idempotent retry, happy path) + 5 new CLI tests + 1 dispatch test. The cancellation / serialization / fan-out branches reuse code paths already covered by the storage variant's test suite.

Caller audit (semantic-change directive)

No semantic changes to existing functions — purely new additions. The new EnableRaftEnvelope server method has a single transport-layer caller (gRPC dispatch), and the new CLI subcommand has a single dispatch entry in encryptionSubcommands. Both are exercised by the new tests.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added enable-raft-envelope CLI command enabling Raft envelope encryption cutover.
    • New RPC endpoint supports idempotent cutover operations with per-member capability tracking.
  • Documentation

    • Updated implementation status documentation reflecting Phase-1 completion for Raft envelope encryption feature.

…ckv-admin CLI subcommand

Mirror of the 6D-6 EnableStorageEnvelope path for the raft variant:
proto definition, server method, CLI subcommand, tests.

## What lands

- proto/encryption_admin.proto — new `EnableRaftEnvelope` RPC +
  `EnableRaftEnvelopeRequest` / `EnableRaftEnvelopeResponse`
  messages. The response omits the storage variant's
  `cutover_index_unknown` defensive field because
  `RaftEnvelopeCutoverIndex != 0` IS the active sentinel (no
  separate bool to defend against, and 6E-1a already fail-closes
  on `raftIdx == 0` before any sidecar mutation).
- adapter/encryption_admin.go — `EnableRaftEnvelope` server method
  with the same semaphore / pre-check / fan-out / propose /
  post-check sequence as EnableStorageEnvelope. Differences:
  target Purpose=PurposeRaft, source slot Active.Raft,
  active-sentinel field RaftEnvelopeCutoverIndex.
- cmd/elastickv-admin — `enable-raft-envelope` subcommand
  (parser, runner, printer). Output prefixes (`enabled` /
  `already-active`) mirror the storage subcommand for script
  compatibility; no warning line because the variant has no
  defensive cutover_index_unknown field.
- Tests: 5 server tests covering raft-specific branches
  (zero proposer ID, oversized epoch, not-bootstrapped, idempotent
  retry, happy path) + 5 CLI tests + 1 dispatch test.
- docs/design/2026_05_31_proposed_6e_enable_raft_envelope.md →
  partial; 6E-1 (both 6E-1a + 6E-1b sub-milestones) is now
  complete. 6E-2 and 6E-3 remain.

## Protoc version skew (documented)

This PR was generated against locally-installed `libprotoc 34.0`
because the pinned `libprotoc 29.3` is not available in the dev
environment. Empirical comparison (no-op regen on the unmodified
proto file with both versions) showed the only difference is the
`// protoc vX.Y.Z` version comment in the file header — the
generated rawDesc bytes and all code are byte-identical. The
version comment in this PR has been reverted to the pinned 29.3
output so the diff is purely the new additive content. CI/release
should re-run `make gen` with the pinned 29.3 toolchain to
confirm.

## Self-review

1. Data loss — no replication / FSM apply path changes; new
   server method only proposes through the existing
   proposeEncryptionEntry seam.
2. Concurrency — semaphore + acquireCutoverSemaphore reused
   verbatim from EnableStorageEnvelope; ctx-aware wait,
   pre-canceled short-circuit, and serialization properties
   inherited.
3. Performance — one extra map entry in proto type registry;
   no hot-path impact.
4. Data consistency — proposal shape (Purpose=PurposeRaft,
   DEKID=Active.Raft, len(Wrapped)==0) re-validated at apply
   in 6E-1a's `validateEnableRaftEnvelopePayload`; the server
   and applier agree on the constraints.
5. Test coverage — 5 new server tests + 5 new CLI tests + 1
   dispatch test. Operator-inertness guarantee preserved: 6E-1
   is a no-op until 6E-2 ships engine unwrap + coord wrap.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 43 minutes and 23 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5625f661-77c4-4861-b57d-513c918e21a7

📥 Commits

Reviewing files that changed from the base of the PR and between 67e8826 and 7c4403b.

📒 Files selected for processing (4)
  • adapter/encryption_admin.go
  • adapter/encryption_admin_test.go
  • cmd/elastickv-admin/encryption.go
  • docs/design/2026_05_31_partial_6e_enable_raft_envelope.md
📝 Walkthrough

Walkthrough

This PR implements the EnableRaftEnvelope RPC for Stage 6E-1 Raft envelope Phase 2 cutover. It adds protobuf message definitions, server-side orchestration with precheck/propose/postcheck semantics, comprehensive unit and integration tests, and CLI command integration with input validation and response rendering.

Changes

Raft Envelope Cutover RPC Implementation

Layer / File(s) Summary
Protocol contract
proto/encryption_admin.proto
EnableRaftEnvelopeRequest (proposer identity/epoch) and EnableRaftEnvelopeResponse (applied_index, capability_summary, was_already_active) message types and the EnableRaftEnvelope RPC method are added to the EncryptionAdmin service.
Server-side RPC implementation
adapter/encryption_admin.go
The EnableRaftEnvelope handler orchestrates cutover: acquires/releases the cutover semaphore, runs Raft-specific precheck (validates request, reads sidecar, checks bootstrap gate Active.Raft != 0, returns idempotent response if RaftEnvelopeCutoverIndex != 0), executes capability fan-out, proposes the Raft cutover rotation (RotateSubEnableRaftEnvelope with empty Wrapped, DEKID from sidecar active raft), and performs postcheck (re-reads sidecar to distinguish stale-DEKID races from fresh success). Response constructors handle both idempotent-retry and fresh-success paths, projecting capability verdicts on success.
Server tests
adapter/encryption_admin_test.go
Tests validate gRPC boundary checks (rejecting zero proposer node ID and oversized local epoch), bootstrapping gate (rejecting unbootstrapped sidecar with FailedPrecondition), idempotent retry semantics (confirming WasAlreadyActive=true, no fan-out/propose calls), and happy-path end-to-end flow with applying proposer (validating response indices, capability summary, and decoded rotation proposal structure with PurposeRaft and empty wrapped bytes).
CLI command integration
cmd/elastickv-admin/encryption.go, cmd/elastickv-admin/encryption_mutators.go
The enable-raft-envelope subcommand is registered and routes to runEncryptionEnableRaftEnvelope. The handler parses --proposer-node-id (non-zero) and --proposer-local-epoch (16-bit bound), invokes the RPC, and renders the response as already-active or enabled with applied_index and optional capability summary; unlike the storage variant, it omits the cutover-index-unknown warning.
CLI tests
cmd/elastickv-admin/encryption_test.go
The stub mutation server is extended to mock EnableRaftEnvelope. Tests cover happy-path output, idempotent-retry behavior (no capability header, no warning line), CLI-side input validation (zero proposer node ID, epoch out-of-bounds), and subcommand routing verification.
Design documentation
docs/design/2026_05_31_partial_6e_enable_raft_envelope.md
Stage 6E status is updated from proposed to partial; an implementation-status table marks 6E-1a and 6E-1b as shipped and 6E-2/6E-3 as not started; narrative describes operator-facing capabilities (trigger Phase-2 cutover, FSM records cutover index atomically with proposer registration) and notes the flip remains inert until 6E-2 lands.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • bootjp/elastickv#899: Applies the RotateSubEnableRaftEnvelope rotation to set RaftEnvelopeCutoverIndex idempotently, completing the end-to-end Stage 6E-1 cutover when paired with this PR's propose side.
  • bootjp/elastickv#820: Implements the storage-envelope variant (EnableStorageEnvelope) using the same CLI infrastructure and server-side orchestration pattern, serving as a structural reference for the raft variant.
  • bootjp/elastickv#812: Implements EnableStorageEnvelope on the server side with the same precheck/propose/postcheck pattern that this PR mirrors for Raft, establishing the cutover RPC semantics.

Poem

🐰 With raft envelopes now aligned,
The cutover flow comes into mind—
Semaphore guards the leader's dance,
While idempotent retries enhance.
Phase 2 awaits, but first we've won
The proposal path; the work's begun! 🔐

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding the EnableRaftEnvelope admin RPC and its elastickv-admin CLI integration for Stage 6E-1b.
Docstring Coverage ✅ Passed Docstring coverage is 96.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/encryption-6e-1b-admin-rpc-and-cli

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Stage 6E-1b of the Phase-2 cutover, introducing the EnableRaftEnvelope admin RPC and its corresponding elastickv-admin enable-raft-envelope CLI subcommand. This allows operators to trigger the transition from cleartext Raft proposals to envelope-encrypted proposals. The changes include the server-side orchestration, request validation, idempotent retry handling, CLI integration, and comprehensive unit tests. A review comment points out that the check for appliedIndex == 0 in freshRaftCutoverResponse is unreachable dead code because the caller already verifies that RaftEnvelopeCutoverIndex is non-zero, suggesting a simplification of the function.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread adapter/encryption_admin.go Outdated
Comment on lines +1197 to +1205
appliedIndex := sc.RaftEnvelopeCutoverIndex
if appliedIndex == 0 {
appliedIndex = proposedIdx
}
return &pb.EnableRaftEnvelopeResponse{
AppliedIndex: appliedIndex,
CapabilitySummary: projectCapabilityVerdicts(fanoutResult.Verdicts),
WasAlreadyActive: false,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In freshRaftCutoverResponse, the check if appliedIndex == 0 is unreachable dead code. This is because freshRaftCutoverResponse is only called from raftCutoverPostcheck after verifying that postSidecar.RaftEnvelopeCutoverIndex != 0 (lines 1151-1154). Since RaftEnvelopeCutoverIndex is the sole sentinel for the active state in the raft variant (unlike the storage variant which has a separate boolean flag), appliedIndex is guaranteed to be non-zero here. We should simplify this function by removing the redundant check.

	return &pb.EnableRaftEnvelopeResponse{
		AppliedIndex:      sc.RaftEnvelopeCutoverIndex,
		CapabilitySummary: projectCapabilityVerdicts(fanoutResult.Verdicts),
		WasAlreadyActive:  false,
	}
References
  1. Remove dead or unreachable code paths, such as asynchronous handlers for tasks that are always performed synchronously, to improve code clarity and maintainability.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 67e8826e3a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if err != nil {
return nil, err
}
proposedIdx, err := s.proposeRaftCutoverEntry(ctx, preSidecar, req)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate raft cutover until wrapping support ships

When this RPC is invoked before the 6E-2 wrap/unwrap/barrier path is active, it persists RaftEnvelopeCutoverIndex=N while all subsequent Raft entries are still proposed in cleartext. The planned apply hook keys off entry.Index > sidecar.RaftEnvelopeCutoverIndex, so after upgrading/restarting with 6E-2 any cleartext entries committed at indexes greater than N will be treated as envelopes and replay/apply can halt. The marker should not be proposed until wrapping is active, or the activation scheme must ensure no cleartext entries can exist above the recorded cutover index.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/elastickv-admin/encryption.go (1)

47-47: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Usage and error text omit the newly registered enable-raft-envelope.

The dispatch table at Line 68 now serves enable-raft-envelope, but the help string (Line 47) and the unknown-subcommand error (Line 53) still enumerate the old set only. Operators running encryption help or hitting a typo won't discover the new subcommand.

📝 Proposed fix
-		_, err := fmt.Fprintln(os.Stdout, "usage: elastickv-admin encryption <subcommand> [flags]\n\nsubcommands:\n  status\n  rotate-dek\n  register-writer\n  bootstrap\n  enable-storage-envelope\n  probe-node-id")
+		_, err := fmt.Fprintln(os.Stdout, "usage: elastickv-admin encryption <subcommand> [flags]\n\nsubcommands:\n  status\n  rotate-dek\n  register-writer\n  bootstrap\n  enable-storage-envelope\n  enable-raft-envelope\n  probe-node-id")
-	return errors.Errorf("encryption: unknown subcommand %q (supported: status, rotate-dek, register-writer, bootstrap, enable-storage-envelope, probe-node-id)", sub)
+	return errors.Errorf("encryption: unknown subcommand %q (supported: status, rotate-dek, register-writer, bootstrap, enable-storage-envelope, enable-raft-envelope, probe-node-id)", sub)

Also applies to: 53-53

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/elastickv-admin/encryption.go` at line 47, Update the hard-coded
usage/help and the unknown-subcommand error message to include the new
subcommand name "enable-raft-envelope": change the string printed by the
fmt.Fprintln(os.Stdout, ...) usage call and the corresponding unknown-subcommand
error output so they list "enable-raft-envelope" alongside the other subcommands
(the dispatch table that handles subcommands already routes
"enable-raft-envelope"). Ensure both messages match the dispatch table (the
fmt.Fprintln usage string and the unknown-subcommand error string in
encryption.go).
🧹 Nitpick comments (1)
cmd/elastickv-admin/encryption.go (1)

28-33: 💤 Low value

Stale doc comment.

This narrative now describes the current PR's work as future: it attributes the diff to "6D-6b (this PR)" adding enable-storage-envelope and says enable-raft-envelope "lands in Stage 6E". Since this PR (Stage 6E-1b) actually wires enable-raft-envelope, the comment reads as inaccurate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/elastickv-admin/encryption.go` around lines 28 - 33, The block comment
referencing PR stages is stale; update the narrative to accurately reflect that
this PR wires enable-raft-envelope (not defers it to Stage 6E) and adjust the
history line to indicate which PR added enable-storage-envelope versus
enable-raft-envelope; specifically edit the doc comment that mentions "6D-6b
(this PR) adds `enable-storage-envelope`" and the later "lands in Stage 6E" so
it correctly lists `enable-storage-envelope`, `enable-raft-envelope`, and
previously added symbols (`status`, `rotate-dek`, `register-writer`,
`bootstrap`, `probe-node-id`, `ResyncSidecar`) in their proper order/PRs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/2026_05_31_partial_6e_enable_raft_envelope.md`:
- Around line 11-25: The status table correctly marks 6E-2 and 6E-3 as not
started but later prose incorrectly claims engine unwrap-on-apply, coordinator
wrap-on-propose and the §7.1 quiescence barrier have landed; update the later
narrative to defer those behaviors to milestone 6E-2 by replacing any
present-tense or "landed"/"implemented" wording with phrasing that they are
"planned in 6E-2" or "not yet implemented; planned in 6E-2", and ensure
references to engine unwrap-on-apply, coordinator wrap-on-propose and §7.1
proposal-quiescence barrier explicitly state they are pending work in 6E-2
rather than already shipped.
- Line 16: The table entry for "6E-1b — `EnableRaftEnvelope` admin RPC +
`elastickv-admin enable-raft-envelope` CLI subcommand" currently uses the
placeholder "this PR"; replace that placeholder with the concrete PR identifier
(and commit hash if available) following the same style used in the 6E-1a row so
the row reads with a durable reference (e.g., "PR `#1234` (commit abcdef0)").

---

Outside diff comments:
In `@cmd/elastickv-admin/encryption.go`:
- Line 47: Update the hard-coded usage/help and the unknown-subcommand error
message to include the new subcommand name "enable-raft-envelope": change the
string printed by the fmt.Fprintln(os.Stdout, ...) usage call and the
corresponding unknown-subcommand error output so they list
"enable-raft-envelope" alongside the other subcommands (the dispatch table that
handles subcommands already routes "enable-raft-envelope"). Ensure both messages
match the dispatch table (the fmt.Fprintln usage string and the
unknown-subcommand error string in encryption.go).

---

Nitpick comments:
In `@cmd/elastickv-admin/encryption.go`:
- Around line 28-33: The block comment referencing PR stages is stale; update
the narrative to accurately reflect that this PR wires enable-raft-envelope (not
defers it to Stage 6E) and adjust the history line to indicate which PR added
enable-storage-envelope versus enable-raft-envelope; specifically edit the doc
comment that mentions "6D-6b (this PR) adds `enable-storage-envelope`" and the
later "lands in Stage 6E" so it correctly lists `enable-storage-envelope`,
`enable-raft-envelope`, and previously added symbols (`status`, `rotate-dek`,
`register-writer`, `bootstrap`, `probe-node-id`, `ResyncSidecar`) in their
proper order/PRs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c14a1241-356e-415a-a633-2ae2d2fe47e2

📥 Commits

Reviewing files that changed from the base of the PR and between 3bffd34 and 67e8826.

⛔ Files ignored due to path filters (2)
  • proto/encryption_admin.pb.go is excluded by !**/*.pb.go
  • proto/encryption_admin_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (7)
  • adapter/encryption_admin.go
  • adapter/encryption_admin_test.go
  • cmd/elastickv-admin/encryption.go
  • cmd/elastickv-admin/encryption_mutators.go
  • cmd/elastickv-admin/encryption_test.go
  • docs/design/2026_05_31_partial_6e_enable_raft_envelope.md
  • proto/encryption_admin.proto

Comment thread docs/design/2026_05_31_partial_6e_enable_raft_envelope.md Outdated
Comment thread docs/design/2026_05_31_partial_6e_enable_raft_envelope.md Outdated
… ships

codex P1 (cluster-bricking hazard): accepting cutover proposals
now records RaftEnvelopeCutoverIndex=N while subsequent Raft
entries remain cleartext (6E-2's wrap-on-propose path is not yet
deployed). On a later 6E-2 upgrade, the engine apply-hook
dispatches entry.Index > sidecar.RaftEnvelopeCutoverIndex through
the unwrap path, so all the cleartext entries committed at
indexes > N get treated as wrapped envelopes and halt apply
cluster-wide.

Fail-closed gate: package-level const raftEnvelopeWrapEnabled
starts at false. EnableRaftEnvelope refuses fresh proposals with
FailedPrecondition. The pre-check (leader gate, bootstrap gate,
idempotent-retry short-circuit) still fires for fast feedback;
only the propose path is refused. 6E-2 flips the const atomically
with the wrap/unwrap/barrier wiring.

gemini MEDIUM: removed the dead 'if appliedIndex == 0' defensive
branch in freshRaftCutoverResponse. The raft variant uses the
cutover index itself as the active sentinel, so
raftCutoverPostcheck already refuses earlier on a zero index --
reaching freshRaftCutoverResponse guarantees non-zero. The
storage variant's hand-edit hedge has no analogue here.

Test changes:
- Replaced HappyPath with GatedUntil6E2 -- pins the refusal,
  proposer.calls stays empty, and the sidecar's
  RaftEnvelopeCutoverIndex remains 0.
- Dropped the orphaned assertRaftCutoverProposalShape helper
  (the HappyPath test was its only caller).
- The other 4 server tests (zero proposer / oversized epoch /
  not-bootstrapped / idempotent retry) exercise branches that
  fire BEFORE the gate; their behavior is unchanged.

Caller audit (semantic-change directive): EnableRaftEnvelope's
sole production caller is the auto-generated gRPC dispatcher; the
CLI wraps any error with errors.Wrap(err, "EnableRaftEnvelope"),
which surfaces the gate message cleanly. No other callers exist.

Design doc: added §3.3 documenting the gate, the codex P1
rationale, and the 6E-2 atomic-flip contract.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

Round-1 fixes in 1facaa1:

  • codex P1 (cluster-bricking hazard) — Added a fail-closed gate (package-level const raftEnvelopeWrapEnabled = false in adapter/encryption_admin.go). EnableRaftEnvelope refuses fresh cutover proposals with FailedPrecondition until 6E-2 ships the wrap-on-propose / unwrap-on-apply / §7.1 proposal-quiescence-barrier triple. The pre-check (leader gate, bootstrap gate, idempotent-retry short-circuit) still fires for fast operator feedback; only the propose path is refused. 6E-2 flips the const atomically with the wrap/unwrap/barrier wiring so the open-gate-without-wrap window cannot exist. Replaced the HappyPath test with GatedUntil6E2, which pins (1) FailedPrecondition with a 6E-2 hint in the message, (2) proposer.calls empty, (3) sidecar RaftEnvelopeCutoverIndex remains 0 — i.e., no state mutation can occur before 6E-2 lifts the gate, so a future upgrade is safe.

  • gemini MEDIUM — Removed the dead if appliedIndex == 0 defensive branch in freshRaftCutoverResponse. The raft variant uses the cutover index itself as the active sentinel, so raftCutoverPostcheck refuses earlier on a zero index; reaching freshRaftCutoverResponse guarantees non-zero. The storage variant's hand-edit hedge has no analogue here (the storage variant's appliedIndex == 0 branch covers StorageEnvelopeActive=true ∧ StorageEnvelopeCutoverIndex=0, a two-field-divergence state that cannot exist when active and index are the same field).

  • Caller audit (per the project's semantic-change-requires-grep directive): EnableRaftEnvelope's sole production caller is the auto-generated gRPC dispatcher in proto/encryption_admin_grpc.pb.go; the CLI wraps any error with errors.Wrap(err, "EnableRaftEnvelope"), which surfaces the gate message cleanly. freshRaftCutoverResponse's sole caller is raftCutoverPostcheck, which I already audited as guaranteeing non-zero. No other callers.

  • Design doc — added §3.3 documenting the gate, the codex P1 rationale (the engine apply-hook's entry.Index > cutover semantics and why cleartext entries above N would brick a 6E-2 upgrade), and the 6E-2 atomic-flip contract.

The orphaned assertRaftCutoverProposalShape helper was dropped (its only caller was the old HappyPath test). When 6E-2 lands, a new HappyPath test reintroduces the proposal-shape assertion.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 3m 21s —— View job


Code Review — Round-1 fixes (commit 1facaa14)

Tasks

  • Load review tools
  • Read the changed files in commit 1facaa14
  • Review the raftEnvelopeWrapEnabled gate implementation
  • Review removal of dead appliedIndex == 0 branch
  • Review test changes (GatedUntil6E2 replacing HappyPath)
  • Review design doc §3.3 addition
  • Check coderabbit's remaining open comments (usage string, doc comment)
  • Post findings

Gate implementation (adapter/encryption_admin.go:1086–1094)

The const raftEnvelopeWrapEnabled = false approach is correct:

  • Placement is right. The gate fires after raftCutoverPrecheck (which covers the leader gate, bootstrap gate, and idempotent-retry short-circuit) and before fan-out / propose — so no sidecar mutation or Raft proposal can escape before the gate is lifted. The semaphore is correctly released by defer s.releaseCutoverSemaphore() on all paths including the gate refusal.
  • The idempotent-retry bypass is preserved intentionally. As §3.3 notes, RaftEnvelopeCutoverIndex != 0 is operationally impossible while the gate is closed, but keeping the short-circuit in place means a sidecar that somehow carried a cutover index (e.g. from a development build where someone toggled the const) would still get a clean idempotent response rather than a confusing gate error. Correct call.
  • const rather than var. This prevents runtime mutation — exactly right. A var could be toggled by an errant init or test.
  • Error message. The §7.1 proposal-quiescence-barrier Unicode works fine here since the storage variant already sets that precedent, and gRPC surfaces it in the status message string as-is.

One minor observation: preSidecar returned from raftCutoverPrecheck is only consumed by proposeRaftCutoverEntry(ctx, preSidecar, req), which is beyond the gate. So every gate-refused call incurs one extra sidecar read. This is acceptable for a rare operator RPC and matches the stated "fast feedback" design goal (the bootstrap check needs the sidecar read anyway).

Dead-code removal (freshRaftCutoverResponse)

Removing the if appliedIndex == 0 branch is correct. The call chain guarantees non-zero at that point:

raftCutoverPostcheck → line 1182: if postSidecar.RaftEnvelopeCutoverIndex == 0 { return error }
                     → line 1186: freshRaftCutoverResponse(postSidecar, proposedIdx, fanoutResult)

Reaching freshRaftCutoverResponse therefore implies RaftEnvelopeCutoverIndex != 0. The _ uint64 parameter is the idiomatic Go way to express the intentional non-use, keeping the call site readable and preserving the signature for when 6E-2 may want to restore a defensive fallback. No issue.

Test changes (GatedUntil6E2)

The test correctly pins all four invariants: FailedPrecondition status code, the "6E-2" hint in the error message, empty proposer.calls, and RaftEnvelopeCutoverIndex == 0 in the sidecar. The two-condition check if err == nil || !strings.Contains(err.Error(), "6E-2") handles both the nil-error case and the wrong-error-message case cleanly.

The forward-compatibility comment ("When 6E-2 lands and flips raftEnvelopeWrapEnabled to true, this test becomes the regression pin for the gate-flip and a HappyPath sibling is added") is a useful marker for the 6E-2 author.

Dropping assertRaftCutoverProposalShape is correct — its only caller was the removed HappyPath test.

Design doc §3.3

The section is accurate and well-structured. The "cleartext entry at index > N treated as envelope → halt apply cluster-wide" hazard is explained clearly, and the 6E-2 atomic-flip contract is crisp. No issues.


Remaining open issues (not addressed in Round-1) — minor

1. Usage string missing enable-raft-envelope (cmd/elastickv-admin/encryption.go:47)

The help text printed on -h/--help/help still lists the old six subcommands; the dispatch table at line 68 already routes enable-raft-envelope. An operator who types elastickv-admin encryption help won't see the new subcommand listed. Fix this →

// line 47 — add "  enable-raft-envelope\n" before "  probe-node-id"

2. Unknown-subcommand error missing enable-raft-envelope (cmd/elastickv-admin/encryption.go:53)

Same gap — a typo like enable-raft-envelop shows (supported: … enable-storage-envelope, probe-node-id) without enable-raft-envelope. Fix this →

3. Stale doc comment (cmd/elastickv-admin/encryption.go:30–32)

The block comment still reads:

// 6D-6b (this PR) adds `enable-storage-envelope`; the §7.1 Phase 2
// `enable-raft-envelope` lands in Stage 6E.

enable-raft-envelope has landed in this PR (Stage 6E-1b), so the second sentence is now incorrect. Fix this →

4. "this PR" placeholder in design doc (docs/design/2026_05_31_partial_6e_enable_raft_envelope.md:16)

The table row for 6E-1b still reads | shipped | this PR |. CodeRabbit marked this "✅ Addressed in commit 1facaa1" but the placeholder is still there — it looks like a false positive. The 6E-1a row shows the pattern (#899 (3bffd344)); applying the same to 6E-1b would give #907 (67e8826). Fix this →


Overall

The Round-1 fixes address the two substantive concerns correctly — the cluster-bricking hazard is closed with a well-placed fail-closed gate, and the dead-code defensive branch is cleanly removed. The four remaining items above are cosmetic (help text, error string, doc comment, doc placeholder) and don't affect correctness or safety. They can be bundled into a follow-up fixup commit or landed here before merge.

bootjp added 2 commits June 2, 2026 20:19
… tweaks

- Major (already auto-marked addressed by coderabbit for 1facaa1):
  §0 "Why this slice exists" prose still read as if a single PR
  landed engine unwrap / coord wrap / barrier end-to-end. Clarify
  that the design is sliced across 6E-1a / 6E-1b / 6E-2 / 6E-3,
  with engine unwrap + coord wrap + barrier deferred to 6E-2
  (planned, not shipped).
- Minor: replaced "this PR" placeholder on the 6E-1b status row
  with the durable #907 reference, matching the 6E-1a row style.

No code changes; doc-only.
claude review round-1 flagged three stale references in
cmd/elastickv-admin/encryption.go now that enable-raft-envelope has
landed (6E-1b):

1. The block doc comment for encryptionMain said
   "the §7.1 Phase 2 enable-raft-envelope lands in Stage 6E" --
   no longer accurate. Replaced the shipping-order narrative with
   a concise list noting 6E-1b is gated until 6E-2 ships
   wrap-on-propose.

2. The -h/--help usage string listed six subcommands; added
   enable-raft-envelope between enable-storage-envelope and
   probe-node-id.

3. The unknown-subcommand error's "supported:" list missed
   enable-raft-envelope. Same insertion point.

No code paths change -- just three text edits in the help / error
surfaces. Existing TestEncryptionMain_RejectsUnknownSubcommand
continues to pass (it asserts "unknown subcommand" substring; the
"supported:" list is auxiliary).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

Round-2 follow-up (7c4403b + cafe9bf):

  • claude round-1 cosmetic Configure Renovate #1-3 — fixed in cmd/elastickv-admin/encryption.go: usage string and unknown-subcommand error now both list enable-raft-envelope; the stale block comment saying "enable-raft-envelope lands in Stage 6E" was replaced with a concise shipping-order list that notes 6E-1b is gated until 6E-2 ships wrap-on-propose.
  • coderabbit round-1 Major / Minor — addressed in cafe9bf: §0 prose clarified to defer engine unwrap + coord wrap + barrier to 6E-2 (planned, not shipped); "this PR" placeholder replaced with #907 reference.

All claude round-1 findings now closed:

Tests pass on cmd/elastickv-admin (0.340s); lint clean.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 1m 47s —— View job


Code Review — Round-2 follow-up (commits 7c4403ba + cafe9bf0)

Tasks

  • Load review tools
  • Read cmd/elastickv-admin/encryption.go — verify usage string, error message, block comment fixes
  • Read docs/design/2026_05_31_partial_6e_enable_raft_envelope.md — verify §0 prose + placeholder fixes
  • Post findings

All five open items from Round-1 + coderabbit are cleanly closed:

Cosmetic #1 — Usage string (encryption.go:46)
enable-raft-envelope is now listed between enable-storage-envelope and probe-node-id. Ordering matches the dispatch table.

Cosmetic #2 — Unknown-subcommand error (encryption.go:52)
✅ The supported-list in errors.Errorf now includes enable-raft-envelope in the same position. Error message and dispatch table agree.

Cosmetic #3 — Stale block comment (encryption.go:28–31)
✅ The forward-reference ("enable-raft-envelope lands in Stage 6E") is gone. The rewritten comment reads: enable-raft-envelope (6E-1b; the server method is gated until 6E-2 ships wrap-on-propose) — accurately describes the current state.

Coderabbit Major — §0 prose claiming engine unwrap/coord wrap/barrier had landed (design doc §0)
✅ The sentence now reads: engine unwrap-on-apply + coordinator wrap-on-propose + §7.1 proposal-quiescence barrier … (6E-2, **planned**), and the §6C-4 fail-closed guards (6E-3, planned). No ambiguity about what has shipped vs. what is planned.

Coderabbit Minor + claude #4 — "this PR" placeholder (design doc:16)
| … | shipped | #907 |. Omitting a mid-PR commit hash is appropriate here — the PR number is the durable handle, and a in-progress PR doesn't have a canonical head hash yet. The 6E-1a row's #899 (3bffd344) style can be applied to 6E-1b at merge time if desired, but it's not a blocking item.


Overall

Round-2 is LGTM. All review feedback from Round-1 and coderabbit is addressed. The implementation is substantively sound (gate, dead-code removal, test correctness) and the cosmetics are now tidy. No new findings.

@bootjp bootjp merged commit 1b47165 into main Jun 2, 2026
9 checks passed
@bootjp bootjp deleted the feat/encryption-6e-1b-admin-rpc-and-cli branch June 2, 2026 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant