feat(encryption): Stage 6D-6c-3a - wire capability fan-out closure into EnableStorageEnvelope#830
Conversation
…e wiring Proposes wiring the §4 capability fan-out closure into main.go so the EnableStorageEnvelope cutover RPC can run its pre-flight GetCapability fan-out (today s.capabilityFanout is nil → cutover refuses with the §4 "not wired" FailedPrecondition). Design: build an all-groups admin.RouteSnapshot from each runtime's engine.Configuration(ctx), a DialFunc over a dedicated kv.GRPCConnCache returning pb.EncryptionAdminClient (no-op cleanup, cache owns conns), and assemble adapter.CapabilityFanoutFn; wire via WithEncryptionAdminCapabilityFanout gated on enableMutators. Fail-closed when the membership view can't be fully enumerated. Dedicated conn cache (not the --adminEnabled-gated admin-forward cache) so the fan-out works whenever encryption mutators are enabled. e2e test deferred to 6D-6c-3b. Doc-first per CLAUDE.md; implementation commits follow on the same PR.
…to EnableStorageEnvelope Wires the §4 capability fan-out into main.go so the EnableStorageEnvelope cutover RPC can run its pre-flight GetCapability fan-out (previously s.capabilityFanout was nil → cutover refused with the §4 "not wired" FailedPrecondition). - main_encryption_fanout.go: buildCapabilityFanoutFn assembles the adapter.CapabilityFanoutFn. The closure snapshots all groups' membership via each runtime's engine.Configuration(ctx) (routeGroupFromConfiguration maps raftengine.Server → admin.RouteMember keyed by etcd.DeriveNodeID, split voter/learner on Suffrage with empty == voter), and dials each peer's EncryptionAdmin endpoint over a dedicated kv.GRPCConnCache (no-op cleanup; the cache owns conns). A snapshot-build error fails the closure → cutover refuses (fail-closed). - buildEncryptionCapabilityFanout owns the dedicated conn cache (NOT the --adminEnabled-gated admin-forward cache, so the fan-out works whenever encryption mutators are enabled), drained on ctx cancellation via eg. Returns nil when mutators are disabled. Extracted as a helper to keep startRaftServers under the cyclop budget. - registerEncryptionAdminServer threads the closure through WithEncryptionAdminCapabilityFanout, gated on the same enableMutators boolean as the Proposer/LeaderView. The end-to-end Bootstrap→cutover→Put→read-back test lands in 6D-6c-3b. Tests: routeGroupFromConfiguration voter/learner split (incl. empty suffrage == voter), DeriveNodeID node identity, address mapping, multi-server, empty configuration; builder returns non-nil. Self-review (5 lenses): - Data loss/consistency: read-only control-plane wiring; no writes, no apply-path change; cutover semantics unchanged (6D-6a). - Concurrency: CapabilityFanout already concurrent + timeout-bounded; snapshot builder per-RPC single-shot; conn cache concurrency-safe; cleanup via eg on ctx.Done. - Performance: fan-out runs once per (rare) cutover RPC, off the data path. - Test coverage: mapping helper unit-tested; full fan-out behavior covered by the 6D-6c-3b e2e. Doc commit (6D-6c-3a proposed) landed first; 6D milestone doc updated.
|
Warning Review limit reached
Your plan includes 1 review of capacity. Refill in 51 minutes and 45 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR implements the 6D-6c-3a capability fan-out wiring for EnableStorageEnvelope cutover RPCs. The change adds snapshot-based routing of live Raft membership across runtimes through a dedicated gRPC connection cache, gates the behavior behind the enableMutators condition, and wires the closure into the admin server registration to ensure fail-closed behavior. ChangesEnableStorageEnvelope capability fan-out wiring
Sequence DiagramsequenceDiagram
participant startRaftServers
participant buildEncryptionCapabilityFanout
participant registerEncryptionAdminServer
participant buildCapabilityFanoutFn
participant capabilityRouteSnapshot
participant admin.CapabilityFanout
startRaftServers->>buildEncryptionCapabilityFanout: ctx, eg, runtimes, enableMutators
buildEncryptionCapabilityFanout->>buildEncryptionCapabilityFanout: enableMutators ? buildCapabilityFanoutFn : nil
buildEncryptionCapabilityFanout-->>startRaftServers: CapabilityFanoutFn or nil
loop for each raft group
startRaftServers->>registerEncryptionAdminServer: gs, fullNodeID, sidecarPath, enableMutators, engine, encryptionCapabilityFanout
registerEncryptionAdminServer->>registerEncryptionAdminServer: if enableMutators && capabilityFanout != nil: wire WithEncryptionAdminCapabilityFanout
end
Note over buildCapabilityFanoutFn: When EnableStorageEnvelope RPC invoked:
buildCapabilityFanoutFn->>capabilityRouteSnapshot: snapshot all runtimes
capabilityRouteSnapshot->>capabilityRouteSnapshot: routeSnapshotFromSources: read each runtime's Configuration
capabilityRouteSnapshot-->>buildCapabilityFanoutFn: live membership routes or error
buildCapabilityFanoutFn->>admin.CapabilityFanout: routes, dialFn (with GRPCConnCache), timeout
admin.CapabilityFanout-->>buildCapabilityFanoutFn: capability snapshot or error (fail-closed)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@claude review |
|
Claude finished @bootjp's task in 3m 58s —— View job Code Review — Stage 6D-6c-3a capability fan-out wiring
Overall the wiring is clean and the design rationale is solid. One race condition in the new file needs to be fixed before merge; the rest are lower-severity items. Bug: data race in
|
There was a problem hiding this comment.
Code Review
This pull request implements the capability fan-out closure wiring for the EnableStorageEnvelope cutover RPC, introducing membership snapshotting and gRPC dialing via a dedicated connection cache. Feedback identifies an improvement opportunity to propagate the context within the dialing logic for better timeout management and suggests adding a safety check for runtime specifications to avoid potential nil pointer dereferences.
| dial := func(_ context.Context, address string) (pb.EncryptionAdminClient, func(), error) { | ||
| conn, err := connCache.ConnFor(address) | ||
| if err != nil { | ||
| return nil, nil, errors.Wrapf(err, "capability fan-out: dial %s", address) | ||
| } | ||
| return pb.NewEncryptionAdminClient(conn), func() {}, nil | ||
| } |
There was a problem hiding this comment.
The dial closure ignores the context.Context parameter passed by the fan-out helper. In accordance with project standards, context.Context should be used for managing deadlines and cancellation. The context should be propagated to the connection cache to ensure the fan-out timeout is strictly respected and to avoid goroutine leaks if a dial hangs.
| dial := func(_ context.Context, address string) (pb.EncryptionAdminClient, func(), error) { | |
| conn, err := connCache.ConnFor(address) | |
| if err != nil { | |
| return nil, nil, errors.Wrapf(err, "capability fan-out: dial %s", address) | |
| } | |
| return pb.NewEncryptionAdminClient(conn), func() {}, nil | |
| } | |
| dial := func(ctx context.Context, address string) (pb.EncryptionAdminClient, func(), error) { | |
| conn, err := connCache.ConnFor(ctx, address) | |
| if err != nil { | |
| return nil, nil, errors.Wrapf(err, "capability fan-out: dial %s", address) | |
| } | |
| return pb.NewEncryptionAdminClient(conn), func() {}, nil | |
| } |
References
- When designing interfaces, use context.Context for managing deadlines and cancellation to ensure timeouts are strictly respected.
| } | ||
| cfg, err := rt.engine.Configuration(ctx) | ||
| if err != nil { | ||
| return admin.RouteSnapshot{}, errors.Wrapf(err, "capability fan-out: configuration for group %d", rt.spec.id) |
There was a problem hiding this comment.
Potential nil pointer dereference: rt.spec is accessed here without a prior nil check. While rt and rt.engine are checked at line 89, rt.spec is assumed to be non-nil. Although this might be an internal invariant, adding a check or ensuring rt.spec is always populated during runtime construction would improve robustness.
… gaps, doc lifecycle Addresses the claude review on PR #830: - must-fix (data race): capabilityRouteSnapshot read rt.engine directly, but the fan-out closure runs during live EnableStorageEnvelope RPCs concurrently with a possible Close() that clears the engine field under engineMu.Lock(). Now reads via rt.snapshotEngine() (engineMu.RLock, nil-safe), matching the documented contract for post-startup engine reads. Not a semantic change — same value, race-safe access. - should-fix (test gaps): extracted routeSnapshotFromSources behind a narrow configReader interface so the fail-closed path is unit testable without a full engine. Added: - TestRouteSnapshotFromSources_ConfigurationErrorFailsClosed — a group Configuration error aborts the whole snapshot (cutover refuses rather than proposing against a partial membership view). - TestRouteSnapshotFromSources_MapsAllGroups — multi-group mapping. - TestBuildEncryptionCapabilityFanout_DisabledReturnsNil — mutators off → nil closure. - should-fix (doc lifecycle): git mv the per-PR design doc proposed -> partial with Status: partial + a lifecycle note (the fan-out wiring shipped; 6D-6c-3b e2e remains). Minor non-blocking observations (WaitForReady bounded by the 5s timeout, const placement, gating comment, shutdown error surfacing) were confirmed correct as-is and left unchanged.
|
@claude review |
|
Claude finished @bootjp's task in 2m 38s —— View job Code Review — Stage 6D-6c-3a capability fan-out wiring (Round 2)
Round-1 fixes confirmedAll three must-fix / should-fix items from the first round are addressed in the current code:
Response to gemini-code-assist comments"dial closure ignores context" — Not actionable as written. "potential nil pointer dereference on rt.spec" — Not an issue. Minor observation
Test imports mix stdlib and cockroachdb errors — What's solidThe refactoring since round 1 is clean and well-reasoned:
Approve to merge. No blocking issues in the current state. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/design/2026_05_18_partial_6d_enable_storage_envelope.md`:
- Line 114: Replace the incorrect cross-reference link text
"2026_05_25_proposed_6d6c3a_capability_fanout_wiring.md" in
docs/design/2026_05_18_partial_6d_enable_storage_envelope.md with the actual
shipped/partial document filename used in this PR (the matching _partial_
document name), ensuring the link points to the current partial document rather
than the proposed dated filename.
In `@docs/design/2026_05_25_partial_6d6c3a_capability_fanout_wiring.md`:
- Around line 36-38: The documentation currently states "SuffrageVoter → Voters,
else Learners" which contradicts implementation: update the phrasing around
`srv.Address`/`srv.Suffrage` mapping to state that `SuffrageLearner` maps to
Learners and all other suffrage values (including empty) are treated as Voters
(i.e., routed into `admin.RouteGroup{GroupID: rt.spec.id}` as voters); mention
the explicit exceptions `SuffrageLearner` and the default "all others → Voters"
behavior to match the code.
In `@main_encryption_fanout.go`:
- Around line 101-103: The loop over runtimes calls rt.snapshotEngine() without
guarding against nil entries, so add an explicit nil check for the loop variable
rt before invoking snapshotEngine() (e.g., if rt == nil { continue }),
preserving the documented “nil runtime is skipped” behavior; update the loop
surrounding runtimes and the call to snapshotEngine() to skip nil rt values and
only call snapshotEngine() when rt is non-nil.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fe24c67a-6af9-4e8e-8e51-7588839f429f
📒 Files selected for processing (7)
docs/design/2026_05_18_partial_6d_enable_storage_envelope.mddocs/design/2026_05_25_partial_6d6c3a_capability_fanout_wiring.mdmain.gomain_encryption_admin.gomain_encryption_admin_test.gomain_encryption_fanout.gomain_encryption_fanout_test.go
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cdf5021734
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if rt == nil || rt.engine == nil { | ||
| continue | ||
| } | ||
| cfg, err := rt.engine.Configuration(ctx) |
There was a problem hiding this comment.
Read engine through snapshot accessor in fan-out path
capabilityRouteSnapshot is a long-lived runtime path (invoked by admin RPCs after startup), but it reads rt.engine directly. raftGroupRuntime documents that post-startup readers must use snapshotEngine() because Close() clears engine under engineMu (multiraft_runtime.go), so this direct read introduces a shutdown-time data race and can produce an inconsistent/partial route snapshot during cutover checks. Please load the engine via snapshotEngine() and operate on that local value instead of touching the field directly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Already addressed in round-1 (commit 26ecce5): capabilityRouteSnapshot reads each engine via rt.snapshotEngine() (engineMu.RLock, nil-safe) and stashes it in groupConfigSource before calling Configuration — the direct rt.engine field read this P1 flagged is gone. This comment was filed against the pre-fix commit (cdf5021). The current head (664d8f9) has the locked accessor at main_encryption_fanout.go:105, and the round-2 review independently confirmed the race fix.
…rections
claude round-2 approved to merge; this folds in the late-arriving
coderabbit findings and confirms the codex P1.
- coderabbit Minor (nil runtime guard): add an explicit
`if rt == nil { continue }` before rt.snapshotEngine() in
capabilityRouteSnapshot. snapshotEngine is already nil-receiver-safe
(returns nil → skipped), so this is defensive clarity, not a
behavior change.
- coderabbit Minor (doc): correct the 6D-6c-3a suffrage mapping text
to match the implementation — SuffrageLearner → Learners, else →
Voters (empty/unannotated suffrage counts as voter).
- coderabbit Minor (doc): fix the parent 6D doc's reference link to
point at the renamed *_partial_* 6D-6c-3a doc.
codex P1 ("read engine through snapshot accessor") was already
resolved in round-1 (commit 26ecce5): capabilityRouteSnapshot reads
via rt.snapshotEngine() at main_encryption_fanout.go:105 — the P1 was
filed against the pre-fix commit. claude round-2 independently
confirmed the race fix is correct.
|
@claude review |
|
Claude finished @bootjp's task in 2m 27s —— View job Code Review — Stage 6D-6c-3a capability fan-out wiring (Round 3)
Round-1 and Round-2 fixes confirmedAll findings from the previous two rounds are correctly resolved in the current head (
CodeRabbit Round-2 comments confirmed addressed
Gemini concerns (both correctly dismissed, confirmed)
One minor doc inconsistency (non-blocking)Design doc §1 states: "Fan-out timeout: a Subtle correctness check:
|
Summary
Stage 6D-6c-3a wires the §4 capability fan-out closure into
main.goso theEnableStorageEnvelopecutover RPC can run its pre-flightGetCapabilityfan-out. Previouslys.capabilityFanout == nil, so the cutover refused with the §4 "fan-out not wired"FailedPrecondition. The fan-out helper (internal/admin.CapabilityFanout, 6D-3) and the RPC option (adapter.WithEncryptionAdminCapabilityFanout, 6D-6a) already existed and are tested — this slice is pure wiring.Doc-first per CLAUDE.md: the proposed design doc is the first commit.
What landed
main_encryption_fanout.go—buildCapabilityFanoutFnassembles theadapter.CapabilityFanoutFn. The closure snapshots all groups' membership (the §4.1 "voter ∪ learner of every group" contract) via each runtime'sengine.Configuration(ctx);routeGroupFromConfigurationmapsraftengine.Server→admin.RouteMemberkeyed byetcd.DeriveNodeID(srv.ID), split voter/learner onSuffrage(empty == voter). TheDialFuncdials each peer'sEncryptionAdminendpoint over a dedicatedkv.GRPCConnCache(no-op cleanup — the cache owns conns).buildEncryptionCapabilityFanout— owns the dedicated conn cache (not the--adminEnabled-gated admin-forward cache, so the fan-out works whenever encryption mutators are enabled), drained on ctx cancellation viaeg. Returnsnilwhen mutators are disabled. Extracted to keepstartRaftServerswithin the cyclop budget.registerEncryptionAdminServer— threads the closure throughWithEncryptionAdminCapabilityFanout, gated on the sameenableMutatorsboolean as the Proposer/LeaderView.Fail-closed posture
A snapshot-build error (any
engine.Configurationfailure) fails the closure → the cutover refuses rather than proposing against a membership view it couldn't fully enumerate. Unreachable / not-capable members are handled insideCapabilityFanout(OK=false→ refuse; no partial success, §4.3).Scope boundary
The end-to-end test (single-node: Bootstrap → EnableStorageEnvelope → Put → read-back-via-envelope) lands in 6D-6c-3b on top of this wiring.
Self-review (5 lenses)
CapabilityFanoutis already concurrent + timeout-bounded; the snapshot builder runs per-RPC single-shot; the conn cache is concurrency-safe; cleanup goroutine drains it onctx.Done().routeGroupFromConfigurationmapping (voter/learner split incl. empty-suffrage == voter,DeriveNodeIDidentity, address, multi-server, empty config) + builder non-nil. Full fan-out behavior is covered by the 6D-6c-3b e2e.Test plan
go test -race .golangci-lint run .(0 issues)Summary by CodeRabbit
Documentation
New Features
Tests