feat(api): add runtime aggregator role toggle#309
Conversation
🤖 Kimi Code ReviewOverall Assessment: Well-structured PR implementing runtime aggregator toggling with correct atomic semantics and clear documentation. The hot-standby model is appropriately constrained (static gossip subscriptions, mutable aggregation logic). Critical Issues
Code Quality & Correctness
Rust Best Practices
Consensus & Safety
Documentation
Minor Suggestions
Approved with Reservations: Address the authentication concern on the admin endpoint before deploying to production networks. The consensus logic implementation is correct. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code Review
I didn’t see another obvious fork-choice or state-transition correctness regression in the diff itself. I couldn’t run the Rust tests here because the sandbox blocks rustup/toolchain setup ( Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewHere is the full review: PR #309 —
|
Ports leanSpec PR #636: adds `GET /lean/v0/admin/aggregator` and `POST /lean/v0/admin/aggregator` endpoints that let operators rotate aggregation duties across nodes without restarting. The `--is-aggregator` CLI flag now seeds a shared `AggregatorController` (an `Arc<AtomicBool>` in `ethlambda-types`) rather than a plain bool. The blockchain actor reads the flag fresh on every tick and gossip attestation, so runtime toggles take effect from the next tick. Subnet subscriptions remain frozen at startup, matching the spec's explicit scope limitation: standby aggregators must still boot with `--is-aggregator` so gossip subscriptions are in place, then use the admin endpoint to rotate duties (hot-standby model). The `lean_is_aggregator` metric is synced from the actor's read rather than from the HTTP handler so the gauge reflects what the actor actually acted on, not what was requested. POST strictly rejects non-boolean `enabled` (incl. JSON 0/1) to match the spec's semantics.
2bbfe5a to
3a793b8
Compare
🤖 Kimi Code ReviewThe PR implements runtime aggregator toggling correctly with proper documentation and tests. A few minor observations: Code Quality & Safety
Correctness
Style
Documentation
The implementation correctly handles the hot-standby model limitations and the distinction between aggregation logic and gossip mesh subscriptions. The use of Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Claude Code ReviewHere is the code review for PR #309: PR #309 — Runtime Aggregator Role ToggleOverviewPorts leanSpec PR #636. Introduces Memory Ordering —
|
🤖 Codex Code Review
The actor-side wiring itself looks clean: reading the flag per tick/event is straightforward, and the P2P comment about startup-only subnet subscriptions is clear. I couldn’t run the new tests locally because Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryThis PR ports leanSpec PR #636 to add two admin endpoints ( Confidence Score: 5/5Safe to merge; all remaining findings are P2 style or coverage suggestions with no blocking correctness issues. The core logic (AtomicBool toggle, Extension wiring, 503/400 error paths) is sound and backed by 10 admin-route tests plus the pre-existing suite. The only open items are a discrepancy between the PR description's claimed unit tests for AggregatorController and what's actually in the file, plus a style note on error-response content types — neither affects runtime behaviour. crates/common/types/src/aggregator.rs — verify whether the claimed 4 unit tests were intentionally omitted or accidentally left out of the commit.
|
| Filename | Overview |
|---|---|
| crates/common/types/src/aggregator.rs | New AggregatorController type wrapping Arc; correct Relaxed ordering for a periodic flag; claimed unit tests are not present in the file |
| crates/net/rpc/src/admin.rs | GET/POST admin handlers with correct 503/400/200 paths; strict bool-only validation; 10 tests covering all branches; error responses are plain-text rather than JSON |
| crates/blockchain/src/lib.rs | BlockChainServer now holds AggregatorController; flag snapshotted once per tick for consistency within a tick, re-read fresh on each gossip event; metric mirrored actor-side |
| crates/net/rpc/src/lib.rs | AggregatorController threaded in via Extension layer; admin routes registered correctly; existing store-backed handlers unchanged |
| bin/ethlambda/src/main.rs | Controller constructed from CLI flag, cloned into BlockChain::spawn and start_api_server; SwarmConfig intentionally keeps a plain bool with an explanatory comment |
| crates/common/types/src/lib.rs | Exports the new aggregator module; straightforward addition |
| crates/net/p2p/src/lib.rs | Adds INVARIANT doc comment on SwarmConfig.is_aggregator clarifying why the gossip-subscription flag stays a plain bool; no logic changes |
| CLAUDE.md | New Runtime Aggregator Toggle gotcha section added; clearly documents the in-process-only scope and subnet subscription limitation |
Sequence Diagram
sequenceDiagram
participant Operator
participant AdminAPI
participant Controller
participant Actor
Operator->>AdminAPI: POST admin/aggregator
AdminAPI->>Controller: set_enabled(bool)
Controller-->>AdminAPI: previous
AdminAPI-->>Operator: new and previous
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/common/types/src/aggregator.rs
Line: 1-26
Comment:
**Unit tests missing despite PR description claiming 4 passed**
The PR description's test plan explicitly states `cargo test -p ethlambda-types --lib aggregator:: — 4 passed` with a cross-thread visibility test using `Barrier`, but `aggregator.rs` contains no `#[cfg(test)]` block at all. The file ends at line 26 with only the struct and impls. Either the test module was accidentally omitted from the commit, or the description was written ahead of the actual work. Since `AggregatorController` is a thin `Arc<AtomicBool>` wrapper, its behavior is indirectly covered by the admin API tests, but the described standalone visibility tests (cross-thread `Barrier` proof) are absent.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/net/rpc/src/admin.rs
Line: 65-96
Comment:
**Missing `Content-Type` on error responses**
`bad_request` and `service_unavailable` return `(StatusCode, &str).into_response()`, which axum serialises as `text/plain`. Callers relying on `Content-Type: application/json` for all responses (e.g. generic error-parsing middleware) will need to special-case these. Consider returning a small JSON error body for consistency with the success responses:
```rust
fn bad_request(reason: &'static str) -> Response {
(StatusCode::BAD_REQUEST, Json(serde_json::json!({"error": reason}))).into_response()
}
fn service_unavailable(reason: &'static str) -> Response {
(StatusCode::SERVICE_UNAVAILABLE, Json(serde_json::json!({"error": reason}))).into_response()
}
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'main' into runtime-aggrega..." | Re-trigger Greptile
| pub async fn post_aggregator( | ||
| controller: Option<Extension<AggregatorController>>, | ||
| body: Option<Json<Value>>, | ||
| ) -> Response { | ||
| let Some(Extension(controller)) = controller else { | ||
| return service_unavailable("Aggregator controller not available"); | ||
| }; | ||
|
|
||
| // Parsing happens through `Option<Json<Value>>` so we can distinguish | ||
| // "no body / malformed JSON" (None) from "valid JSON with wrong shape". | ||
| let Some(Json(payload)) = body else { | ||
| return bad_request("Invalid or missing JSON body"); | ||
| }; | ||
|
|
||
| let Some(enabled_value) = payload.get("enabled") else { | ||
| return bad_request("Missing 'enabled' field in body"); | ||
| }; | ||
|
|
||
| let Some(enabled) = enabled_value.as_bool() else { | ||
| return bad_request("'enabled' must be a boolean"); | ||
| }; | ||
|
|
||
| let previous = controller.set_enabled(enabled); | ||
| if previous != enabled { | ||
| info!(enabled, previous, "Aggregator role toggled via admin API"); | ||
| } | ||
|
|
||
| json_response(ToggleResponse { | ||
| is_aggregator: enabled, | ||
| previous, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Missing
Content-Type on error responses
bad_request and service_unavailable return (StatusCode, &str).into_response(), which axum serialises as text/plain. Callers relying on Content-Type: application/json for all responses (e.g. generic error-parsing middleware) will need to special-case these. Consider returning a small JSON error body for consistency with the success responses:
fn bad_request(reason: &'static str) -> Response {
(StatusCode::BAD_REQUEST, Json(serde_json::json!({"error": reason}))).into_response()
}
fn service_unavailable(reason: &'static str) -> Response {
(StatusCode::SERVICE_UNAVAILABLE, Json(serde_json::json!({"error": reason}))).into_response()
}Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/net/rpc/src/admin.rs
Line: 65-96
Comment:
**Missing `Content-Type` on error responses**
`bad_request` and `service_unavailable` return `(StatusCode, &str).into_response()`, which axum serialises as `text/plain`. Callers relying on `Content-Type: application/json` for all responses (e.g. generic error-parsing middleware) will need to special-case these. Consider returning a small JSON error body for consistency with the success responses:
```rust
fn bad_request(reason: &'static str) -> Response {
(StatusCode::BAD_REQUEST, Json(serde_json::json!({"error": reason}))).into_response()
}
fn service_unavailable(reason: &'static str) -> Response {
(StatusCode::SERVICE_UNAVAILABLE, Json(serde_json::json!({"error": reason}))).into_response()
}
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
Ports leanSpec PR #636. Adds two admin endpoints that let operators rotate aggregation duties across nodes at runtime without a restart.
GET /lean/v0/admin/aggregatorreturns{"is_aggregator": bool}POST /lean/v0/admin/aggregatorwith body{"enabled": bool}returns{"is_aggregator": <new>, "previous": <old>}Returns
503when the controller is not wired,400on malformed body or non-booleanenabled(JSON ints0/1are explicitly rejected, matching the spec).Design
The CLI flag
--is-aggregatornow seeds a sharedAggregatorController(a thinArc<AtomicBool>inethlambda-types) instead of a plain bool. The blockchain actor reads the flag fresh on every tick and every gossip attestation, so runtime toggles take effect from the next tick with no protocol-message churn.axum::Extensionis used to thread the controller into admin handlers so existing store-backed routes don't need to change their signatures.The
lean_is_aggregatormetric is updated from the actor's tick read (with a cachedlast_seen_aggregator) rather than from the HTTP handler, so the gauge reflects what the actor actually acted on.Scope limitations (intentional, matching spec)
Operational model: standby aggregators must boot with
--is-aggregator=trueso gossip subscriptions are in place, then use the admin endpoint to rotate duties (hot-standby model). A node booted with--is-aggregator=falseand toggled on later will have no extra subnets to aggregate. This is documented in the CLI flag's docstring and theCLAUDE.mdgotcha section.