Skip to content

security(H16): vault crate primitives for WS3 public-key state machine#79

Open
tolgaergin wants to merge 1 commit into
mainfrom
security/h8-h16-ws3-public-key-vault-crate
Open

security(H16): vault crate primitives for WS3 public-key state machine#79
tolgaergin wants to merge 1 commit into
mainfrom
security/h8-h16-ws3-public-key-vault-crate

Conversation

@tolgaergin

Copy link
Copy Markdown
Contributor

Summary

Workstream 3 PR #3 — the rust-client half of the public-key write hardening. Server-side WS3 (a-package-manager#30) gates POST /api/users/me/public-key on a step-up proof and refuses silent overwrites; this PR teaches the vault crate to:

  1. Classify the local-vs-server key state without mutating
  2. Propagate server errors honestly instead of collapsing them to "no key"
  3. Carry the step-up proof in the X-LPM-Step-Up-Proof header on upload

The CLI command-layer migration (rotate-sharing-key, pending-key promotion, replacing every silent ensure_public_key(...) caller with explicit classification + reauth UX) is PR #4 of this workstream — intentionally split so the crate API change can be reviewed independently.

Wire-level changes

get_my_public_key() — no longer collapses non-2xx to Ok(None). 401/403/5xx now propagate as Err, so a transient outage or expired token can no longer trick callers into the silent-overwrite path. 2xx with publicKey: null is the explicit "no key on server" signal.

New PublicKeyRegistrationState enum captures the three outcomes the CLI must branch on:

  • Matches(local) — same key on both sides
  • NeedsInitialSet(local) — no key on server; needs vault:public-key:set proof
  • RotationRequired { local, server_public_key_b64 } — different key on server; CLI must refuse silent overwrite and direct user to explicit rotation flow

New classify_public_key_state(registry_url, auth_token) runs the local-keypair load + server lookup + classification. Pure classifier — never writes.

upload_public_key() signature gains step_up_proof: Option<&str>, emits the proof in the X-LPM-Step-Up-Proof header when provided, and parses the WS3 response shape ({ok, status, fingerprintPrefix, previousFingerprintPrefix, invalidatedWrappedKeys, affectedOrgs}). Every field is optional, so pre-WS3 servers ({status: "saved"}) still round-trip.

CLI_STEP_UP_HEADER_NAME constant mirrors the server-side header name from lib/auth/cli-step-up.js so the two surfaces never drift.

Backwards compatibility

ensure_public_key updated to the new upload_public_key(_, _, _, None) signature. Against the WS3-hardened server, the no-proof upload will surface as step_up_required for the set/rotate cohorts — that's the correct failure mode (silent overwrite was the security bug WS3 closes). PR #4 replaces every ensure_public_key caller with explicit classification + reauth.

Three existing push_org_with_keys tests had to swap their 404 "missing" mocks for the truthful 200 + publicKey: null shape since the old Ok(None) collapse no longer hides the 404.

Test plan

  • cargo fmt --check clean
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo nextest run --workspace --exclude lpm-integration-tests7362/7362 pass (10 new in crates/lpm-vault/src/sync.rs tests module)

New tests

  • get_my_public_key — non-2xx → Err, happy 2xx → Some, 2xx with null → None
  • upload_public_key — sends X-LPM-Step-Up-Proof when provided, omits header when None (regression guard against spurious empty header), propagates server error envelope on non-2xx with step_up_required code preserved
  • classify_public_key_state — Matches / NeedsInitialSet / RotationRequired branches, plus propagation of server errors (critical regression guard against the old Ok(None) collapse misclassifying outages as NeedsInitialSet)

Deployment compatibility

Branched off main since WS3 server PRs are forward-compatible:

  • Old servers ignore the new proof header
  • New servers refuse the old proof-less calls with a structured error the CLI can surface

Both behaviors are correct for their respective deployments. PR #4 will make the CLI side prompt for and supply the proof.

Remaining WS3 PRs

🤖 Generated with Claude Code

…chine

Workstream 3 PR #3 — the rust-client half of the public-key write
hardening. Server-side WS3 (paired PR #30 in a-package-manager)
gates POST /api/users/me/public-key on a step-up proof and refuses
silent overwrites; this PR teaches the vault crate to (a) classify
the local-vs-server key state without mutating, (b) propagate
server errors honestly instead of collapsing them to "no key", and
(c) carry the step-up proof in the X-LPM-Step-Up-Proof header on
the upload path.

The CLI command-layer migration (rotate-sharing-key, pending-key
promotion, replacing every silent `ensure_public_key(...)` caller
with explicit classification + reauth UX) is PR #4 of this
workstream, intentionally split so the crate API change can be
reviewed independently of the CLI UX work.

Concrete changes in `crates/lpm-vault/src/sync.rs`:

- `get_my_public_key()` no longer collapses non-2xx to `Ok(None)`.
  401, 403, and 5xx now propagate as `Err(...)` carrying the HTTP
  status, so a transient outage or an expired token can no longer
  trick the caller into the silent-overwrite path. 2xx with
  `publicKey: null` (or the field absent) is the explicit "no key
  on server" signal.

- New `LocalPublicKeyState` struct + `PublicKeyRegistrationState`
  enum capture the three outcomes the CLI must branch on:
  `Matches`, `NeedsInitialSet`, `RotationRequired`. The
  `RotationRequired` variant is the one the prior
  `ensure_public_key` path silently rolled through — the new
  classifier surfaces it so the CLI command layer can refuse and
  direct the user to the explicit rotation flow.

- New `classify_public_key_state(registry_url, auth_token)` runs
  the local-keypair load + server lookup + classification. Pure
  classifier — never writes.

- `upload_public_key()` signature gains `step_up_proof:
  Option<&str>`, emits the proof in the `X-LPM-Step-Up-Proof`
  header when provided, and parses the WS3 server response shape
  (`{ok, status, fingerprintPrefix, previousFingerprintPrefix,
  invalidatedWrappedKeys, affectedOrgs}`). The new
  `UploadPublicKeyResponse` struct deserializes every field as
  optional so pre-WS3 servers (returning `{status: "saved"}`) still
  round-trip without error.

- Local keypair load extracted into `load_local_public_key_state()`
  so `ensure_public_key` and `classify_public_key_state` share the
  storage / generation / canonical-Base64-encoding policy.

- `CLI_STEP_UP_HEADER_NAME` constant mirrors the server-side
  header name from `lib/auth/cli-step-up.js`, so the upload site
  and any future caller never drift.

- `ensure_public_key` updated to the new `upload_public_key`
  signature with `proof: None`. Against the WS3-hardened server
  the no-proof upload will surface as `step_up_required` for the
  set/rotate cohorts — that's the correct failure mode (silent
  overwrite was the security bug WS3 closes). Three existing
  push_org_with_keys tests had to swap their 404 "missing key"
  mocks for the truthful 200-with-null-publicKey shape.

Tests (10 new):

- get_my_public_key returns Err on non-2xx (401 cohort)
- get_my_public_key returns Some on happy 2xx with key
- get_my_public_key returns None on 2xx with publicKey: null
- upload_public_key sends X-LPM-Step-Up-Proof when provided AND
  omits the header entirely when None passed (regression guard
  against spurious empty headers)
- upload_public_key propagates server error envelope on non-2xx
  with step_up_required code preserved
- classify_public_key_state returns Matches when keys equal
- classify_public_key_state returns NeedsInitialSet on null
- classify_public_key_state returns RotationRequired on mismatch
  (the critical regression guard against silent overwrite)
- classify_public_key_state propagates server errors instead of
  misclassifying outages as NeedsInitialSet

Workspace gate: 7362/7362 nextest passed (4 new). cargo fmt
clean. cargo clippy --workspace --all-targets clean.

Branched off main since WS3 server PRs are forward-compatible:
old servers ignore the new proof header, new servers refuse the
old proof-less calls with a structured error the CLI can
surface — both behaviors are correct for their respective
deployments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 21, 2026
…/pull

Workstream 3 PR #4 (the final WS3 rust-client slice). Adds the
interactive `lpm env rotate-sharing-key` command, migrates the silent
`ensure_public_key` callers in `share` / `pull --org` to the new
classifier path from PR #3, and wires the Workstream 2 CLI step-up
prompt as the reauth primitive for both flows.

Server-side WS3 gates (a-package-manager #30) require a step-up proof
for public-key writes and refuse silent overwrites; this PR makes the
rust-client side of that contract explicit:

  - "Matches" → continue
  - "NeedsInitialSet" → prompt for step-up (vault:public-key:set),
    upload local key, continue
  - "RotationRequired" → STOP, refuse to silently overwrite, point
    the user at `lpm env rotate-sharing-key`

New surfaces:

* `crates/lpm-vault/src/sync.rs`
  - `CliStepUpPolicy` + `discover_cli_step_up_policy()` — GETs the
    server's step-up policy so the CLI knows whether to prompt for
    password, password+TOTP, or refuse outright.
  - `CliStepUpCredential` + `mint_cli_step_up_proof()` — POSTs the
    credential and returns the proof JWT for the X-LPM-Step-Up-Proof
    header.
  - `PendingPublicKey` + `create_pending_x25519_keypair` /
    `read_pending_x25519_keypair` / `promote_pending_x25519_keypair`
    / `discard_pending_x25519_keypair`. File-backed
    `~/.lpm/.x25519_key.pending` slot kept distinct from the live
    slot so a crash between server-side upload and local promotion
    leaves a recoverable state (the next rotate-sharing-key
    invocation detects the matching pending key and finishes the
    promotion).
  - `should_use_file_backed_x25519_keypair(force_file, live_key_exists)`
    selector — the macOS loader now prefers the file-backed slot
    when a live file is present, so promotion's file write actually
    takes effect on the next read instead of being silently replaced
    by a fresh keychain key. Regression-pinned at
    `x25519_backend_selection_uses_live_file_after_rotation_without_force_env`.
    [Fix authored by GPT after the initial round caught the macOS
    keychain-fallthrough bug.]

* `crates/lpm-vault/src/keychain.rs`
  - `delete_x25519_keypair()` — best-effort macOS keychain clear, used
    by promotion so subsequent reads observe the new live file.

* `crates/lpm-cli/src/step_up.rs` (new module)
  - `request_cli_step_up_proof(registry_url, auth_token, scope)` —
    cliclack-driven prompts (password / password+TOTP), strict
    non-TTY refusal so a CI environment can't blunder into a hung
    prompt or accept hostile piped input.

* `crates/lpm-cli/src/commands/env.rs`
  - New `rotate-sharing-key` dispatcher arm + `env_rotate_sharing_key`
    implementation. Refuses non-TTY (and explicit `--yes`) at the
    door. Crash-recovery branch detects a matching pending key on
    server and finishes promotion without a second rotation. Blast-
    radius warning + typed-ROTATE confirmation before any prompt.
    On success, reports the wrapped-key invalidation counts the
    server returns.
  - `ensure_sharing_key_ready_for_org_op()` classify-then-act helper
    used by both the share and the org-pull paths. Refuses
    `RotationRequired` with a remediation hint that names the rotate
    flow. Prompts step-up + uploads on `NeedsInitialSet`. The prior
    `ensure_public_key()` silent-upload path was the headline H16
    silent-overwrite vector; this is the client side of the WS3 gate.
  - `unknown vars action` help text now lists `rotate-sharing-key`.

Tests:

* lpm-vault inline (`crates/lpm-vault/src/sync.rs`) — 10 new across
  step-up clients (discover password/unavailable/non-2xx; mint
  password body shape, totp body shape, error envelope), pending-key
  lifecycle (create→read→promote round-trip, discard preserves live,
  promote-with-no-pending is an explicit error), and the macOS
  backend selector regression test from GPT's fix.

* lpm-workflows (`tests/workflows/tests/env_vault.rs`) — 2 new
  pinning the non-TTY refusal for `rotate-sharing-key` (with and
  without `--yes`). Both run from `cargo test`'s pipe-backed stdin
  so the refusal must fire at TTY-detect time, before any network
  or pending-key side effect.

Local gate: `cargo fmt --check` clean, `cargo clippy --workspace
--all-targets -- -D warnings` clean, `cargo nextest run --workspace
--exclude lpm-integration-tests` — 7374/7374 pass (12 new vs PR #79
baseline of 7362).

Branched off `security/h8-h16-ws3-public-key-vault-crate` (PR #79).
Auto-retargets to `main` when #79 merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tolgaergin added a commit that referenced this pull request May 21, 2026
…/pull (#82)

Workstream 3 PR #4 (the final WS3 rust-client slice). Adds the
interactive `lpm env rotate-sharing-key` command, migrates the silent
`ensure_public_key` callers in `share` / `pull --org` to the new
classifier path from PR #3, and wires the Workstream 2 CLI step-up
prompt as the reauth primitive for both flows.

Server-side WS3 gates (a-package-manager #30) require a step-up proof
for public-key writes and refuse silent overwrites; this PR makes the
rust-client side of that contract explicit:

  - "Matches" → continue
  - "NeedsInitialSet" → prompt for step-up (vault:public-key:set),
    upload local key, continue
  - "RotationRequired" → STOP, refuse to silently overwrite, point
    the user at `lpm env rotate-sharing-key`

New surfaces:

* `crates/lpm-vault/src/sync.rs`
  - `CliStepUpPolicy` + `discover_cli_step_up_policy()` — GETs the
    server's step-up policy so the CLI knows whether to prompt for
    password, password+TOTP, or refuse outright.
  - `CliStepUpCredential` + `mint_cli_step_up_proof()` — POSTs the
    credential and returns the proof JWT for the X-LPM-Step-Up-Proof
    header.
  - `PendingPublicKey` + `create_pending_x25519_keypair` /
    `read_pending_x25519_keypair` / `promote_pending_x25519_keypair`
    / `discard_pending_x25519_keypair`. File-backed
    `~/.lpm/.x25519_key.pending` slot kept distinct from the live
    slot so a crash between server-side upload and local promotion
    leaves a recoverable state (the next rotate-sharing-key
    invocation detects the matching pending key and finishes the
    promotion).
  - `should_use_file_backed_x25519_keypair(force_file, live_key_exists)`
    selector — the macOS loader now prefers the file-backed slot
    when a live file is present, so promotion's file write actually
    takes effect on the next read instead of being silently replaced
    by a fresh keychain key. Regression-pinned at
    `x25519_backend_selection_uses_live_file_after_rotation_without_force_env`.
    [Fix authored by GPT after the initial round caught the macOS
    keychain-fallthrough bug.]

* `crates/lpm-vault/src/keychain.rs`
  - `delete_x25519_keypair()` — best-effort macOS keychain clear, used
    by promotion so subsequent reads observe the new live file.

* `crates/lpm-cli/src/step_up.rs` (new module)
  - `request_cli_step_up_proof(registry_url, auth_token, scope)` —
    cliclack-driven prompts (password / password+TOTP), strict
    non-TTY refusal so a CI environment can't blunder into a hung
    prompt or accept hostile piped input.

* `crates/lpm-cli/src/commands/env.rs`
  - New `rotate-sharing-key` dispatcher arm + `env_rotate_sharing_key`
    implementation. Refuses non-TTY (and explicit `--yes`) at the
    door. Crash-recovery branch detects a matching pending key on
    server and finishes promotion without a second rotation. Blast-
    radius warning + typed-ROTATE confirmation before any prompt.
    On success, reports the wrapped-key invalidation counts the
    server returns.
  - `ensure_sharing_key_ready_for_org_op()` classify-then-act helper
    used by both the share and the org-pull paths. Refuses
    `RotationRequired` with a remediation hint that names the rotate
    flow. Prompts step-up + uploads on `NeedsInitialSet`. The prior
    `ensure_public_key()` silent-upload path was the headline H16
    silent-overwrite vector; this is the client side of the WS3 gate.
  - `unknown vars action` help text now lists `rotate-sharing-key`.

Tests:

* lpm-vault inline (`crates/lpm-vault/src/sync.rs`) — 10 new across
  step-up clients (discover password/unavailable/non-2xx; mint
  password body shape, totp body shape, error envelope), pending-key
  lifecycle (create→read→promote round-trip, discard preserves live,
  promote-with-no-pending is an explicit error), and the macOS
  backend selector regression test from GPT's fix.

* lpm-workflows (`tests/workflows/tests/env_vault.rs`) — 2 new
  pinning the non-TTY refusal for `rotate-sharing-key` (with and
  without `--yes`). Both run from `cargo test`'s pipe-backed stdin
  so the refusal must fire at TTY-detect time, before any network
  or pending-key side effect.

Local gate: `cargo fmt --check` clean, `cargo clippy --workspace
--all-targets -- -D warnings` clean, `cargo nextest run --workspace
--exclude lpm-integration-tests` — 7374/7374 pass (12 new vs PR #79
baseline of 7362).

Branched off `security/h8-h16-ws3-public-key-vault-crate` (PR #79).
Auto-retargets to `main` when #79 merges.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant