Skip to content

feat(cli,store,common): Phase 66 Phase 4e — lpm cache prune + known-projects registry#34

Open
tolgaergin wants to merge 1 commit into
phase66-4d-migration-and-flipfrom
phase66-4e-cache-prune
Open

feat(cli,store,common): Phase 66 Phase 4e — lpm cache prune + known-projects registry#34
tolgaergin wants to merge 1 commit into
phase66-4d-migration-and-flipfrom
phase66-4e-cache-prune

Conversation

@tolgaergin
Copy link
Copy Markdown
Contributor

Summary

Closes the last Phase-4 prerequisite before the linker default flip (Phase 4f) can ship externally. The v2 virtual store at ~/.lpm/store/v2/links/ grows monotonically across every project on the machine — link entries from deleted/moved projects accumulate forever without prune. Phase 4e gives users the cleanup story.

Stacks on #33 (Phase 4d — silent migration + default flip). Will auto-rebase to main after #30#31#32#33 land.

Surface

lpm cache prune                            # dry-run, list orphans (uses registry)
lpm cache prune --apply                    # actually remove
lpm cache prune --max-age 30d              # only entries older than this
lpm cache prune --project <path>           # manual repair: ignore registry, walk this project
lpm cache prune --legacy-v1                # also wipe ~/.lpm/store/v1/ (post-migration cleanup)

lpm doctor now surfaces the orphan count and suggests prune as remediation when non-zero.

What's in this PR

lpm_common::known_projects — machine-global registry at ~/.lpm/known-projects.json. Schema-versioned JSON with canonicalized paths (so symlink-cwd quirks don't accumulate aliased entries). Atomic rewrites via <path>.tmp.<pid> → rename. Best-effort load policy (missing/malformed/schema-mismatch → empty registry, never blocks installs).

Install pipeline integration — every successful lpm install registers the project. Errors are logged + dropped at the call site so a flaky write never blocks an install.

lpm_store::v2::Store::iter_object_dirs() — pairs with the existing iter_link_entries for symmetric coverage of links/<*>/ and objects/<sri>/ orphans (preplan §4.4).

Prune algorithm (preplan §4.3 + §4.4) — collect roots from registry (or --project), BFS through LinkMeta.deps[].target_graph_key to mark reachable, apply --max-age filter, mark unreferenced objects via surviving link entries' object_path. Apply mode deletes; default is dry-run.

Doctor integrationV2_STORE_ORPHANS catalog entry, pass when zero orphans, warn-with-remediation ("run: lpm cache prune --apply") otherwise.

End-to-end smoke

$ lpm install                              # registers project in ~/.lpm/known-projects.json
$ lpm cache prune                          # → 0 orphan link entries, 0 orphan objects
$ rm -rf <project>                         # delete project
$ lpm cache prune                          # → 5 orphan links + 5 orphan objects, 9.4 MB eligible
$ lpm cache prune --apply                  # → Pruned 5 link entries + 5 objects (9.4 MB)
$ ls ~/.lpm/store/v2/links/                # empty

Pre-merge gate

cargo clippy --workspace --all-targets -- -D warnings  ✓
cargo fmt --check                                       ✓
cargo nextest run --workspace --exclude lpm-integration-tests
  → 5742 tests pass, 7 skipped                          ✓
cargo test -p lpm-auth (parallel-deterministic)         ✓
./bench/audit-fixtures/run-all.sh
  → 17 PASS / 1 SKIP / 0 mixed (v2 default)            ✓
LPM_STORE_VERSION=v1 ./bench/audit-fixtures/run-all.sh
  → 17 PASS / 1 SKIP / 0 mixed (v1 downgrade)          ✓

Tests added

  • lpm_common::known_projects::tests (10) — schema versioning, malformed-JSON degrade, schema-mismatch degrade, symlink canonicalization, dedup-on-register, last_seen bumps, atomic no-tmp-leak, drop_missing happy path + zero-when-all-alive, sort-on-write.
  • commands::cache_prune::tests (4) — orphan detection happy path, multi-hop BFS through dep edges (parent → child → grandchild), --max-age preserves recent unreachable entries, --project mode skips the registry.

Out of scope (queued)

  • Concurrent install ↔ prune locking via lpm_root.store_lock() (production users running prune from a separate shell during an active install is the lone risk window).
  • --gc-registry mode for collapsing very-stale (>180d) registry entries.
  • Automatic doctor threshold tuning (currently warns on any orphan count > 0; preplan mentions "N GB / X% thresholds TBD").

These don't gate Phase 4f's linker default flip — that's the next milestone, gated on bench data.

Test plan

  • All new unit tests pass (10 + 4)
  • Full nextest workspace run passes (5742 tests)
  • Audit fixtures green under both flag values
  • End-to-end prune smoke (install → register → delete → prune dry-run → prune apply → store empty)
  • Doctor reports orphans and suggests prune
  • CI matrix green on both audit-fixtures (v1) and audit-fixtures (v2) rows

🤖 Generated with Claude Code

…-projects registry

Closes the last Phase-4 prerequisite before the linker default flip
(Phase 4f) can ship externally. Without prune, the v2 virtual store
at `~/.lpm/store/v2/links/` grows monotonically across every project
on the machine — link entries from deleted/moved projects accumulate
forever. Phase 4e gives users the cleanup story.

## What's new

**Surface:** `lpm cache prune [--apply] [--max-age <dur>] [--project <path>] [--legacy-v1]`

- Default: dry-run, list orphans (uses the registry).
- `--apply`: actually remove orphan link entries + objects.
- `--max-age`: filter to entries with `last_referenced_at` older than
  the supplied duration (`30d`, `24h`, etc.). Younger entries
  preserved under the "registry might be stale" assumption.
- `--project <path>`: manual repair mode. Walk only this project's
  `node_modules/` to collect roots; ignore the registry entirely.
  For post-machine-restore or after a corrupted registry write.
- `--legacy-v1`: also wipe `~/.lpm/store/v1/` (post-Phase-4d
  migration cleanup).

## Implementation

**`lpm_common::known_projects`** — machine-global registry at
`~/.lpm/known-projects.json`. Schema-versioned JSON
(`{"version": 1, "projects": [{"path", "last_seen"}]}`) with
canonicalized paths so symlink-cwd quirks don't accumulate aliased
entries. Atomic rewrites via `<path>.tmp.<pid>` → rename. Best-
effort load policy: missing-file / malformed-JSON / schema-mismatch
all silently degrade to empty registry rather than failing every
install — the registry is a perf+UX cache, not load-bearing data.

**Install pipeline integration** — `commands::install::run` calls
`known_projects::register(path, project_dir)` after every successful
install. Errors are logged + dropped at this site so a flaky write
never blocks an install that succeeded.

**`lpm_store::v2::Store::iter_object_dirs()`** — pairs with the
existing `iter_link_entries` to give prune symmetric coverage of
both `links/<*>/` and `objects/<sri>/` orphans (preplan §4.4).
Skips non-directories and dirs without `.integrity` markers
(mid-write tmp dirs, incomplete extracts).

**Prune algorithm** (preplan §4.3 + §4.4):

1. Collect root projects:
   - Default: `known_projects::drop_missing` removes registry
     entries pointing at deleted paths, then load the surviving set.
   - `--project <path>`: bypass registry, use the supplied path
     verbatim.
2. For each project, walk `<project>/node_modules/<X>` symlinks
   (recursing one level into `@scope/`), canonicalize, and
   collect any whose target lies inside `~/.lpm/store/v2/links/`
   as the BFS seed set.
3. BFS through `LinkMeta.deps[].target_graph_key` (the digest-hex
   form recorded in each sidecar) to mark every reachable link
   entry. Cycles are bounded by the visited-set check.
4. Apply `--max-age` filter: unreachable entries whose
   `last_referenced_at` is younger than the threshold stay live.
5. Object orphan detection: walk every SURVIVING (non-orphan)
   link entry's sidecar, collect `object_path` segments, mark
   anything in `objects/<*>/` not in that set as an orphan.
6. `--apply` mode: delete the orphan link/object dirs and (if
   `--legacy-v1`) wipe `~/.lpm/store/v1/`. Default is dry-run.

**Path-canonicalization invariant** — the BFS frontier holds
canonical paths (because `add_if_link_descendant` canonicalizes the
project symlink to detect store-descendant relationships). Phase
4e's first algorithm draft mismatched canonical-frontier vs raw
sidecar-iteration paths and surfaced 0 reachable entries on macOS
(the `/private/var/...` realpath shape vs. the `/var/...` symlink
shape). Fix: canonicalize every link_dir at iter-collection time
so the join-by-PathBuf works.

**Doctor integration** (preplan §4.5) — new `V2_STORE_ORPHANS`
catalog entry. Pass when zero orphans; warn-with-remediation
("run: lpm cache prune --apply") when non-zero. Reuses the
prune algorithm in dry-run mode so doctor and prune always agree
on the orphan count.

**`PruneFlags` shared on the `lpm cache <action>` dispatcher** —
flags are inert for `clean` and `path`; only `prune` reads them.
Keeping the dispatcher signature stable across actions lets the
existing `clean`/`path` test surface stay unchanged (just adds
`PruneFlags::default()` at every call site).

## Tests

- `lpm_common::known_projects::tests` (10) — schema versioning,
  malformed-JSON degrade, schema-mismatch degrade, symlink
  canonicalization, dedup-on-register, last_seen bumps, atomic
  no-tmp-leak, drop_missing happy path + zero-when-all-alive,
  sort-on-write.
- `commands::cache_prune::tests` (4) — orphan detection happy path,
  multi-hop BFS through dep edges (parent → child → grandchild),
  `--max-age` preserves recent unreachable entries,
  `--project` mode skips the registry.
- `v2::store::tests::iter_object_dirs_*` covered indirectly via
  the prune-algorithm tests.

## End-to-end smoke

```
$ lpm install                              # registers project in ~/.lpm/known-projects.json
$ lpm cache prune                          # → 0 orphan link entries, 0 orphan objects
$ rm -rf <project>                         # delete project
$ lpm cache prune                          # → 5 orphan links + 5 orphan objects, 9.4 MB
$ lpm cache prune --apply                  # → Pruned 5 link entries + 5 objects (9.4 MB)
$ ls ~/.lpm/store/v2/links/                # empty
```

## Pre-merge gate

- cargo clippy --workspace --all-targets -- -D warnings ✓
- cargo fmt --check ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
  → 5742 tests pass, 7 skipped ✓
- cargo test -p lpm-auth (parallel-deterministic) ✓
- audit-fixtures default (v2) → 17 PASS / 1 SKIP / 0 mixed ✓
- audit-fixtures `LPM_STORE_VERSION=v1` (downgrade) → 17 PASS / 1 SKIP / 0 mixed ✓

## Out of scope (queued for follow-ups)

- **Concurrent install ↔ prune locking.** Preplan §4.3 mentions
  flock; the current implementation is racy if a user runs `lpm
  cache prune --apply` while another process is mid-install.
  Documenting as a follow-up — the audit-fixture suite doesn't
  exercise it; production users running prune from a separate
  shell during an active install is the lone risk window. Phase 4f
  follow-up: wrap the prune apply in `lpm_root.store_lock()` as a
  shared reader (matches the `lpm install` exclusive-writer
  contract).
- **`--gc-registry` mode** for collapsing very-stale (>180d)
  registry entries (preplan §4.3 safety rail "last_seen tracking").
  `register` already touches `last_seen`; the GC walker is a
  ~20-line follow-up.
- **Automatic doctor threshold tuning** — currently warns on ANY
  orphan count > 0. Preplan §4.5 mentions "N GB or X% orphans
  thresholds TBD" — leaving as warn-on-any until usage data
  surfaces a sensible default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant