feat(cli,store,common): Phase 66 Phase 4e — lpm cache prune + known-projects registry#34
Open
tolgaergin wants to merge 1 commit into
Open
feat(cli,store,common): Phase 66 Phase 4e — lpm cache prune + known-projects registry#34tolgaergin wants to merge 1 commit into
lpm cache prune + known-projects registry#34tolgaergin wants to merge 1 commit into
Conversation
…-projects registry
Closes the last Phase-4 prerequisite before the linker default flip
(Phase 4f) can ship externally. Without prune, the v2 virtual store
at `~/.lpm/store/v2/links/` grows monotonically across every project
on the machine — link entries from deleted/moved projects accumulate
forever. Phase 4e gives users the cleanup story.
## What's new
**Surface:** `lpm cache prune [--apply] [--max-age <dur>] [--project <path>] [--legacy-v1]`
- Default: dry-run, list orphans (uses the registry).
- `--apply`: actually remove orphan link entries + objects.
- `--max-age`: filter to entries with `last_referenced_at` older than
the supplied duration (`30d`, `24h`, etc.). Younger entries
preserved under the "registry might be stale" assumption.
- `--project <path>`: manual repair mode. Walk only this project's
`node_modules/` to collect roots; ignore the registry entirely.
For post-machine-restore or after a corrupted registry write.
- `--legacy-v1`: also wipe `~/.lpm/store/v1/` (post-Phase-4d
migration cleanup).
## Implementation
**`lpm_common::known_projects`** — machine-global registry at
`~/.lpm/known-projects.json`. Schema-versioned JSON
(`{"version": 1, "projects": [{"path", "last_seen"}]}`) with
canonicalized paths so symlink-cwd quirks don't accumulate aliased
entries. Atomic rewrites via `<path>.tmp.<pid>` → rename. Best-
effort load policy: missing-file / malformed-JSON / schema-mismatch
all silently degrade to empty registry rather than failing every
install — the registry is a perf+UX cache, not load-bearing data.
**Install pipeline integration** — `commands::install::run` calls
`known_projects::register(path, project_dir)` after every successful
install. Errors are logged + dropped at this site so a flaky write
never blocks an install that succeeded.
**`lpm_store::v2::Store::iter_object_dirs()`** — pairs with the
existing `iter_link_entries` to give prune symmetric coverage of
both `links/<*>/` and `objects/<sri>/` orphans (preplan §4.4).
Skips non-directories and dirs without `.integrity` markers
(mid-write tmp dirs, incomplete extracts).
**Prune algorithm** (preplan §4.3 + §4.4):
1. Collect root projects:
- Default: `known_projects::drop_missing` removes registry
entries pointing at deleted paths, then load the surviving set.
- `--project <path>`: bypass registry, use the supplied path
verbatim.
2. For each project, walk `<project>/node_modules/<X>` symlinks
(recursing one level into `@scope/`), canonicalize, and
collect any whose target lies inside `~/.lpm/store/v2/links/`
as the BFS seed set.
3. BFS through `LinkMeta.deps[].target_graph_key` (the digest-hex
form recorded in each sidecar) to mark every reachable link
entry. Cycles are bounded by the visited-set check.
4. Apply `--max-age` filter: unreachable entries whose
`last_referenced_at` is younger than the threshold stay live.
5. Object orphan detection: walk every SURVIVING (non-orphan)
link entry's sidecar, collect `object_path` segments, mark
anything in `objects/<*>/` not in that set as an orphan.
6. `--apply` mode: delete the orphan link/object dirs and (if
`--legacy-v1`) wipe `~/.lpm/store/v1/`. Default is dry-run.
**Path-canonicalization invariant** — the BFS frontier holds
canonical paths (because `add_if_link_descendant` canonicalizes the
project symlink to detect store-descendant relationships). Phase
4e's first algorithm draft mismatched canonical-frontier vs raw
sidecar-iteration paths and surfaced 0 reachable entries on macOS
(the `/private/var/...` realpath shape vs. the `/var/...` symlink
shape). Fix: canonicalize every link_dir at iter-collection time
so the join-by-PathBuf works.
**Doctor integration** (preplan §4.5) — new `V2_STORE_ORPHANS`
catalog entry. Pass when zero orphans; warn-with-remediation
("run: lpm cache prune --apply") when non-zero. Reuses the
prune algorithm in dry-run mode so doctor and prune always agree
on the orphan count.
**`PruneFlags` shared on the `lpm cache <action>` dispatcher** —
flags are inert for `clean` and `path`; only `prune` reads them.
Keeping the dispatcher signature stable across actions lets the
existing `clean`/`path` test surface stay unchanged (just adds
`PruneFlags::default()` at every call site).
## Tests
- `lpm_common::known_projects::tests` (10) — schema versioning,
malformed-JSON degrade, schema-mismatch degrade, symlink
canonicalization, dedup-on-register, last_seen bumps, atomic
no-tmp-leak, drop_missing happy path + zero-when-all-alive,
sort-on-write.
- `commands::cache_prune::tests` (4) — orphan detection happy path,
multi-hop BFS through dep edges (parent → child → grandchild),
`--max-age` preserves recent unreachable entries,
`--project` mode skips the registry.
- `v2::store::tests::iter_object_dirs_*` covered indirectly via
the prune-algorithm tests.
## End-to-end smoke
```
$ lpm install # registers project in ~/.lpm/known-projects.json
$ lpm cache prune # → 0 orphan link entries, 0 orphan objects
$ rm -rf <project> # delete project
$ lpm cache prune # → 5 orphan links + 5 orphan objects, 9.4 MB
$ lpm cache prune --apply # → Pruned 5 link entries + 5 objects (9.4 MB)
$ ls ~/.lpm/store/v2/links/ # empty
```
## Pre-merge gate
- cargo clippy --workspace --all-targets -- -D warnings ✓
- cargo fmt --check ✓
- cargo nextest run --workspace --exclude lpm-integration-tests
→ 5742 tests pass, 7 skipped ✓
- cargo test -p lpm-auth (parallel-deterministic) ✓
- audit-fixtures default (v2) → 17 PASS / 1 SKIP / 0 mixed ✓
- audit-fixtures `LPM_STORE_VERSION=v1` (downgrade) → 17 PASS / 1 SKIP / 0 mixed ✓
## Out of scope (queued for follow-ups)
- **Concurrent install ↔ prune locking.** Preplan §4.3 mentions
flock; the current implementation is racy if a user runs `lpm
cache prune --apply` while another process is mid-install.
Documenting as a follow-up — the audit-fixture suite doesn't
exercise it; production users running prune from a separate
shell during an active install is the lone risk window. Phase 4f
follow-up: wrap the prune apply in `lpm_root.store_lock()` as a
shared reader (matches the `lpm install` exclusive-writer
contract).
- **`--gc-registry` mode** for collapsing very-stale (>180d)
registry entries (preplan §4.3 safety rail "last_seen tracking").
`register` already touches `last_seen`; the GC walker is a
~20-line follow-up.
- **Automatic doctor threshold tuning** — currently warns on ANY
orphan count > 0. Preplan §4.5 mentions "N GB or X% orphans
thresholds TBD" — leaving as warn-on-any until usage data
surfaces a sensible default.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the last Phase-4 prerequisite before the linker default flip (Phase 4f) can ship externally. The v2 virtual store at
~/.lpm/store/v2/links/grows monotonically across every project on the machine — link entries from deleted/moved projects accumulate forever without prune. Phase 4e gives users the cleanup story.Stacks on #33 (Phase 4d — silent migration + default flip). Will auto-rebase to
mainafter #30 → #31 → #32 → #33 land.Surface
lpm doctornow surfaces the orphan count and suggests prune as remediation when non-zero.What's in this PR
lpm_common::known_projects— machine-global registry at~/.lpm/known-projects.json. Schema-versioned JSON with canonicalized paths (so symlink-cwd quirks don't accumulate aliased entries). Atomic rewrites via<path>.tmp.<pid>→ rename. Best-effort load policy (missing/malformed/schema-mismatch → empty registry, never blocks installs).Install pipeline integration — every successful
lpm installregisters the project. Errors are logged + dropped at the call site so a flaky write never blocks an install.lpm_store::v2::Store::iter_object_dirs()— pairs with the existingiter_link_entriesfor symmetric coverage oflinks/<*>/andobjects/<sri>/orphans (preplan §4.4).Prune algorithm (preplan §4.3 + §4.4) — collect roots from registry (or
--project), BFS throughLinkMeta.deps[].target_graph_keyto mark reachable, apply--max-agefilter, mark unreferenced objects via surviving link entries'object_path. Apply mode deletes; default is dry-run.Doctor integration —
V2_STORE_ORPHANScatalog entry, pass when zero orphans, warn-with-remediation ("run: lpm cache prune --apply") otherwise.End-to-end smoke
Pre-merge gate
Tests added
lpm_common::known_projects::tests(10) — schema versioning, malformed-JSON degrade, schema-mismatch degrade, symlink canonicalization, dedup-on-register, last_seen bumps, atomic no-tmp-leak, drop_missing happy path + zero-when-all-alive, sort-on-write.commands::cache_prune::tests(4) — orphan detection happy path, multi-hop BFS through dep edges (parent → child → grandchild),--max-agepreserves recent unreachable entries,--projectmode skips the registry.Out of scope (queued)
lpm_root.store_lock()(production users running prune from a separate shell during an active install is the lone risk window).--gc-registrymode for collapsing very-stale (>180d) registry entries.These don't gate Phase 4f's linker default flip — that's the next milestone, gated on bench data.
Test plan
audit-fixtures (v1)andaudit-fixtures (v2)rows🤖 Generated with Claude Code