Skip to content

chore(d1): add AnalyticsEventFact retention to cron (prod was near 10 GB cap)#285

Merged
whimet merged 3 commits into
mainfrom
chore/analytics-fact-retention
Jun 27, 2026
Merged

chore(d1): add AnalyticsEventFact retention to cron (prod was near 10 GB cap)#285
whimet merged 3 commits into
mainfrom
chore/analytics-fact-retention

Conversation

@whimet

@whimet whimet commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Why

conf-zenuml-prod D1 hit 9.89 GB / 10 GB (98.9%) — the 10 GB cap is a hard limit, not a billing threshold; writes fail at the wall.

AnalyticsEventFact was ~99.9% of the DB (9.67M rows, only 49 days). ~88% were dead page_viewed telemetry that stopped being written ~Jun 22. Root cause: nothing pruned this table — the daily cron-aggregate only purged UserBehaviorEvent (now empty), and purgeAnalyticsFactRetention() was wired only to a manual API endpoint (default 90d → deletes 0 at 49 days).

What this PR does

cron-aggregate — adds bounded, batched age-based retention for AnalyticsEventFact (default 45 days, tunable via ANALYTICS_FACT_RETENTION_DAYS). Bounds growth going forward. Batched via id-subquery (D1's SQLite has no DELETE ... LIMIT) and capped at 40×50k rows/run so the cron stays within limits and any backlog drains over a few nights.

Backlog already drained (2026-06-27)

The 8.5M dead page_viewed rows were removed out-of-band against prod before this merge:

before after
page_viewed rows 8,542,688 0
AnalyticsEventFact total 9,668,759 1,126,304
reported D1 size 9.89 GB (98.9%) 3.49 GB (34.9%)

D1 reclaimed the space immediately — no VACUUM/support ticket needed. Storage is now under the 5 GB free tier ($0). The one-off drain script has been removed; the remaining 1.13M rows are page_updated (~34k/day), which this PR's cron retention keeps bounded.

Rollout

Merge → pnpm --filter cron-aggregate deploy:prod. That's it — the nightly cron then keeps AnalyticsEventFact within 45 days.

Ops note: wrangler ... --remote can fail with /memberships Authentication error [code: 10000]; export CLOUDFLARE_ACCOUNT_ID=8d5fc7ce04adc5096f52485cce7d7b3d to bypass.

🤖 Generated with Claude Code

…uml-prod near 10 GB cap)

conf-zenuml-prod D1 is at 9.89 GB / 10 GB (98.9%). AnalyticsEventFact is
~99.9% of it (9.67M rows, 49 days), and nothing pruned it — the daily cron
only purged the now-empty UserBehaviorEvent, and purgeAnalyticsFactRetention
was wired only to a manual API endpoint. ~88% of rows are dead `page_viewed`
telemetry that stopped being written ~Jun 22.

- cron-aggregate: add bounded, batched age-based retention for
  AnalyticsEventFact (default 45d, tunable via ANALYTICS_FACT_RETENTION_DAYS),
  so the table is bounded going forward. Batched via id-subquery (D1 SQLite has
  no DELETE ... LIMIT) and capped per run to stay within cron limits.
- scripts/purge-analytics-fact-backlog.sh: one-time batched drain for the dead
  `page_viewed` backlog (younger than the retention window, so age-based purge
  won't clear it). Supports --dry-run/--yes, prod|stg.

Note: deletes free SQLite pages to the freelist (halts growth, write headroom
returns) but the reported/billed D1 size may not drop until D1 compacts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@whimet

whimet commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Backlog drain executed against prod ✅

Ran scripts/purge-analytics-fact-backlog.sh --env prod --event page_viewed --batch 100000 against conf-zenuml-prod.

before after
page_viewed rows 8,542,688 0
AnalyticsEventFact total 9,668,759 1,126,304
reported D1 size 9.89 GB (98.9%) 3.49 GB (34.9%)

86 batches, all 8,542,688 rows deleted cleanly (exit 0).

Correction to the caveat above: D1 did reclaim space — the reported size dropped immediately, no Cloudflare support ticket needed. Storage is now under the 5 GB free tier, so storage cost is $0. Will update the script's inline note to match.

Rollout gotcha: wrangler ... --remote kept failing with /memberships Authentication error [code: 10000] (transient OAuth account resolution). Export CLOUDFLARE_ACCOUNT_ID=8d5fc7ce04adc5096f52485cce7d7b3d to bypass it.

The remaining 1.13M rows are page_updated (still ingesting ~34k/day); the cron retention in this PR will keep AnalyticsEventFact bounded to 45 days going forward.

…og drain

D1 reclaimed space immediately after the page_viewed delete — correct the
earlier 'size may not drop' caveat in the script and cron comments.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The backlog has been drained from conf-zenuml-prod (8.5M page_viewed rows;
9.89->3.49 GB). The nightly cron retention added in this PR handles ongoing
bounding, so the one-shot script is no longer needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@whimet whimet changed the title chore(d1): AnalyticsEventFact retention + backlog drain (prod near 10 GB cap) chore(d1): add AnalyticsEventFact retention to cron (prod was near 10 GB cap) Jun 27, 2026
@whimet whimet merged commit 4c39c38 into main Jun 27, 2026
18 checks passed
@whimet whimet deleted the chore/analytics-fact-retention branch June 27, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant