chore(d1): add AnalyticsEventFact retention to cron (prod was near 10 GB cap)#285
Conversation
…uml-prod near 10 GB cap) conf-zenuml-prod D1 is at 9.89 GB / 10 GB (98.9%). AnalyticsEventFact is ~99.9% of it (9.67M rows, 49 days), and nothing pruned it — the daily cron only purged the now-empty UserBehaviorEvent, and purgeAnalyticsFactRetention was wired only to a manual API endpoint. ~88% of rows are dead `page_viewed` telemetry that stopped being written ~Jun 22. - cron-aggregate: add bounded, batched age-based retention for AnalyticsEventFact (default 45d, tunable via ANALYTICS_FACT_RETENTION_DAYS), so the table is bounded going forward. Batched via id-subquery (D1 SQLite has no DELETE ... LIMIT) and capped per run to stay within cron limits. - scripts/purge-analytics-fact-backlog.sh: one-time batched drain for the dead `page_viewed` backlog (younger than the retention window, so age-based purge won't clear it). Supports --dry-run/--yes, prod|stg. Note: deletes free SQLite pages to the freelist (halts growth, write headroom returns) but the reported/billed D1 size may not drop until D1 compacts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Backlog drain executed against prod ✅Ran
86 batches, all 8,542,688 rows deleted cleanly (exit 0). Correction to the caveat above: D1 did reclaim space — the reported size dropped immediately, no Cloudflare support ticket needed. Storage is now under the 5 GB free tier, so storage cost is $0. Will update the script's inline note to match. Rollout gotcha: The remaining 1.13M rows are |
…og drain D1 reclaimed space immediately after the page_viewed delete — correct the earlier 'size may not drop' caveat in the script and cron comments. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The backlog has been drained from conf-zenuml-prod (8.5M page_viewed rows; 9.89->3.49 GB). The nightly cron retention added in this PR handles ongoing bounding, so the one-shot script is no longer needed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Why
conf-zenuml-prodD1 hit 9.89 GB / 10 GB (98.9%) — the 10 GB cap is a hard limit, not a billing threshold; writes fail at the wall.AnalyticsEventFactwas ~99.9% of the DB (9.67M rows, only 49 days). ~88% were deadpage_viewedtelemetry that stopped being written ~Jun 22. Root cause: nothing pruned this table — the dailycron-aggregateonly purgedUserBehaviorEvent(now empty), andpurgeAnalyticsFactRetention()was wired only to a manual API endpoint (default 90d → deletes 0 at 49 days).What this PR does
cron-aggregate— adds bounded, batched age-based retention forAnalyticsEventFact(default 45 days, tunable viaANALYTICS_FACT_RETENTION_DAYS). Bounds growth going forward. Batched via id-subquery (D1's SQLite has noDELETE ... LIMIT) and capped at 40×50k rows/run so the cron stays within limits and any backlog drains over a few nights.Backlog already drained (2026-06-27)
The 8.5M dead
page_viewedrows were removed out-of-band against prod before this merge:page_viewedrowsAnalyticsEventFacttotalD1 reclaimed the space immediately — no
VACUUM/support ticket needed. Storage is now under the 5 GB free tier ($0). The one-off drain script has been removed; the remaining 1.13M rows arepage_updated(~34k/day), which this PR's cron retention keeps bounded.Rollout
Merge →
pnpm --filter cron-aggregate deploy:prod. That's it — the nightly cron then keepsAnalyticsEventFactwithin 45 days.🤖 Generated with Claude Code