Skip to content

[codex] Implement KPI dataset v2 instrumentation#334

Open
lewtun wants to merge 2 commits into
mainfrom
codex/kpi-v2-instrumentation
Open

[codex] Implement KPI dataset v2 instrumentation#334
lewtun wants to merge 2 commits into
mainfrom
codex/kpi-v2-instrumentation

Conversation

@lewtun

@lewtun lewtun commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

Implements the KPI dataset v2 plan for ML Intern analytics.

  • Replaces the KPI builder with v2 outputs under v2/, including session facts plus hourly, daily, and monthly rollups.
  • Requires salted HMAC hashing via KPI_USER_HASH_SALT and avoids writing raw user IDs, session IDs, or Hub artifact repo IDs.
  • Uses usage_metrics as the billing source instead of legacy llm_call.cost_usd semantics.
  • Adds user plan capture, sanitized Hub artifact telemetry, HF job cancellation telemetry, expected HF job artifact handling, and Trackio / upload / repo creation artifact hooks.
  • Adds focused unit coverage for session facts, rollups, usage metrics billing, plans, artifacts, jobs, telemetry, and session uploads.

Validation

  • uv run --no-sync --with ruff ruff format --check .
  • uv run --no-sync --with ruff ruff check .
  • uv run --no-sync --with pytest --with pytest-asyncio pytest tests/unit/test_build_kpis.py tests/unit/test_telemetry_usage.py tests/unit/test_session_uploader.py tests/unit/test_hub_artifacts.py tests/unit/test_dataset_uploads.py

Result: 54 passed, 1 warning.

Co-authored-by: OpenAI Codex <codex@openai.com>
@lewtun lewtun marked this pull request as ready for review June 18, 2026 19:18
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

Claude finished @lewtun's task in 4m 34s —— View job


KPI dataset v2 instrumentation — review

No blocking issues — 1 P1, 3 P2.

The PII story is sound: salt is mandatory in the offline builder (_require_hash_salt, scripts/build_kpis.py:37-41), best-effort/skip in the live telemetry path (_kpi_hash_identifier returns None when unset, agent/core/telemetry.py:156-164), raw repo ids never leave record_hub_artifact, and user_plan is correctly plumbed end-to-end (whoami → normalize_hf_user_plan → session, only emits pro/free/unknown so VALID_PLANS is complete). Billing correctly sources from usage_metrics/hf_billing rather than llm_call.cost_usd (_usage_components, build_kpis.py:256-276). New telemetry functions are well covered in test_telemetry_usage.py. Job-status dedup by job_id (_job_counts, build_kpis.py:346-372) correctly handles a cancel + late completion racing on the same id.


P1 — Every hourly run re-downloads and re-writes the entire month-to-date of session facts

run_for_hour writes session facts three times per tick: the hourly path (build_kpis.py:948), then _write_daily_rollup (:886), then _write_monthly_rollup (:915). The monthly pass calls _facts_for_start_dates over _month_start_datesall days from the 1st through today (build_kpis.py:859-871) — then _write_session_facts re-uploads one v2/session_facts/<day>.jsonl commit per day (build_kpis.py:757-775).

So on day N of the month, each :05 cron tick re-downloads N days of sessions/<date>/ folders and re-commits N day-partition files, even though only the current day's facts changed. Concrete cost: O(days_in_month) Hub downloads and commits per hour, growing through the month (≈720 commits/day from the monthly pass alone near month-end), plus today+yesterday written 3× per run. huggingface_hub.upload_file commits even when content is byte-identical, so this is real commit-history bloat on the target dataset and a rate-limit risk — and if the rollup starts erroring, kpis_scheduler._run_hour only logs a warning (backend/kpis_scheduler.py:92-93), so KPIs would silently stall.

Suggested fix: write session_facts only in the hourly path (current day), and have _write_daily_rollup/_write_monthly_rollup read facts to compute the CSV rollup without re-uploading the per-session JSONL. Fix this →


P2 — Scheduler silently no-ops when KPI_USER_HASH_SALT is unset

If the salt env var is missing in the backend deployment, run_for_hour raises RuntimeError (build_kpis.py:936_require_hash_salt), which _run_hour swallows as a warning (backend/kpis_scheduler.py:92-93). The result is KPIs that quietly never produce output. Consider an explicit guard/log in kpis_scheduler.start() (alongside the token check) so a misconfiguration is visible at startup rather than once-per-hour in the noise.

P2 — No end-to-end test for the orchestration paths

Coverage is solid for the pure functions (_session_fact, _daily_rollup, _monthly_rollup, _job_counts). But run_for_hour/run_for_day and _write_session_facts — where the redundant-write behavior above lives — have no test with a fake HfApi. A small fake-API test asserting how many upload_file commits a single run_for_hour produces would both lock in the intended write pattern and catch the P1 regression class.

P2 — Multi-day sessions can be undercounted in daily/monthly rollups

_daily_rollup only ingests facts from _daily_start_dates(day) = [day-1, day] start folders (build_kpis.py:855-856), then filters by _fact_active_in_day. A session that started ≥2 days before day but stayed active into it is excluded because its start-date folder is never read. Same shape at the monthly boundary (_month_start_dates reads only one prior day). Agent sessions rarely span multiple days so impact is low, but worth a comment documenting the window assumption.


· branch codex/kpi-v2-instrumentation

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

Claude finished @lewtun's task in 1m 15s —— View job


KPI v2 re-review

No blocking issues — prior findings addressed, no new P0.

This is a re-review of the Address KPI v2 PR feedback commit (97ebb5e) against my earlier review. The previously flagged items are resolved:

  • P1 (redundant session_facts writes) — fixed. _write_daily_rollup (build_kpis.py:893) and _write_monthly_rollup (build_kpis.py:921) no longer re-upload per-session JSONL; they only read facts to compute the CSV rollup. run_for_hour now writes session_facts once, filtered to the current day via _fact_start_day (build_kpis.py:953-959). This collapses the O(days_in_month) commits/hour down to one session_facts upload + three rollup CSVs per tick.
  • P2 (scheduler silent no-op on missing salt) — fixed. kpis_scheduler.start() now hard-fails at startup with a logged error when KPI_USER_HASH_SALT is unset (backend/kpis_scheduler.py:115-117), so a misconfig is visible immediately rather than once-per-hour as a warning.
  • P2 (no end-to-end test) — fixed. test_run_for_hour_uploads_session_facts_only_once_for_current_day (tests/unit/test_build_kpis.py:371-411) uses a FakeHfApi and asserts the exact upload list — one v2/session_facts/2026-06-02.jsonl plus the three rollup CSVs. It includes a midnight-spanning session (previous.jsonl, started 2026-06-01T23:55) and confirms it does not trigger an extra write to the prior-day partition, locking in the intended write pattern. test_start_is_no_op_without_hash_salt covers the scheduler guard.
  • P2 (multi-day session window) — documented. _daily_start_dates/_month_start_dates now carry comments stating the [day-1, day] window and the short-lived-session assumption (build_kpis.py:857-860, :872-873).

One observation, not blocking and not a regression: with run_for_hour now writing only the current day's session_facts, a session that started before midnight and stays active past it won't have its prior-day partition refreshed by today's hourly ticks. The CSV rollups remain correct (they recompute from source each run), so only the persisted per-session JSONL for such a session can go slightly stale — an acceptable tradeoff given the documented short-session assumption, and the new test deliberately encodes this behavior.

Note: I couldn't execute the test suite here (sandbox blocked uv run pytest), so the verification above is from reading the code and tests, not a local run. The PR description reports 54 passed.
· branch codex/kpi-v2-instrumentation

@lewtun

lewtun commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

Addressed the review feedback in 97ebb5e:\n\n- Session facts are now written only once from the hourly path for the current start-date partition; daily/monthly rollup helpers only upload their CSVs.\n- Added a startup guard so the KPI scheduler logs and skips immediately when KPI_USER_HASH_SALT is missing.\n- Added an orchestration test with a fake Hub API asserting the exact run_for_hour upload set.\n- Documented the one-day rollup window assumption for normal midnight-spanning sessions.\n\nValidation:\n- Cached Ruff format/check passed because the configured package index was unavailable during this run.\n- Focused tests passed: 62 passed, 1 warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant