Add grants-explorer CLI: natural-language SQL over Finnish grants#16
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea02869049
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds a new grants-explorer CLI that (1) downloads Finnish grant-decision exports per Sektoriluokitus from the Tutkihallintoa Power BI report, (2) loads/normalizes the data into an in-memory SQLite database, and (3) answers user questions via an OpenAI agent using a read-only SQL tool.
Changes:
- Introduces Playwright-based multi-sector XLSX download + validated manifest/cache semantics.
- Adds XLSX loading/normalization pipeline (dates/amounts/EU funding), SQLite schema + indexing, and a
query_grantsagent tool. - Adds utilities + schemas for sector parsing and Y-tunnus extraction; adds comprehensive unit/integration tests and documentation.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/cli/grants-explorer/utils/sector-parser.ts | Parses slicer option text into normalized sector codes (incl. sentinel buckets). |
| src/cli/grants-explorer/utils/sector-parser.test.ts | Tests sector option parsing including NBSP and invalid inputs. |
| src/cli/grants-explorer/utils/business-id.ts | Extracts Finnish business ID (Y-tunnus) from recipient strings. |
| src/cli/grants-explorer/utils/business-id.test.ts | Tests trailing-only Y-tunnus extraction and null/invalid cases. |
| src/cli/grants-explorer/types/schemas.ts | Defines CLI args, agent output schema, sector manifest, and GrantRow validation schema. |
| src/cli/grants-explorer/types/schemas.test.ts | Tests CLI arg schema behavior (presence-only --refetch, --dir). |
| src/cli/grants-explorer/tools/sql-tool.ts | Adds read-only SQL tool with basic query validation for the agent. |
| src/cli/grants-explorer/should-refetch.ts | Pure decision helper for refetch vs cached load. |
| src/cli/grants-explorer/should-refetch.test.ts | Tests shouldRefetch decision table. |
| src/cli/grants-explorer/README.md | Documents CLI usage, cache layout, schema, and downloader/loader behavior. |
| src/cli/grants-explorer/main.ts | Wires together download → load → DB insert → combined JSON write → agent Q&A loop. |
| src/cli/grants-explorer/constants.ts | Centralizes agent name/model and cache/default paths. |
| src/cli/grants-explorer/clients/xlsx-loader.ts | Loads XLSX rows, normalizes cells, validates with Zod, and tags rows by sector. |
| src/cli/grants-explorer/clients/xlsx-loader.test.ts | Unit + integration tests for normalizers and loader wiring (incl. business_id extraction). |
| src/cli/grants-explorer/clients/xlsx-downloader.ts | Implements Power BI automation: sector discovery, per-sector export, cache/manifest write, reconciliation. |
| src/cli/grants-explorer/clients/xlsx-downloader.test.ts | Mocks Playwright and tests downloader behavior (manifest, caching, failure handling). |
| src/cli/grants-explorer/clients/manifest.ts | Reads/writes and validates sectors.json manifest. |
| src/cli/grants-explorer/clients/manifest.test.ts | Tests manifest read/write validation failures and round-trip. |
| src/cli/grants-explorer/clients/database.ts | In-memory SQLite schema, indexes, insert/query helpers. |
| src/cli/grants-explorer/clients/database.test.ts | Tests inserts, null handling in aggregates, filters, and constraints/indexed columns. |
| src/cli/grants-explorer/clients/combined-writer.ts | Atomically writes combined grants.json derived artifact. |
| src/cli/grants-explorer/clients/combined-writer.test.ts | Tests round-trip shape, null preservation, and temp-file cleanup semantics. |
| package.json | Adds run:grants-explorer script and adds xlsx dependency. |
| pnpm-lock.yaml | Locks xlsx@0.18.5 and its transitive dependencies. |
| .claude/settings.json | Modifies command allow/deny lists (adds gh pr create to allow). |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What
query_grantsSQL tool over an in-memory SQLite DB loaded from the Tutkihallintoa Power BI export.tmp/grants-explorer/paatokset/, merged intotmp/grants-explorer/grants.jsonfor downstream tools (jq/duckdb/pandas).recipient_business_idcolumn so per-legal-entity aggregation is exact, not substring-based.--refetchflag for force-redownload; resume-friendly per-sector caching.How to test
pnpm typecheck
pnpm lint
pnpm test -- --run src/cli/grants-explorer/
Recommended: pnpm run:grants-explorer --refetch
Recommended: pnpm run:grants-explorer
Security review
OPENAI_API_KEYloaded viadotenv/config; never logged.tutkihallintoa.fi(Power BI embed) via Playwright and to the OpenAI API. No other endpoints.tmp/only.