Skip to content

Add grants-explorer CLI: natural-language SQL over Finnish grants#16

Merged
valuecodes merged 5 commits into
mainfrom
grants-explorer
May 31, 2026
Merged

Add grants-explorer CLI: natural-language SQL over Finnish grants#16
valuecodes merged 5 commits into
mainfrom
grants-explorer

Conversation

@valuecodes

Copy link
Copy Markdown
Owner

What

  • Adds a new CLI: ask natural-language questions about Finnish grant decisions. An OpenAI agent answers via a read-only query_grants SQL tool over an in-memory SQLite DB loaded from the Tutkihallintoa Power BI export.
  • Spans every Sektoriluokitus (institutional sectors S11–S15 + 6-digit sub-codes + sentinel buckets for null/missing). Per-sector xlsx files are scraped via Playwright (keyboard-nav discovery of the slicer listbox), cached under tmp/grants-explorer/paatokset/, merged into tmp/grants-explorer/grants.json for downstream tools (jq/duckdb/pandas).
  • Recipient Y-tunnus is extracted into an indexed recipient_business_id column so per-legal-entity aggregation is exact, not substring-based. --refetch flag for force-redownload; resume-friendly per-sector caching.

How to test

pnpm typecheck
pnpm lint
pnpm test -- --run src/cli/grants-explorer/
Recommended: pnpm run:grants-explorer --refetch
Recommended: pnpm run:grants-explorer

Security review

  • Secrets / env vars: depends on OPENAI_API_KEY loaded via dotenv/config; never logged.
  • Network / API calls: new outbound to tutkihallintoa.fi (Power BI embed) via Playwright and to the OpenAI API. No other endpoints.
  • Data handling / PII: dataset is public grant decisions including recipient names and Y-tunnus (public Finnish business IDs). Artifacts written under tmp/ only.

Copilot AI review requested due to automatic review settings May 31, 2026 08:03

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea02869049

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/grants-explorer/main.ts Outdated
Comment thread .claude/settings.json Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new grants-explorer CLI that (1) downloads Finnish grant-decision exports per Sektoriluokitus from the Tutkihallintoa Power BI report, (2) loads/normalizes the data into an in-memory SQLite database, and (3) answers user questions via an OpenAI agent using a read-only SQL tool.

Changes:

  • Introduces Playwright-based multi-sector XLSX download + validated manifest/cache semantics.
  • Adds XLSX loading/normalization pipeline (dates/amounts/EU funding), SQLite schema + indexing, and a query_grants agent tool.
  • Adds utilities + schemas for sector parsing and Y-tunnus extraction; adds comprehensive unit/integration tests and documentation.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/cli/grants-explorer/utils/sector-parser.ts Parses slicer option text into normalized sector codes (incl. sentinel buckets).
src/cli/grants-explorer/utils/sector-parser.test.ts Tests sector option parsing including NBSP and invalid inputs.
src/cli/grants-explorer/utils/business-id.ts Extracts Finnish business ID (Y-tunnus) from recipient strings.
src/cli/grants-explorer/utils/business-id.test.ts Tests trailing-only Y-tunnus extraction and null/invalid cases.
src/cli/grants-explorer/types/schemas.ts Defines CLI args, agent output schema, sector manifest, and GrantRow validation schema.
src/cli/grants-explorer/types/schemas.test.ts Tests CLI arg schema behavior (presence-only --refetch, --dir).
src/cli/grants-explorer/tools/sql-tool.ts Adds read-only SQL tool with basic query validation for the agent.
src/cli/grants-explorer/should-refetch.ts Pure decision helper for refetch vs cached load.
src/cli/grants-explorer/should-refetch.test.ts Tests shouldRefetch decision table.
src/cli/grants-explorer/README.md Documents CLI usage, cache layout, schema, and downloader/loader behavior.
src/cli/grants-explorer/main.ts Wires together download → load → DB insert → combined JSON write → agent Q&A loop.
src/cli/grants-explorer/constants.ts Centralizes agent name/model and cache/default paths.
src/cli/grants-explorer/clients/xlsx-loader.ts Loads XLSX rows, normalizes cells, validates with Zod, and tags rows by sector.
src/cli/grants-explorer/clients/xlsx-loader.test.ts Unit + integration tests for normalizers and loader wiring (incl. business_id extraction).
src/cli/grants-explorer/clients/xlsx-downloader.ts Implements Power BI automation: sector discovery, per-sector export, cache/manifest write, reconciliation.
src/cli/grants-explorer/clients/xlsx-downloader.test.ts Mocks Playwright and tests downloader behavior (manifest, caching, failure handling).
src/cli/grants-explorer/clients/manifest.ts Reads/writes and validates sectors.json manifest.
src/cli/grants-explorer/clients/manifest.test.ts Tests manifest read/write validation failures and round-trip.
src/cli/grants-explorer/clients/database.ts In-memory SQLite schema, indexes, insert/query helpers.
src/cli/grants-explorer/clients/database.test.ts Tests inserts, null handling in aggregates, filters, and constraints/indexed columns.
src/cli/grants-explorer/clients/combined-writer.ts Atomically writes combined grants.json derived artifact.
src/cli/grants-explorer/clients/combined-writer.test.ts Tests round-trip shape, null preservation, and temp-file cleanup semantics.
package.json Adds run:grants-explorer script and adds xlsx dependency.
pnpm-lock.yaml Locks xlsx@0.18.5 and its transitive dependencies.
.claude/settings.json Modifies command allow/deny lists (adds gh pr create to allow).
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/cli/grants-explorer/clients/xlsx-downloader.ts Outdated
Comment thread src/cli/grants-explorer/tools/sql-tool.ts
Comment thread src/cli/grants-explorer/main.ts
Comment thread .claude/settings.json Outdated
@valuecodes valuecodes merged commit 01f8d20 into main May 31, 2026
5 checks passed
@valuecodes valuecodes deleted the grants-explorer branch May 31, 2026 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants