Skip to content

fix(graph): IMPORTS edges keyed by local_name across buffer, store and dump#814

Open
DeusData wants to merge 2 commits into
mainfrom
distill/770-imports-dedup
Open

fix(graph): IMPORTS edges keyed by local_name across buffer, store and dump#814
DeusData wants to merge 2 commits into
mainfrom
distill/770-imports-dedup

Conversation

@DeusData

@DeusData DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

fix(graph): IMPORTS edges keyed by local_name across buffer, store and dump

Distills #770 (author: Alexandros Pappas, @apappas1129 — credited via
Co-authored-by) and completes the edge-uniqueness contract that the PR
changed in only one of its three homes.

What was broken

A single import { A, B } from './lib' produced exactly one IMPORTS edge.
The graph buffer dedups edges on (source_id, target_id, type), and both
named imports resolve to the same (source_file, target_file) pair — so the
second symbol's edge collided with the first and merge-replaced its
properties, silently dropping whichever import lost the write race.

Blast radius is bigger than "who imports X" queries: pass_calls.c,
pass_usages.c, pass_semantic.c and pass_lsp_cross.c all parse exactly
one local_name out of each IMPORTS edge to build their local-name →
module-QN maps, so the dropped symbol's cross-file call resolution broke
too.

The three-site contract

Edge uniqueness is defined in three places that must agree. #770 fixed only
the first; as-is it ships databases that violate their own UNIQUE constraint
(PRAGMA integrity_check: non-unique entry in index sqlite_autoindex_edges_1).

Site Old contract New contract
src/graph_buffer/graph_buffer.c make_edge_key() (dedup hash key; 3 call-sites: insert, unindex, delete-by-type) src:tgt:type src:tgt:type:local_name for IMPORTS only
src/store/store.c edges DDL + cbm_store_insert_edge upsert UNIQUE(source_id,target_id,type), matching ON CONFLICT + local_name_gen VIRTUAL generated column (CASE WHEN type='IMPORTS' THEN coalesce(json_extract(properties,'$.local_name'),'') ELSE '' END), UNIQUE(...,local_name_gen), conflict target widened to match
internal/cbm/sqlite_writer.c (raw dump: DDL text + hand-built sqlite_autoindex_edges_1 comparator + entry builder) 3-column autoindex 4-column DDL/comparator/entries; CBMDumpEdge.local_name extracted via yyjson so entries match json_extract's unescaped values (same precedent as idx_edges_url_path)

Details:

  • IMPORTS-only discriminator. Non-IMPORTS edges keep their existing
    collapse-repeats dedup; their local_name_gen is '' (NOT NULL —
    NULLs never conflict in a UNIQUE index, which would have broken
    non-IMPORTS dedup entirely).
  • Audit nit from fix(graph): key IMPORTS edge dedup on local_name, not just source/target #770 addressed (EDGE_KEY_BUF truncation):
    buffer bumped CBM_SZ_128 → CBM_SZ_256, and a local_name too long for
    the buffer is re-keyed with an FNV-1a-64 hash of the FULL name instead
    of being silently truncated — two long names sharing a prefix can no
    longer collide back into one edge. The hash key is prefixed with byte
    0x01, which cannot occur in a raw JSON slice, so hash keys can never
    collide with verbatim keys.
  • Idempotency preserved: re-inserting the same
    (src, tgt, IMPORTS, local_name) still dedups in the buffer and still
    upserts (json_patch) in the store.

Schema migration

There is no in-place migration mechanism for SQLite table constraints
(ALTER TABLE cannot touch a table-level UNIQUE), and the codebase's
established pattern for incompatible DBs is fail open → rebuild:

  • init_schema now probes for edges.local_name_gen and fails the open on
    pre-existing old-schema DBs (store.schema result=incompatible).
    try_incremental_or_delete_db already treats an unopenable DB as
    incompatible and deletes + rebuilds it on the next full index;
    artifact import refuses and falls back to a reindex.
  • Read-only query opens (cbm_store_open_path_query) skip init_schema
    and keep working on old DBs — only writes force the rebuild.
  • CBM_ARTIFACT_SCHEMA_VERSION 1 → 2 (the header's documented "increment
    when DB schema changes" gate): old binaries refuse artifacts carrying the
    widened schema — their 3-column ON CONFLICT target can no longer prepare
    against it.

Tests (reproduce-first)

All new tests were written first and run against unmodified main
(red evidence verbatim below), then against the intermediate state
(gbuf fix only = #770's scope), then the full fix.

Test (tier) main (unfixed) gbuf fix only (#770 as-is) full fix
gbuf_imports_multi_symbol_dedup (graph_buffer, from #770) FAIL tests/test_graph_buffer.c:192: eid_a == eid_b (both 3) PASS PASS
gbuf_imports_long_local_name_no_collision (graph_buffer, new) FAIL tests/test_graph_buffer.c:244: eid_a == eid_b (both 3) PASS PASS
store_imports_edge_local_name_coexist (store_edges) FAIL tests/test_store_edges.c:104: ida == idb (both 1) same FAIL PASS
sw_imports_local_name_unique (sqlite_writer) FAIL tests/test_sqlite_writer.c:185: "non-unique entry in index sqlite_autoindex_edges_1" != "ok" ¹ same FAIL PASS
pipeline_imports_multi_symbol_edges (pipeline, end-to-end TS fixture) FAIL tests/test_pipeline.c:1739: count == 1, expected 2 == 2 FAIL tests/test_pipeline.c:1727: "non-unique entry in index sqlite_autoindex_edges_1" != "ok" PASS

¹ Run with the .local_name initializers commented out, since the field
does not exist on main (the struct gains it in this change).

The bolded intermediate-state failure is the guard proving the schema half
matters: with only #770's buffer fix, the real pipeline dumps both edges into
a DB whose own UNIQUE(source_id, target_id, type) they violate.

Verification

  • make -f Makefile.cbm cbm-Werror clean
  • make -f Makefile.cbm lint-ci — cppcheck + clang-format + NOLINT check passed
  • Suites green (0 failures): graph_buffer, store_edges, sqlite_writer
    (82), pipeline (210), and the wider affected set store_nodes,
    store_search, store_arch, store_bulk, store_pragmas,
    store_checkpoint, artifact, dump_verify, dump_verify_io, cypher,
    mcp, incremental, integration, edge_imports, edge_structural,
    watcher, cli (924)

Closes #768.

DeusData and others added 2 commits July 3, 2026 20:16
…d dump

Distills PR #770 (graph-buffer dedup key) and completes the edge-uniqueness
contract it left partial: the contract lives in THREE places, and changing
only the buffer ships databases that violate their own UNIQUE constraint
(PRAGMA integrity_check: non-unique entry in sqlite_autoindex_edges_1).

A single 'import { A, B } from ./lib' produced ONE IMPORTS edge: the graph
buffer dedups edges on (source_id, target_id, type) and merge-replaces
properties on collision, so the second symbol silently overwrote the first.
pass_calls.c, pass_usages.c, pass_semantic.c and pass_lsp_cross.c parse one
local_name per IMPORTS edge for cross-file resolution, so the dropped
symbol's calls failed to resolve too — not just "who imports X" queries.

Changes, kept in sync across all three sites:

- graph_buffer.c (from #770): make_edge_key() folds local_name into the
  dedup key for IMPORTS edges only; all three key call-sites updated.
  Hardened beyond #770: EDGE_KEY_BUF bumped to 256 and oversized
  local_names are re-keyed with an FNV-1a hash of the full name instead
  of being silently truncated (two long names sharing a prefix must not
  collide back into one edge).
- store.c: edges gains local_name_gen, a VIRTUAL generated column
  (IMPORTS -> coalesce(json_extract(properties,'$.local_name'),''),
  else '' — NOT NULL because NULLs never conflict in a UNIQUE index);
  uniqueness widened to UNIQUE(source_id, target_id, type,
  local_name_gen) and the insert upsert's conflict target matches.
  init_schema probes pre-#768 DBs (no local_name_gen) and fails the
  open: SQLite cannot ALTER a table constraint in place, and an
  unopenable DB already takes the existing repair path — full index
  deletes + rebuilds, artifact import refuses and falls back to a
  reindex. Read-only query opens skip init_schema and keep working.
- sqlite_writer.c/.h: dump DDL matches the widened schema; the hand-built
  sqlite_autoindex_edges_1 comparator and entry builder include the
  local_name column; CBMDumpEdge carries local_name extracted via real
  JSON parsing (yyjson) so index entries match json_extract's unescaped
  values exactly, per the idx_edges_url_path precedent.
- artifact.h: CBM_ARTIFACT_SCHEMA_VERSION 1 -> 2 so old binaries refuse
  artifacts carrying the widened schema (their 3-column conflict target
  can no longer prepare against it).

Tests (reproduce-first, all red on the unfixed code):
- gbuf tier: multi-symbol import -> 2 IMPORTS edges; long-local_name
  truncation guard.
- store tier: distinct local_name coexists as 2 rows, same local_name
  still upserts, non-IMPORTS dedup unchanged.
- writer tier: dumped DB passes integrity_check with 2 sibling imports
  and exposes matching local_name_gen values.
- end-to-end: TS fixture through the real pipeline -> 2 queryable
  IMPORTS edges AND integrity_check ok; with only the buffer half of
  the fix this test still fails on integrity_check, proving the schema
  half is required.

Closes #768.

Co-authored-by: Alexandros Pappas <11921291+apappas1129@users.noreply.github.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
@DeusData DeusData enabled auto-merge July 3, 2026 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple named imports from one specifier produce only a single IMPORTS edge — sibling symbols are invisible to the graph

1 participant