fix(graph): IMPORTS edges keyed by local_name across buffer, store and dump#814
Open
DeusData wants to merge 2 commits into
Open
fix(graph): IMPORTS edges keyed by local_name across buffer, store and dump#814DeusData wants to merge 2 commits into
DeusData wants to merge 2 commits into
Conversation
…d dump Distills PR #770 (graph-buffer dedup key) and completes the edge-uniqueness contract it left partial: the contract lives in THREE places, and changing only the buffer ships databases that violate their own UNIQUE constraint (PRAGMA integrity_check: non-unique entry in sqlite_autoindex_edges_1). A single 'import { A, B } from ./lib' produced ONE IMPORTS edge: the graph buffer dedups edges on (source_id, target_id, type) and merge-replaces properties on collision, so the second symbol silently overwrote the first. pass_calls.c, pass_usages.c, pass_semantic.c and pass_lsp_cross.c parse one local_name per IMPORTS edge for cross-file resolution, so the dropped symbol's calls failed to resolve too — not just "who imports X" queries. Changes, kept in sync across all three sites: - graph_buffer.c (from #770): make_edge_key() folds local_name into the dedup key for IMPORTS edges only; all three key call-sites updated. Hardened beyond #770: EDGE_KEY_BUF bumped to 256 and oversized local_names are re-keyed with an FNV-1a hash of the full name instead of being silently truncated (two long names sharing a prefix must not collide back into one edge). - store.c: edges gains local_name_gen, a VIRTUAL generated column (IMPORTS -> coalesce(json_extract(properties,'$.local_name'),''), else '' — NOT NULL because NULLs never conflict in a UNIQUE index); uniqueness widened to UNIQUE(source_id, target_id, type, local_name_gen) and the insert upsert's conflict target matches. init_schema probes pre-#768 DBs (no local_name_gen) and fails the open: SQLite cannot ALTER a table constraint in place, and an unopenable DB already takes the existing repair path — full index deletes + rebuilds, artifact import refuses and falls back to a reindex. Read-only query opens skip init_schema and keep working. - sqlite_writer.c/.h: dump DDL matches the widened schema; the hand-built sqlite_autoindex_edges_1 comparator and entry builder include the local_name column; CBMDumpEdge carries local_name extracted via real JSON parsing (yyjson) so index entries match json_extract's unescaped values exactly, per the idx_edges_url_path precedent. - artifact.h: CBM_ARTIFACT_SCHEMA_VERSION 1 -> 2 so old binaries refuse artifacts carrying the widened schema (their 3-column conflict target can no longer prepare against it). Tests (reproduce-first, all red on the unfixed code): - gbuf tier: multi-symbol import -> 2 IMPORTS edges; long-local_name truncation guard. - store tier: distinct local_name coexists as 2 rows, same local_name still upserts, non-IMPORTS dedup unchanged. - writer tier: dumped DB passes integrity_check with 2 sibling imports and exposes matching local_name_gen values. - end-to-end: TS fixture through the real pipeline -> 2 queryable IMPORTS edges AND integrity_check ok; with only the buffer half of the fix this test still fails on integrity_check, proving the schema half is required. Closes #768. Co-authored-by: Alexandros Pappas <11921291+apappas1129@users.noreply.github.com> Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(graph): IMPORTS edges keyed by local_name across buffer, store and dump
Distills #770 (author: Alexandros Pappas, @apappas1129 — credited via
Co-authored-by) and completes the edge-uniqueness contract that the PRchanged in only one of its three homes.
What was broken
A single
import { A, B } from './lib'produced exactly one IMPORTS edge.The graph buffer dedups edges on
(source_id, target_id, type), and bothnamed imports resolve to the same
(source_file, target_file)pair — so thesecond symbol's edge collided with the first and merge-replaced its
properties, silently dropping whichever import lost the write race.
Blast radius is bigger than "who imports X" queries:
pass_calls.c,pass_usages.c,pass_semantic.candpass_lsp_cross.call parse exactlyone
local_nameout of each IMPORTS edge to build their local-name →module-QN maps, so the dropped symbol's cross-file call resolution broke
too.
The three-site contract
Edge uniqueness is defined in three places that must agree. #770 fixed only
the first; as-is it ships databases that violate their own UNIQUE constraint
(
PRAGMA integrity_check:non-unique entry in index sqlite_autoindex_edges_1).src/graph_buffer/graph_buffer.cmake_edge_key()(dedup hash key; 3 call-sites: insert, unindex, delete-by-type)src:tgt:typesrc:tgt:type:local_namefor IMPORTS onlysrc/store/store.cedges DDL +cbm_store_insert_edgeupsertUNIQUE(source_id,target_id,type), matchingON CONFLICTlocal_name_genVIRTUAL generated column (CASE WHEN type='IMPORTS' THEN coalesce(json_extract(properties,'$.local_name'),'') ELSE '' END),UNIQUE(...,local_name_gen), conflict target widened to matchinternal/cbm/sqlite_writer.c(raw dump: DDL text + hand-builtsqlite_autoindex_edges_1comparator + entry builder)CBMDumpEdge.local_nameextracted via yyjson so entries matchjson_extract's unescaped values (same precedent asidx_edges_url_path)Details:
collapse-repeats dedup; their
local_name_genis''(NOT NULL —NULLs never conflict in a UNIQUE index, which would have broken
non-IMPORTS dedup entirely).
EDGE_KEY_BUFtruncation):buffer bumped
CBM_SZ_128 → CBM_SZ_256, and a local_name too long forthe buffer is re-keyed with an FNV-1a-64 hash of the FULL name instead
of being silently truncated — two long names sharing a prefix can no
longer collide back into one edge. The hash key is prefixed with byte
0x01, which cannot occur in a raw JSON slice, so hash keys can nevercollide with verbatim keys.
(src, tgt, IMPORTS, local_name)still dedups in the buffer and stillupserts (
json_patch) in the store.Schema migration
There is no in-place migration mechanism for SQLite table constraints
(
ALTER TABLEcannot touch a table-level UNIQUE), and the codebase'sestablished pattern for incompatible DBs is fail open → rebuild:
init_schemanow probes foredges.local_name_genand fails the open onpre-existing old-schema DBs (
store.schema result=incompatible).try_incremental_or_delete_dbalready treats an unopenable DB asincompatible and deletes + rebuilds it on the next full index;
artifact import refuses and falls back to a reindex.
cbm_store_open_path_query) skipinit_schemaand keep working on old DBs — only writes force the rebuild.
CBM_ARTIFACT_SCHEMA_VERSION 1 → 2(the header's documented "incrementwhen DB schema changes" gate): old binaries refuse artifacts carrying the
widened schema — their 3-column
ON CONFLICTtarget can no longer prepareagainst it.
Tests (reproduce-first)
All new tests were written first and run against unmodified
main(red evidence verbatim below), then against the intermediate state
(gbuf fix only = #770's scope), then the full fix.
gbuf_imports_multi_symbol_dedup(graph_buffer, from #770)FAIL tests/test_graph_buffer.c:192: eid_a == eid_b (both 3)gbuf_imports_long_local_name_no_collision(graph_buffer, new)FAIL tests/test_graph_buffer.c:244: eid_a == eid_b (both 3)store_imports_edge_local_name_coexist(store_edges)FAIL tests/test_store_edges.c:104: ida == idb (both 1)sw_imports_local_name_unique(sqlite_writer)FAIL tests/test_sqlite_writer.c:185: "non-unique entry in index sqlite_autoindex_edges_1" != "ok"¹pipeline_imports_multi_symbol_edges(pipeline, end-to-end TS fixture)FAIL tests/test_pipeline.c:1739: count == 1, expected 2 == 2FAIL tests/test_pipeline.c:1727: "non-unique entry in index sqlite_autoindex_edges_1" != "ok"¹ Run with the
.local_nameinitializers commented out, since the fielddoes not exist on main (the struct gains it in this change).
The bolded intermediate-state failure is the guard proving the schema half
matters: with only #770's buffer fix, the real pipeline dumps both edges into
a DB whose own
UNIQUE(source_id, target_id, type)they violate.Verification
make -f Makefile.cbm cbm—-Werrorcleanmake -f Makefile.cbm lint-ci— cppcheck + clang-format + NOLINT check passedgraph_buffer,store_edges,sqlite_writer(82),
pipeline(210), and the wider affected setstore_nodes,store_search,store_arch,store_bulk,store_pragmas,store_checkpoint,artifact,dump_verify,dump_verify_io,cypher,mcp,incremental,integration,edge_imports,edge_structural,watcher,cli(924)Closes #768.