Skip to content

Preserve CTD taxon metadata#413

Open
cbizon wants to merge 2 commits into
masterfrom
taxon
Open

Preserve CTD taxon metadata#413
cbizon wants to merge 2 commits into
masterfrom
taxon

Conversation

@cbizon

@cbizon cbizon commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Enable Babel taxon retrieval during normalization and use the top-level NodeNorm taxa field for node taxon.
  • Remove CTD gene node NCBITaxon properties and emit CTD row-level taxa as edge taxon CURIEs.
  • Preserve CTD row-level taxa during graph merges by adding edge_merging_attributes: [taxon] to CTD-containing graph specs and downstream subgraph merge specs.

Validation

  • uv run --with beautifulsoup4 pytest tests/test_graph_spec.py tests/test_merging.py tests/test_normalization.py tests/test_kgx_file_normalizer.py tests/test_ctd_loader.py -> 70 passed.
  • Rebuilt CTD KGX graph with taxon-aware merging: build status stable, QC passed, 26,491 nodes, 186,376 edges.
  • Rebuilt CTD output has 10,659 node taxon properties, 83,806 edge taxon properties, zero NCBITaxon/taxonid node or edge properties, and zero invalid taxon CURIE values.
  • Taxon-aware merge check found zero source merge groups with multiple distinct taxa.

Notes

  • A local Neo4j-output build reached KGX build and QC but failed afterward because neo4j-admin is not installed locally; KGX validation completed successfully.

cbizon added 2 commits June 17, 2026 14:46
- Default NodeNorm taxa retrieval on so normalized nodes get Babel taxon metadata.\n- Read taxa from the top-level NodeNorm response field.\n- Keep CTD row taxon as edge context and remove the conflicting NCBITaxon node property.
@cbizon cbizon requested a review from EvanDietzMorris June 17, 2026 20:43
@cbizon

cbizon commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Fixes #412

@github-actions github-actions Bot added the Biological Context QC Require validation of biological context to ensure accuracy and consistency label Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Biological Context QC Require validation of biological context to ensure accuracy and consistency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant