Skip to content

Reclassify loose/historical/lossy relationship types out of Identity in predicate_mapping.csv #25

Description

@nicoloesch

Is there an existing issue for this?

  • I have searched the existing issues

Bug summary

GroundingConstraints trusts PredicateKind.IDENTITY edges by default for hierarchy-anchor pathfinding (find_standard_paths), treating them as safe "same concept" hops. Several relationship_ids in config/predicate_mapping.csv were classified under Identity despite being loose, historical, exception, or class-to-instance relations rather than true 1:1 equivalences. This let find_standard_paths walk through unrelated concepts and surface spuriously perfect (relevance == 1.0) matches with no real relationship to the query.

Root cause

config/predicate_mapping.csv defines, per OMOP relationship_id, a (class, subclass) pair loaded into relationship_mapping and used as the predicate_kind filter. A full audit of all 735 rows (all five classes) found 27 rows misclassified under Identity (or left entirely unclassified) that should not be trusted as identity hops:

  • SNOMED historical-association family: Concept poss_eq from/to (possibly_equivalent_to), Concept was_a from/to (was_a), Concept alt_to from/to (alternative_to). These are SNOMED's own ambiguous/fallback inactivation relations, explicitly documented as requiring human judgment to resolve and are not safe for automated traversal.
  • Cross-vocabulary class-to-instance mappings: OPCS4 - SNOMED/SNOMED - OPCS4 (own name says "equivalent or categorical"), ATC - RxNorm/RxNorm - ATC, ETC - RxNorm, ICD9CM - MedDRA/MedDRA - ICD9CM, ICDO - SNOMED/SNOMED - ICDO. These connect a class/category-level concept to a specific instance (e.g. one row's own example shows a 7-ingredient combination product mapping to the ATC class of just one of its ingredients).
  • Structural/substance-variant relations mislabeled as lifecycle events: Modification of/Has modification, Reformulated in/Reformulation of. These are salt/ester/formulation variants, not deprecated-concept links.
  • OMOP-internal mapping-exception bookkeeping: Excluded in map from/Included in map from/Map excludes child/Map includes child. These describe exceptions to a mapping rule, not equivalence between two concepts.
  • A grouping construct, not an equivalence: Proc Schema to ICDO. Its own example links a cancer-registry anatomic schema to one specific malignancy occurring at that site.
  • Previously unclassified rows: Reformulated in/Reformulation of, Has inherent, Has transformation/Transformation of were blank (class/subclass), flagged "left for manual review," and silently dropped from relationship_mapping entirely. Classified for the first time, by precedent from sibling rows already in the file.

Fix

  • Reclassify the 27 affected rows in config/predicate_mapping.csv to honest, lower-trust classes (Association,Administrative, Association,Mapping, Hierarchy,Taxonomic – up, Composition,Structural, Composition,Logical, Attribute,Context, Attribute,Pharmaceutical).
  • Add one new subclass, Association,Mapping, for "cross-vocabulary or historical mapping without a guaranteed 1:1 correspondence". There isn't an existing subclass outside Identity that fit this shape.

Code for reproduction

## Evidence

> Query *"Galactosaemia"*target *Galactosemia* (439788). At a broader hierarchy anchor (level 3), the top-scoring result became **Cystinuria** (4225406), a completely unrelated amino-acid transport disorder, with `relevance == 1.0` and no resolver attribution.
>
> Traced path: `Galactosemia (439788) -[possibly_equivalent_to]-> 40351277 (a retired SNOMED "disjunction" concept literally named as a list of ~6 unrelated diseases) -[possibly_equivalent_to]-> Cystinuria (4225406)`.

`possibly_equivalent_to` is SNOMED's documented many-to-many fallback for retired concepts with no clean 1:1 successor. This is the opposite of "Exact equivalence," which is how it was classified.

Error messages

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions