Is there an existing issue for this?
Bug summary
GroundingConstraints trusts PredicateKind.IDENTITY edges by default for hierarchy-anchor pathfinding (find_standard_paths), treating them as safe "same concept" hops. Several relationship_ids in config/predicate_mapping.csv were classified under Identity despite being loose, historical, exception, or class-to-instance relations rather than true 1:1 equivalences. This let find_standard_paths walk through unrelated concepts and surface spuriously perfect (relevance == 1.0) matches with no real relationship to the query.
Root cause
config/predicate_mapping.csv defines, per OMOP relationship_id, a (class, subclass) pair loaded into relationship_mapping and used as the predicate_kind filter. A full audit of all 735 rows (all five classes) found 27 rows misclassified under Identity (or left entirely unclassified) that should not be trusted as identity hops:
- SNOMED historical-association family:
Concept poss_eq from/to (possibly_equivalent_to), Concept was_a from/to (was_a), Concept alt_to from/to (alternative_to). These are SNOMED's own ambiguous/fallback inactivation relations, explicitly documented as requiring human judgment to resolve and are not safe for automated traversal.
- Cross-vocabulary class-to-instance mappings:
OPCS4 - SNOMED/SNOMED - OPCS4 (own name says "equivalent or categorical"), ATC - RxNorm/RxNorm - ATC, ETC - RxNorm, ICD9CM - MedDRA/MedDRA - ICD9CM, ICDO - SNOMED/SNOMED - ICDO. These connect a class/category-level concept to a specific instance (e.g. one row's own example shows a 7-ingredient combination product mapping to the ATC class of just one of its ingredients).
- Structural/substance-variant relations mislabeled as lifecycle events:
Modification of/Has modification, Reformulated in/Reformulation of. These are salt/ester/formulation variants, not deprecated-concept links.
- OMOP-internal mapping-exception bookkeeping:
Excluded in map from/Included in map from/Map excludes child/Map includes child. These describe exceptions to a mapping rule, not equivalence between two concepts.
- A grouping construct, not an equivalence:
Proc Schema to ICDO. Its own example links a cancer-registry anatomic schema to one specific malignancy occurring at that site.
- Previously unclassified rows:
Reformulated in/Reformulation of, Has inherent, Has transformation/Transformation of were blank (class/subclass), flagged "left for manual review," and silently dropped from relationship_mapping entirely. Classified for the first time, by precedent from sibling rows already in the file.
Fix
- Reclassify the 27 affected rows in
config/predicate_mapping.csv to honest, lower-trust classes (Association,Administrative, Association,Mapping, Hierarchy,Taxonomic – up, Composition,Structural, Composition,Logical, Attribute,Context, Attribute,Pharmaceutical).
- Add one new subclass,
Association,Mapping, for "cross-vocabulary or historical mapping without a guaranteed 1:1 correspondence". There isn't an existing subclass outside Identity that fit this shape.
Code for reproduction
## Evidence
> Query *"Galactosaemia"* → target *Galactosemia* (439788). At a broader hierarchy anchor (level 3), the top-scoring result became **Cystinuria** (4225406), a completely unrelated amino-acid transport disorder, with `relevance == 1.0` and no resolver attribution.
>
> Traced path: `Galactosemia (439788) -[possibly_equivalent_to]-> 40351277 (a retired SNOMED "disjunction" concept literally named as a list of ~6 unrelated diseases) -[possibly_equivalent_to]-> Cystinuria (4225406)`.
`possibly_equivalent_to` is SNOMED's documented many-to-many fallback for retired concepts with no clean 1:1 successor. This is the opposite of "Exact equivalence," which is how it was classified.
Error messages
Is there an existing issue for this?
Bug summary
GroundingConstraintstrustsPredicateKind.IDENTITYedges by default for hierarchy-anchor pathfinding (find_standard_paths), treating them as safe "same concept" hops. Severalrelationship_ids inconfig/predicate_mapping.csvwere classified underIdentitydespite being loose, historical, exception, or class-to-instance relations rather than true 1:1 equivalences. This letfind_standard_pathswalk through unrelated concepts and surface spuriously perfect (relevance == 1.0) matches with no real relationship to the query.Root cause
config/predicate_mapping.csvdefines, per OMOPrelationship_id, a(class, subclass)pair loaded intorelationship_mappingand used as thepredicate_kindfilter. A full audit of all 735 rows (all five classes) found 27 rows misclassified underIdentity(or left entirely unclassified) that should not be trusted as identity hops:Concept poss_eq from/to(possibly_equivalent_to),Concept was_a from/to(was_a),Concept alt_to from/to(alternative_to). These are SNOMED's own ambiguous/fallback inactivation relations, explicitly documented as requiring human judgment to resolve and are not safe for automated traversal.OPCS4 - SNOMED/SNOMED - OPCS4(own name says "equivalent or categorical"),ATC - RxNorm/RxNorm - ATC,ETC - RxNorm,ICD9CM - MedDRA/MedDRA - ICD9CM,ICDO - SNOMED/SNOMED - ICDO. These connect a class/category-level concept to a specific instance (e.g. one row's own example shows a 7-ingredient combination product mapping to the ATC class of just one of its ingredients).Modification of/Has modification,Reformulated in/Reformulation of. These are salt/ester/formulation variants, not deprecated-concept links.Excluded in map from/Included in map from/Map excludes child/Map includes child. These describe exceptions to a mapping rule, not equivalence between two concepts.Proc Schema to ICDO. Its own example links a cancer-registry anatomic schema to one specific malignancy occurring at that site.Reformulated in/Reformulation of,Has inherent,Has transformation/Transformation ofwere blank (class/subclass), flagged "left for manual review," and silently dropped fromrelationship_mappingentirely. Classified for the first time, by precedent from sibling rows already in the file.Fix
config/predicate_mapping.csvto honest, lower-trust classes (Association,Administrative,Association,Mapping,Hierarchy,Taxonomic – up,Composition,Structural,Composition,Logical,Attribute,Context,Attribute,Pharmaceutical).Association,Mapping, for "cross-vocabulary or historical mapping without a guaranteed 1:1 correspondence". There isn't an existing subclass outsideIdentitythat fit this shape.Code for reproduction
Error messages