N+1 SQL round-trips in ancestry validation during concept_ground (find_standard_paths / find_standard_concepts)

### Motivation

Expensive round trips to the OMOP CDM DB cause slow retrieval of data, that can be optimised:

- `find_standard_concepts()` (`omop_graph/src/omop_graph/reasoning/grounding.py`) loops over every anchor  concept ID (`for parent in parent_ids: find_standard_paths(kg, candidate, target=parent, ...)`), e.g. 21
  anchors for the Observation domain.
- `find_standard_paths()` (`omop_graph/src/omop_graph/graph/paths.py`), during BFS graph traversal, calls
  `kg.get_potential_ancestor(child_id=subject_node.concept_id, parent_id=target)` once per standard concept
  node it encounters.
- `get_potential_ancestor()` (`omop_graph/src/omop_graph/graph/kg.py`) opens a session and runs a single-pair point lookup (`q_concept_potential_ancestor` in `omop_graph/src/omop_graph/graph/queries.py`)
    - PK-indexed and fast on its own (~0.5ms), but called once per `(candidate, parent_id)` pair.

Volume: N candidates × ~10 standard concepts per BFS × 21 parent_ids ≈ up to **2,100 SQL round trips** per
`concept_ground` call. At ~0.5ms each that's already ~1s in pure SQL time, plus per-round-trip Python/network
overhead, accounting for the bulk of the observed 5-6s.

This was previously masked by a [much larger bottleneck in `omop-alchemy`](https://github.com/AustralianCancerDataNetwork/OMOP_Alchemy/pull/22).

### Pitch

Add a batched ancestor-check method so each candidate makes one round trip instead of up to 21:

```python
# omop_graph/src/omop_graph/graph/kg.py
def get_potential_ancestors_batch(
    self, child_id: int, parent_ids: tuple[int, ...]
) -> set[int]:
    """Return the subset of parent_ids that are ancestors of child_id."""
    ...
```

```python
# omop_graph/src/omop_graph/graph/queries.py
def q_concept_potential_ancestors_batch(
    child_id: int, parent_ids: tuple[int, ...]
) -> Select:
    return select(Concept_Ancestor.ancestor_concept_id).where(
        and_(
            Concept_Ancestor.descendant_concept_id == child_id,
            Concept_Ancestor.ancestor_concept_id.in_(parent_ids),
            Concept_Ancestor.min_levels_of_separation > 0,
        )
    )
```

Update `find_standard_paths()` to accept the full set of `parent_ids` and use the batched result instead of
checking one `target` at a time, and update `find_standard_concepts()` to pass all anchors through in a
single call rather than looping and calling `find_standard_paths()` per anchor.

Expected improvement: ~2,100 round trips → ~N (one per candidate). Embedding tier: ~5-6s → ~0.5-1s.

### Alternatives

- **Pre-materialised descendant sets per anchor:** at grounding startup, load the full descendant
  `concept_id` set for each anchor into memory once; replace the SQL ancestry check with an O(1) set
  membership test. Avoids per-request round trips entirely, at the cost of a one-time startup load
  (~1-2s per anchor group) and higher memory use. Worth it if anchor sets are stable across requests.
- **Push ancestry into the SQL query itself:** already done for the ExactLabel/ExactSynonym resolvers (the
  functional-index fix joins directly against `concept_ancestor` in one query). The FTS resolver could do
  the same, but its candidate set comes from tsvector hits rather than a name-equality filter, so the
  ancestry check has to happen after the fact. This may be  tolerable today only because FTS result counts are small
  (<20).

Recommend the batched-query approach (Pitch) first: smallest change, same code structure, no caching
invalidation concerns.

### Additional context

- Profiling source: 2026-06-15 grounding pipeline investigation. Full `concept_ground` timing before any
  fixes: 
    - ExactLabelResolver+ExactSynonymResolver: 26,142ms
    - FullTextResolver+FullTextSynonymResolver: 211ms
    - EmbeddingResolver: 886ms
    - PartialLabelResolver+PartialSynonymResolver: 3,337ms 
    - Total: 30,591ms


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

N+1 SQL round-trips in ancestry validation during concept_ground (find_standard_paths / find_standard_concepts) #24

Motivation

Pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

N+1 SQL round-trips in ancestry validation during concept_ground (find_standard_paths / find_standard_concepts) #24

Description

Motivation

Pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions