Skip to content

QuestEA: complementary approach with patient-level embeddings instead of item-level matching #135

@thiswillbeyourgithub

Description

@thiswillbeyourgithub

Hi Harmony team,

Quick context: I pitched my project to Dr Astrid Chevance who actually talked to me about Harmony. I'm Olivier Cornelis, French psychiatry resident in Paris and data scientist (my bio). I've been working solo on a related problem from a different angle and wanted to share the idea. Also, both are open source, which I'm a big fan of.

Harmony matches items across instruments in a shared semantic space. My approach (QuestEA, for Questionnaire Embedding Analysis) matches individual patients: one patient becomes one vector. The math is naively simple: weighting each question's embedding by that patient's answer to it then normalizing it. The repository is on github too: QuestEA and includes the entire documentation as well as variants of the formula that I explored.

The reason I think patient embeddings are useful: QuestEA projects patients into a shared latent space regardless of which subset of instruments each cohort administered (a similar question about sleep quality has the same coordinates across multiple depression surveys), allowing to pool smaller cohorts together (e.g.: two cohorts on identic outcomes filled different but similar depression questionnaires). Of course this does not have the same a priori validity but I believe it can still provide insightful results. Ultimately the general idea also opens the door to integrating other modalities (clinical notes, ICD codes, imaging reports) since everything ends up in the model's latent space. This connects directly to the heterogeneity problem Veal et al. raised in their 2023 Lancet Psychiatry paper on outcome measures in depression trials (doi:10.1016/S2215-0366(23)00438-8) harmonising the questions helps, but the patient-level comparability gap remains until we project them in a shared space.

QuestEA is still preliminary as it's just a solo project and I don't have access to clinical-grade data to evaluate it (I was about to get access through a friendly lab but the dataset became unaffordable). I tried validating it on open psychometric datasets with intrinsic clustering metrics but the data is not very well suited for the task.

A few things I'd be curious about:

  1. Whether you've toyed with similar ideas as QuestEA
  2. Whether you'd see value in a patient-level embedding module sitting on top of your item matching
  3. If you'd be interested in me contributing to Harmony (code or not)
  4. If you'd be interested in helping me validate my approach (I mostly need filled clinical surveys (even just scans as I'm working on a universal survey scanner anyway)).
  5. If you had any feedback on my project.

Happy to chat further and thank you for putting the project out in the open.

Olivier Cornelis
(You can contact me here)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions