Skip to content

Add ADNI dataset and Alzheimer's Disease classification pipeline#921

Open
laubryan wants to merge 1 commit into
sunlabuiuc:masterfrom
laubryan:add-adni
Open

Add ADNI dataset and Alzheimer's Disease classification pipeline#921
laubryan wants to merge 1 commit into
sunlabuiuc:masterfrom
laubryan:add-adni

Conversation

@laubryan
Copy link
Copy Markdown

@laubryan laubryan commented Mar 31, 2026

Contributor

Contribution

This PR is a full pipeline implementation of the Alzheimer's Disease classification model described by Liu et al., consisting of dataset, task and model for ADNI MRI images dataset.

Paper

This work is an implementation of the methods and ideas presented in:

Description

This PR adds support for ADNI MRI brain scan images using a PyHealth-compatible dataset and processing via a compatible task and multi-class classification model, based on the paper "On the Design of Convolutional Neural Networks for Automatic Detection of Alzheimer's Disease" by Liu et al.

Changes

  • Implemented PyHealth-compatible ADNIDataset for ADNI MRI images, with YAML configuration file
  • Added NIfTIImageProcessor for NIftI MRI images
  • Added AlzheimersDiseaseClassification task
  • Implemented AlzheimersDiseaseCNN 3D CNN model for classification
  • Added unit tests
  • Added example scripts and notebook
  • Added new RST files for pipeline components
  • Updated related RST documentation indices

Files to Review

Category Files
Dataset pyhealth/datasets/adni.py
pyhealth/datasets/configs/adni.yaml
Processor pyhealth/processors/nifti_image_processor.py
Task pyhealth/tasks/ad_classification.py
Model pyhealth/models/alzheimers_cnn.py
Tests pyhealth/tests/core/test_adni.py
Examples pyhealth/examples/adni_adclassification_cnn.ipynb
pyhealth/examples/adni_ad_classification.py
pyhealth/examples/adni_ad_synthetic_data.py
Documentation pyhealth/docs/api/datasets.rst
pyhealth/docs/api/datasets/pyhealth.datasets.ADNIDataset.rst

pyhealth/docs/api/processors.rst
pyhealth/docs/api/processors/pyhealth.processors.NIftIImageProcessor.rst

pyhealth/docs/api/tasks.rst
pyhealth/docs/api/tasks/pyhealth.tasks.AlzheimersDiseaseClassification.rst

pyhealth/docs/api/models.rst
pyhealth/docs/api/models/pyhealth.models.AlzheimersDiseaseCNN.rst

Notes

@laubryan laubryan force-pushed the add-adni branch 5 times, most recently from c85f2fe to 86e4f90 Compare April 1, 2026 01:26
the ADNI dataset and 3D-CNN model.

- Dataset: ADNIDataset
- Processor: NIftIImageProcessor
- Task: AlzheimersDiseaseClassification
- Model: AlzheimersDiseaseCNN
- Tests:
    - TestADNIDataset
    - TestAlzheimersDiseaseClassification
    - TestAlzheimersDiseaseModel
- Examples:
    - Python scripts
    - Notebook
- RST Documentation
Copy link
Copy Markdown
Collaborator

@EricSchrock EricSchrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had time for a full review yet, but I wanted to get you some initial feedback to work on. I still need to review the examples, processor, task, and tests as well as to read the paper and possibly try running things for myself.

pyhealth.datasets.ADNIDataset
===================================

The Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. A Pyhealth dataset consisting of labelled MRI brain scan images. For more information, and to apply for access, visit `https://adni.loni.usc.edu/data-samples/adni-data/`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this won't create clickable links (here and in pyhealth.models.AlzheimersDiseaseCNN.rst). See the existing docs for how to create clickable links in RST files (and don't forget the trailing underscore).

:members:
:undoc-members:
:show-inheritance:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well clean up this trailing whitespace at the end of this file (even though it's just copy/paste from some of the existing docs).

Comment thread docs/api/processors.rst
- ``TimeImageProcessor``: For time-stamped image sequences (e.g., serial X-rays)
- ``TensorProcessor``: For pre-processed tensor data
- ``RawProcessor``: Pass-through processor for raw data
- ``NIftIImageProcessor``: For NIftI MRI images
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably belongs in the "Specialized Processors" section.

@@ -0,0 +1,9 @@
pyhealth.models.AlzheimersDiseaseCNN
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very generic name, which makes me think it might conflict with future PyHealth contributions. Unfortunately, the paper this model comes from doesn't really provide a unique name. Any ideas for better names? My first thought was AdMri3dCNN, but I don't love it (camel case not doing us any favors here with all the acronyms).

Comment thread pyhealth/datasets/adni.py
self.root = Path(root)
self.dev = dev

self.DEV_LIMIT = 1000
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't like having this hardcoded in BaseDataset and here. We should add that to BaseDataset, perhaps as a class variable/attribute, and reference it in this class.

Unless of course you break this class's dependency on dev mode... (per other comments). Then you can just delete self.DEV_LIMIT.

Comment thread pyhealth/datasets/adni.py
config_path (Optional[str]): Path to the configuration file,
defaults to "../configs/adni-metadata-pyhealth.csv"
dataset_name (str): Name identifying this dataset, default "ADNI".
dev: If True, number of loaded images is restricted to 100.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dev: If True, number of loaded images is restricted to 100.
dev: If True, number of loaded images is restricted to 1000.

Comment thread pyhealth/datasets/adni.py
# e.g. ADNI_002_S_0295_MPR__GradWarp__B1_Correction__N3_S13408_I45107.xml
metadata_files = Path(data_path).glob("ADNI*.xml")

# If a dev limit is requested, then sample the list randomly
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add randomness to dev mode? I would expect dev mode to be 100% reproducible for debugging issues.

Comment thread pyhealth/datasets/adni.py
dataset_name (str): Name identifying this dataset, default "ADNI".
dev: If True, number of loaded images is restricted to 100.
"""
self.root = Path(root)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.root should not be initialized here, as that mirrors the initialization of self.root in the parent class. Just pass Path(root) to _validate_data_path() and _write_patient_catalog() and let super().__init__() initialize self.root.

Comment thread pyhealth/datasets/adni.py
dev: If True, number of loaded images is restricted to 100.
"""
self.root = Path(root)
self.dev = dev
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing the dev member variable here in the child class is odd because it will be re-initialized by the call to super().__init__(). I see why you're doing it though. You need that information in _write_patient_catalog.

How long does _write_patient_catalog take to run on the dev and full datasets respectively? If the difference and/or absolute value is small, probably best to just run _write_patient_catalog on the full dataset metadata every time, even in dev mode, removing the need for the if self.dev: statement.

The other option is to pass dev to _write_patient_catalog directly as a function parameter, avoiding the need for self.dev = dev.

from pyhealth.models.base_model import BaseModel


class AlzheimersDiseaseCNN(BaseModel):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you speak to why you didn't choose to inherit from the CNN class?

I'm wondering if that would simplify this class. I'm also wondering if this is an opportunity to further parameterize the CNN class to meet the needs of this new model (and future models?).

@EricSchrock
Copy link
Copy Markdown
Collaborator

Note to self: #1005, #1121, and #1122 are based off the same paper. #1113 is a different paper but similar PR. Maybe pick the best parts from each?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants