Add ADNI dataset and Alzheimer's Disease classification pipeline by laubryan · Pull Request #921 · sunlabuiuc/PyHealth

laubryan · 2026-03-31T03:25:09Z

Contributor

Name: Bryan Lau
NetID (Email): bryan16 (bryan16@illinois.edu)

Contribution

This PR is a full pipeline implementation of the Alzheimer's Disease classification model described by Liu et al., consisting of dataset, task and model for ADNI MRI images dataset.

Paper

This work is an implementation of the methods and ideas presented in:

Liu et al., On the Design of Convolutional Neural Networks for Automatic Detection of Alzheimer's Disease
arXiv: https://arxiv.org/abs/1911.03740

Description

This PR adds support for ADNI MRI brain scan images using a PyHealth-compatible dataset and processing via a compatible task and multi-class classification model, based on the paper "On the Design of Convolutional Neural Networks for Automatic Detection of Alzheimer's Disease" by Liu et al.

Changes

Implemented PyHealth-compatible ADNIDataset for ADNI MRI images, with YAML configuration file
Added NIfTIImageProcessor for NIftI MRI images
Added AlzheimersDiseaseClassification task
Implemented AlzheimersDiseaseCNN 3D CNN model for classification
Added unit tests
Added example scripts and notebook
Added new RST files for pipeline components
Updated related RST documentation indices

Files to Review

Category	Files
Dataset	`pyhealth/datasets/adni.py` `pyhealth/datasets/configs/adni.yaml`
Processor	`pyhealth/processors/nifti_image_processor.py`
Task	`pyhealth/tasks/ad_classification.py`
Model	`pyhealth/models/alzheimers_cnn.py`
Tests	`pyhealth/tests/core/test_adni.py`
Examples	`pyhealth/examples/adni_adclassification_cnn.ipynb` `pyhealth/examples/adni_ad_classification.py` `pyhealth/examples/adni_ad_synthetic_data.py`
Documentation	`pyhealth/docs/api/datasets.rst` `pyhealth/docs/api/datasets/pyhealth.datasets.ADNIDataset.rst` `pyhealth/docs/api/processors.rst` `pyhealth/docs/api/processors/pyhealth.processors.NIftIImageProcessor.rst` `pyhealth/docs/api/tasks.rst` `pyhealth/docs/api/tasks/pyhealth.tasks.AlzheimersDiseaseClassification.rst` `pyhealth/docs/api/models.rst` `pyhealth/docs/api/models/pyhealth.models.AlzheimersDiseaseCNN.rst`

Notes

Access to download ADNI files requires approval through IDA (https://adni.loni.usc.edu/data-samples/adni-data/#AccessData)
Example scripts and unit tests generate and use synthetic data

the ADNI dataset and 3D-CNN model. - Dataset: ADNIDataset - Processor: NIftIImageProcessor - Task: AlzheimersDiseaseClassification - Model: AlzheimersDiseaseCNN - Tests: - TestADNIDataset - TestAlzheimersDiseaseClassification - TestAlzheimersDiseaseModel - Examples: - Python scripts - Notebook - RST Documentation

EricSchrock

I haven't had time for a full review yet, but I wanted to get you some initial feedback to work on. I still need to review the examples, processor, task, and tests as well as to read the paper and possibly try running things for myself.

EricSchrock · 2026-05-08T18:57:04Z

+pyhealth.datasets.ADNIDataset
+===================================
+
+The Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. A Pyhealth dataset consisting of labelled MRI brain scan images. For more information, and to apply for access, visit `https://adni.loni.usc.edu/data-samples/adni-data/`.


Pretty sure this won't create clickable links (here and in pyhealth.models.AlzheimersDiseaseCNN.rst). See the existing docs for how to create clickable links in RST files (and don't forget the trailing underscore).

EricSchrock · 2026-05-08T18:59:00Z

+    :members:
+    :undoc-members:
+    :show-inheritance:
+


May as well clean up this trailing whitespace at the end of this file (even though it's just copy/paste from some of the existing docs).

EricSchrock · 2026-05-08T19:14:34Z

 - ``TimeImageProcessor``: For time-stamped image sequences (e.g., serial X-rays)
 - ``TensorProcessor``: For pre-processed tensor data
 - ``RawProcessor``: Pass-through processor for raw data
+- ``NIftIImageProcessor``: For NIftI MRI images


This probably belongs in the "Specialized Processors" section.

EricSchrock · 2026-05-08T19:34:07Z

@@ -0,0 +1,9 @@
+pyhealth.models.AlzheimersDiseaseCNN


This is a very generic name, which makes me think it might conflict with future PyHealth contributions. Unfortunately, the paper this model comes from doesn't really provide a unique name. Any ideas for better names? My first thought was AdMri3dCNN, but I don't love it (camel case not doing us any favors here with all the acronyms).

EricSchrock · 2026-05-08T19:48:08Z

+        self.root = Path(root)
+        self.dev = dev
+
+        self.DEV_LIMIT = 1000


Hmm, I don't like having this hardcoded in BaseDataset and here. We should add that to BaseDataset, perhaps as a class variable/attribute, and reference it in this class.

Unless of course you break this class's dependency on dev mode... (per other comments). Then you can just delete self.DEV_LIMIT.

EricSchrock · 2026-05-08T20:09:33Z

+            config_path (Optional[str]): Path to the configuration file, 
+                defaults to "../configs/adni-metadata-pyhealth.csv"
+            dataset_name (str): Name identifying this dataset, default "ADNI".
+            dev: If True, number of loaded images is restricted to 100.


Suggested change

dev: If True, number of loaded images is restricted to 100.

dev: If True, number of loaded images is restricted to 1000.

EricSchrock · 2026-05-08T20:12:01Z

+        # e.g. ADNI_002_S_0295_MPR__GradWarp__B1_Correction__N3_S13408_I45107.xml
+        metadata_files = Path(data_path).glob("ADNI*.xml")
+
+        # If a dev limit is requested, then sample the list randomly


Why add randomness to dev mode? I would expect dev mode to be 100% reproducible for debugging issues.

EricSchrock · 2026-05-08T20:24:58Z

+            dataset_name (str): Name identifying this dataset, default "ADNI".
+            dev: If True, number of loaded images is restricted to 100.
+        """
+        self.root = Path(root)


self.root should not be initialized here, as that mirrors the initialization of self.root in the parent class. Just pass Path(root) to _validate_data_path() and _write_patient_catalog() and let super().__init__() initialize self.root.

EricSchrock · 2026-05-08T20:27:58Z

+            dev: If True, number of loaded images is restricted to 100.
+        """
+        self.root = Path(root)
+        self.dev = dev


Initializing the dev member variable here in the child class is odd because it will be re-initialized by the call to super().__init__(). I see why you're doing it though. You need that information in _write_patient_catalog.

How long does _write_patient_catalog take to run on the dev and full datasets respectively? If the difference and/or absolute value is small, probably best to just run _write_patient_catalog on the full dataset metadata every time, even in dev mode, removing the need for the if self.dev: statement.

The other option is to pass dev to _write_patient_catalog directly as a function parameter, avoiding the need for self.dev = dev.

EricSchrock · 2026-05-08T20:42:11Z

+from pyhealth.models.base_model import BaseModel
+
+
+class AlzheimersDiseaseCNN(BaseModel):


Could you speak to why you didn't choose to inherit from the CNN class?

I'm wondering if that would simplify this class. I'm also wondering if this is an opportunity to further parameterize the CNN class to meet the needs of this new model (and future models?).

EricSchrock · 2026-05-14T14:46:24Z

Note to self: #1005, #1121, and #1122 are based off the same paper. #1113 is a different paper but similar PR. Maybe pick the best parts from each?

laubryan force-pushed the add-adni branch 5 times, most recently from c85f2fe to 86e4f90 Compare April 1, 2026 01:26

laubryan force-pushed the add-adni branch from 86e4f90 to d4da117 Compare April 5, 2026 01:02

EricSchrock requested changes May 8, 2026

View reviewed changes

EricSchrock mentioned this pull request May 14, 2026

DL4H Alzheimers CNN #1005

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ADNI dataset and Alzheimer's Disease classification pipeline#921

Add ADNI dataset and Alzheimer's Disease classification pipeline#921
laubryan wants to merge 1 commit into
sunlabuiuc:masterfrom
laubryan:add-adni

laubryan commented Mar 31, 2026 •

edited

Loading

Uh oh!

EricSchrock left a comment

Uh oh!

EricSchrock May 8, 2026

Uh oh!

EricSchrock May 8, 2026

Uh oh!

EricSchrock May 8, 2026

Uh oh!

EricSchrock May 8, 2026

Uh oh!

EricSchrock May 8, 2026

Uh oh!

EricSchrock May 8, 2026

Uh oh!

EricSchrock May 8, 2026

Uh oh!

EricSchrock May 8, 2026

Uh oh!

EricSchrock May 8, 2026

Uh oh!

EricSchrock May 8, 2026

Uh oh!

EricSchrock commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	dev: If True, number of loaded images is restricted to 100.
	dev: If True, number of loaded images is restricted to 1000.

		from pyhealth.models.base_model import BaseModel


		class AlzheimersDiseaseCNN(BaseModel):

Conversation

laubryan commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributor

Contribution

Paper

Description

Changes

Files to Review

Notes

Uh oh!

EricSchrock left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EricSchrock commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

laubryan commented Mar 31, 2026 •

edited

Loading