Skip to content

SuLab/lotz-ivd

Repository files navigation

IVD Single-Cell Atlas

Goal: Identify the cell types and cell states present in the human intervertebral disc (IVD) and determine how these change with aging and degeneration.

Approach

This project uses a human-gated agentic pipeline to analyze publicly available single-cell RNA-seq datasets of human IVD tissue. An AI agent (Claude) executes well-defined computational steps, while a human PI reviews results and makes scientific decisions at defined checkpoints.

The pipeline is driven by a loop:

while :; do cat PROMPT.md | claude; done

Each iteration, the agent reads the current state from analysis_plan.md, executes the next task as defined in the module specs, runs validation, and updates the plan. The loop halts at human checkpoints — the agent prepares review materials and stops until the human advances the plan.

Pipeline Modules

# Module Description
01 Dataset Discovery Systematic search for all human IVD scRNA-seq datasets
02 Metadata Harmonization Standardize condition labels, demographics, covariates
03 Preprocessing Per-dataset QC, normalization, clustering
04 Coarse Annotation Coarse cell classification for scANVI integration anchors
05 Integration Cross-study tiered scANVI integration
06 Clustering Resolution-optimized Leiden clustering of integrated objects
07 Post-Integration Annotation De novo cell type annotation from integrated, clustered data
08 Differential Analysis Cell composition changes and pseudobulk DE (DESeq2) between conditions
09 Biological Interpretation Pathway enrichment, GRNs, pain-associated gene analysis
10 Trajectory & Dynamics Pseudotime, RNA velocity for the cell state continuum
11 Cell-Cell Communication Ligand-receptor interactions between IVD cell populations
12 Reporting Final report, figures, reproducibility documentation

Key Files

  • PROMPT.md — Fed to the agent on each loop iteration
  • AGENT.md — Execution rules and environment instructions
  • analysis_plan.md — Living document tracking progress, decisions, and revisions
  • specs/ — Module specifications defining inputs, outputs, methods, validation, and checkpoints

Compute Requirements

Most modules run on a standard workstation (32GB RAM). Two modules benefit from HPC:

  • Module 05 (Integration): scVI/scANVI training is significantly faster with GPU. ~200k cells across all studies.
  • Module 07 (SCENIC): Gene regulatory network inference is RAM-intensive (64GB+).

Key Design Decisions

Human-gated, not fully autonomous. The agent executes computational steps but stops at decision points for human review. The automated validation checks are regression safeguards, not proof of correctness.

Per-dataset first, then integrate. Each dataset is preprocessed and annotated independently before cross-study integration. This avoids the known problem where batch correction erases the subtle cell state variation in the chondrocyte/fibroblast continuum.

Tiered integration. Non-resident cells (immune, endothelial) integrate easily and are handled with standard methods. Resident IVD cells require conservative integration or alternative approaches (label transfer, metacells) to preserve the biological continuum.

The plan is revisable. Every human checkpoint includes an evaluation of whether the downstream plan still makes sense given what's been learned. The analysis may loop back to earlier steps with revised parameters.

Pseudobulk DE, not single-cell DE. Differential expression uses pseudobulk aggregation (DESeq2/edgeR) to avoid inflated statistics from treating cells as independent observations.

Directory Structure

ivd-analysis/
├── PROMPT.md               # Agent loop prompt
├── AGENT.md                # Agent execution rules
├── README.md               # This file
├── analysis_plan.md        # Living plan document
├── specs/                  # Module specifications
│   ├── 00_PROJECT.md
│   ├── 01_DATASET_DISCOVERY.md
│   ├── 02_METADATA.md
│   ├── 03_PREPROCESSING.md
│   ├── 04_ANNOTATION.md
│   ├── 05_INTEGRATION.md
│   ├── 06_CLUSTERING.md
│   ├── 07_POST_ANNOTATION.md
│   ├── 08_DIFFERENTIAL.md
│   ├── 09_INTERPRETATION.md
│   ├── 10_TRAJECTORY.md
│   ├── 11_COMMUNICATION.md
│   └── 12_REPORTING.md
├── data/
│   ├── raw/                # Downloaded datasets
│   ├── processed/          # Per-dataset h5ad files
│   └── integrated/         # Cross-study integrated objects
├── metadata/               # Dataset registry, sample metadata
├── results/                # All analysis outputs
├── scripts/                # Compute scripts (run by agent or on HPC)
└── notebooks/              # Jupyter notebooks (visualization, figures, checkpoint review)
    ├── 01_datasets.ipynb       → Table 1
    ├── 02_metadata.ipynb       → Table 1 (cont.)
    ├── 03_qc.ipynb             → Fig S1
    ├── 04_classification.ipynb → Fig S2
    ├── 05_integration.ipynb    → Fig S3
    ├── 06_clustering.ipynb     → Fig S3 (cont.)
    ├── 07_annotation.ipynb     → Fig 1
    ├── 08_differential.ipynb   → Fig 2-3, Table 2
    ├── 09_interpretation.ipynb → Fig 4-5, Fig S4
    ├── 10_trajectory.ipynb     → Fig 6
    └── 11_communication.ipynb  → Fig 7

Scripts vs. Notebooks

Each pipeline module produces two outputs:

  • A script in scripts/ that does the heavy computation (can run headlessly on HPC)
  • A notebook in notebooks/ that loads saved results and produces figures and interpretation

Notebooks are independent of scripts — they read from data/ and results/, not from in-memory objects. This means a reviewer can run a notebook without re-executing the full compute pipeline. Notebooks also serve as draft manuscript figures: each maps to specific figures and tables in the planned publication (see arrows in directory listing above).

Citation

If this analysis contributes to a publication, cite:

  • The original publications for each included dataset
  • The tools used (scanpy, scvi-tools, DESeq2, etc.)
  • This pipeline methodology as appropriate

About

IVD single-cell atlas: cell types and states in the human intervertebral disc across aging and degeneration

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors