Reproducibility code for Evaluating cell type annotations in single-cell omics in the absence of ground truth
Reproducible workflows used to generate the analyses and figures for the manuscript.
This repository is organized around four main reproducibility layers:
- data preparation from raw single-cell objects
- benchmarking pipelines (inter-sample consistency and label transfer - supervised classification)
- ISC meta-study
- figure notebooks used for manuscript panels
- Process all core raw datasets into standardized outputs (
data_processing) - Run the ISC benchmark (
ISC_benchmark/) - Run saturation analysis on ISC metrics (
ISC_benchmark_saturation/) - Run the label-transfer benchmark (supervised classification) (
label_transfer_task/) - Run meta-study using top ISC metrics (
Meta_study/) - Render manuscript figure notebooks from processed outputs (
Figures_notebooks/)
From repository root:
Rscript -e 'renv::restore()'Download raw .rds files from Zenodo DOI 10.5281/zenodo.18921437 into:
data/raw/
Dataset-level expected filenames and processing details are documented in:
data_processing/README.mddata_processing/config/
cd data_processing
Rscript -e 'targets::tar_make()'Alternative wrappers (local/HPC) are available in data_processing/scripts/.
Expected output: processed datasets in data/processed/.
cd ISC_benchmark
Rscript -e 'targets::tar_make()'Details and HPC submission options:
ISC_benchmark/README.mdISC_benchmark/scripts/ISC_benchmark/config/
The saturation pipeline evaluates ranking stability by assessing method rankings across progressively larger subsets of datasets. It measures whether top-performing methods remain robust to dataset selection and predicts ranking convergence at larger dataset numbers.
cd ISC_benchmark_saturation
Rscript -e 'targets::tar_make()'Main outputs:
ISC_benchmark_saturation/results/— Ranking stability analysis, correlation distributions, and trend curves
Details and optional runner script:
ISC_benchmark_saturation/README.mdISC_benchmark_saturation/scripts/
The label-transfer pipeline benchmarks supervised classifiers on query/reference splits generated from processed datasets.
Splits are prepared inside the targets pipeline (lt_prepared_splits target).
Then run the benchmark:
cd label_transfer_task
Rscript -e 'targets::tar_make()'Optional: run split generation as a standalone pre-step:
cd ..
Rscript label_transfer_task/R/00_prepare_splits.RMain aggregated outputs:
label_transfer_task/results/aggregated/label_transfer_metrics_aggregated.csvlabel_transfer_task/results/aggregated/label_transfer_summary_stats.csv
Alternative classifier/HPC runners are in:
label_transfer_task/R/label_transfer_task/scripts/
The meta-study pipeline generates scTypeEval objects from processed datasets and selected annotation columns.
cd Meta_study
Rscript -e 'targets::tar_make()'Main outputs:
Meta_study/output/scTypeEval_objs/
Details and optional runner script:
Meta_study/README.mdMeta_study/scripts/
data_processing/README.mdISC_benchmark/README.mdISC_benchmark_saturation/README.mdlabel_transfer_task/README.mdMeta_study/README.md
Figure notebooks are in Figures_notebooks/ (for example 1_Figure.Rmd, 3_Figure.Rmd, 4_Figure.Rmd, 5_Figure.Rmd, 6_Figure.Rmd).
Typical usage:
cd Figures_notebooks
Rscript -e 'rmarkdown::render("1_Figure.Rmd")'Notebook respective inputs are listed within notebooks.
- Pipeline execution is managed with
targetsfor resumable and incremental runs - Package versions are pinned by
renv.lock - Parameters are controlled in YAML files under each module
config/directory
renv::restore()- Download Zenodo raw files to
data/raw/ - Run
data_processing/ - Run
ISC_benchmark/ - Run
ISC_benchmark_saturation/ - Run
label_transfer_task/ - Run
Meta_study/ - Render notebooks in
Figures_notebooks/
- If a pipeline cannot find paths, run commands from the module directory (
data_processing/,ISC_benchmark/,ISC_benchmark_saturation/,label_transfer_task/,Meta_study/) - If outputs are partial after interruption, rerun
targets::tar_make()in the same module to resume