Skip to content

maarten-devries/scib-benchmark

Repository files navigation

scib-benchmark

Head-to-head benchmark comparing scib-metrics (JAX) and scib-rapids (CuPy/RAPIDS) for single-cell integration benchmarking metrics.

What this measures

Both libraries compute the same 12 single-cell integration benchmarking metrics on identical data. This benchmark measures:

  1. Numerical equivalency — do both implementations produce the same metric values?
  2. Runtime performance — how much faster is the GPU-accelerated RAPIDS implementation vs JAX?

Metrics benchmarked

Category Metric Input
Silhouette silhouette_label embeddings, labels
Silhouette silhouette_batch embeddings, labels, batch
Silhouette bras (Batch Removal Adapted Silhouette) embeddings, labels, batch
LISI ilisi_knn (integration LISI) kNN graph, batch
LISI clisi_knn (cell-type LISI) kNN graph, labels
Batch effect kbet kNN graph, batch
Batch effect kbet_per_label kNN graph, batch, labels
Clustering nmi_ari_cluster_labels_kmeans embeddings, labels
Clustering nmi_ari_cluster_labels_leiden kNN graph, labels
Integration isolated_labels embeddings, labels, batch
Integration graph_connectivity kNN graph, labels
Regression pcr_comparison pre/post embeddings, covariate

Methodology

Data

Real single-cell RNA-seq data from Tabula Sapiens v2 (548k cells, multi-tissue, multi-donor), accessed via a local TileDB-SOMA store. The benchmark subsamples to multiple dataset sizes (default: 1,000 / 20,000 cells). Larger sizes (e.g., 400k) are not run by default because scib-metrics becomes prohibitively slow at that scale.

Preprocessing pipeline:

  • Filter genes (min 3 cells)
  • Normalize to 10,000 counts per cell + log1p
  • Select 2,000 highly variable genes
  • PCA (50 components)

Pre-integration embeddings are simulated by adding batch-correlated shifts to the PCA coordinates.

Environment

Each library runs in its own isolated virtual environment created with uv:

  • venv_metrics: scib-metrics + jax[cuda12] (JAX with CUDA GPU support)
  • venv_rapids: scib-rapids + cupy-cuda12x

Both are editable installs from local repos (/home/inference/repos/scib-metrics and /home/inference/repos/scib-rapids).

Important: The JAX benchmark uses jax[cuda12], i.e., JAX with CUDA GPU acceleration — not jax[cpu]. This ensures a fair GPU-vs-GPU comparison.

Execution

Workers run sequentially on a single GPU (CUDA_VISIBLE_DEVICES=0). Each worker:

  1. Loads the same .npy data arrays
  2. Computes k-nearest neighbors (k=90) using PyNNDescent
  3. Runs all 12 metrics, timing each independently
  4. Saves results + timings as JSON

Equivalency criteria

  • Deterministic metrics: must match within 1% relative difference
  • Stochastic metrics (kmeans, leiden clustering): 10% tolerance due to PRNG differences

Usage

# Full run (setup venvs, download data, benchmark, compare)
./run_benchmark.sh

# Skip venv setup (reuse existing)
./run_benchmark.sh --skip-setup

# Skip data download (reuse existing)
./run_benchmark.sh --skip-data

# Custom dataset sizes
./run_benchmark.sh --sizes "1000 20000"

# Custom k-neighbors
./run_benchmark.sh --n-neighbors 90

Requirements

  • NVIDIA GPU with CUDA 12.x
  • uv package manager
  • Local clones of scib-metrics and scib-rapids at /home/inference/repos/
  • Local TileDB-SOMA data store (for download_data.py)

Output

Results are saved to benchmark_results/:

  • metrics_n{size}.json — scib-metrics results per dataset size
  • rapids_n{size}.json — scib-rapids results per dataset size
  • comparison.json — full equivalency + timing comparison
  • logs/ — stdout/stderr from each worker

File overview

File Purpose
run_benchmark.sh Main pipeline orchestrator
setup_venvs.sh Creates isolated venvs with uv
download_data.py Fetches and preprocesses real scRNA-seq data from SOMA
generate_data.py Alternative synthetic data generator
worker.py Runs all 12 metrics (auto-detects scib-metrics or scib-rapids)
compare_results.py Compares metric values and timings, prints summary tables

Results

Results from the latest run on 1k and 20k cell subsets of Tabula Sapiens v2 (single NVIDIA GPU, k=90 neighbors).

Metric values

Raw metric values from each library.

n = 1,000 cells

Metric scib-metrics scib-rapids
bras 0.679313 0.679147
clisi_knn 0.992598 0.992598
graph_connectivity 0.956295 0.956295
ilisi_knn 0.208344 0.208285
isolated_labels 0.576349 0.576349
kbet 0.172000 0.172000
kbet_per_label 0.885625 0.885427
nmi_ari_cluster_labels_kmeans_ari 0.444720 0.427040
nmi_ari_cluster_labels_kmeans_nmi 0.790064 0.781690
nmi_ari_cluster_labels_leiden_ari 0.593712 0.553312
nmi_ari_cluster_labels_leiden_nmi 0.785297 0.773716
pcr_comparison 0.873575 0.873576
silhouette_batch 0.804671 0.804671
silhouette_label 0.536008 0.536008

n = 20,000 cells

Metric scib-metrics scib-rapids
bras 0.641492 0.641519
clisi_knn 0.999022 0.999023
graph_connectivity 0.870374 0.870374
ilisi_knn 0.092136 0.092127
isolated_labels 0.552903 0.552903
kbet 0.025750 0.025750
kbet_per_label 0.474108 0.468552
nmi_ari_cluster_labels_kmeans_ari 0.300553 0.311180
nmi_ari_cluster_labels_kmeans_nmi 0.717732 0.716465
nmi_ari_cluster_labels_leiden_ari 0.693071 0.684802
nmi_ari_cluster_labels_leiden_nmi 0.802485 0.797029
pcr_comparison 0.921949 0.921950
silhouette_batch 0.788884 0.788884
silhouette_label 0.511745 0.511745

Timing

Wall-clock seconds per metric, with speedup = metrics_time / rapids_time.

n = 1,000 cells

Metric scib-metrics (s) scib-rapids (s) speedup
bras 20.7397 1.0738 19.31×
clisi_knn 0.2940 0.0023 127.83×
graph_connectivity 0.0487 0.0477 1.02×
ilisi_knn 1.5596 0.5072 3.07×
isolated_labels 0.0140 0.2588 0.05×
kbet 1.0525 0.9294 1.13×
kbet_per_label 18.9501 5.4061 3.51×
nearest_neighbors 19.5032 20.8665 0.93×
nmi_ari_cluster_labels_kmeans 3.2053 2.8090 1.14×
nmi_ari_cluster_labels_leiden 1.1334 1.5514 0.73×
pcr_comparison 1.2024 2.1607 0.56×
silhouette_batch 25.6472 0.1800 142.48×
silhouette_label 3.1603 4.0132 0.79×

n = 20,000 cells

Metric scib-metrics (s) scib-rapids (s) speedup
bras 88.6623 0.8740 101.44×
clisi_knn 0.9002 0.0069 130.46×
graph_connectivity 0.1646 0.1586 1.04×
ilisi_knn 1.9619 0.0175 112.11×
isolated_labels 0.1285 4.9100 0.03×
kbet 2.2675 0.0224 101.23×
kbet_per_label 61.8097 18.2841 3.38×
nearest_neighbors 33.6002 32.8474 1.02×
nmi_ari_cluster_labels_kmeans 8.9912 8.0512 1.12×
nmi_ari_cluster_labels_leiden 44.6316 4.6684 9.56×
pcr_comparison 1.3026 0.2186 5.96×
silhouette_batch 106.9684 0.4200 254.69×
silhouette_label 646.0577 5.3100 121.67×

Highlights:

  • silhouette_batch, bras, silhouette_label, clisi_knn, ilisi_knn, and kbet show 1–2 orders of magnitude speedup at 20k cells.
  • isolated_labels is the one metric where scib-metrics is consistently faster (rapids path is ~30× slower at 20k).
  • nearest_neighbors (PyNNDescent) is shared code and runs at parity.

Full per-metric values and raw timings are in benchmark_results/comparison.json.

Notes on numerical differences

Most metrics agree exactly or within float-rounding noise. Three metrics drift:

  • kbet_per_label (~0.02% at 1k, ~1.17% at 20k): kbet_per_label internally calls a diffusion-map kNN step. scib-rapids (src/scib_rapids/utils/_diffusion_nn.py:52-64) passes a deterministic v0 = ones(n)/√n to scipy.sparse.linalg.eigsh and canonicalizes eigenvector signs (forces the largest-magnitude component positive). scib-metrics (src/scib_metrics/utils/_diffusion_nn.py:68) does neither, so Lanczos can sign-flip eigenvectors run-to-run, perturbing the diffusion-distance kNN graph that kbet_per_label consumes. scib-rapids also does the chi-square in float32 vs float64 in scib-metrics, contributing a small additional drift. Aggregate kbet matches exactly because it doesn't go through diffusion_nn. Adding v0 + sign canonicalization upstream in scib-metrics should make the two bit-identical here.
  • nmi_ari_cluster_labels_kmeans_* (up to ~4%): both libraries seed with 0, but scib-metrics uses jax.random.PRNGKey(0) while scib-rapids uses np.random.default_rng(0). Different PRNG streams → different k-means++ initializations → different cluster assignments → different NMI/ARI. Algorithmically equivalent; different random seeds in practice.
  • nmi_ari_cluster_labels_leiden_* (up to ~7% ARI at 1k, ~1.2% at 20k): scib-rapids now runs leiden via cugraph.leiden (Apache-2, GPU) while scib-metrics uses igraph.community_leiden (GPL-2, CPU). Different leiden implementations with different refinement and move strategies produce different partitions, so NMI/ARI differ modestly. The license and performance win (see timing table — nmi_ari_cluster_labels_leiden is ~10× faster at 20k) makes this the right trade-off; the two are not expected to agree bit-for-bit.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors