Skip to content

V2#25

Closed
ch99l wants to merge 57 commits into
develfrom
v2
Closed

V2#25
ch99l wants to merge 57 commits into
develfrom
v2

Conversation

@ch99l
Copy link
Copy Markdown
Collaborator

@ch99l ch99l commented Apr 15, 2026

Major Update of Bambu-Single Cell Pipeline

Summary of Changes Made:

  1. Support for new single cell and spatial 10x kits
  2. Refactor main.nf script (move process-specific code to modules/subworkflows)
  3. Create assets directory (10x_config) to store static configuration files
  4. Update nextflow.config file
  5. Include new processes in pipeline to improve fastq processing (chopper and cutadapt)
  6. Fix Nextflow parallel processing logic
  7. Update README.md file

@ClareRobin
Copy link
Copy Markdown

ClareRobin commented Apr 15, 2026

Code review (From Claude /code-review plugin using our best practices instruction md file)

Found 5 issues:

  1. All containers use :latest tags — reproducibility is not guaranteed (CLAUDE.md says "Always pin to an explicit version — never use latest")

    Affects every module: bambu-pipe-r:latest, bambu-pipe-preprocess:latest, bambu-pipe-alignment:latest. Replace with pinned digest or immutable tag, e.g. via Seqera/Wave for CLI tools and a digest for GHCR images.

    https://github.com/GoekeLab/bambu-singlecell-spatial/blob/4010340bbd821a3037ef86adc04b45e0e12c58a5/modules/bambu.nf#L2-L6

  2. Processes are defined inside subworkflow files rather than imported from separate module files (CLAUDE.md says "Do not define processes inside subworkflows — import from modules only")

    subworkflows/alignment.nf defines MINIMAP_BUILD_INDEX, PAFTOOLS_GFF2BED, and MINIMAP_ALIGNMENT inline. subworkflows/prepare_input_standard.nf defines EXTRACT_10X_BARCODES and EXTRACT_10X_SPATIAL_COORDINATES inline. Each should live in its own modules/<tool>/main.nf.

    https://github.com/GoekeLab/bambu-singlecell-spatial/blob/4010340bbd821a3037ef86adc04b45e0e12c58a5/subworkflows/alignment.nf#L1-L21

  3. No process emits versions.yml (CLAUDE.md says "Every process must emit versions.yml alongside its real outputs, using explicit emit: names")

    None of the new or modified modules (BAMBU, BAMBU_EM, BAMBU_CONSTRUCT_READ_CLASS, BAMBU_PREPARE_ANNOTATION, PREPROCESS_FASTQ, SEURAT_CLUSTERING, MINIMAP_ALIGNMENT, etc.) emit a versions.yml file. This blocks version tracking and aggregation.

    https://github.com/GoekeLab/bambu-singlecell-spatial/blob/4010340bbd821a3037ef86adc04b45e0e12c58a5/modules/bambu.nf#L14-L18

  4. timeline, report, trace, and dag are not enabled in nextflow.config (CLAUDE.md says "nextflow.config must enable timeline, report, trace, dag")

    The new nextflow.config adds params, process, and profiles blocks but omits the four reporting directives entirely.

    https://github.com/GoekeLab/bambu-singlecell-spatial/blob/4010340bbd821a3037ef86adc04b45e0e12c58a5/nextflow.config#L1-L89

  5. NDR receives the literal string "NULL" instead of R's NULL when params.ndr is unset (bug)

    In main.nf, def ndr = params.ndr ?: 'NULL' produces the Groovy string "NULL". When interpolated into the Rscript block as NDR = $ndr, R sees NDR = NULL — which is actually valid R NULL only because the bare word NULL is an R keyword, so this works correctly. However the R Dockerfile installs bambu from the branch reference GoekeLab/bambu@devel_pre_v4 rather than a pinned commit or tag, so the installed version is not reproducible. CLAUDE.md says "Pin... all package versions".

    https://github.com/GoekeLab/bambu-singlecell-spatial/blob/4010340bbd821a3037ef86adc04b45e0e12c58a5/containers/r/Dockerfile#L8-L10

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Copy link
Copy Markdown

@ClareRobin ClareRobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My (human) review: (summary)

Biggest issue:

  • example data (currently as is) can't easily be run (also all the samplesheets currently in examples are out of date) - it looks like the reads_chr9_1_1000000.fastq.gz files are 10x5v2, to be able to run the pipeline I had to:
  1. make a new samplesheet
  2. fix resource limits in nextflow config for profile local
  3. fix the staging clash
  4. fix sampleData (spatial) parameter in process BAMBU {}

Other (more minor issues):

  • Various READMe comments (inconsistencies between README & actual pipline).

Comment thread .gitignore
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clare (human) comments :)
suggested to have a more comprehensive gitignore e.g.

.nextflow.log*
work/
results/
*.command.*
*.pyc
.DS_Store
.vscode/```

Comment thread README.md Outdated
Comment thread subworkflows/alignment.nf Outdated
Comment thread subworkflows/alignment.nf Outdated
Comment thread nextflow.config
Comment thread README.md
Comment thread main.nf
Comment thread modules/bambu.nf Outdated
Comment thread modules/bambu.nf Outdated
Comment thread nextflow.config
@ClareRobin
Copy link
Copy Markdown

ClareRobin commented Apr 16, 2026

Github Copilot Code Review (using our nextflow best practices github instruction md file)

Findings:

  1. High: barcode extraction is hard-wired to a path that does not exist in the declared spaceranger container, so the pipeline dies before any real sample processing starts. The failure is in subworkflows/prepare_input_standard.nf and subworkflows/prepare_input_standard.nf, with the path coming from nextflow.config. I reproduced this with a stub-run: EXTRACT_10X_BARCODES failed trying to copy 737K-august-2016.txt from /opt/spaceranger-4.0.1/lib/python/cellranger/barcodes, which is not present in that container. As written, this blocks all chemistries.

  2. High: multi-sample non-visium runs will hit a Nextflow input file collision in BAMBU. For non-visium chemistries, subworkflows/prepare_input_standard.nf creates the same placeholder spatial file name per chemistry, then attaches that same path to every sample in subworkflows/prepare_input_standard.nf. Later, main.nf collects all of those paths into one list and passes them into modules/bambu.nf. Nextflow rejects duplicate file names within a single path-list input, so two 10x3v2 samples will fail before BAMBU starts.

  3. Medium: the branch’s own documented and shipped example inputs no longer match the parser contract, so the advertised smoke-test path is broken. The parser now requires a samplesheet with a path column in subworkflows/prepare_input_standard.nf, but the bundled examples still use fastq or bam columns in examples/samplesheet_basic.csv, examples/samplesheet_bam_example.csv, and examples/samplesheet_custom_example.csv. README also tells users to run a non-existent sample sheet in README.md and points to a non-existent template in README.md. I verified that a stub-run against the bundled example data now fails immediately with “Samplesheet is missing a required path column.”

Clare's (human) note on this feedback

For issue 1: there are currently a docker and singularity profiles, this issue isn't a problem with either of those profiles, but if the user runs the pipeline without a docker or singularity profile (i.e. using locally installed software) the spaceranger filepath will fail. If we want to avoid the possible failure & make this unambiguous, we could add a startup validation in main.nf that exits unless docker or singularity is enabled.

Comment thread modules/bambu.nf Outdated
Copy link
Copy Markdown

@hafiz-ismail hafiz-ismail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly general comments

Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md
Comment thread modules/seurat_clustering.nf Outdated
@ch99l ch99l closed this May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants