Skip to content

Pipeline crash in compute_prob.R when transcriptomic_distance is positive (10x 3' data) #34

@Chris-Kato

Description

@Chris-Kato

Hi,

I encountered a crash in the SCALPEL pipeline at the step:

isoform_quantification:probability_distribution

The error is:

Error in seq.default(0, read_tab$transcriptomic_distance[1], -BINS) :
wrong sign in 'by' argument

This occurs in compute_prob.R at:

part_neg = c(seq(0,read_tab$transcriptomic_distance[1],-BINS), ...)

From debugging, the issue arises because the pipeline assumes that
transcriptomic_distance values are negative (i.e. upstream of the 3' end).

However, in my dataset (10x Chromium 3' snRNA-seq), many distances are positive.

Example from all_unique_reads.txt:

chr21 44335310 44335372 + 478 ...
chr21 44335399 44335489 + 361 ...
chr21 44335490 44335553 + 297 ...

Here the transcript coordinates are:

transcript: 44335251–44335851 (+ strand)

So the read lies upstream of the 3' end, but the computed dist_END is positive.

Because of this, the following call fails:

seq(0, positive_value, -BINS)

which leads to the crash.

As a temporary workaround, I inverted the sign of dist_END in compute_prob.R:

reads = reads %>%
dplyr::filter(gene_name %in% gene.tokeep) %>%
mutate(dist_END = -as.numeric(dist_END))

After this change, the pipeline proceeds normally.

My questions are:

  1. Should transcriptomic_distance be expected to be negative upstream of the 3' end?
  2. Is the current sign convention in mapping_filtering.R intended?
  3. Should compute_prob.R handle both positive and negative distances more robustly (e.g. using min() instead of [1])?

Thank you for developing SCALPEL.

Best regards

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions