Skip to content

Pipeline crash in compute_prob.R when transcriptomic_distance is positive (10x 3' data) #34

@Chris-Kato

Description

@Chris-Kato

Hi,

I encountered a crash in the SCALPEL pipeline at the step:

isoform_quantification:probability_distribution

The error is:

Error in seq.default(0, read_tab$transcriptomic_distance[1], -BINS) :
wrong sign in 'by' argument

This occurs in compute_prob.R at:

part_neg = c(seq(0,read_tab$transcriptomic_distance[1],-BINS), ...)

From debugging, the issue arises because the pipeline assumes that
transcriptomic_distance values are negative (i.e. upstream of the 3' end).

However, in my dataset (10x Chromium 3' snRNA-seq), many distances are positive.

Example from all_unique_reads.txt:

chr21 44335310 44335372 + 478 ...
chr21 44335399 44335489 + 361 ...
chr21 44335490 44335553 + 297 ...

Here the transcript coordinates are:

transcript: 44335251–44335851 (+ strand)

So the read lies upstream of the 3' end, but the computed dist_END is positive.

Because of this, the following call fails:

seq(0, positive_value, -BINS)

which leads to the crash.

As a temporary workaround, I inverted the sign of dist_END in compute_prob.R:

reads = reads %>%
dplyr::filter(gene_name %in% gene.tokeep) %>%
mutate(dist_END = -as.numeric(dist_END))

After this change, the pipeline proceeds normally.

My questions are:

  1. Should transcriptomic_distance be expected to be negative upstream of the 3' end?
  2. Is the current sign convention in mapping_filtering.R intended?
  3. Should compute_prob.R handle both positive and negative distances more robustly (e.g. using min() instead of [1])?

Thank you for developing SCALPEL.

Best regards

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions