paulranum11 / split-seq_demultiplexing Goto Github PK

25.0 8.0 8.0 7 MB

An unofficial demultiplexing strategy for SPLiT-seq RNA-Seq data

License: MIT License

Shell 32.24% Python 67.76%

scrna-seq scrna-seq-analysis split-seq fastq demultiplexing single-cell single-cell-rna-seq single-cell-analysis single-cell-omics single-cell-sequencing

split-seq_demultiplexing's People

Contributors

Stargazers

Watchers

Forkers

zexuneu charliewhitmore28 solocell xflicsu landersj1 dumaatravaie chen-dawn ashley-brooke

split-seq_demultiplexing's Issues

step 4 issue

hello,
when I ran splitseqdemultiplex, I find that the step 4 takes a long time.
it seems that the step4 has been running 1 day:
Beginning STEP4: Extracting UMIs. Current time : 2021-05-03 06:45:57.

Do you think it is fine?

odd results

How to prepare the roundXbarcodes file？

Hi, Paul

Thank you very much for developing this useful tool! I am new to the SPLiT-seq data analysis, just want to make sure how could I prepare the roundXbarcode files from my primer list. Below are my 1st round primers that I used in experiments.

AGGTCAGAGCATTGAAACATCGTTTTTTTTTTTTTTTVN
AGGTCAGAGCATTGATGCCTAATTTTTTTTTTTTTTTVN
AGGTCAGAGCATTGAGTGGTCATTTTTTTTTTTTTTTVN
AGGTCAGAGCATTGATCATTCCNNNNNN	
AGGTCAGAGCATTGATTGGCTCNNNNNN	
AGGTCAGAGCATTGCAAGGAGCNNNNNN

So the corresponding file should be:

AGGTCAGAGCATTGAAACATCG
AGGTCAGAGCATTGATGCCTAA
AGGTCAGAGCATTGAGTGGTCA
AGGTCAGAGCATTGATCATTCC	
AGGTCAGAGCATTGATTGGCTC	
AGGTCAGAGCATTGCAAGGAGC

It looks like your example file barcode is much shorter, could you please explain a little bit more about the file preparation?

Thanks for your help!

Best,
Monica

KeyError: '0_34'

Version of Split-seq

Hi there,

I noticed that we are having an issue where barcode 1 seems to be missing for alot of the reads, which reduces the cell counts and reads per cell. I'm wondering if this may be due to us using v3 of split-seq and whether this pipeline reflects that version? They changed the positioning of barcode 1 for v3.

Confirm that OdT/ranHex collapse is occurring on correct BC position

If the UMI is bases 1-10 of the barcode read, then according to this schematic, the barcode corresponding to oligo-dT/random hexamers should be the round1 barcode, but should be the third barcode sequenced, since sequencing proceeds outside-in (ie UMI-BC3-BC2-BC1). If I'm understanding the collapse script (both versions) correctly, it looks like we're collapsing based on the first barcode rather than the third.

Maybe we're misunderstanding something about the amplification or direction of sequencing, or perhaps the demultiplexing python script is (correctly) reading the barcodes from right to left...? Anyway can you please confirm/clarify this?

Thanks!

Demultiplexing and collapsing takes days to complete for at-scale experiment

Hi-
We finally generated some full-scale SPLiT-seq data, and the runtime for this tool is quite large. We have ~850M reads; the initial demultiplexing step as well as the collapsing ODT/random hexamers each take days to complete.

Are there any performance improvements you might be able to make to get this tool to scale better with data input size? zUMIs by comparison can do most of this in hours or less...

Thanks!

What are the recommended steps once demultiplexing is complete?

It would be helpful to describe what steps are recommended once the FASTQs are separated by cell. The SPLiT-seq paper utilizes one of the Drop-seq tools (TagReadWithGeneExon), followed by Starcode to collapse UMIs of aligned reads that were within 1 nt mismatch of another UMI. They then don't describe how they generated their final cell x gene expression matrix, but I assume it's the DigitalExpression tool from the Drop-seq toolkit as well.

I'm trying to figure out how best to connect the dots from the output of your method to those steps, or if some other approach is better.

Any advice would be appreciated!

Questions about UMI and Barcode Position

Hi,
Thank you for this good work.

I am testing your last version of SPLiT-Seq_demultiplexing0.1.1.
As, i understood from the SplitSeq Protocol, the Read1 and Read2 of fastq files are described as :

Read 1 (66 nt) = transcript
Read 2 (94 nt) = UMI + BC3 + spacer + BC2 +spacer + BC1
where the UMI starts at - 1nts, BC3- Starts at 11 nts, BC2- Starts at 48 nts, BC1- Starts at 86 nts

So, when i look at the results of SPLiT-Seq_demultiplexing0.1.1 ( using small test fastq files downloaded along with this logiciel ) of merged fastq files, the UMI is always the first 10 nts of READ1 not first 10nts of READ2 !!!

For example ( for this read 1 and read 2 ):

Read1:

@SRR6750041.1 1/1
CTGGANAAGTGAAATAATATAAATTTTTCCACTATTGAATAAAAGCAACTTAAATTTTCTAAGTCG
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEA<AAEEEEE<6

Read2:

@SRR6750041.1 1/2
NNTACTAAAGGCTAACGAGTGGCCGCTGTTTCGCATCGGCGTACGACTATTGAGGAATCCACGTGCTTGAGAGGCCAGAGCATTCGAACGCTTA
+
##AAAEEEEEEEEEEEEEE/6E/EE6/EAE<66666AEAAAE66<<</<<EEEEEEEEAAAAE666666AAE<<EEEEEEEE<AEAEEEEEEE/

SPLiT-Seq_demultiplexing0.1.1 produces the output as :

@SRR6750041.1_AGCATTCGAACGCTTAATTGAGGAATCCAGCTAACGAGTGGCC_CTGGANAAGT 1/1
CTGGANAAGTGAAATAATATAAATTTTTCCACTATTGAATAAAAGCAACTTAAATTTTCTAAGTCG
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEA<AAEEEEE<6

Where in the header last characters _CTGGANAAGT is the UMI, which is the first 10nts of READ1 not READ2 !!!!

So, Can you please tell me whether this is correct, or it is me who is wrong.

I hope i could explain you clearly.
Thank you in advance for your reply,
with best wishes,
Duma

paulranum11 / split-seq_demultiplexing Goto Github PK

split-seq_demultiplexing's People

Contributors

Stargazers

Watchers

Forkers

split-seq_demultiplexing's Issues

step 4 issue

odd results

How to prepare the roundXbarcodes file？

KeyError: '0_34'

Version of Split-seq

Confirm that OdT/ranHex collapse is occurring on correct BC position

Demultiplexing and collapsing takes days to complete for at-scale experiment

What are the recommended steps once demultiplexing is complete?

Questions about UMI and Barcode Position

how to do demultiplexing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent