Giter Site home page Giter Site logo

paulranum11 / split-seq_demultiplexing Goto Github PK

View Code? Open in Web Editor NEW
25.0 8.0 8.0 7 MB

An unofficial demultiplexing strategy for SPLiT-seq RNA-Seq data

License: MIT License

Shell 32.24% Python 67.76%
scrna-seq scrna-seq-analysis split-seq fastq demultiplexing single-cell single-cell-rna-seq single-cell-analysis single-cell-omics single-cell-sequencing

split-seq_demultiplexing's People

Contributors

charliewhitmore28 avatar davidsonlabchop avatar paulranum11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

split-seq_demultiplexing's Issues

step 4 issue

hello,
when I ran splitseqdemultiplex, I find that the step 4 takes a long time.
it seems that the step4 has been running 1 day:
Beginning STEP4: Extracting UMIs. Current time : 2021-05-03 06:45:57.

Do you think it is fine?

How to prepare the roundXbarcodes file?

Hi, Paul

Thank you very much for developing this useful tool! I am new to the SPLiT-seq data analysis, just want to make sure how could I prepare the roundXbarcode files from my primer list. Below are my 1st round primers that I used in experiments.

AGGTCAGAGCATTGAAACATCGTTTTTTTTTTTTTTTVN
AGGTCAGAGCATTGATGCCTAATTTTTTTTTTTTTTTVN
AGGTCAGAGCATTGAGTGGTCATTTTTTTTTTTTTTTVN
AGGTCAGAGCATTGATCATTCCNNNNNN	
AGGTCAGAGCATTGATTGGCTCNNNNNN	
AGGTCAGAGCATTGCAAGGAGCNNNNNN

So the corresponding file should be:

AGGTCAGAGCATTGAAACATCG
AGGTCAGAGCATTGATGCCTAA
AGGTCAGAGCATTGAGTGGTCA
AGGTCAGAGCATTGATCATTCC	
AGGTCAGAGCATTGATTGGCTC	
AGGTCAGAGCATTGCAAGGAGC	

It looks like your example file barcode is much shorter, could you please explain a little bit more about the file preparation?

Thanks for your help!

Best,
Monica

Version of Split-seq

Hi there,

I noticed that we are having an issue where barcode 1 seems to be missing for alot of the reads, which reduces the cell counts and reads per cell. I'm wondering if this may be due to us using v3 of split-seq and whether this pipeline reflects that version? They changed the positioning of barcode 1 for v3.

Confirm that OdT/ranHex collapse is occurring on correct BC position

If the UMI is bases 1-10 of the barcode read, then according to this schematic, the barcode corresponding to oligo-dT/random hexamers should be the round1 barcode, but should be the third barcode sequenced, since sequencing proceeds outside-in (ie UMI-BC3-BC2-BC1). If I'm understanding the collapse script (both versions) correctly, it looks like we're collapsing based on the first barcode rather than the third.

Maybe we're misunderstanding something about the amplification or direction of sequencing, or perhaps the demultiplexing python script is (correctly) reading the barcodes from right to left...? Anyway can you please confirm/clarify this?

Thanks!

Demultiplexing and collapsing takes days to complete for at-scale experiment

Hi-
We finally generated some full-scale SPLiT-seq data, and the runtime for this tool is quite large. We have ~850M reads; the initial demultiplexing step as well as the collapsing ODT/random hexamers each take days to complete.

Are there any performance improvements you might be able to make to get this tool to scale better with data input size? zUMIs by comparison can do most of this in hours or less...

Thanks!

What are the recommended steps once demultiplexing is complete?

It would be helpful to describe what steps are recommended once the FASTQs are separated by cell. The SPLiT-seq paper utilizes one of the Drop-seq tools (TagReadWithGeneExon), followed by Starcode to collapse UMIs of aligned reads that were within 1 nt mismatch of another UMI. They then don't describe how they generated their final cell x gene expression matrix, but I assume it's the DigitalExpression tool from the Drop-seq toolkit as well.

I'm trying to figure out how best to connect the dots from the output of your method to those steps, or if some other approach is better.

Any advice would be appreciated!

Questions about UMI and Barcode Position

Hi,
Thank you for this good work.

I am testing your last version of SPLiT-Seq_demultiplexing0.1.1.
As, i understood from the SplitSeq Protocol, the Read1 and Read2 of fastq files are described as :

Read 1 (66 nt) = transcript
Read 2 (94 nt) = UMI + BC3 + spacer + BC2 +spacer + BC1
where the UMI starts at - 1nts, BC3- Starts at 11 nts, BC2- Starts at 48 nts, BC1- Starts at 86 nts

So, when i look at the results of SPLiT-Seq_demultiplexing0.1.1 ( using small test fastq files downloaded along with this logiciel ) of merged fastq files, the UMI is always the first 10 nts of READ1 not first 10nts of READ2 !!!

For example ( for this read 1 and read 2 ):

Read1:

@SRR6750041.1 1/1
CTGGANAAGTGAAATAATATAAATTTTTCCACTATTGAATAAAAGCAACTTAAATTTTCTAAGTCG
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEA<AAEEEEE<6

Read2:

@SRR6750041.1 1/2
NNTACTAAAGGCTAACGAGTGGCCGCTGTTTCGCATCGGCGTACGACTATTGAGGAATCCACGTGCTTGAGAGGCCAGAGCATTCGAACGCTTA
+
##AAAEEEEEEEEEEEEEE/6E/EE6/EAE<66666AEAAAE66<<</<<EEEEEEEEAAAAE666666AAE<<EEEEEEEE<AEAEEEEEEE/

SPLiT-Seq_demultiplexing0.1.1 produces the output as :

@SRR6750041.1_AGCATTCGAACGCTTAATTGAGGAATCCAGCTAACGAGTGGCC_CTGGANAAGT 1/1
CTGGANAAGTGAAATAATATAAATTTTTCCACTATTGAATAAAAGCAACTTAAATTTTCTAAGTCG
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEA<AAEEEEE<6

Where in the header last characters _CTGGANAAGT is the UMI, which is the first 10nts of READ1 not READ2 !!!!

So, Can you please tell me whether this is correct, or it is me who is wrong.

I hope i could explain you clearly.
Thank you in advance for your reply,
with best wishes,
Duma

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.