Giter Site home page Giter Site logo

Comments (13)

paulranum11 avatar paulranum11 commented on June 12, 2024

Hi geneticsmcgill,

SPLiT-Seq_demultiplexing should be robust across differentially positioned split-seq barcodes. The barcode position is not a fixed parameter. Instead of extracting sequences at a fixed position SPLiT-Seq_Demultiplexing searches for sequence matches corresponding to each split-seq barcode. Because of this architecture barcode sequence and flanking sequence (but not position) impact barcode 1 identification. Have you confirmed that the barcodes you are using in read1 match the barcode and flanking sequences in the Round1_barcodes_new5.txt file?

Another consideration is the use of oligoDT and random hexamer RT primers. The --collapse setting needs to be set to true or false based on your configuration and desire to collapse reads obtained from barcodes in the same round 1 well. Unwanted collapse of barcodes could result in loss of 1/2 of the expected round 1 barcodes.

I hope this helps... let me know if you have further questions. If you can provide me with an example of your read2.fastq file and expected barcode configuration i may be able to look further into the issue.

-Paul

from split-seq_demultiplexing.

paulranum11 avatar paulranum11 commented on June 12, 2024

You may find this conversation from a previous issue helpful when checking that the sequences in the Round1_barcodes_new5.txt match the sequences you used in your experiment.

#3

from split-seq_demultiplexing.

geneticsmcgill avatar geneticsmcgill commented on June 12, 2024

from split-seq_demultiplexing.

paulranum11 avatar paulranum11 commented on June 12, 2024

Hi geneticsmcgill,

Could you also point me to the SPLiT-Seq V3 documentation that you based your library prep on?

Thanks,
Paul

from split-seq_demultiplexing.

paulranum11 avatar paulranum11 commented on June 12, 2024

Also, i was unable to see any attached files. If you comment directly from github you should be able to add attachments.

from split-seq_demultiplexing.

geneticsmcgill avatar geneticsmcgill commented on June 12, 2024

Hi Paul,

Sorry about that! Let me know if this is any better. Appreciate the help. Here are the files:

reads.zip
SPLiTseqV3.0_OligonucleotideSequences (1).xlsx
SPLiT-seq Protocol V3.0 (4).pdf

from split-seq_demultiplexing.

paulranum11 avatar paulranum11 commented on June 12, 2024

Hi Geneticsmcgill,

I took a look at your reads. The reads in your passing file contain the predicted amplicon structure with both the heterogeneous barcode containing positions and the static connecting sequences (See attached image). However the majority of reads in the failing file bear little resemblance to the predicted SPLiT-Seq amplicon sequence at all (See attached image). The static intervening sequence between barcodes 2 and 3 is only detected in one of these reads. So from a bioinformatic perspective they are correctly failed.

There could be several explanations for the presence of these reads in your data including:

  1. Poor sequencing quality.
  2. Nextera XT Adapter ligation of non-split-seq fragments.
  3. Amplification of non-split-seq-barcoded fragments.
  4. Amplification of tagmented products using Nextera XT primers instead of the SPLiT-Seq specific primers which control the start position of barcode sequencing.

It is a very complex workflow so there are many places you could troubleshoot.

This probably isn't what you wanted to hear but... i hope this helps.

  • Paul

Screen Shot 2020-11-16 at 4 44 39 PM

from split-seq_demultiplexing.

paulranum11 avatar paulranum11 commented on June 12, 2024

You also may have a concatamer issue. About 25% of the reads in the 1000.failed.sam.read2.fastq file contain all or part of a repeating sequence TGATACCACTGCTTCCCATTCACTCTGCGT . The reads below are composed almost exclusively of this repeating sequence.

GCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGATACCACTGCTTCCCATTCACTCTGCGT GCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGCTACCACTGCTTCCCATTCACTCTGCGT GCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGATACCACTGCTTCCCATTCACTCTGCGT

from split-seq_demultiplexing.

geneticsmcgill avatar geneticsmcgill commented on June 12, 2024

from split-seq_demultiplexing.

paulranum11 avatar paulranum11 commented on June 12, 2024

Amplification of the tagmented products should be performed with "BC_0118" AATGATACGGCGACCACCGAGATCTACACTAGATCGCTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG and one of "BC_0076 through BC_0083" see the first tab of the excel doc you sent.

In my above comment "4." i meant to indicate that using the primers that come with the nextera XT kit instead of the SPLiT-Seq provided primers can cause problems because the nextera XT primers randomly set the read2 start position at the position of the nextera transposase sequence. In contrast the SPLiT-Seq BC_0018 primer positions the start of the read2 sequence at the beginning of the UMI such that it is in correct position to read through all the SPLiT-Seq barcodes.

from split-seq_demultiplexing.

geneticsmcgill avatar geneticsmcgill commented on June 12, 2024

You also may have a concatamer issue. About 25% of the reads in the 1000.failed.sam.read2.fastq file contain all or part of a repeating sequence TGATACCACTGCTTCCCATTCACTCTGCGT . The reads below are composed almost exclusively of this repeating sequence.

GCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGATACCACTGCTTCCCATTCACTCTGCGT GCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGCTACCACTGCTTCCCATTCACTCTGCGT GCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGATACCACTGCTTCCCATTCACTCTGCGTTGATACCACTGCTTCCCATTCACTCTGCGT

Thanks Paul. It seems to be a concatamer from sequencing primers? I ran fastqc on failed vs passed reads. I was wondering if you have any insight into why that may be the case given the high proportion of them in read2? I appreciate the help.

fastqc.zip

from split-seq_demultiplexing.

paulranum11 avatar paulranum11 commented on June 12, 2024

One factor that can contribute to primer concatamer formation is the availibility of template sequence. It may be the case that you have a lower than ideal amount SPLiT-Seq barcoded template sequences available for amplification at this stage of the library prep. This could be from a non-optimal completion of any of the previous steps (bead purification, template switching, low numbers of nuclei...). One reagent that I would check if I were you is the Template Switching Oligo as it contains RNA bases and degradation of these bases can impede function. Is it stored in aliquots at -80? If so, take out a new aliquot. If not you may want to reorder it.

You may be able to get a sense for the extent of your issue by looking at the data that was successfully generated. Do you have a high number of genes and UMIs for the cells that were successfully identified? Or is it very low? High numbers of reads but low UMI counts would support the idea that you don't have much successfully barcoded template and that you need to troubleshoot your library prep.

from split-seq_demultiplexing.

paulranum11 avatar paulranum11 commented on June 12, 2024

If you have found this advice or the software package useful please consider staring the repository.

Thanks,
Paul

from split-seq_demultiplexing.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.