Giter Site home page Giter Site logo

kgzaker / smart-seq3 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sandberg-lab/smart-seq3

0.0 0.0 0.0 76.35 MB

Code and analysis pipeline for Smart-seq3 (Hagemann-Jensen et al. 2020).

License: GNU General Public License v3.0

Python 77.88% R 22.12%

smart-seq3's Introduction

Smart-seq3

This repository contains the scripts and pipelines used to process and analyse Smart-seq3 libraries, as described in Hagemann-Jensen et al. 2020. https://doi.org/10.1038/s41587-020-0497-0

We here provide the code to perform the following steps, that are expanded upon in the dedicated sub-folders.

1) Processing of Smart-seq3 data with zUMIs.

We show how fastq files are efficiently processed to BAM files in a manner that simultaneously distinguishes 5' from internal reads, and error-corrects both cell barcodes and molecular barcodes using zUMIs.

First, you should obtain raw fastq files without demultiplexing, as the data will be processed in a pooled fashion. When running the bcl2fastq conversion, be sure to keep index read fastq files.

Example for a dual-index, 150 bp PE run: bcl2fastq --use-bases-mask Y150N,I8,I8,Y150N --no-lane-splitting --create-fastq-for-index-reads -R /mnt/storage1/NextSeqNAS/191011_NB502120_0154_AHVG7JBGXB

Next, prepare your config file in YAML format for zUMIs. The UMI sequence needs to be correctly extracted from 5' reads in Smart-seq3. These will always be the first Illumina read and are recognized by our unique 11bp tag sequence. Thus, you need to set the following settings:

file1:
    name: /mnt/storage2/temp_workdir/Undetermined_S0_L003_R1_001.fastq.gz
    base_definition:
      - cDNA(23-150)
      - UMI(12-19)
    find_pattern: ATTGCGCAATG

You can find an example YAML file here.

Note that we advise caution when using STARs 2-pass mapping mode, as we have observed some spurious novel splice junctions being used that may distort molecule reconstructions.

2) Scripts to reconstruct RNA molecules based on the zUMIs prepared BAM files.

Using our python script stitcher.py we in silico reconstruct RNA molecules based on the read pair alignments in the zUMIs generated BAM files. Note that for RNA reconstruction, paired-end sequencing data is required. This step results in a new BAM file where each entry is a reconstructed molecule.

https://github.com/AntonJMLarsson/stitcher.py/tree/57330b5af97a338d914b4504121a5d018eb2c3d5

3) Scripts to assign reconstructed RNA molecules to allelic origins.

We provide a stand-alone Rscript that assigns molecules to their allele of origin.

https://github.com/sandberg-lab/Smart-seq3/tree/master/allele_level_expression

4) Scripts to assign reconstructed RNA molecules to transcript isoforms.

Using a couple of python scripts, we assign each RNA molecule to a set of compatible isoforms (including unique assignments). The resulting assignments are reported in tab-delimited text files.

https://github.com/sandberg-lab/Smart-seq3/tree/master/ss3iso

5) Notebooks.

Here we post notebooks that show the analysis workflows for selected analyses from Hagemann-Jensen et al. as R or Python Jupyter notebooks.

smart-seq3's People

Contributors

antonjmlarsson avatar pingchen-angela avatar cziegenhain avatar rickardsandberg avatar baobaocp121 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.