Giter Site home page Giter Site logo

umi-tools's Introduction

https://user-images.githubusercontent.com/6096414/93030687-c7cf7300-f61c-11ea-92b8-102ec17ef6aa.png

UMI-tools was published in Genome Research on 18 Jan '17 (open access)

For full documentation see https://umi-tools.readthedocs.io/en/latest/

Tools for dealing with Unique Molecular Identifiers

This repository contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes. Currently there are 6 commands.

The extract and whitelist commands are used to prepare a fastq containg UMIs +/- cell barcodes for alignment.

  • whitelist:
    Builds a whitelist of the 'real' cell barcodes
    This is useful for droplet-based single cell RNA-Seq where the identity of the true cell barcodes is unknown. Whitelist can then be used to filter with extract (see below)
  • extract:
    Flexible removal of UMI sequences from fastq reads.
    UMIs are removed and appended to the read name. Any other barcode, for example a library barcode, is left on the read. Can also filter reads by quality or against a whitelist (see above)

The remaining commands, group, dedup and count/count_tab, are used to identify PCR duplicates using the UMIs and perform different levels of analysis depending on the needs of the user. A number of different UMI deduplication schemes are enabled - The recommended method is directional.

  • dedup:
    Groups PCR duplicates and deduplicates reads to yield one read per group
    Use this when you want to remove the PCR duplicates prior to any downstream analysis
  • group:
    Groups PCR duplicates using the same methods available through `dedup`.
    This is useful when you want to manually interrogate the PCR duplicates
  • count:
    Groups and deduplicates PCR duplicates and counts the unique molecules per gene
    Use this when you want to obtain a matrix with unique molecules per gene, per cell, for scRNA-Seq.
  • count_tab:
    As per count except input is a flatfile

See QUICK_START.md for a quick tutorial on the most common usage pattern.

If you want to use UMI-tools in single-cell RNA-Seq data processing, see Single_cell_tutorial.md

Important update: We now recommend the use of alevin for droplet-based scRNA-Seq (e.g 10X, inDrop etc). alevin is an accurate, fast and convenient end-to-end tool to go from fastq -> count matrix and extends the UMI error correction in UMI-tools within a framework that also enables quantification of droplet scRNA-Seq without discarding multi-mapped reads. See alevin documentation and alevin pre-print for more information

The dedup, group, and count / count_tab commands make use of network-based methods to resolve similar UMIs with the same alignment coordinates. For a background regarding these methods see:

Genome Research Publication

Blog post discussing network-based methods.

Installation

If you're using Conda, you can use:

$ conda install -c bioconda -c conda-forge umi_tools

Or pip:

$ pip install umi_tools

Or if you'd like to work directly from the git repository:

$ git clone https://github.com/CGATOxford/UMI-tools.git

Enter repository and run:

$ python setup.py install

For more detail see INSTALL.rst

Help

For full documentation see https://umi-tools.readthedocs.io/en/latest/

See QUICK_START.md and Single_cell_tutorial.md for tutorials on the most common usage patterns.

To get help on umi_tools run

$ umi_tools --help

To get help on the options for a specific [COMMAND], run

$ umi_tools [COMMAND] --help

Dependencies

umi_tools is dependent on python>=3.5, numpy, pandas, scipy, cython, pysam, future, regex and matplotlib

umi-tools's People

Contributors

andreasheger avatar bowhan avatar cbrueffer avatar christianbioinf avatar daniel-liu-c0deb0t avatar hoohm avatar iansudbery avatar jbloom avatar johanneskoester avatar jz314 avatar k3yavi avatar kohlkopf avatar mikej888 avatar oguya avatar peterch405 avatar popucui avatar redst4r avatar tomsmithcgat avatar y9c avatar yfu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.