Giter Site home page Giter Site logo

seqkit's Introduction

Seqkit

Seqkit is a suite of software utilities for manipulating and analyzing common genome sequencing data types (FASTA, SAM). Seqkit is written in Rust, and uses rust-htslib for reading and writing BAM files. Seqkit is divided into two utilities: fasta and sam. Each utility provides various useful subcommands. For a complete listing type the command name without arguments into your shell.

Features

For FASTA/FASTQ files, Seqkit can:

  • Convert between FASTA, FASTQ and raw sequence-per-line formats
  • Extract sample barcodes and UMIs from multiplexed FASTQ sequencing data
  • Demultiplex FASTQ sequencing data
  • Trim FASTQ sequencing data by per-base quality (BASEQ) values
  • Mask low quality bases in FASTQ sequencing data
  • Replace FASTQ read identifiers with compact numeric IDs

For BAM files, Seqkit can:

  • Extract reads from name-sorted or position-sorted BAM files
  • Calculate a histogram of fragment lengths
  • Calculate statistics about unaligned, aligned and duplicate-flagged reads

Installation

Install Rust (version 1.31 or later). Then run the following command:

cargo install --force --git https://github.com/annalam/seqkit

Examples

Extracting reads

Extract reads from a name-sorted or position-sorted BAM file called tumor.bam. Paired end reads are written in gzip-compressed FASTQ format into output files tumor_1.fq.gz and tumor_2.fq.gz, which are automatically created. Orphan reads are written into output file tumor.fq.gz. The second parameter specifies the prefix used for the output file names.

sam to fastq tumor.bam tumor

Demultiplex a pooled sequencing run

Extract UMIs and demultiplex Illumina sequencing data where both the sample barcode and UMI are stored in the adapter:

fasta demultiplex sample_sheet.tsv
  <(fasta add barcode multiplexed_R1.fq.gz multiplexed_I1.fq.gz)
  <(fasta add barcode multiplexed_R2.fq.gz multiplexed_I1.fq.gz)

seqkit's People

Contributors

annalam avatar mgvel avatar nurmians avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.