Giter Site home page Giter Site logo

shi7's Introduction

Prerequisites

  1. Python 2.7+
  2. Java

Installation

The CONDA way (personal install), recommended for Linux or Windows WSL

  1. Follow steps 1 and 2 of https://bioconda.github.io/ (including installing MiniConda 3.6 if you don't have miniconda)
  2. Do this in a terminal:
conda create -n shi7 -c knights-lab shi7
source activate shi7

Alternative portable/server install:

Grab the latest release package (not source), extract, then add to PATH. You should be able to execute shi7.py on the commandline.

Confused or new to UNIX? Or something not working right? Try following the super-specific directions below:

Here are some super specific installation instructions for the portable install:

For our purposes, we will install and use SHI7 on an interactive shell on our supercomputer, MSI, like so: isub -n nodes=1:ppn=16 -m 22GB -w 12:00:00 (skip this if installing on your own computer, just open a (bash) terminal and optionally enter a directory where you want to install SHI7 with cd). You'll want to update the URLs below with the latest version on the release page above!

  1. Have python (you will most likely have this already!) and importantly (at least on MacOS) the Java SDK: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html (If you are installing java for the first time, you MUST REBOOT to have it be recognized).

  2. Download and unpack the latest release: (if on MacOS, replace the word "linux" with "mac" near the end of the wget link!)

wget https://github.com/knights-lab/shi7/releases/download/v0.9.9/shi7_0.9.9_linux_release.zip
unzip shi7_*_release.zip
chmod +x shi7_*_linux_release/*
  1. Add SHI7 binaries to your PATH so they can be called on the commandline anywhere: On Linux:
echo "PATH=$PWD/shi7_0.9.9_linux_release:$PATH" >> ~/.bashrc

On Mac:

echo "PATH=$PWD/shi7_0.9.9_mac_release:$PATH" >> ~/.bash_profile
  1. Reload your terminal environment and test shi7.py:
. ~/.bash_profile
. ~/.bashrc
shi7.py -h

At this point you should see the help screen printed out and SHI7 should be installed.

Using SHI7 (simplest method):

Note: if you installed via the CONDA method, omit the ".py" extension at the end of shi7 and shi7_learning

  1. To run interactively on a supercomputer like MSI (skip this step otherwise):
isub -n nodes=1:ppn=16 -m 22GB -w 12:00:00`
module load python
  1. Learn the appropriate shi7 parameters from the raw (unzipped) fastq data (replace 'myFastqFolder' with your actual data directory):
shi7_learning.py -i myFastqFolder -o learnt
  1. Run shi7.py with the output of 2b: chmod +x learnt/shi7_cmd.sh && ./learnt/shi7_cmd.sh (or just copy the command that shi7_learning prints out when it finishes and run that)

Other usage examples:

Assuming you have a bunch of fastq files, of forward and reverse reads, split up by sample, that have Nextera adaptors:

shi7.py -i MyFastQFolder -o MyOutputFolder --adaptor Nextera

Assuming you only have R1 reads (no paired end):

shi7.py -i MyFastQFolder -o MyOutputFolder --adaptor Nextera -SE

This sets the minimum read length to 285 and the maximum to 300 when stitching, which is the canonical HMP V4 16S primer coverage region. This can be a powerful QC step in and of itself. Note: if using the EMP V4 protocol, omit these arguments.

If you have shotgun sequences, you might want to try not stitching (we recommend trying first and seeing how many stitch -- "percent combined" in the shi7.log file):

--flash False

Including --drop_r2 True in this case returns only R1 reads.

We recommend the following format for sequence file names:

sampleID_other_information_R1.fastq
sampleID_other_information_R2.fastq

Then, using strip_underscore True will return processed reads with just the sampleID, simplifying downstream processing. For example, an efficient command for non-stitching shotgun sequences with the sample names in the filenames before the first underscore character:

shi7.py -i MyFastQFolder -o MyOutputFolder --adaptor Nextera --flash False --strip_delim _,1 --drop_r2 True

Cite

To cite SHI7: Please cite the article published here: http://msystems.asm.org/content/3/3/e00202-17

Al-Ghalith, Gabriel A., Benjamin Hillmann, Kaiwei Ang, Robin Shields-Cutler, and Dan Knights. 2018. “SHI7 Is a Self-Learning Pipeline for Multipurpose Short-Read DNA Quality Control.” mSystems 3 (3). American Society for Microbiology Journals: e00202–17.

shi7's People

Contributors

bhillmann avatar gabeal avatar kaiweiang avatar rrshieldscutler avatar transcript avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.