Giter Site home page Giter Site logo

rtvt123 / 2013-khmer-counting Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dib-lab/2013-khmer-counting

0.0 2.0 0.0 16.97 MB

Paper code: "These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure"

Home Page: http://arxiv.org/abs/1309.2975

2013-khmer-counting's Introduction

Running the khmer paper script pipeline

Date

June 9, 2014

Here are some brief notes on how to run the pipeline for our 2013 khmer counting paper on an Amazon EC2 rental instance. Using these commands you should be able to completely recapitulate the paper.

The instructions below will reproduce all of the figures in the paper, and will then compile the paper from scratch using the new figures.

Starting up a machine and get necessary data for reproduction

First, start up an EC2 instance using starcluster:

starcluster start -o -s 1 -i m2.2xlarge -n ami-999d49f0 pipeline

You can also do this via the AWS console; just use ami-999d49f0, and start an instance with 30gb or more of memory.

Make sure that port 22 (SSH) and port 80 (HTTP) are open; you'll need the first one to log in, and the second one to connect to the ipython notebook.

Now, log in! :

starcluster sshmaster pipeline

(or just ssh in however you would normally do it.)

First go to /mnt/ because we do not have enough space in home directory:

cd /mnt

Now, check out the source repository and grab the initial data sets:

git clone https://github.com/ngs-docs/ngs-scripts

git clone https://github.com/ged-lab/2013-khmer-counting.git
cd 2013-khmer-counting

curl -O http://public.ged.msu.edu.s3.amazonaws.com/2013-khmer-counting/2013-khmer-counting-data.tar.gz

tar xzf 2013-khmer-counting-data.tar.gz

Installing necessary software

Before we get started, we need to install all the necessary software(including khmer), including:

  • Tallymer
  • Jellyfish
  • DSK
  • KMC
  • BFCount
  • Turtle
  • QUAST
  • FASTX-toolkit
  • seqtk
  • ipython
  • LaTex
  • Velvet
  • Java
  • screed
  • khmer

To do so, run:

cd /mnt/2013-khmer-counting/pipeline
bash software_install.sh

OK, now all your software is installed, hurrah!

Running the pipeline

Now go into the pipeline directory and run the pipeline. This will take a few hours hours, so you might want to do it in 'screen' (see "Running long jobs on UNIX"). :

cd /mnt/2013-khmer-counting/pipeline
make KHMER=/usr/local/src/khmer

Once it successfully completes, copy the data over to the ../data/ directory:

make copydata

Run the ipython notebook server:

cd ../notebook
ipython notebook --no-browser --ip=* --port=80 &

Connect into the ipython notebook (it will be running at 'http://<your EC2 hostname>'); if the above command succeeded but you can't connect in, you probably forgot to enable port 80 on your EC2 firewall.

Once you're connected in, select the 'khmer-counting' notebook (should be the only one on the list) and open it. Once open, go to the 'Cell...' menu and select 'Run all'.

Now go back to the command line and execute:

% cd ../
% make

and voila, 'khmer-counting.pdf' will contain the paper with the figures you just created.

2013-khmer-counting's People

Contributors

ctb avatar jasonpell avatar mr-c avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.