Giter Site home page Giter Site logo

dat-ecosystem-archive / anacapa-container Goto Github PK

View Code? Open in Web Editor NEW
9.0 5.0 6.0 26.77 MB

A containerized way to run the Anacapa eDNA processing toolkit on your own machine or server [ DEPRECATED - More info on active projects and modules at https://dat-ecosystem.org/ ]

R 10.76% Shell 69.76% Dockerfile 19.47%

anacapa-container's Introduction

deprecated

More info on active projects and modules at dat-ecosystem.org


Anacapa Container

Instructions For Running The Anacapa Toolkit in a Singularity container using Linux or a Vagrant Virtualbox for Mac/Windows.

Written by Emily Curd and Max Ogden.

Requirements

  • Linux (Recommended) or Windows/Mac via Virtualbox (Slower but works)
  • Around 6GB of disk space

Overview

The following guide shows how to download and run the Anacapa toolkit and process a small test dataset. To ensure reproducibility we recommend verifying the test dataset matches the included expected output data from our verified runs. Once you complete these steps you may continue on to load your own data for custom analysis.

Linux Instructions

For Mac/Windows instructions, first see the Vagrant section below.

We recommend using Linux to run Anacapa Container, as running it on Windows or Mac involves virtualization which imposes performance and resource limitations, potentially making analysis slower depending on the size of your data.

1. Install Singularity

If you are intending to run this on a shared university cluster (one where you are not a system administrator and cannot use 'sudo' or install new system packages yourself) you will need to ask your system administrators to install Singularity. Please share with them this admin guide: http://singularity.lbl.gov/admin-guide

Otherwise, if you are using your own server, you can follow the admin guide yourself to install Singularity.

This guide was tested with Singularity version 2.5.2.

2. Download the container and test data

Download the Linux container dataset from Zenodo (Mirror). You can do wget <url of download link> to download on the CLI directly and then tar xf downloaded-file.tar.gz to extract it.

You should now see 3 files.

anacapa-1.5.0.img

This is the Singularity container with all necessary software dependencies (Python, R, Perl, Bash) you will use with Singularity to execute the Anacapa toolkit. This image was created using Containerfile in this repository.

anacapa_db.tar.gz

This is a copy of the Anacapa toolkit packaged with a full copy of all CRUX primer types, and a small 12S test sequence. The extracted anacapa directory will contain the following:

  • Anacapa_db (the toolkit itself)
  • 12S_test_data
  • Anacapa_test_data_expected_output_after_QC_dada2
  • Anacapa_test_data_expected_output_after_classifier
  • Crux_test_expected_output

crux_db.tar.gz

(Optional) A partial copy of the CRUX repository modified for use in Vagrant and a small 16S test dataset. If developing your own reference libraries, you will need to extract this and then download additional files. Follow the instructions in the CRUX GitHub readme if so.

3. Test the container

Try this command to enter the container:

singularity shell anacapa-1.5.0.img

You should see something like this:

$ singularity shell anacapa-1.5.0.img
Singularity: Invoking an interactive shell within container...

Singularity anacapa-1.5.0.img:~> 

Any commands you type in the Singularity shell will happen inside the container. Type exit to go back to your normal shell.

Accessing other folders

For singularity version 2.5.2, only $HOME, /tmp, /proc, /sys, and /dev are automatically shared from the host filesystem into the container. So if you need the scripts inside the container to be able to directly access a folder other than those you should be able to add that folder with a -B argument to your singularity commands. For example: -B /home/anacapa.

To test if it worked, you can go back into the singularity shell:

singularity shell -B /home/anacapa anacapa-1.5.0.img
Singularity anacapa-1.5.0.img:~> ls /home/anacapa

4. Run the Anacapa QC example

This script runs the Anacapa QC pipeline with the included 12S_test_data. Save this as a new file called run-anacapa-qc.sh, and then edit the variables to point to your extracted anacapa folders and other paths it requires.

# EDIT THESE
BASEDIR="/vagrant" # change to folder you want shared into container
CONTAINER="/vagrant/anacapa-container/anacapa-1.5.0.img" # change to full container .img path
DB="/vagrant/Anacapa/Anacapa_db" # change to full path to Anacapa_db
DATA="$DB/12S_test_data" # change to input data folder (default 12S_test_data inside Anacapa_db)
OUT="$BASEDIR/12S_time_test" # change to output data folder

# OPTIONAL
FORWARD="$DB/12S_test_data/forward.txt"
REVERSE="$DB/12S_test_data/reverse.txt"

cd $BASEDIR

# If you need additional folders shared into the container, add additional -B arguments below

time singularity exec -B $BASEDIR $CONTAINER /bin/bash -c "$DB/anacapa_QC_dada2.sh -i $DATA -o $OUT -d $DB -f $FORWARD -r $REVERSE -e $DB/metabarcode_loci_min_merge_length.txt -a truseq -t MiSeq -l -g"

The expected results can be found in anacapa/Anacapa_test_data_expected_output_after_QC_dada2

Approximate time to run:

real	0m45.906s
user	0m43.568s
sys	0m1.396s

If using slurm or qsub an example job file is available in the jobs/ folder of this repository.

5. Run the Anacapa Classifier example

This script runs the Anacapa Classifier pipeline on the output of the QC pipeline. It is similar to the first script. Save it as "run-anacapa-classifier.sh", edit, and run:

# EDIT THESE
BASEDIR="/vagrant" # change to folder you want shared into container
CONTAINER="/vagrant/anacapa-container/anacapa-1.5.0.img" # change to full container .img path
DB="/vagrant/Anacapa/Anacapa_db" # change to full path to Anacapa_db
OUT="$BASEDIR/12S_time_test" # change to output data folder

cd $BASEDIR

# If you need additional folders shared into the container, add additional -B arguments below
time singularity exec -B $BASEDIR -B $SINGULARITY $CONTAINER /bin/bash -c "$DB/anacapa_classifier.sh -o $OUT -d $DB -l"

The expected results can be found in anacapa/Anacapa_test_data_expected_output_after_classifier

Approximate time to run:

real	0m19.467s
user	0m13.384s
sys	0m1.480s

If using slurm or qsub an example job file is available in the jobs/ folder of this repository.

6. (Optional) Run the CRUX example

Note To run Crux on Linux, you'll need to download a large amount of external genomic/taxonomy data. Please see the Crux readme for more info.

Note For all the follow examples, you may need to change the exact paths to match the paths on your local machine. Included paths use vagrant paths as an example.

$ singularity exec /vagrant/anacapa-1.5.0.img /bin/bash /vagrant/anacapa/crux_db/crux.sh -n 16S_example -f GTGYCAGCMGCCGCGGTAA -r GGACTACNVGGGTWTCTAAT -s 200 -m 450 -o ~/anacapa/crux_db/16S_example -d ~/anacapa/crux_db/ -l -a 1 -v 0.001 -e 5 -q

The expected results can be found in ~/anacapa/Crux_test_expected_output

Vagrant

Instructions For Mac/Windows Users

These instructions will allow users to run the Anacapa Toolkit on an OSX or PC. The authors note that running Anacapa on a Virtualbox may reduce the speed, and performance of the toolkit. We recommend that the user allow the Vagrant Virtualbox at least 10 GB of memory. If the vagrant Virtualbox does not have at least 5 GB of free Memory some steps may fail.

  1. Install Singularity Vagrant Virtualbox Linux

Follow these Vagrant/Singularity installation instructions (These pages are also backed up in Git files above if links break):

  1. Start a new instance and login

After a successful installation of the Singularity Vagrant VirtualBox, start a new instance by logging into a terminal (for Mac) or GitBash for (PC).

As per the above instructions, you should have already created a folder e.g. singularity-vm and executed the command vagrant init singularityware/singularity-2.4 inside of it. You only have to do those two commands once as a set up step. However, each time you wish to use the vagrant machine you have to ensure you run the following three commands:

$ cd singularity-vm/
$ vagrant up
$ vagrant ssh

The final command, vagrant ssh, will open a terminal into the virtual machine where you have the singularity command available.

  1. Download the Anacapa vagrant container

Download the Anacapa Vagrant Container (Mac/Windows) (Mirror). Extract the downloaded .tar.gz file and you should see an anacapa.img and a copy of both Anacapa and CRUX configured for use in Vagrant. Move the crux_db folder inside the Anacapa_db folder. Then move anacapa-1.5.0.img and Anacapa_db into the singularity-vm folder.

  1. Access the downloaded data from inside vagrant

Vagrant will automatically sync files in the folder singularity-vm to and from the guest vagrant machine running the container. This means you can edit the files inside singularity-vm with your Mac or Windows text editor and the container will have access to them immediately.

By default, Vagrant shares your project directory to the /vagrant directory in your guest machine.

You should have already placed all of the downloaded folders inside the singularity-vm folder. Double check they are available in the vagrant shell:

vagrant@vagrant:/vagrant$ ls /vagrant

You should see the contents of the singularity-vm folder.

  1. Follow test instructions

At this point please follow the Linux instructions above, starting at number 3 "Test the container", using your vagrant ssh session to enter the commands.

anacapa-container's People

Contributors

julianpistorius avatar max-mapper avatar ninabreznik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

anacapa-container's Issues

Readme Vague

hey Max, great container. trying to get this up and running on our new server in the Barber lab. We were a little stumped by the copy url instructions to download from zenodo. We ended up copying the link of each of three datasets. Is there a url to download all three at once? Was not clear. Thanks

No username given

Hello,

I am trying to run this container in Hoffman2. I've downloaded all the files and the singularity seems to be working fine. The issue is when I try to run the first script, the Anacapa QC step. I've followed the instructions and edited the paths. But when I run the script using sh run-anacapa-qc.sh, I get the following warning and no output:

INFO:    Converting SIF file to temporary sandbox...
/u/local/Modules/4.7.0/gcc-4.8.5/init/bash: line 37: /usr/bin/tclsh: No such file or directory
/u/local/Modules/4.7.0/gcc-4.8.5/init/bash: line 37: /usr/bin/tclsh: No such file or directory

Running in HPC mode
No username given...

INFO:    Cleaning up image...

real    0m16.936s
user    0m14.808s
sys     0m13.375s

I've edited the script to have my Hoffman2 username:

time singularity exec --userns -B $BASEDIR $CONTAINER /bin/bash -c $DB/anacapa_QC_dada2.sh -i $DATA -o $OUT -u rturba -d $DB -f $FORWARD -r $REVERSE -e $DB/metabarcode_loci_min_merge_length.txt -a truseq -t MiSeq -g

But I still get the same warning. I don't know what is happening.

installing vagrant on mac osx with Big Sour

I just wanted to document some issues i ran into when following the installation instructions (on my personal macbook)

  1. make sure to run everything as root

after you open the shell, type sudo -i and enter your password

see https://stackoverflow.com/a/31711279/1183277

  1. If you follow the instruction to install vagrant https://singularity.lbl.gov/install-mac

note that

brew cask install virtualbox
brew cask install vagrant
brew cask install vagrant-manager

must now be typed as

brew install --cask virtualbox
brew install --cask vagrant
brew install --cask vagrant-manager

see https://stackoverflow.com/a/66081492

  1. when you run vagrant init singularityware/singularity-2.4 it will fail

You need to go to Sytem preferences > Security & Privacy. Some Oracle thing will be there and you need to allow it and restart

(However, I think just must let it fail for the option to appear, but not sure)

see hashicorp/vagrant#12049 (comment)

After I troubleshooted these steps above, I managed to install vagrant. However see also #6 (correct path to image)

ERROR : Image path doesn't exists

I have followed the instructions to install the Vagrant virtual machine on mac osx (Big Sour)

it seems to work. From within vagrant I get this:

vagrant@vagrant:~$ ls /vagrant
Vagrantfile  anacapa-1.5.0.img  anacapa_db_vagrant

However, loading the image fails

vagrant@vagrant:~$ singularity shell  /usr/local/singularity-vm/anacapa-1.5.0.img 
ERROR  : Image path doesn't exists
ABORT  : Retval = 255

I noted that the instruction say

This guide was tested with Singularity version 2.5.2.

However, when installing Singularity in the virtual box following the instructions here https://singularity.lbl.gov/install-mac

Install singularity 2.4 and no more recent version seems to be available.

vagrant@vagrant:~$ singularity --version
2.4-dist

Would you have any pointers?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.