Giter Site home page Giter Site logo

tseemann / nullarbor Goto Github PK

View Code? Open in Web Editor NEW
129.0 20.0 37.0 11.6 MB

:floppy_disk: :page_with_curl: "Reads to report" for public health and clinical microbiology

License: GNU General Public License v2.0

Shell 3.01% Perl 84.67% CSS 0.94% HTML 1.83% PHP 0.57% Makefile 0.55% Raku 8.43%
bacteria report fastq denovo-assembly variant-calling genotyping resistome virulome phylogenomics public-health

nullarbor's Introduction

License: GPL v2 Don't judge me Difficult to install

Nullarbor

Pipeline to generate complete public health microbiology reports from sequenced isolates

โš ๏ธ This documents the current Nullarbor 2.x version; previous 1.x is here

Motivation

Public health microbiology labs receive batches of bacterial isolates whenever there is a suspected outbreak.In modernised labs, each of these isolates will be whole genome sequenced, typically on an Illumina or Ion Torrent instrument. Each of these WGS samples needs to quality checked for coverage, contamination and correct species. Genotyping (eg. MLST) and resistome characterisation is also required. Finally a phylogenetic tree needs to be generated to show the relationship and genomic distance between the strains. All this information is then combined with epidemiological information (metadata for each sample) to assess the situation and inform further action.

Example reports

Feel free to browse some example reports.

Pipeline

Limitations

Nullarbor currently only supports Illumina paired-end sequencing data; single end reads, from either Illumina or Ion Torrent are not supported. All jobs are run on a single compute node; there is no support yet for distributing the work across a high performance cluster.

Per isolate

  1. Clean reads
    • remove adaptors, low quality bases and reads (Trimmomatic)
  2. Species identification
    • k-mer analysis against known genome database (Kraken, Kraken2, Centrifuge)
  3. De novo assembly
    • User can select (SKESA, SPAdes, Megahit, shovill, Velvet)
  4. Annotation
    • Add features to assembly Prokka)
  5. MLST
    • From assembly w/ automatic scheme detection (mlst + PubMLST)
  6. Resistome
  7. Virulome
  8. Variants
    • From reads aligned to reference (snippy)

Per isolate set

  1. Core genome SNPs
  2. Infer core SNP phylogeny
  3. Pan genome
    • From annotated contigs (Roary)
  4. Report
    • Summary isolate information (HTML + Plotly.JS + DataTables + PhyloCanvas)
    • More detailed per isolate pages (COMING SOON)

Installation

You need to install both the software and the databases separately.

Software

Conda

Install Conda or Miniconda:

conda install -c conda-forge -c bioconda -c defaults nullarbor

Homebrew (coming soon)

Install Homebrew (macOS) or LinuxBrew (Linux).

brew install brewsci/bio/nullarbor

Source

This is the hardest way to install Nullarbor.

cd $HOME
git clone https://github.com/tseemann/nullarbor.git

# keep running this command and installing stuff until it says everything is correct
./nullarbor/bin/nullarbor.pl --check

# For Perl modules (eg. YAML::Tiny), use one of the following methods
apt-get install yaml-tiny-perl  # ubuntu/debian
yum install perl-YAML-Tiny      # centos/redhat
cpan YAML::Tiny
cpanm YAML::Tiny

Databases

Kraken

You need to install a Kraken database (~8 GB).

wget https://ccb.jhu.edu/software/kraken/dl/minikraken_20171019_8GB.tgz
tar -C $HOME -zxvf minikraken_20171019_8GB.tgz

Kraken 2

You need to install a Kraken2 database (~8 GB).

wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/minikraken2_v2_8GB_201904_UPDATE.tgz
tar -C $HOME -zxvf minikraken2_v2_8GB_201904_UPDATE.tgz

Centrifuge

Install a Centrifuge database (~8 GB):

wget ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/p_compressed+h+v.tar.gz
mkdir $HOME/centrifuge-db
tar -C $HOME/centrifuge-db -zxvf p_compressed+h+v.tar.gz

Set global database locations

Then add the following to your $HOME/.bashrc so Nullarbor can find the databases:

export KRAKEN_DEFAULT_DB=$HOME/minikraken_20171019_8GB
export KRAKEN2_DEFAULT_DB=$HOME/minikraken2_v2_8GB_201904_UPDATE
export CENTRIFUGE_DEFAULT_DB=$HOME/centrifuge-db/p_compressed+h+v

You should be good to go now. When you first run Nullarbor it will let you know of any missing dependencies or databases.

Usage

Check dependencies

Nullarbor does a self-check of all binaries, Perl modules and databases:

nullarbor.pl --check

Create a 'samples' file (TAB)

This is a file, one line per isolate, with 3 tab separated columns: ID, R1, R2.

Isolate1	/data/reads/Isolate1_R1.fq.gz	/data/reads/Isolate2_R1.fq.gz
Isolate2	/data/reads/Isolate2_R1.fq      /data/reads/Isolate2_R2.fq
Isolate3	/data/old/s_3_1_sequence.txt	/data/old/s_3_2_sequence.txt
Isolate3b	/data/reads/Isolate3b_R1.fastq	/data/reads/Isolate3b_R2.fastq

Choose a reference genome (FASTA, GENBANK)

This is just a regular FASTA or GENBANK file. Try and choose a reference phylogenomically similar to your isolates.
If you use a GENBANK or EMBL file the annotations will be used to annotate SNPs by Snippy.

Generate the run folder

This command will create a new folder with a Makefile in it:

nullarbor.pl --name PROJNAME --mlst saureus --ref US300.fna --input samples.tab --outdir OUTDIR

This will check that everything is okay. One of the last lines it prints is the command you need to run to actually perform the analysis e.g.

Run the pipeline with: nice make -j 4 -C OUTDIR

So you can just cut and paste that:

nice make -j 4 -C OUTDIR

The -C option just means to change into the /home/maria/listeria/nullarbor folder first, so you could do this instead:

cd OUTDIR
make -j 4

View the report

firefox OUTDIR/report/index.html

Here are some example reports.

See some options

Once set up, a Nullarbor folder can be used in a few different ways. See what's available with this command:

make help

Advanced usage

Quick preview mode

You should not do a full run the first time, because it will probably contain outliers and QC failures. To build a quick "rough" tree:

make preview

This will create a mini-report in the same report/ folder. Use this to identify outliers and then comment them out (or delete) them from the --input file. Then type the following to regenerate the report for a second round of inspection:

make again
make preview

When you are happy with the result, proceed with the full analysis:

make again
make

Prefilling data

Often you want to perform multiple analyses where some of the isolates have been used in previous Nullarbor runs. It is wasteful to recompute results you already have. The --prefill option allows you to "copy" existing result files into a new Nullarbor folder before commencing the run.

To set it up, add a prefill section to nullarbor.conf as follows:

# nullarbor.conf
prefill:
        contigs.fa: /home/seq/MDU/QC/{ID}/contigs.fa

The {ID} will replaced for each isolate ID in your --input TAB file and the contigs.fa copied from the source path specified. This will prevent Nullarbor having to re-assemble the reads.

Using different components

Nullarbor 2.x has a plugin system for assembly and tree building. These can be changed using the --assembler and --treebuilder options.

Read trimming is off by default, because most sequences are now provided pre-trimmed, and retrimming occupies much disk space. To trim Illumina adaptors, use the --trim option.

Removing isolates from an existing run

After examining the report from your initial analysis, it is common to observe some outliers, or bad data. In this case, you want to remove those isolates from the analysis, but want to minimize the amount of recomputation needed.

Just go to the original --input TAB file and either (1) remove the offending lines; or (2) just add a # symbol to "comment out" the line and it will be ignored by Nullarbor.

Then go back into the Nullarbor folder and type make again and it should make a new report. Assemblies and SNPs won't be redone, but the tree-builder and pan-genome components will need to run again.

Adding isolates to an existing run

As per "Removing isolates" above, you can also add in more isolates to your original --input TAB file when you want to expand the analysis. Then just type make again and it should only recalculate things it needs to, saving a lot of computation.

Immediate start

If you don't want to cut and paste the make .... instructions to start the analysis, just add the --run option to your nullarbor.pl command.

Influential environmental variables

  • NULLARBOR_CONF - default --conf, the path to nullarbor.conf
  • NULLARBOR_CPUS - default --cpus
  • NULLARBOR_ASSEMBLER - default --assembler tool
  • NULLARBOR_TREEBUILDER - default --treebuilder tool
  • NULLARBOR_TAXONER - default --taxoner tool

Dependencies

Nullarbor has many dependencies, so you are best off using a package manager to install it. Type nullarbor.pl --check to see what you need.

Perl: Bio::Perl Time::Piece List::Util Path::Tiny YAML::Tiny Moo SVG Text::CSV List::MoreUtils IO::File

Tools: seqtk trimmomatic prokka roary mlst abricate seqret skesa megahit spades shovill snippy snp-dists newick-utils iqtree fasttree quicktree kraken kraken2 centrifuge

Databases: minikraken centrifuge-bacvirhum

Note that these are only the immediate dependencies and that the tools listed above will depend on various other tools, Perl modules, and Python modules.

Etymology

The Nullarbor is a huge treeless plain that spans the area between south-west and south-east Australia. It comes from the Latin "nullus" (no) and "arbor" (tree), or "no trees". As this software will generate a tree, there is an element of Australian irony in the name.

Issues

Submit problems to the Issues Page

License

GPL 2.0

Citation

Seemann T, Goncalves da Silva A, Bulach DM, Schultz MB, Kwong JC, Howden BP. Nullarbor Github https://github.com/tseemann/nullarbor

nullarbor's People

Contributors

andersgs avatar drpowell avatar tseemann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nullarbor's Issues

--force option sometimes re-performs mapping and SNP calls

I want to run nullarbor on a subset of my samples that have already been run through nullarbor.
For organisation purposes, and for ease of viewing of just that subset for external parties, I want to regenerate the nullarbor webpage report, otherwise I would have just run snippy-core.

I created a new project folder, with symlinks to the sample folders in the main nullarbor directory. I created a new samples.tab file with the list of isolates in the subset for analysis and generated a new makefile from this.

However, when running nullarbor from this new project folder with --force, nullarbor repeats the mapping and SNP calling process. Why is this? The reference is unchanged.

skewer taking a long time ...

Is it usual for skewer to take 8.5 minutes to clip reads? (514 seconds for 5,619,342 MiSeq reads using 64 cpus). I thought I remember skewer running much faster than that eg. 30 seconds previously.

Installing Nullarbor on Bio-linux 8.0 (Ubuntu 14.10)

I just installed Bio-linux and Nullarbor on a couple of my students laptops. The nullarbor installation had a few issues and the issues were the same on each computer (different specs).
Here is what I did to get nullarbor installed on each machine, maybe this can help someone else.

Blast install fails.

brew install blast --without-check
brew install nullarbor

openssl install fails

brew remove curl
brew install curl
brew install nullarbor

librsvg install fails

sudo apt-get install libgtk-3-dev
brew install librsvg
brew install nullarbor

nullabor install breaks

Dear Group
This issue appears to be very similar to issue #47
I am trying to install nullabor on a Ubuntu 14.04 machine.
The "brew install nullarbor" command produces compile errors. Examining ~/.cache/Homebrew/Logs/blast/02.make it appears that the Boost headers can not be found. The gcc command does not seem to list the locations (-I) They are installed in the system (/usr/include/boost) as well as within linuxbrew (e.g. brew install boost
Warning: boost-1.60.0_1 already installed
)
Is there a way to include extra compile paths (i.e. -I) into the brew command?

regards
Simon

make abricate fails

When trying to run a job for the first time, if one only wants abricate results, just running make abricate crashes.

It searches for the clip reads, and when it doesn't find them, it crashes. For example:

  Makefile:658: recipe for target '2015-22510/R1.fq.gz' failed

nullarbor.pl is not passing --cpus value to Snippy and FastTree

kraken --threads 64 --preload --quick --paired 2012-10754/R1.fq.gz 2012-10754/R2.fq.gz | kraken-report > 2012-10754/kraken.tab
Loading database... complete.
4078498 sequences (1972.19 Mbp) processed in 98.247s (2490.8 Kseq/m, 1204.43 Mbp/m).
  3924156 sequences classified (96.22%)
  154342 sequences unclassified (3.78%)
snippy --force --outdir 2012-10753/2012-10753 --ref ref.fa --R1 2012-10753/R1.fq.gz --R2 2012-10753/R2.fq.gz
[12:50:38] This is snippy 2.6
[12:50:38] Written by Torsten Seemann <[email protected]>
[12:50:38] Obtained from https://github.com/tseemann/snippy
[12:50:38] Detected operating system: linux
[12:50:38] Enabling bundled linux tools.
[12:50:38] Found bwa - /bio/linuxbrew/bin/bwa
[12:50:38] Found samtools - /bio/linuxbrew/bin/samtools
[12:50:38] Found tabix - /bio/linuxbrew/bin/tabix
[12:50:38] Found bgzip - /bio/linuxbrew/bin/bgzip
[12:50:38] Found parallel - /bio/linuxbrew/bin/parallel
[12:50:38] Found freebayes - /bio/linuxbrew/bin/freebayes
[12:50:38] Found freebayes-parallel - /bio/linuxbrew/bin/freebayes-parallel
[12:50:38] Found fasta_generate_regions.py - /bio/linuxbrew/bin/fasta_generate_regions.py
[12:50:38] Found vcffilter - /bio/linuxbrew/bin/vcffilter
[12:50:38] Found vcfstreamsort - /bio/linuxbrew/bin/vcfstreamsort
[12:50:38] Found vcfuniq - /bio/linuxbrew/bin/vcfuniq
[12:50:38] Found vcffirstheader - /bio/linuxbrew/bin/vcffirstheader
[12:50:38] Found vcf-consensus - /bio/linuxbrew/bin/vcf-consensus
[12:50:38] Found snippy-vcf_to_tab - /home/tseemann/git/snippy/bin/snippy-vcf_to_tab
[12:50:38] Found snippy-vcf_report - /home/tseemann/git/snippy/bin/snippy-vcf_report
[12:50:38] Using reference: /home/jkwong1/testing/nullarbor/test/ref.fa
[12:50:38] Will use 8 CPU cores.
[12:50:38] Using read file: /home/jkwong1/testing/nullarbor/test/2012-10753/R1.fq.gz
[12:50:38] Using read file: /home/jkwong1/testing/nullarbor/test/2012-10753/R2.fq.gz
[12:50:38] Creating folder: 2012-10753/2012-10753
...
FastTree -gtr -nt core.aln > tree.newick
FastTree Version 2.1.8 Double precision (No SSE3), OpenMP (8 threads)
Alignment: core.aln

Brew installable

brew install fails due to a dependency of ImageMagic -

Clean Ubuntu 14.04 install.

...
sudo apt-get -y install build-essential curl git m4 ruby texinfo libbz2-dev libcurl4-openssl-dev libexpat-dev libncurses-dev zlib1g-dev python-pip libpng-dev unzip flex bison python-dev libpng-dev pkg-config libcairo2-dev perl-doc expect

sudo perl -MCPAN -e "CPAN::Shell->notest('install', 'Bio::Perl')"
sudo cpan -i Moo
sudo cpan -i Spreadsheet::Read
sudo cpan -i SVG::Graph
...
brew install nullarbor

Installing dependencies for nullarbor: libcroco, librsvg, imagemagick
==> Installing nullarbor dependency: libcroco
==> Downloading http://ftp.gnome.org/pub/GNOME/sources/libcroco/0.6/libcroco-0.6.8.tar.xz
Already downloaded: /home/vagrant/.cache/Homebrew/libcroco-0.6.8.tar.xz
==> ./configure --prefix=/home/vagrant/.linuxbrew/Cellar/libcroco/0.6.8 --disable-Bsymbolic
installed software in a non-standard prefix.

Alternatively, you may set the environment variables CROCO_CFLAGS

Log is at: https://gist.github.com/4c13e3e099957a6c4cc9

nullarbor quits without completing

nullarbor stops without error before finishing.

final output is

[15:05:58] Walltime used: 0.62 minutes
[15:05:58] If you use this result please cite the Prokka paper:
[15:05:58] Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics. 30(14):2068-9.
[15:05:58] Type 'prokka --citation' for more details.
[15:05:58] Share and enjoy!
make: Leaving directory

Roary has not been run and no report has been compiled

Abricate table doesn't handle duplicate genes in report!

it's hashing on 'GENE' but that may not be unique !

eg.

/home/tseemann/tmp/6008.fna     gi|384860682|ref|NC_017341.1|   897649  898380  erm(A)  1-732/732       =============== 0       100.00  99.86
/home/tseemann/tmp/6008.fna     gi|384860682|ref|NC_017341.1|   1733079 1733810 erm(A)  1-732/732       =============== 0       100.00  99.86

snippy error

Hi Torsten,

The latest brew recipe for nullarbor throws an error when running snippy. I blieve this was previously flagged and fixed (tseemann/snippy#45)

Changes to the snippy code (tseemann/snippy@a8dc9b2) are not present in the version of snippy (v2.9) packaged with the nullarbor brew recipe.

Changing the code results in an error with freebayes-parallel

`### freebayes-parallel reference/ref.txt 4 -p 1 -q 20 -m 60 --min-coverage 10 -V -f reference/ref.fa snps.bam > snps.raw.vcf

parallel: Error: --tollef has been retired.
parallel: Error: Remove --tollef or use --gnu to override --tollef.
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
/home/smrtanalysis/.linuxbrew/bin/freebayes-parallel: line 40: 22492 Exit 255 ( cat $regionsfile | parallel -k -j $ncpus "$command --region {}" )
22493 Done | vcffirstheader
22495 Aborted (core dumped) | vcfstreamsort -w 1000
22497 Aborted (core dumped) | vcfuniq
`

Add SNP density plot to report

Plot density of (core) SNPs across reference genome to ensure it is uniformly distributed.

Could possibly do a statistical test to check and alert the reader.

error 1

nice make -j 1 -C /media/sf_linuxpasty/data/ahmedtest4 [10:10AM]
make: Entering directory `/media/sf_linuxpasty/data/ahmedtest4'
mkdir -p Isolate1
any2fasta.pl /media/sf_linuxpasty/data/Pm70.fna > ref.fa
samtools faidx ref.fa
fq --quiet --ref ref.fa /media/sf_linuxpasty/data/P1234_1.fastq.gz /media/sf_linuxpasty/data/P1234_2.fastq.gz > Isolate1/yield.dirty.tab
Calculating depth, using size 2295190
trimmomatic PE -threads 3 /media/sf_linuxpasty/data/P1234_1.fastq.gz /media/sf_linuxpasty/data/P1234_2.fastq.gz Isolate1/R1.fq.gz /dev/null Isolate1/R2.fq.gz /dev/null ILLUMINACLIP:/home/manager/.linuxbrew/Cellar/nullarbor/1.01/bin/../conf/trimmomatic.fa:1:30:11 LEADING:10 TRAILING:10 MINLEN:30
TrimmomaticPE: Started with arguments:
-threads 3 /media/sf_linuxpasty/data/P1234_1.fastq.gz /media/sf_linuxpasty/data/P1234_2.fastq.gz Isolate1/R1.fq.gz /dev/null Isolate1/R2.fq.gz /dev/null ILLUMINACLIP:/home/manager/.linuxbrew/Cellar/nullarbor/1.01/bin/../conf/trimmomatic.fa:1:30:11 LEADING:10 TRAILING:10 MINLEN:30
Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG'
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'

Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'

Using Long Clipping Sequence: 'TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC'

Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'

Using Long Clipping Sequence: 'TTTTTTTTTTCAAGCAGAAGACGGCATACGA'

Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'

Using Long Clipping Sequence: 'AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG'

Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'

Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'

Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'

Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'

Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT'

Using Long Clipping Sequence: 'AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAG'

Skipping duplicate Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'

Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT'

Using Long Clipping Sequence: 'AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG'

Using Long Clipping Sequence:
'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT'

Skipping duplicate Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'

Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'

Using Long Clipping Sequence: 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT'

ILLUMINACLIP: Using 2 prefix pairs, 17 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences

Quality encoding detected as phred33

Input Read Pairs: 507695 Both Surviving: 507446 (99.95%) Forward Only Surviving: 138 (0.03%)
Reverse Only Surviving: 96 (0.02%) Dropped: 15 (0.00%)

TrimmomaticPE: Completed successfully

fq --quiet --ref ref.fa Isolate1/R1.fq.gz Isolate1/R2.fq.gz > Isolate1/yield.clean.tab

Calculating depth, using size 2295190
kraken --threads 3 --preload --paired Isolate1/R1.fq.gz Isolate1/R2.fq.gz | kraken-report > Isolate1/kraken.tab

Loading database... complete.

507446 sequences (153.26 Mbp) processed in 23.622s (1288.9 Kseq/m, 389.28 Mbp/m).
490768 sequences classified (96.71%)
16678 sequences unclassified (3.29%)
rm -f -r Isolate1/megahit
mkdir -p Isolate1
megahit --min-count 3 --k-list 21,31,41,53,75,97,111,127 -t 3 --memory 0.5 -1 Isolate1/R1.fq.gz -2 Isolate1/R2.fq.gz --out-dir Isolate1/megahit --min-contig-len 500
7.0Gb memory in total.

Using: 3.852Gb.

MEGAHIT v1.0.3

--- [Wed Apr 6 10:15:46 2016] Start assembly. Number of CPU threads 3 ---

--- [Wed Apr 6 10:15:46 2016] k list: 21,31,41,53,75,97,111,127 ---

make: *** [Isolate1/contigs.fa] Error 1

make: Leaving directory `/media/sf_linuxpasty/data/ahmedtest4'

Hidden requirement 'fa'

I can't really figure out how to install fa -- please help!

make
../bin/nullarbor.pl --outdir ./t --ref data/ref.fa --input data/data.tab  --force --mlst saureus --name NullTest
[15:14:21] Hello root
[15:14:21] This is nullarbor.pl 0.5
[15:14:21] Send complaints to Torsten Seemann <[email protected]>
[15:14:21] Found 'kraken' => /opt/kraken/kraken
[15:14:21] Found 'snippy' => /opt/snippy-2.6/bin/snippy
[15:14:21] Found 'mlst' => /usr/local/bin/mlst
[15:14:21] Found 'abricate' => /opt/abricate/bin/abricate
[15:14:21] Found 'megahit' => /usr/local/bin/megahit
[15:14:21] Found 'nw_order' => /usr/local/bin/nw_order
[15:14:21] Found 'nw_display' => /usr/local/bin/nw_display
[15:14:21] Found 'trimal' => /usr/local/bin/trimal
[15:14:21] Found 'FastTree' => /usr/local/bin/FastTree
[15:14:21] Found 'fq' => /opt/build/nullarbor/bin/fq
[15:14:21] Could not find 'fa'. Please install it and ensure it is in the PATH.

Without fa on-hand, I commented it out and tried the test run. I am guessing that this error is also related:

    [16:22:53] Loading pre-masked/aligned sequences...
    [16:22:53] 1/4  genome01 coverage 0/68250 = 0.00%
    [16:22:53] 2/4  genome02 coverage 0/68250 = 0.00%
    [16:22:53] 3/4  genome03 coverage 0/68250 = 0.00%
    [16:22:53] 4/4  genome04 coverage 0/68250 = 0.00%
    [16:22:53] Patching variant sites into whole genome alignment...
    [16:22:53] Constructing alignment object for core.full.aln

    --------------------- WARNING ---------------------
    MSG: Got a sequence without letters. Could not guess alphabet
    ---------------------------------------------------

    --------------------- WARNING ---------------------
    MSG: Got a sequence without letters. Could not guess alphabet
    ---------------------------------------------------

    --------------------- WARNING ---------------------
    MSG: Got a sequence without letters. Could not guess alphabet
    ---------------------------------------------------

    --------------------- WARNING ---------------------
    MSG: Got a sequence without letters. Could not guess alphabet
    ---------------------------------------------------
    [16:22:53] Writing 'fasta' alignment to core.full.aln
    [16:22:53] Writing core SNP table
    [16:22:53] Found 0 core SNPs from 0 variant sites.
    [16:22:53] Saved SNP table: core.tab
    [16:22:53] Constructing alignment object for core.aln
    [16:22:53] Writing 'fasta' alignment to core.aln
    [16:22:53] Done.
    trimal -in core.full.aln -out core.nogaps.aln -nogaps

    WARNING: Removing sequence 'Reference' composed only by gaps
    WARNING: Removing sequence 'genome01' composed only by gaps
    WARNING: Removing sequence 'genome02' composed only by gaps
    WARNING: Removing sequence 'genome03' composed only by gaps
    WARNING: Removing sequence 'genome04' composed only by gaps


    WARNING: Output alignment has not been generated. It is empty.

    mlst --scheme saureus genome01/contigs.fa > genome01/mlst.tab
    mlst --scheme saureus genome02/contigs.fa > genome02/mlst.tab
    mlst --scheme saureus genome03/contigs.fa > genome03/mlst.tab
    mlst --scheme saureus genome04/contigs.fa > genome04/mlst.tab
    (head -n 1 genome01/mlst.tab && tail -q -n +2 genome01/mlst.tab genome02/mlst.tab genome03/mlst.tab genome04/mlst.tab) > mlst.tab
    make: *** No rule to make target `genome01/denovo.tab', needed by `denovo.tab'.  Stop.
    make: Leaving directory `/opt/build/nullarbor/test/t'

can't install blast for nullarbor

linuxmint@linuxmint ~/nullarbor $ brew install nullarbor
==> Installing nullarbor from tseemann/bioinformatics-linux
==> Installing dependencies for tseemann/bioinformatics-linux/nullarbor: blast, bedtools, cd-hit, mcl, mafft, libxml2, gettext, lib
==> Installing tseemann/bioinformatics-linux/nullarbor dependency: blast
==> Downloading ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.31/ncbi-blast-2.2.31+-src.tar.gz
Already downloaded: /home/linuxmint/.cache/Homebrew/blast-2.2.31.tar.gz
==> Patching
patching file c++/include/corelib/ncbimtx.inl
==> ./configure --prefix=/home/linuxmint/.linuxbrew/Cellar/blast/2.2.31_1 --libdir=/home/linuxmint/.linuxbrew/Cellar/blast/2.2.31_1/libexe
==> make
Last 15 lines from /home/linuxmint/.cache/Homebrew/Logs/blast/02.make:
^
compilation terminated.
make[3]: *** [test_boost.o] Error 1
make[3]: Leaving directory /tmp/blast20160303-61606-9bhsht/ncbi-blast-2.2.31+-src/c++/ReleaseMT/build/corelib' FAILED: src/corelib/Makefile.test_boost.lib make[3]: Entering directory/tmp/blast20160303-61606-9bhsht/ncbi-blast-2.2.31+-src/c++/ReleaseMT/build/corelib'
/bin/rm -f libtest_boost.a .test_boost.dep .libtest_boost.a.stamp
/bin/rm -f /tmp/blast20160303-61606-9bhsht/ncbi-blast-2.2.31+-src/c++/ReleaseMT/lib/libtest_boost.a /tmp/blast20160303-61606-9bhsht/ncbi-blast-2.2.31+-src/c++/ReleaseMT/status/.test_boost.dep
/tmp/blast20160303-61606-9bhsht/ncbi-blast-2.2.31+-src/c++/ReleaseMT/lib/libtest_boost-static.a /tmp/blast20160303-61606-9bhsht/ncbi-blast-2.2.31+-src/c++/ReleaseMT/status/.test_boost-static.dep
make[3]: Leaving directory /tmp/blast20160303-61606-9bhsht/ncbi-blast-2.2.31+-src/c++/ReleaseMT/build/corelib' make[2]: *** [all.nonusr] Error 2 make[2]: Leaving directory/tmp/blast20160303-61606-9bhsht/ncbi-blast-2.2.31+-src/c++/ReleaseMT/build/corelib'
make[1]: *** [all_r.real] Error 5
make[1]: Leaving directory `/tmp/blast20160303-61606-9bhsht/ncbi-blast-2.2.31+-src/c++/ReleaseMT/build'
make: *** [all] Error 2

READ THIS: https://github.com/Linuxbrew/linuxbrew/blob/master/share/doc/homebrew/Troubleshooting.md#troubleshooting
If reporting this issue please do so at (not Homebrew/homebrew):
https://github.com/Homebrew/homebrew-science/issues

assembly error using --accurate results in duplicate contig

When running nullarbor using --accurate, assembly occasionally results in a duplicate contig:
duplicate-contig

Running spades.py manually, using the same settings --careful --only-assembler --cov-cutoff auto produces the same duplication in the resulting scaffolds.fasta file, but not in the contigs.fasta file.

This only occurred in 1 out of 22 sequencing QC runs for the Listeria monocytogenes strain EGD-e. No idea why it didn't occur in the others.

Run nullarbor components separately

More of a request rather than an issue:

Will it be possible to run each of the components separately?
Eg. reading a list of samples in samples.tab, suppose all the reads were already clipped, and already had de novo assemblies and MLST, but you wanted to re-analyse a subset of the isolates using a different reference.
Could there be an option to run read metrics, snippy and snippy-core? I know I can run them separately, but would it be possible through the nullarbor command-line?

Thanks.

What does Please set KRAKEN_DEFAULT_DB appropriately mean?

manager@bl8vbox[data] nullarbor.pl --name African --mlst pmultocida_rirdc --ref Pm70.fna --input samples.tab --outdir ahmedtest
[09:24:38] Hello manager
[09:24:38] This is nullarbor.pl 1.01
[09:24:38] Send complaints to Torsten Seemann [email protected]
[09:24:38] Using reference genome: /media/sf_linuxpasty/data/Pm70.fna
[09:24:38] Loaded 1 isolates: Isolate1
[09:24:38] Found 'mlst' => /home/manager/.linuxbrew/bin/mlst
[09:24:38] Found 114 MLST schemes
[09:24:38] Using scheme: pmultocida_rirdc
[09:24:38] Making output folder: /media/sf_linuxpasty/data/ahmedtest
[09:24:38] Found 'convert' => /home/manager/.linuxbrew/bin/convert
[09:24:38] Found 'pandoc' => /usr/bin/pandoc
[09:24:38] Found 'head' => /usr/bin/head
[09:24:38] Found 'cat' => /bin/cat
[09:24:38] Found 'install' => /usr/bin/install
[09:24:38] Found 'env' => /usr/bin/env
[09:24:38] Found 'nl' => /usr/bin/nl
[09:24:38] Found 'date' => /bin/date
[09:24:38] Found 'trimmomatic' => /home/manager/.linuxbrew/bin/trimmomatic
[09:24:38] Found 'prokka' => /home/manager/.linuxbrew/bin/prokka
[09:24:38] Found 'roary' => /usr/local/bin/roary
[09:24:38] Found 'kraken' => /home/manager/.linuxbrew/bin/kraken
[09:24:38] Found 'snippy' => /home/manager/.linuxbrew/bin/snippy
[09:24:38] Found 'mlst' => /home/manager/.linuxbrew/bin/mlst
[09:24:38] Found 'abricate' => /home/manager/.linuxbrew/bin/abricate
[09:24:38] Found 'megahit' => /home/manager/.linuxbrew/bin/megahit
[09:24:38] Found 'spades.py' => /home/manager/.linuxbrew/bin/spades.py
[09:24:38] Found 'nw_order' => /home/manager/.linuxbrew/bin/nw_order
[09:24:38] Found 'nw_display' => /home/manager/.linuxbrew/bin/nw_display
[09:24:38] Found 'FastTree' => /home/manager/.linuxbrew/bin/FastTree
[09:24:38] Found 'fq' => /home/manager/.linuxbrew/bin/fq
[09:24:38] Found 'fa' => /home/manager/.linuxbrew/bin/fa
[09:24:38] Found 'afa-pairwise.pl' => /home/manager/.linuxbrew/bin/afa-pairwise.pl
[09:24:38] Found 'any2fasta.pl' => /home/manager/.linuxbrew/bin/any2fasta.pl
[09:24:38] Found 'roary2svg.pl' => /home/manager/.linuxbrew/bin/roary2svg.pl
[09:24:38] Found Perl module: Data::Dumper
[09:24:38] Found Perl module: Moo
[09:24:38] Found Perl module: Bio::SeqIO
[09:24:38] Found Perl module: File::Copy
[09:24:38] Found Perl module: Time::Piece
[09:24:38] Found Perl module: YAML::Tiny
[09:24:39] Parsed version '1.0' from 'MEGAHIT v1.0.3'
[09:24:39] Parsed version '3.0' from 'snippy 3.0'
[09:24:40] Parsed version '1.12' from 'prokka 1.12-beta'
[09:24:41] Parsed version '3.6' from '3.6.0'
[09:24:42] Parsed version '2.1' from 'mlst 2.1'
[09:24:42] Please set KRAKEN_DEFAULT_DB appropriately.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.