Giter Site home page Giter Site logo

funannotate-docker's Introduction

funannotate docker

Docker Pulls Docker Image Version (latest by date)

This is a docker image for the funannotate genome annotation pipeline.

docker pull reslp/funannotate:1.7.2
docker pull reslp/funannotate:1.7.4
docker pull reslp/funannotate:1.8.1
docker pull reslp/funannotate:1.8.3
docker pull reslp/funannotate:1.8.3_antismashV6 # this version contains a fix to correctly parse Antismash V6 results
docker pull reslp/funannotate:1.8.7
docker pull reslp/funannotate:1.8.11 (currently broken due to broken augustus in bioconda!)
docker pull reslp/funannotate:1.8.13
docker pull reslp/funannotate:experimental # removes phylogenetic reconstruction and heatmaps from funannotate compare (based on 1.7.4)
docker pull reslp/funannotate:git_clone # based on latest commit on build date: June 22, 2020 (based on 1.7.4)

Note: As of funannotate version 1.8.3 the container is based on a conda installation of funannotate. This lead to major changes in the Dockerfile. The basis for container versions pre 1.8.3 as the second Dockerfile: Dockerfile_pre_1.8.3

Table of Contents

Status of Container
Installation
Example command
Singularity
Installed software

Current status of container

Funannotate provides lots of different functions which depend on many different programs. This list provides on overview of funannotate's basic functionality by using different symbols:

❌ feature currently not working
✴️ feature and dependencies installed but not yet tested
✅ feature and dependencies installed and tested

funannotate clean
funannotate sort
funannotate mask:

  • tantan ✅
  • repeatmasker ✅
  • repeatmodeler ✅

funannotate train ✴️
funannotate predict:

  • AUGUSTUS ✅
  • Genemark ✅
  • Snap ✅
  • GlimmerHMM ✅
  • BUSCO ✅
  • Evidence Modeler ✅
  • tbl2asn ✅
  • tRNAScan-SE ✅
  • Exonerate ✅
  • minimap ✅
  • CodingQuarry ✴️

funannotate fix ✴️
funannotate update ✴️
funannotate remote
funannotate iprscan
funannotate annotate
funannotate compare ✴️ (works with experimental image reslp/funannotate:experimental, this contains stripped down version of compare without phylogenetic reconstruction)

funannotate util
funannotate setup
funannotate test
funannotate check
funannotate species
funannotate database

Installation

Install the Container

With a working Docker installation simple run:

docker pull reslp/funannotate:latest
docker run --rm -it reslp/funannotate:latest check

to download and run the latest version of the container.

If you wish to go inside the container you can do:

docker run --rm -it --entrypoint /bin/bash reslp/funannotate:latest

External dependencies:

A few programs are not included in the container. They need to be kept externally due to license incompatibility or large size:

  1. Signal-P 4.1: www.cbs.dtu.dk/services/SignalP/
  2. GeneMark-ES: exon.gatech.edu/GeneMark/
  3. Repeatmasker libraries from RepBase
  4. InterproScan: https://www.ebi.ac.uk/interpro/download/

The way to get these programs into the container is to place them into a folder and then mount this folder to a specific point in the container by adding a certain flag to the docker run command:

-v /local/location_of_programs:/data/external

The docker container is set up in such a way as that it searches for specific folders in the root directory and adds them to the path. This way, funannotate running inside the container finds the desired programs. Currently the container is setup to add the following folders o the PATH hence the version names:

/data/external/signalp-4.1
/data/external/gm_et_linux_64

In Docker the container expects the GeneMark license key file in /data/. In Sinularity it depends on how you run your container. Typically the license file needs to be in your home directory.

A Note on SignalP:

You need to change the signalp script to point to the correct directory (inside the container) otherwise signalp will fail to run. It should look like this:

###############################################################################
#               GENERAL SETTINGS: CUSTOMIZE TO YOUR SITE
###############################################################################

# full path to the signalp-4.1 directory on your system (mandatory)
BEGIN {
    $ENV{SIGNALP} = '/data/external/signalp-4.1';
}

# determine where to store temporary files (must be writable to all users)
my $outputDir = "/tmp";

# max number of sequences per run (any number can be handled)
my $MAX_ALLOWED_ENTRIES=100000;

Setting up the funannotate database:

Funannotate relies on a larger dataset of different kinds which it uses to add functional annotations, train gene finders etc. This database needs to be created with

funannotate setup

The docker container has the location folder for the database hardcoded (by setting the FUNANNOTATE_DB environment variable) to /data/database/. This folder needs to be overwritten with the local location of the database when funannotate is run. This is again done when docker run is invoked:

-v /local/location/of/database:/data/database

Where is my data?

Data will be stored in /data which can be mounted from an external folder as well like so:

-v $(pwd):/data

Example commands for the funannotate docker container

The commands presented here assume that the current working directory contains the folders database and external:

$ ls
database
external
genome.fas

The external directory contains Signal-P, interproscan and genemark:

$ ls external
signalp-4.1
gm_et_linux_64
interproscan-5.39-77.0

With a directory structure like this it is possible to add all external dependencies and the database with a single mount command to the container.

This command mounts external dependencies and a database folder:

docker run --rm -it -v $(pwd):/data reslp/funannotate check

These commands perform clean, sort, mask and predict using the container:

docker run --rm -it -v $(pwd):/data reslp/funannotate clean -i /data/genome.fas -o /data/genome_cleaned.fas 

docker run --rm -it -v $(pwd):/data reslp/funannotate sort -i /data/genome_cleaned.fas -o /data/genome_sorted.fas 

docker run --rm -it -v $(pwd):/data reslp/funannotate mask -i /data/genome_sorted.fas -o /data/genome_masked.fas -m repeatmasker --cpus 8

docker run --rm -it -v $(pwd):/data reslp/funannotate predict -i /data/genome_masked.fas -s "sample_species" -o /data/sample_species_preds --cpus 8

Singularity

The idea is to make this container also work with Singularity. This is important because most big clusters don't allow Docker due to the high user privileges it requires. In such environments Singularity offers an alternative to Docker. With singularity it is possible to build Singularity containers directly from Dockerhub. This of course also works with the funannotate container:

singularity pull docker://reslp/funannotate:1.8.1

Singularity however does a few things differently compared to Docker. One important difference is, that Singularity images are read only. Only bound user directories are writable. This is important to remember when using the container. It is therefore important (even more as for Docker) to use the pre defined bind points for the database and external programs.

Installed software

The funannotate container includes (version numbers refer to the latest build tag and the latest version):

funannotate 1.8.11
CodingQuarry 2.0
Trinity 2.8.5
Augustus 3.3.3
BLAT 2.2.31+
FASTA36 36.3.8
diamond 2.0.15\ GMAP 2021-08-25
GlimmerHMM-3.0.4
minimap2 2.24-r1122
kallisto 0.46.1
Proteinortho 6.0.33
pslCDnaFilter v. latest
salmon 0.14.1
snap 2006-07-28
stringtie 2.2.1
tRNA-Scan SE 2.0.9 (July 2021)
Infernal 1.1.3
trimmomatic 0.39
tantan 22
trimal 1.4.1
PASA 2.5.2
EvidenceModeler 1.1.1
ete3 3.0.0b35
RECON 1.08
RepeatScout 1.0.6
TRF 409
rmblast 2.9.0+
RepeatMasker 4.0.7
RepeatModeler 2.0.1\

Python modules:

python 3.8.12
biopython: 1.77
goatools: 1.2.3
matplotlib: 3.4.3
natsort: 8.1.0
numpy: 1.23.1
pandas: 1.4.3
psutil: 5.9.1
requests: 2.28.1
scikit-learn: 1.1.2
scipy: 1.8.0
seaborn: 0.11.2\

funannotate-docker's People

Contributors

reslp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

funannotate-docker's Issues

ERROR in funannotate setup

hello,
I am start to build the four programs in funannotate-docker, and I have two directories /database/ and /external/ in /data/. And gm_et_linux_64 (file) , signalp-4.1(directory) and interproscan-5.46-81.0(directory) in external and change the signalp script.

Firstly, I use docker run --rm -it -v $PWD/database:/root/database -v $PWD/external:/data/external -v $PWD:/data -w /data --entrypoint /bin/bash reslp/funannotate:latest to inside the container.
Then set the FUNANNOTATE_DB environment variable export FUNANNOTATE_DB=/root/database
When i use the command "funannotate setup -i all -f",there is an error,
I want to use a command like this funannotate predict -i mygenome.fa -o output_folder -s "Aspergillus nidulans" --pasa_gff mypasamodels.gff3:8 --other_gff prediction1.gff3:5 prediction2.gff3:1

ERROR:

-------------------------------------------------------
[07:41 PM]: OS: linux2, 112 cores, ~ 1056 GB RAM. Python: 2.7.17
[07:41 PM]: Running 1.7.4
[07:41 PM]: Database location: /root/database
[07:41 PM]: Parsing Augustus pre-trained species and porting to funannotate
Traceback (most recent call last):
  File "/usr/local/bin/funannotate", line 660, in <module>
    main()
  File "/usr/local/bin/funannotate", line 650, in main
    mod.main(arguments)
  File "/usr/local/lib/python2.7/dist-packages/funannotate/setupDB.py", line 652, in main
    meropsDB(DatabaseInfo, args.force, args=args)
  File "/usr/local/lib/python2.7/dist-packages/funannotate/setupDB.py", line 132, in meropsDB
    type, name, version, date, records, checksum = info.get('merops')
TypeError: 'NoneType' object is not iterable

setupDB.py seems not work, but i didn't know how to fix ,could you help me?

IOError: [Errno 2] No such file or directory: 'p2g_18546/diamond.matches.tab'

Hello,

I am trying to run funannotate (version 1.7.4) through docker on a CentOS Linux 7. I get the following error:
"IOError: [Errno 2] No such file or directory: 'p2g_18546/diamond.matches.tab"

docker run --rm -it -v $(pwd):/data reslp/funannotate:1.7.4 predict -i JU1783-k35_Gapclosed_v1.fa.masked.cleanfun.sorted --species "Rhabditis sp. JU1783" --transcript_evidence Trinity.fasta.clean --rna_bam JU1783-RNA-reads-aln-sortedgenome.bam --protein_evidence metazoa-and-Arh-proteins.fasta --pasa_gff sample_mydb_pasa.sqlite.valid_gmap_alignments.gff3 --organism other --ploidy 2  --busco_db nematoda --strain JU1783 -o JU1783-funannotate1
-------------------------------------------------------
[01:04 PM]: OS: linux2, 24 cores, ~ 132 GB RAM. Python: 2.7.17
[01:04 PM]: Running funannotate v1.7.4
[01:04 PM]: Parsed training data, run ab-initio gene predictors as follows:
  Program        Training-Method
  augustus       pasa
  codingquarry   rna-bam
  glimmerhmm     pasa
  snap           pasa
[01:04 PM]: Loading genome assembly and parsing soft-masked repetitive sequences
[01:04 PM]: Genome loaded: 3,083 scaffolds; 58,291,147 bp; 17.03% repeats masked
[01:04 PM]: Existing transcript alignments found: JU1783-funannotate1/predict_misc/transcript_alignments.gff3
[01:04 PM]: Existing RNA-seq BAM hints found: JU1783-funannotate1/predict_misc/hints.BAM.gff
[01:07 PM]: Mapping 8,255,650 proteins to genome using diamond and exonerate
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/funannotate/aux_scripts/funannotate-p2g.py", line 266, in <module>
    Hits = parseDiamond(BlastResult)
  File "/usr/local/lib/python2.7/dist-packages/funannotate/aux_scripts/funannotate-p2g.py", line 107, in parseDiamond
    with open(blastresult, 'rU') as input:
IOError: [Errno 2] No such file or directory: 'p2g_18546/diamond.matches.tab'
Traceback (most recent call last):
  File "/usr/local/bin/funannotate", line 660, in <module>
    main()
  File "/usr/local/bin/funannotate", line 650, in main
    mod.main(arguments)
  File "/usr/local/lib/python2.7/dist-packages/funannotate/predict.py", line 951, in main
    lib.exonerate2hints(Exonerate, hintsP)
  File "/usr/local/lib/python2.7/dist-packages/funannotate/library.py", line 3315, in exonerate2hints
    with open(file, 'r') as input:
IOError: [Errno 2] No such file or directory: '/data/JU1783-funannotate1/predict_misc/protein_alignments.gff3'

Do you have any suggestions on how to fix this?

I have run the following commands (no errors) before running into the problem:

docker pull reslp/funannotate:1.7.4

docker run --rm -it -v /local/path/to/data/database:/data/database reslp/funannotate:1.7.4 setup -b nematoda

docker run --rm -it -v $(pwd):/data reslp/funannotate:1.7.4 clean -i /data/JU1783-k35_Gapclosed_v1.fa.masked -o /data/JU1783-k35_Gapclosed_v1.fa.masked.cleanfun

docker run --rm -it -v $(pwd):/data reslp/funannotate:1.7.4 sort -i /data/JU1783-k35_Gapclosed_v1.fa.masked.cleanfun -o /data/JU1783-k35_Gapclosed_v1.fa.masked.cleanfun.sorted

Any suggestion welcome!
Thanks,
Sophie

RepeatMasker Missing for docker

funannotate-docker mask -i DNA_clean_sort.fa -m repeatmasker -s Viridiplantae -l ../../Database/Libraries/RMRBSeqs.embl
#ERROR
[Oct 09 01:03 PM]: OS: Debian GNU/Linux 10, 80 cores, ~ 264 GB RAM. Python: 3.8.12
[Oct 09 01:03 PM]: Running funanotate v1.8.16
[Oct 09 01:03 PM]: Missing Dependencies: RepeatMasker. Please install missing dependencies and re-run script

how can I used repeatmasker after docker

error: ete3 not installed

I use singularity to install funannotate by
singularity pull docker://reslp/funannotate:1.8.1

When funannotate check --show-versions, it showed
Traceback (most recent call last):
File "/usr/local/bin/ete3", line 6, in
from ete3.tools.ete import main
File "/usr/local/lib/python3.7/dist-packages/ete3/tools/ete.py", line 55, in
from . import (ete_split, ete_expand, ete_annotate, ete_ncbiquery, ete_view,
File "/usr/local/lib/python3.7/dist-packages/ete3/tools/ete_view.py", line 48, in
from .. import (Tree, PhyloTree, TextFace, RectFace, faces, TreeStyle, CircleFace, AttrFace,
ImportError: cannot import name 'TextFace' from 'ete3' (/usr/local/lib/python3.7/dist-packages/ete3/init.py)

even I install ete3 by pip and set environmental PATH, it also happened.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.