Giter Site home page Giter Site logo

beav's Introduction

Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

beav - a bacterial genome and mobile element annotation pipeline


beav: Bacteria/Element Annotation reVamped

beav is a command line tool that streamlines bacterial genome and mobile genetic element annotation. It combines multiple annotation tools, automating the process of running, parsing, and combining the results into a single easy-to-read output. Annotated features include secretion systems, anti-phage defense systems, integrative & conjugative/mobilizable elements, integrons, prophage regions, amino acid biosynthesis pathways, small carbon metabolite catabolism pathways, and biosynthetic gene clusters. Type VI secretion system (T6SS) vgrG operons are automatically identified. Plasmid origin of transfer (oriT) elements are also characterized.

The beav pipeline also includes several tools and databases that enhance the annotation of plant associated microbes, including phytopathogens and symbionts. Custom bakta databases provide correct gene names and annotations for phytopathogen virulence genes, effectors, and genes important for mutualist symbiosis. Other tools annotate promoter elements such as the pip box, tts box, nod box, tra box, vir box, etc.

An optional Agrobacterium-specific pipeline identifies the presence of Ti and Ri plasmids and classifies them under the Weisberg et al. 2020 scheme. It also annotates Ti/Ri plasmid elements including T-DNA borders, overdrive, virbox, trabox, and other binding sites, and determines the biovar and genomospecies of the input strain. Virulence and T-DNA genes, including opine synthase and transport/catabolism loci, are also correctly named and annotated.


beav will generate Circos plot annotating important features for the genome as well as pTi/pRi plasmid (if Agrobacterium specific analysis is conducted). It is also possible to separately run the Circos script.

C58 Genome Circos

Example Circos plot of whole genome annotations automatically generated by beav.

C58 pTi Plasmid Circos

Example Circos plot visualizing oncogenic Ti/Ri plasmids generated by the optional Agrobacterium-specific pipeline.

Installation

The beav pipeline requires a number of programs and databases be installed. Therefore, it is highly encouraged and recommended to use conda to install beav and all of its dependencies.

From conda (Recommended)

It is recommended to use either conda with libmamba or mamba to install beav as this will greatly speed up the time solving the environment.

instructions for conda:

conda create -n beav
conda install -n beav beav

alternative instructions using mamba:

conda create -n beav
mamba install -n beav beav

The conda environment can then be activated using:

conda activate beav

Alternative: From source

Clone the beav github repository.

git clone https://github.com/weisberglab/beav.git

If installing from source, DBSCAN-SWA, TIGER2, and GapMind (PaperBLAST) need to be installed in the software folder within the beav folder. Then the BEAV_DIR environment variable needs to be set and pointing to the beav directory.

Prerequisites:

Program Install location
Bakta PATH
IntegronFinder PATH
MacSyFinder PATH
DefenseFinder PATH
TIGER2 $BEAV_DIR/software
GapMind (PaperBlast) $BEAV_DIR/software
DBSCAN-SWA $BEAV_DIR/software
antiSMASH PATH
EMBOSS PATH
HMMER PATH

Databases for each of these programs can then be installed manually. Alternatively, the following can be used to install them automatically.

Install all databases

conda activate beav 
beav_db

Usage

NOTE: there is currently a bug in the latest DefenseFinder models that cause an error in MacSyFinder when running it. We recommend running Beav with --skip_defensefinder until the MacSyFinder bug fix is released in bioconda. Alternatively, copying the patched file to the MacSyFinder python library folder of your conda release will fix the issue.

Patching instructions To do so, find the python version of your conda environment:
python --version

Then download the patched registries.py file:

wget https://github.com/gem-pasteur/macsyfinder/blob/27ee21ceb8e7100d9183b084356f791487aca4ad/macsypy/registries.py

Then copy it to the correct folder in your conda env, changing the python version as necessary:

cp registries.py $CONDA_PREFIX/lib/python3.9/site-packages/macsypy/
usage: beav [--input INPUT] [--output OUPUT_DIRECTORY] [--strain STRAIN] [--bakta_arguments BAKTA_ARGUMENTS] [--tiger_arguments TIGER_ARGUMENTS][--agrobacterium AGROBACTERIUM] [--skip_macsyfinder] [--skip_integronfinder][--skip_defensefinder] [--skip_tiger] [--skip_gapmind][--skip_dcscan-swa] [--skip_antismash] [--help] [--threads THREADS]
    BEAV- Bacterial Element Annotation reVamped
    Input/Output: 
        --input, -i STRAIN.fna
                Input file in fasta nucleotide format (Required)
        --output DIRECTORY
                Output directory (default: current working directory)
        --strain STRAIN
                Strain name (default: input file prefix)
        --bakta_arguments ARGUMENTS
                Additional arguments and database options specific to Bakta 
        --antismash_arguments ARGUMENTS
                Additional arguments and database options specific to antiSMASH (Default: \"$antismash_args\") 
        --tiger_blast_database DBPATH
                Path to a reference genome blast database for TIGER2 ICE analysis (Required unless --skip_tiger is used)
        --run_operon_email EMAIL
                Annotate predicted operons using the Operon-mapper webserver. Must input an email address for the job
    Options:
        --agrobacterium
                Agrobacterium specific tools that identify biovar/species group, Ti/Ri plasmid, T-DNA borders, virboxes and traboxes
        --skip_macsyfinder
                Skip detection and annotation of secretion systems
        --skip_integronfinder
                Skip detection and annotation of integrons 
        --skip_defensefinder
                Skip detection and annotation of anti-phage defense systems 
        --skip_tiger
                Skip detection and annotation of integrative conjugative elements (ICEs)
        --skip_gapmind
                Skip detection of amino acid biosynthesis and carbon metabolism pathways
        --skip_dbscan-swa
                Skip detection and annotation of prophage
        --skip_antismash
                Skip detection and annotation of biosynthetic gene clusters
        --continue
                Continue running BEAV from any point in the pipeline. Rerun programs that gave an error or were skipped. 
    General:
        --help, -h
                Show BEAV help message
        --threads, -t
                Number of CPU threads

Options

--antismash_arguments

Additional antiSMASH arguments can be input into antiSMASH using the --antismash_arguments option. This allows for full usage of antiSMASH and additional databases.

--tiger_blast_database

Required if running TIGER. Users must provide a path to a blast database of reference genomes using the --tiger_blast_database option.

--bakta_arguments

Additional arguments can be passed to bakta using the --bakta_arguments option.

--agrobacterium

The --agrobacterium option activates an additional pipeline to provide agrobacterium-specific annotation.

--skip-PROGRAM

The skip options allow for specified programs to be skipped if the annotation is not needed or required programs are not installed.

--continue

The continue option will check the output of existing Beav runs and rerun programs that errored or were skipped. This option allows for the pipeline to be used with existing Bakta runs.

Examples

Minimal run

beav --input /path/to/file/test.fna --threads 8 --skip_tiger

Standard run

beav --input /path/to/file/test.fna --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna

Standard run with operon annotation (remote)

beav --input /path/to/file/test.fna --threads 8 --tiger_blast_database /path/to/databases/blast/refseq_genomic.fna --run_operon_email [email protected]

Complex run

beav --input /path/to/file/test.fna --threads 8 --bakta_arguments '--db /path/to/alternative-data-bases/bakta-1.7/' --tiger_blast_database /path/to/databases/blast/allagro.fna --agrobacterium --skip_integronfinder

Standalone Circos plot generation

To generate Circos plots on your GenBank file independant of the beav pipeline, make sure the beav conda environment is activated:

conda activate beav 

The beav databases, models, scripts and other forked tools are downloaded in $BEAV_DIR of your system.

To visualize only beav-specific features on Circos plot, run beav_circos.py script by following:

python3 beav_circos.py --input $GBK

Here, --input takes annotated GenBank file as input.

To also visualize the oncogenic plasmid features, run the following:

python3 beav_circos.py --input $GBK --contig $CONTIG --plasmid $pTi

Here, --CONTIG takes is the list of contigs that you want to visualize, and --plasmid takes the name of the plasmid that will be annotated as label.

Citation

Beav can be cited as:

Jung J.M., Rahman A., Schiffer A.M., and Weisberg A.J., Beav: a bacterial genome and mobile element annotation pipeline. (2024) bioRxiv 2024.01.25.577299; doi: https://doi.org/10.1101/2024.01.25.577299

beav's People

Contributors

jewell-bug avatar acarafat avatar alexweisberg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.