BiG-MEx: a tool for the mining of Biosynthetic Gene Cluster (BGC) domains and classes in metagenomic data. It consists of the following modules:
- run_bgc_dom_annot: fast identification of BGC protein domains.
- run_bgc_dom_div: BGC domain-based diversity analysis.
- run_bgc_class_pred: BGC class abundance predictions.
Pereira-Flores, E., Buttigieg, P. L., Medema, M. H., Meinicke, P., Glöckner, F. O. and Fernandez-Guerra, A.. (2018+). Mining metagenomes for natural product biosynthetic gene clusters: unlocking new potential with ultrafast techniques. Under review.
BiG-MEx consists of five docker images:
- epereira/bgc_dom_annot
- epereira/bgc_dom_amp_div
- epereira/bgc_dom_meta_div
- epereira/bgc_dom_merge_div
- epereira/bgc_class_pred
Before running BiG-MEx it is necessary to install docker.
Then just clone the GitHub repository:
git clone [email protected]:pereiramemo/BiG-MEx.git
All four images are in dockerhub. These will be downloaded automatically the first time you run the scripts.
The run_bgc_*.bash scripts run the docker images, which include all the code, dependencies and data used in the analysis. Given that we are using docker, if your user is not in the docker group in Linux or Mac OS, the run_bgc_*.bash scripts have to be executed with sudo.
This first module runs UProC using a BGC domain profile database. It takes as an input metagenomic unassembled data and outputs a BGC domain abundance profile table.
See help
./run_bgc_dom_annot.bash . . --help
The bgc_dom_div has three different modes: amplicon (amp), metagenome (meta), and merge. The first two modes have the objective of analyzing the BGC domain diversity in amplicon and metagenomic samples. The diversity analysis consists of estimating the operational domain unit (ODU) diversity, blasting the domain sequences against a reference database, and placing the domain sequences onto reference trees. The merge mode integrates the amplicon or metagenome diversity results of different samples to provide a comparative analysis.
See help
./run_bgc_dom_div.bash amp . . --help
./run_bgc_dom_div.bash meta . . . --help
./run_bgc_dom_div.bash merge . . --help
This module is based on the bgcpred R package, which includes a library of BGC class abundance models. Based on the domain profile generated by bgc_dom_annot, this module computes the BGC class abundance profile.
See help
./run_bgc_class_pred.bash . . --help