Giter Site home page Giter Site logo

zaidissa / metaron Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 2.0 1.68 MB

Metagenomic opeRon Prediction pipeline. MetaRon presents the first pipeline for the prediction of metagenomic operons without any functional or experimental data.

License: Other

Python 100.00%
metagenomics metagenomic-operons metagenomic-pipeline metagenomic-data-processing

metaron's Introduction

Introduction

MetaRon (Metagenomic opeRon prediction pipeline) is a computational workflow for the prediction of operons from metagenomic data. The pipeline predicts metagenomic operons without any any functional or experimental data. It comes with options to process the metagenomic data starting from filtered raw reads, which includes: assembly into scaffolds via IDBA, data manipulation, gene prediction via prodigal and lastly operon prediction based on gene's co-directionality, intergenic distance (IGD) and promoters.

Metagenomic operon prediction redefines the operonic clusters by identifying promoters in co-directional genes with an intergenic distance threshold of <= 600 bp.

Installation

Prerequisites

MetaRon requires:

* Python (2.7 )
* IDBA (iterative De Bruijn Graph De Novo Assembler) [conda install -c bioconda idba]
* Prodigal [conda install -c bioconda prodigal]
* BDGP: Neural Network Promoter Prediction 2.2
* antiSMASH: antibiotics & Secondary Metabolite Analysis Shell (Optional: required for downstream analysis only.)
* BOWTIE (Optional: only required for downstream analysis)

If you already have Anaconda environment setup, you can quickly install the prerequisites using any one command from each section:

  1. IDBA

    conda install -c bioconda idba

    conda install -c bioconda/label/cf201901 idba

  2. Prodigal

    conda install -c bioconda prodigal

    conda install -c bioconda/label/cf201901 prodigal

  3. antiSMASH

    conda install -c bioconda antismash

    conda install -c bioconda/label/cf201901 antismash

  4. BOWTIE2

    conda install -c bioconda bowtie

    conda install -c bioconda/label/cf201901 bowtie

Install MetaRon

You can install MetaRon either from PyPi using pip and install it from the source. Please make sure you have already installed the above mentioned python libraries required to run MetaRon.

Install from PyPi::

pip install metaron

Install from the source::

tar -zxvf metaron-1.0.tar.gz
cd metaron-1.0
python setup.py install

How to use MetaRon

Once you have installed MetaRon, you can type:

metaron --help

to find the available commands and required parameters to run MetaRon.

-h, --help
Show this help message and exit

-n, --sample
Sample name without any dot/underscore/dash

-p, --process
1. ago: assembly gene prediction and operon prediciton 2. op: operon prediction only.

If 'ago', please provide the following parameters:

--sample,--process, --read_type, --read_length, --paired_1, --paired_2, --output

If 'op', please provide the following parameters:

--sample, --process, --igp, --isc, --tool, --output

-rt, --read_type
Enter read type. 'merge' if the reads are paired-end in two files. 'paired' if the reads are paired-end in one file.

-rl, --read_length
Enter 'l' if read length is longer than 128 bases and 'r' if read length is shorter than 128 bases

-pe1, --paired_1
Enter paired read file 1

-pe2, --paired_2
Enter paired read file 2

-pm, --paired_merged
Enter the paired end read file if both paired-end reads are in one file

-i, --igp
Select the gene prediction .tab file generated via MetageneMark or Prodigal

-j, --isc
Select the file containing all scaftigs

-t, --tool
Enter 1 for MetaGeneMark, 2 for Prodigal

-o, --output
Enter output destination folder

=======================================================NOTE=======================================================

1- If the selected --process is 'op', then please refer to the provided scaftig and gene prediction file format

2- Add NNPP2.2 path to the config.txt file

====================================================================================================================

Make predictions

Metagenomic operon prediction could be performed by providing filtered raw reads under the process "ago" i.e. assembly, gene prediction and operon identification

## test_sample: ERR022075.1.fastq & ERR022075.2.fastq

metaron --sample ERR022075 --process ago --read_type merge OR paired --read_length r OR l --paired_1 ~/path/to/ERR022075.1.fastq --paired_2 ~/path/to/ERR022075.2.fastq --output ~/path/to/output/directory/

If metagenomic scaffolds and gene predictions are already available, the user can predict operon under the process "op"

## test_assembly: ERR022075_scaf.fa 
## test_gene_prediction: ERR022075

metaron --sample ERR022075 --process op --igp ERR022075 --isc ERR022075_scaf.fa --tool 1 OR 2 --output ~/path/to/output/directory/

This will save metagenomic operon predictions Operon_File.tab. The prediction file will report the operonic information based on the above mentioned parameters.

Proposed downstream anslysis

  1. Secondary Metabolites

    a. Secondary Metabolites identified from operonic sequences using antiSMASH

    b. Differntial abundance of Secondary Metabolites (condition-1 / Disease vs Condition-2 / Control)

  2. Operonnic pathways

    a. Mapping raw metagenomic reads to operonic sequences using BOWTIE

    b. Submitting the mapped reads to Functional Mapping and Analysis Pipeline (FMAP)

    c. Identifying differential abundance of pathways between disease and control or environment-1 and environment-2

Support

If you have questions, or found any bug in the program, please write to us at

syedshujaat[at]comsats.edu.pk syedzaidi[at]arizona.edu

metaron's People

Contributors

asntech avatar zaidissa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

metaron's Issues

Operon prediction fails at Gene data extraction step

I am looking to run MetaRon on my previously annotated genome assemblies. I was able to get it somewhat start running, however it fails with the test data at the data extraction step:

metaron -p op -n testrun -o ./ -i ERR022075 -j ERR022075_scaffold.fa -t 2
All parameters checked
('Sample name: ', 'testrun')
('OUTPUT DIRECTORY: ', './MetaRon_testrun')
config_file_check start
/homedirectory/.local/lib/python2.7/site-packages/metaron-2.0-py2.7.egg/EGG-INFO/scripts
config_file_check completed

********************************** METAGENOMIC OPERON PREDICTION **********************************

Formatting assembly file
Gene data extraction
PRODIGAL
Extracting sequence information...
Number of entries for each gene are not equal, please correct the data and run again

When I add a few print statements, it appears that the regular expression ("tmp1 = re.compile('(?P<file_name......") in the data_extraction function is not matching the prodigal output table, and therefore several fields are not being populated.

I've attached a zip file of the input files and the resulting output from this run.
metaronexample.tar.gz

Thanks!

install issue

Getting the following error after 'pip install metaron':

Failed to build metaron
ERROR: Could not build wheels for metaron, which is required to install pyproject.toml-based projects

Installed prerequisites IDBA and Prodigal successfully.

contaminating python2 code in python3 build?

Congratulations on your recent publication in BMC Genomics. I would like to use MetaRon, but there appears to be conflicting information between the MetaRon paper and this repository. The paper states that MetaRon requires Python 3, but the installation prerequisites in this repository state Python 2.7 is needed. Further, your most recent commit is named "python3", so I went ahead with a Python 3 installation.

After resolving an indentation issue...

$ source /programs/miniconda3/bin/activate metaron-2.0
$ metaron --help
  File "/programs/miniconda3/envs/metaron-2.0/bin/metaron", line 234
    gene_name_scaf, strand, gene_st, gene_end, file_name, scaftig_name, gene_name, scaf_name4dict = data_extraction(gene_file, gene_pred_tool)
    ^
IndentationError: unexpected indent

... the --help option now works.

$ source /programs/miniconda3/bin/activate metaron-2.0
$ metaron --help
usage: metaron [-h] [-n SAMPLE] [-p PROCESS] [-rt READ_TYPE] [-rl READ_LENGTH]
               [-pe1 PAIRED_1] [-pe2 PAIRED_2] [-pm PAIRED_MERGED] [-i IGP]
               [-j ISC] [-t TOOL] [-o OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  -n SAMPLE, --sample SAMPLE
                        Sample name without any dot, underscore or dash
  -p PROCESS, --process PROCESS
                        1. ago: assembly gene prediction and operon prediciton
                        2. op: operon prediction only. If 'ago', please
                        provide the following parameters:
                        -n,-rl,-rt,[-pe1,pe2|-pm],
  -rt READ_TYPE, --read_type READ_TYPE
                        Enter read type. 'merge' if the reads are paired-end
                        in two file. 'paired' if the reads are paired-end in
                        one file.
  -rl READ_LENGTH, --read_length READ_LENGTH
                        Enter 'l'if read length is longer than 128 bases and
                        'r' if read length is shorter than 128 bases
  -pe1 PAIRED_1, --paired_1 PAIRED_1
                        Enter enter paired read file 1
  -pe2 PAIRED_2, --paired_2 PAIRED_2
                        Enter enter paired read file 2
  -pm PAIRED_MERGED, --paired_merged PAIRED_MERGED
                        Enter the paired end read file if both pairedend reads
                        are in one file
  -i IGP, --igp IGP     Select the gene prediction .tab file generated via
                        MetageneMark or Prodigal
  -j ISC, --isc ISC     Select the file containing all scaftigs
  -t TOOL, --tool TOOL  Enter 1 for MetaGeneMark, 2 for Prodigal
  -o OUTPUT, --output OUTPUT
                        Enter output destination folder

But running both the example data and my own data through the op process lead to additional errors.

Error 1

$ metaron --sample ERR022075 --process op --igp /home/metaron/data/ERR022075 --isc /home/metaron/data/ERR022075_scaffold.fa --tool 2 --output ./out/
All parameters checked
Sample name:     ERR022075
OUTPUT DIRECTORY:     ./out/MetaRon_ERR022075
config_file_check start
/programs/miniconda3/envs/metaron-2.0/bin
config_file_check completed
********************************** METAGENOMIC OPERON PREDICTION **********************************
Formatting assembly file
Traceback (most recent call last):
  File "/programs/miniconda3/envs/metaron-2.0/bin/metaron", line 1659, in <module>
    main()
  File "/programs/miniconda3/envs/metaron-2.0/bin/metaron", line 233, in main
    gff2tab(sample_name)
  File "/programs/miniconda3/envs/metaron-2.0/bin/metaron", line 444, in gff2tab
    writer.writerows(zoo)
TypeError: a bytes-like object is required, not 'str'

It seems like this error with writer.writerows() is documented in StackOverflow: TypeError: a bytes-like object is required, not 'str' in python and CSV

Note that my config.txt file reads:

NNPP2.2_path=/home/metaron/NNPP2.2/bin/fa2TDNNpred-PRO.linux

which points to my local installation of NNPP2.2. I've confirmed that the standalone NNPP2.2 program works fine.

In the absence of a config file, I see a second error that also looks to be caused by python3 trying to compile python2 code:

Error 2

Traceback (most recent call last):
  File "/programs/miniconda3/envs/metaron-2.0/bin/metaron", line 1659, in <module>
    main()
  File "/programs/miniconda3/envs/metaron-2.0/bin/metaron", line 180, in main
    config_file_check()
  File "/programs/miniconda3/envs/metaron-2.0/bin/metaron", line 1615, in config_file_check
    NNPP2_path = raw_input('Enter path for NNPP2.2 directory test2')
NameError: name 'raw_input' is not defined

Again, StackOverflow: How do I use raw_input in Python 3

So, should metaron-2.0 be compiled and run in a python2 environment, despite the documentation in the paper and elsewhere that metaron is written in python3?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.