Giter Site home page Giter Site logo

diva's Introduction

-------------
-------------
DivA 1.0
M. Lisandra Zepeda Mendoza & Rute R. da Fonseca
-------------
-------------


-------------
DESCRIPTION
-------------

Set of python scripts designed to detect non-homologous and very Divergent regions in protein sequence Alignments. DivA was tested with python 2.7

DivA makes no assumptions on evolutionary models, and it is ideal for detecting incorrectly annotated segments within individual gene sequences. DivA is a python script that is a binary decision making method that inapplies a sliding-window approach to estimates four divergence-based parameters and defines their outlier values according to automatically defined thresholds that can be optionally modified. DivA then classifies the windows of a sequence of an alignment as very divergent (potentially non-homologous) if it presents a combination of outlier values for the four parameters. The windows classified as very divergent can optionally be masked in the alignment.  This allows DivA to discard a minimum amount of sequence information compared to other currently available methods that remove entire sequences or blocks of a multiple sequence alignment. One important application of DivA is in the detection of incorrect automatic gene annotated sequences, which can have confounding effects in comparative genomics and phylogenomics analyses.


-------------
INSTALLATION
-------------

DivA is a python script that does not need any sort of compilation. It was developed in Python 2.7.3 and uses the following modules which should be already installed in the user's system:

- numpy
- function AlignIO from module Bio
- re
- os
- sys
- argparse

Make sure to put the bin in your path, where the blosum62.txt should also be placed; alternatively place the blosum62.txt or another distance matrix of preference on the same directory where DivA is going to me used.


------
USAGE
------

usage: DivA.py [-h] [--mask] [--printAllwindows] [-w W] [-g G] [-p P] [-zp ZP]
               [-d D] [-zd ZD] [-o O] [-m M]
               alnNamesFile

Identify very divergent potentially non-homologous windows in a protein
multiple sequence alignment.

positional arguments:
  alnNamesFile       A txt file with the file name(s) of the MSA(s) on which
                     to perform the method

optional arguments:
  -h, --help         show this help message and exit
  --mask             Flag for the output of an alignment with the wrong
                     windows masked with XXs [default not set]
  --printAllwindows  Flag for the output of a file with the parameter values
                     and start and end positions of all the windows in the
                     MSA(s) [default not set]
  -w W               The size of the sliding window [default 12]
  -g G               Maximum gap content in a window to be considered [default
                     0.6]
  -p P               The number of standard deviations from the mean of the
                     alpha parameter to use as threshold [default 1]
  -zp ZP             The number of standard deviations from the mean of the
                     Zalpha parameter to use as threshold [default 2]
  -d D               The number of standard deviations from the mean of the
                     beta parameter to use as threshold [default 2]
  -zd ZD             The number of standard deviations from the mean of the
                     Zbeta parameter to use as threshold [default 2]
  -o O               Output basename prefix [default "out"]
  -m M               The amino acid distance matrix [default "blosum62.txt"]





#Example:


 1. Create a file with the names/paths of the alignments to be analyzed. The final thresholds will be calculated using all those alignemnts.

 2. Run DivA:

python DivA_RF.py ListOfAlignments.txt #Basic default DivA run

python DivA.py -h # Will display the help

python DivA.py ListOfAlignments.txt -o DivaOutput --mask --printAllwindows # The outputs will have the prefix "DivaOutput" and alignments with the wrong windows masked wll begenerated, as well as an etra output file containing all the windows with the four parameter values and start and end positions.

python DivA.py ListOfAlignments.txt -o DivaOutput -p 2 # The number of standard deviations form the mean of the alpha parameter is changed to 2 and the outputs will have the prefix "DivaOutput"


#Example files in the 'Test' directory
The Test.aln file corresponds to ortholog alignment 14518.fasta from Jarvis et al.


-----
CITE
-----

Zepeda Mendoza ML, Nygaard S, and da Fonseca R (2014)  "DivA: detection of non-homologous and very Divergent regions in protein sequence Alignments"

--------
CONTACT
--------

For any enquiries correspondence is sent to [email protected]



diva's People

Contributors

mlzm-lab avatar

Stargazers

 avatar Mensur Dlakic avatar Andrés Pérez-Figueroa avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.