Giter Site home page Giter Site logo

nlr-parser's Introduction

NLR-Parser README

NLR-Parser is a tool to rapidly annotate the NLR complement from sequenced plant genomes.

The NLR-Parser refines the output of MAST and reliably annotates disease resistance genes encoding for nucleotide-binding leucine-rich repeat (NLR) proteins.

Prerequisites

MEME suite version 4.9.1

The MEME suite is available at http://meme-suite.org/index.html

Please note that the most actual version of meme is not compatible with NLR Parser. Use meme 4.9.1.

Don't worry about setting up the Apache webserver. You just need MAST, so the quick install is sufficient.

JRE 1.6

Make sure you have the Java Runtime Environments 1.6 or higher. Download from http://java.com

NLR motif definitions

Download the meme.xml that contains the definitions from here. The motifs were published by Jupe et al. (2012). The downloaded meme.xml is an input argument for MAST.

6Frame translator

If you intend to screen nucleotide sequences for NLRs, it might make sense to translate your sequence in all 6 reading frames. To ensure the full functionality of the NLR-Parser, please make sure the 6 aa-sequences only differ by a suffix and end with:

  • _frame+0
  • _frame+1
  • _frame+2
  • _frame-0
  • _frame-1
  • _frame-2

For this you can use the TranslateSequence.jar, which is part of this software.

Installation

Just download NLR-Parser.jar from the latest release. Run it from the command line.

java -jar NLR-Parser.jar -i <mast.xml> -o <output.mast.txt> [-s <splitpattern>] [-p <pvalue>] [-b <blastfile>] [-gh] [-a <sequence>]

If you want to build it from source you will need the Apache Commons CLI

Input parameters

parameter argument description
-i STR The location of the xml output of MAST
-o STR Location and name of the outputfile that will be generated by the NLR-Parser. Note that an existing file will be overwritten
-s STR The splitpattern to combine 6-frame-translated nucleotide sequences to one output. default: "_frame"
-p float P-value threshold. Motifs with a p-value above will be ignored by the NLR-Parser. default: 1E-5
-a STR Location of an optional amino acid sequence file. This file should be the same as the one subjected to MAST. Providing this file allows extraction of the NB-ARC domain of the NLR, e.g. for phylogenetic studies. File has to be fasta format.
-g Output gff format instead of a tsv.
-h Print help

-s splitpattern

In case a nucleotide sequence has to be annotated, it should be translated into its 6 reading frames. The NLR-Parser can assume the sequence names for the 6 amino acid sequences are of a type . In that case it will report the combined result in one line with in the first column. It is highly unlikely that a sequence will have motifs in one forward strand and in the reverse strand at the same time. This makes sense if you annotate genomic sequence and introns cause a "frameshift".

This is of course a pit-fall if your sequence of interest contains two NLRs on different strands. In those cases, please use the workaround -s $$, assuming that none of your identifiers contains a "$$".

-g

Generate a gff file rather than a tsv table with the NLR-Parser results. This option is under development. Feel free to try and send us comments.

-a aminoacidfile.fasta

One column of the NLR-Parser output is the aminoacid sequence of the NB-ARC domain. This is usually the most conserved part of the NLR and can be used for phylogenetic studies. If you do not provide the complete amino acid sequence of the genes, this column is empty.

-p pvalue

This is the threshold of the p-values of the individual motifs. Motifs with a p-value above this threshold are ignored by the NLR-Parser. The default is 1E-5.

Tips

  • MAST has an e-value threshold. Sequences with an evalue above that are not displayed. This evalue is dependent on the number of input sequences. If you run MAST on a really large file, add the parameter -ev 10000000 to your call.
  • If you want to annotate large files like genomes, it makes sense to chop them in overlapping fragments.

Citation

Contact

If there are any issues with the tool or if you would like to collaborate with us, please don't hesitate to contact us.

nlr-parser's People

Contributors

steuernb avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.