Giter Site home page Giter Site logo

fraggenescan's Introduction

Note to the users: the latest release is significantly improved in terms of performance. For large files of assembly contigs, the running time could be reduced from days to a few mins. (Read more about this in the "releases" file included in the package).

Description
============
FragGeneScan is an application for finding (fragmented) genes in short reads. It can also be applied to predict prokaryotic genes in incomplete assemblies or complete genomes. 

FragGeneScan was first released through omics website (http://omics.informatics.indiana.edu/FragGeneScan/) in March 2010, where you can find its old releases. 

FragGeneScan migrated to SourceForge in October, 2013. (https://sourceforge.net/projects/fraggenescan/)

FragGeneScan migrated to Github for easier maintenance in March, 2017. (https://github.com/COL-IU/FragGeneScan.git)

Installation
=============
To install FragGeneScan, please follow the steps below:

1. Clone the repository:
	git clone https://github.com/COL-IU/FragGeneScan.git

2. Make sure that you also have a C compiler such as "gcc" and perl interpreter.

3. Run "makefile" to compile and build excutable "FragGeneScan"
	make clean
	make fgs


Running the program
====================
1.  To run FragGeneScan, 

./run_FragGeneScan.pl -genome=[seq_file_name] -out=[output_file_name] -complete=[1 or 0] -train=[train_file_name] -thread=[num_thread]

[seq_file_name]: sequence file name including the full path
[output_file_name]: output file name including the full path
[whole_genome]: 1 if the sequence file has complete genomic sequences
		0 if the sequence file has short sequence reads
[train_file_name]: file name that contains model parameters; this file should be in the "train" directory. 
		   Note that four files containing model parameters already exist in the "train" directory. 
		   [complete] for complete genomic sequences or short sequence reads without sequencing error
		   [sanger_5] for Sanger sequencing reads with about 0.5% error rate
		   [sanger_10] for Sanger sequencing reads with about 1% error rate
		   [454_5] for 454 pyrosequencing reads with about 0.5% error rate
		   [454_10] for 454 pyrosequencing reads with about 1% error rate
		   [454_30] for 454 pyrosequencing reads with about 3% error rate
		   [illumina_5] for Illumina sequencing reads with about 0.5% error rate
		   [illumina_10] for Illumina sequencing reads with about 1% error rate
[num_thread]: number of thread used in FragGeneScan. Default 1.

2. To test FragGeneScan with a complete genomic sequence,

./run_FragGeneScan.pl -genome=./example/NC_000913.fna -out=./example/NC_000913-fgs  -complete=1  -train=complete

[NC_000913.fna]: this sequence file has the complete genomic sequence of E.coli
(NCBI gene predictions for this genome are available under the same folder example/)


3. To test FragGeneScan with sequencing reads,

./run_FragGeneScan.pl -genome=./example/NC_000913-454.fna -out=./example/NC_000913-454-fgs  -complete=0  -train=454_10

[NC_000913-454.fna]: this sequence file has simulated reads (pyrosequencing, average length = 400 bp and sequencing error = 1%) generated using Metasim

For illumina reads, please use illumina_5 or illumina_10 as the train model.

4. To test FragGeneScan with assembly contigs,
./run_FragGeneScan.pl -genome=./example/contigs.fna -out=./example/contigs-fgs  -complete=1  -train=complete

Note: -complete=1 & -train=complete are used as the parameters.

Output
============
Upon completion, FragGeneScan generates four files. 

1. The first file is "[output_file_name].out", which lists the coordinates of putative genes. This file consists of five columns (start position, end position, strand, frame, score).  For example,

>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome
108     440     -       3       1.378688
337     2799    +       1       1.303498
2801    3733    +       2       1.317386
3734    5020    +       2       1.293573
5234    5530    +       2       1.354725
5683    6459    -       1       1.290816
6529    7959    -       1       1.326412
8238    9191    +       3       1.286832
9306    9893    +       3       1.317067


2. The second file is  '[output_file_name].ffn", which lists nucleotide sequences corresponding to the putative genes in "[output_file_name].out". For example,

>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e
nd=338 strand=-
GTTGTTACCTCGTTACCTTTGGTCGAAAAAAAAAGCCCGCACTGTCAGGTGCGGGCTTTTTTCTGTGTTTCCTGTACGCGTCAGCCCGCACCGTTACCTG
TGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTTCATGGATGTTGTGTACTCTGTAATTTTTATCTGTCTGTGCGCTATGCCTATATTGGT
TAAAGTATTTAGTGACCTAAGTCAA
>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e
nd=2799 strand=+
TTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCC
TCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTAT
TTTTGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTTGCCCAAATAAAACAT
GTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCG


3. The third file is '[output_file_name].faa", which lists amino acid sequences corresponding to the putative genes in "[output_file_name].out". For example,

>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e
nd=338 strand=-
VVTSLPLVEKKSPHCQVRAFFCVSCTRQPAPLPVVMVMVVVMVVLMRFMDVVYSVIFICLCAMPILVKVFSDLSQ
>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e
nd=2799 strand=+
LKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKH
VLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLG
RNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRDEDE
LPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVG
DGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGV
ANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKSRRKF
LYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIEIEP
VLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTA
AGVFADLLRTLSWKLGV
>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=2801
end=3733 strand=+
VKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSAC
SVVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCI
AHGRHLAGFIHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVADWLGKNYLQNQEGFVHICRLD
TAGARVLEN

4. [output_file_name].gff gene prediction results in gff format.

Citation
=========
If you use FragGeneScan, please cite: 
Mina Rho, Haixu Tang, and Yuzhen Ye. FragGeneScan: Predicting Genes in Short and Error-prone Reads. Nucl. Acids Res., 2010 doi: 10.1093/nar/gkq747 

License
============
Copyright (C) 2010 Mina Rho, Yuzhen Ye and Haixu Tang.
You may redistribute this software under the terms of the GNU General Public License.

fraggenescan's People

Contributors

wazimismail avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.