Giter Site home page Giter Site logo

gaius-augustus / augustus Goto Github PK

View Code? Open in Web Editor NEW
273.0 18.0 107.0 555.16 MB

Genome annotation with AUGUSTUS

Home Page: http://bioinf.uni-greifswald.de/webaugustus/

Makefile 0.73% HTML 0.49% C 0.82% C++ 79.73% Shell 0.09% TeX 0.08% Perl 12.99% Roff 0.79% Python 4.24% Dockerfile 0.06%
genome annotation gene prediction discovery

augustus's Introduction

Build and test GitHub all releases

Gene Prediction with AUGUSTUS

INTRODUCTION
INSTALLATION
RUNNING AUGUSTUS
WEB-SERVER
COMPARATIVE GENE PREDICTION
AUTHORS AND CONTACT
REFERENCES
LICENSES

INTRODUCTION

AUGUSTUS is a program to find genes and their structures in one or more genomes. More ...

INSTALLATION

Windows

Windows users can use the Windows Subsystem for Linux (WSL) to install AUGUSTUS exactly as described below for Linux. How to set up the WSL for AUGUSTUS is described here.

Ubuntu 18.04, Debian 9 or later

Until Ubuntu 21.04 and Debian 11 only as single-genome version, since then with capability for comparative gene prediction.

sudo apt install augustus augustus-data augustus-doc

Docker

Create a docker image from Dockerfile using:

git clone https://github.com/Gaius-Augustus/Augustus.git
cd Augustus
docker build -t augustus .

Singularity

Create a Singularity Image File from the Singularity Definition File using

git clone https://github.com/Gaius-Augustus/Augustus.git
cd Augustus
singularity build augustus.sif Singularity.def

Building AUGUSTUS from source

See INSTALL.md for details.

Download source code from github and compile:

git clone https://github.com/Gaius-Augustus/Augustus.git
cd Augustus
make augustus

After compilation has finished, the command bin/augustus should be executable and print a usage message.

For utilities use

make auxprogs

Install locally

As a normal user, add the directory of the executables to the PATH environment variable, for example:

export PATH=~/augustus/bin:~/augustus/scripts:$PATH

Install globally

You can install AUGUSTUS globally, if you have root privileges, for example:

sudo make install

Alternatively, you can exectue similar commands to those in the "install" section of the top-level Makefile to customize the global installation.

Optional: set environment variable AUGUSTUS_CONFIG_PATH

If the environment variable AUGUSTUS_CONFIG_PATH is set, augustus and etraining will look there for the config directory that contains the configuration and parameter files, e.g. '~/augustus/config'. You may want to add this line to a startup script (like ~/.bashrc).

export AUGUSTUS_CONFIG_PATH=/my_path_to_AUGUSTUS/augustus/config/

If this environment variable is not set, then the programs will look in the path ../config relative to the directory in which the executable lies. As a third alternative, you can specify this directory on the command line when you run augustus: --AUGUSTUS_CONFIG_PATH=/my_path_to_AUGUSTUS/augustus/config/

WEB-SERVER

AUGUSTUS can also be run through a web-interface at http://bioinf.uni-greifswald.de/augustus/ and a web service at http://bioinf.uni-greifswald.de/webaugustus/.

REFERENCES AND DOCUMENTATION

Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008). Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, 24(5), pages 637–644, doi: 10.1093/bioinformatics/btn013

For further references see docs/REFERENCES.md

3 book chapters with command line walkthroughs

LICENSES

All source code, i.e.

  • the AUGUSTUS source code (src/*.cc, include/*.hh)
  • the scripts (scripts/*)
  • the auxiliary programs (auxprogs/)
  • the tree-parser (src/scanner, src/parser)
  • the unit tests (src/unittests)

is under the Artistic License.

augustus's People

Contributors

anica94 avatar banan314 avatar conchoecia avatar dhonsel avatar diekhans avatar douglasgscofield avatar gullumluvl avatar hmehlan avatar ifiddes avatar ingobulla avatar katharinahoff avatar kdm9 avatar khajidu avatar larsgab avatar lizzyge avatar mariostanke avatar mglgnn avatar nathanweeks avatar onlinearts avatar paulmenzel avatar prehensilecode avatar satta avatar sestaton avatar sherbold avatar shuang-broad avatar smoe avatar starsareintherose avatar steinedieb avatar tonatiuhpenacenteno avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

augustus's Issues

Parallelism for AUGUSTUS

Find a suitable approach to introduce parallelism into AUGUSTUS to enable better scaling on multicore CPUs

Augustus Makefile after git pull update

In the current version, a "make" will fail after a source code update:

make
mkdir -p bin
mkdir: cannot create directory ‘bin’: File exists
make: *** [all] Error 1

Maybe we should add a "rm -r bin" in case bin exists?

cannot find -lcolamd

We are getting this error below when trying to compile Augustus 3.3.1 on CentOS 6.9 with g++ (GCC) 7.2.1 20170829 (Red Hat 7.2.1-1) from Software Collections, and lpsolve and lpsolve-devel x86_64 5.5.0.15 is installed from system. Any ideas? Is it related to suitesparse? I didn't see that mentioned in installation instructions but we have suitesparse x86_64 3.4.0-9.el6 installed but there is no suitesparse-devel package, could that be it? Thanks

[root@pac Augustus-3.3.1-tag1]# make
mkdir -p bin
cd src && make
make[1]: Entering directory '/usr/local/Augustus-3.3.1-tag1/src'
g++  -Wall -Wno-sign-compare -Wno-strict-overflow -pedantic -g -ggdb -O3  -DZIPINPUT -std=c++11 -DCOMPGENEPRED -DSQLITE  -o augustus augustus.cc genbank.o properties.o pp_profile.o pp_hitseq.o pp_scoring.o statemodel.o namgene.o types.o gene.o evaluation.o motif.o geneticcode.o hints.o extrinsicinfo.o projectio.o intronmodel.o exonmodel.o igenicmodel.o utrmodel.o merkmal.o vitmatrix.o lldouble.o mea.o graph.o meaPath.o exoncand.o randseqaccess.o fasta.o ncmodel.o parser/parse.o scanner/lex.o genomicMSA.o geneMSA.o contTimeMC.o compgenepred.o phylotree.o orthograph.o orthoexon.o alignment.o speciesgraph.o codonMSA.o train_logReg_param.o sqliteDB.o dummy.o -I../include -I/usr/include/lpsolve -lboost_iostreams -lgsl -lgslcblas  -llpsolve55 -lcolamd -ldl  -lsqlite3
/opt/rh/devtoolset-7/root/usr/libexec/gcc/x86_64-redhat-linux/7/ld: cannot find -lcolamd
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:74: augustus] Error 1
make[1]: Leaving directory '/usr/local/Augustus-3.3.1-tag1/src'
make: *** [Makefile:8: all] Error 2
[root@pac Augustus-3.3.1-tag1]#

segmentation fault with large sequences

Hi,
I installed Augustus after a lot of pain and not it works fine with small sequences, but with larger sequences it gives "segmentation fault (core dumped)" error.

say for eg I tested with "Saccharina japonica cultivar Ja" genome from NCBI. if I extract first 400 lines from the first sequence "JXRI01000001.1" It runs well but if I use 500 lines or more it gives me this error.

I need to run it for many sequences (mostly greater than 100mb) and web interface do not accept large files.

Help please
Thanks in advance

GBProcessor::getGeneList(): Stop codon out of sequence bounds. Ignoring sequence

when I run this command

etraining --species=generic --stopCodonExcludedFromCDS=true genes.raw.gb 2>train1.err

I get this error message.
Encountered error after reading 5941 annotations.
GBProcessor::getGeneList(): Stop codon out of sequence bounds. Ignoring sequence.
Encountered error after reading 5942 annotations.
GBProcessor::getGeneList(): Stop codon out of sequence bounds. Ignoring sequence.
Encountered error after reading 6918 annotations.
GBProcessor::getGeneList(): Stop codon out of sequence bounds. Ignoring sequence.
Encountered error after reading 6935 annotations.
GBProcessor::getGeneList(): Stop codon out of sequence bounds. Ignoring sequence.
Encountered error after reading 990 annotations.
GBProcessor::getGeneList(): Stop codon out of sequence bounds. Ignoring sequence.
Encountered error after reading 991 annotations.

segfault in augustus CGP 3.3.1

Joel Armstrong sent this example via email on July 12, 2018:

augustus --dbhints=1 --allow_hinted_splicesites=atac
--extrinsicCfgFile=augustus-cgp-3.3.1-segfault/tmpXeNqPA.tmp
--species=human --treefile=augustus-cgp-3.3.1-segfault/tmp5wgnqb.tmp
--alnfile=augustus-cgp-3.3.1-segfault/tmpBWvb4O.tmp
--dbaccess=augustus-cgp-3.3.1-segfault/tmp9Ptygk.tmp
--speciesfilenames=augustus-cgp-3.3.1-segfault/yarrow-vm.31530.5456091091.tmp
--softmasking=1 --exoncands=1 --alternatives-from-evidence=0
--/CompPred/logreg=on --printOEs=1
--/CompPred/outdir=augustus-cgp-3.3.1-segfault

Found 3633 ortho exons
building Graph for monDom5
adding sampled states and additional exon candidates

augustus: ERROR
in SpeciesGraph::getPredTypes()
Segmentation fault (core dumped)

Debug output:

Breakpoint 1, SpeciesGraph::getPredType (this=0x5555570c10c0, type=intron_type, begin=1213, end=1510) at speciesgraph.cc:551
551 NodeType SpeciesGraph::getPredType(StateType type, int begin, int end){
(gdb) bt
#0 SpeciesGraph::getPredType (this=0x5555570c10c0, type=intron_type, begin=1213, end=1510) at speciesgraph.cc:551
#1 0x00005555557419bd in SpeciesGraph::addLeftSS (this=0x5555570c10c0, exon=0x55557415b4c0,
neutralLines=std::vector of length 15, capacity 16 = {...}, auxiliaryNodes=std::unordered_map with 0 elements) at speciesgraph.cc:447
#2 0x00005555557403b0 in SpeciesGraph::buildGraph (this=0x5555570c10c0, meanIntrLen=6910.7404295710676) at speciesgraph.cc:91
#3 0x0000555555716941 in CompGenePred::start (this=0x7fffffffd940) at compgenepred.cc:520
#4 0x0000555555567906 in main (argc=15, argv=0x7fffffffe198) at augustus.cc:129
(gdb) p type
$1 = intron_type
(gdb) up
#1 0x00005555557419bd in SpeciesGraph::addLeftSS (this=0x5555570c10c0, exon=0x55557415b4c0,
neutralLines=std::vector of length 15, capacity 16 = {...}, auxiliaryNodes=std::unordered_map with 0 elements) at speciesgraph.cc:447
447 NodeType ntype = getPredType(((State*)exon->item)->type, exon->begin, exon->end);
(gdb) l
442
443 Node* SpeciesGraph::addLeftSS(Status exon, vector< vector<Node> >&neutralLines, unordered_map<int32_t,Node*> &auxiliaryNodes){
444
445 if(!exon)
446 return NULL;
447 NodeType ntype = getPredType(((State*)exon->item)->type, exon->begin, exon->end);
448 int begin = exon->begin;
449 return addAuxilaryNode(ntype,begin,neutralLines, auxiliaryNodes);
450 }
451
(gdb) p exon
$2 = (Status *) 0x55557415b4c0
(gdb) p exon->item
$3 = (const void *) 0x555577c6b1e0
(gdb) p *(State *)exon->item
$5 = {begin = 1213, end = 1510, next = 0x555577c6b2d0, type = intron_type, prob = {static dbl_inf = inf, static max_val = 3.2733906078961419e+150,
static min_val = 3.0549363634996047e-151, static base = 1.0715086071862673e+301, static baseinv = 9.3326361850321888e-302,
static logbase = 693.14718055994535, static max_exponent = 2147483647, static min_exponent = -2147483648, static temperature = 0,
static rest = {4.2535295865117308e+37, 1.8092513943330656e+75, 7.6957043352332967e+112, 3.2733906078961419e+150, 1.3923463798895859e+188,
5.9223865215328557e+225, 2.5191046292098296e+263}, static output_precision = 3, value = 0, exponent = 0}, hasScore = true, apostprob = 1,
sampleCount = 11, evidence = 0x0, truncated = 0 '\000', framemod = 0}

Source code documentation

The source code is currently only partially documented. A consistent source documentation must be created, eg. based on doxygen

joingenes removes "gene" entries from CGP predictions

Currently following the AUGUSTUS-CGP tutorial will lead to running the joingenes binary. This creates a file of all predictions joined into a single file, but only has transcript / CDS entries. Running gtf2gff with options -gff3 will result in a file containing mRNA, CDS, and exons, but not gene entries (although mRNA entries will have a parent, said parent by ID will not be found).

Not sure if a bug or part of the pipeline, but there should be a way to add it back in I would imagine.

update bam2wig README and Makefile

The instructions for compiling bam2wig (https://github.com/Gaius-Augustus/Augustus/blob/master/auxprogs/bam2wig/README.txt) are outdated and should be updated, as described here, samtools/tabix#24
Basically, the repo for tabix is not maintained any longer, and installation of htslib will provide the installation of tabix.

TABIX=$(TOOLDIR)/tabix/
could be updated to TABIX=$(TOOLDIR)/htslib/or just removed since it is already defined with HTSLIB

Docker installation not running smoothly on Ubuntu 16.04.5 LTS

I tried to install Augustus on Ubuntu 16.04.5 LTS with the docker file and it failed as follows (I can install & compile "manually", but it might be of interest for other users to fix this):

katharina@greifserv3:~/git/Augustus$ sudo docker build -t augustus .
Sending build context to Docker daemon 1.079GB
Step 1/40 : FROM ubuntu:18.04
18.04: Pulling from library/ubuntu
32802c0cfa4d: Pull complete
da1315cffa03: Pull complete
fa83472a3562: Pull complete
f85999a86bef: Pull complete
Digest: sha256:6d0e0c26489e33f5a6f0020edface2727db9489744ecc9b4f50c7fa671f23c49
Status: Downloaded newer image for ubuntu:18.04
---> 93fd78260bd1
Step 2/40 : RUN apt-get update
---> Running in 8606dd15e1a9
Err:1 http://security.ubuntu.com/ubuntu bionic-security InRelease
Temporary failure resolving 'security.ubuntu.com'
Err:2 http://archive.ubuntu.com/ubuntu bionic InRelease
Temporary failure resolving 'archive.ubuntu.com'
Err:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Temporary failure resolving 'archive.ubuntu.com'
Err:4 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
Temporary failure resolving 'archive.ubuntu.com'
Reading package lists...
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-updates/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-backports/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/bionic-security/InRelease Temporary failure resolving 'security.ubuntu.com'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Removing intermediate container 8606dd15e1a9
---> 36567f148125
Step 3/40 : RUN apt-get install -y build-essential wget git autoconf
---> Running in 99e8b3976c14
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package build-essential
E: Unable to locate package wget
E: Unable to locate package git
E: Unable to locate package autoconf
The command '/bin/sh -c apt-get install -y build-essential wget git autoconf' returned a non-zero code: 100

Commit 202f900 changes CGP training behavior

The change to making the --printSampled flag default to false breaks the way in which CAT automates training CGP models. Particularly, the step after running the training step involves concatenating the $refGenome.sampled_GFs.gff, exonCands.$refGenome.gff3, and orthoExons.$refGenome.gff3 files together for a subset of the input MAF chunks, then running augustus with that combined file passed to --trainFeatureFile.

Valgrind for system tests

We should adopt valgrind for all system tests, i.e., tests within the Makefile that execute a binary of Augustus (augustus, etrain, ...).

This should be a very simple change to the make file and could be very efficient to detect memory leaks.

Issues with parameter training for CGP (in CAT)

Hey Mario,

This is related to issue #45. A user of CAT performed the automated training of CGP and got 0 genes from the prediction step. I got ahold of his data, and found that I was able to get gene predictions if I instead used the default logistic regression parameters that Stefanie provided. However, I don't really have any insight into what is wrong with the trained parameter set. I have copied this parameter set below. Do you have any ideas on why the training failed in this case?

/CompPred/exon_score0   -65.5168
/CompPred/exon_score1   2.67387
/CompPred/exon_score2   4.58782
/CompPred/exon_score3   1.24357
/CompPred/exon_score4   187.464
/CompPred/exon_score5   -140.957
/CompPred/exon_score6   8.14064
/CompPred/exon_score7   -35.6999
/CompPred/exon_score8   2.68413
/CompPred/exon_score9   -0.081773
/CompPred/exon_score10  -6.79179
/CompPred/exon_score11  0.140883
/CompPred/exon_score12  -6.76836
/CompPred/exon_score13  -24.9682
/CompPred/exon_score14  11.7074
/CompPred/intron_score0 -6.27222
/CompPred/intron_score1 -0.113227
/CompPred/intron_score2 -0.725973
/CompPred/intron_score3 0.072284

For comparison, these are the default (annotated) parameters that Stefanie put in CAT:

# scores from logistic regression

# exon scores

/CompPred/exon_score0	-3.5769278	# intercept
/CompPred/exon_score1	-2.4597280	# for not having omega
/CompPred/exon_score2	0.5009572 	# for not beeing an OE 
/CompPred/exon_score3	0.3228047	# log length
/CompPred/exon_score4	4.7887919	# posterior probability
/CompPred/exon_score5	0.4727347	# average base probability
/CompPred/exon_score12	-1.9212818	# for not beeing sampled


# ortho exon scores

/CompPred/exon_score6	0.3175849	# posterior mean omega (0 if no omega was calculated)
/CompPred/exon_score7	-2.5411494	# variance of omega (0 if no omega was calculated)
/CompPred/exon_score8	-5.4283278	# conservation
/CompPred/exon_score9	-0.0028231	# containment
/CompPred/exon_score10	4.4241493 	# diversity
/CompPred/exon_score11	0		# number of species involved in this ortho exon 
/CompPred/exon_score13	-2.9764014	# conservation * diversity
/CompPred/exon_score14	-1.4184841	# omega * diversity
/CompPred/exon_score15	5.0224288	# number of species involved in this ortho exon divided by clade size

# intron scores

/CompPred/intron_score0	-4.693283	# intercept
/CompPred/intron_score1	 5.772046	# posterior probability
/CompPred/intron_score2	 4.170951	# average base probability
/CompPred/intron_score3	-0.261357	# log length

##############

# scores for single species mode MEA (trained without ortho exons)

lg_exon_score0		-9.25162	# intercept
lg_exon_score1		 4.85176	# posterior probability
lg_exon_score2		 6.69845	# average base probability
lg_exon_score3		 0.22179	# log length

@francicco

Species list

On my compute cluster, as a user without root privileges, I have access to AUGUSTUS 3.3. installed some time June 2018. I do not have access to a newer or older version.

I noticed a post by Katharina Hoff at #12, in Aug 2018, who says that "species list info was updated in the code"

So obviously I do not have the updated AUGUSTUS code that accurately reflects the species available, with their training sets.

With that as context, I have a few questions. Hopefully an expert user or author(s) can respond asap. Thanks!

A What is actual list of species with training sets for AUGUSTUS 3.3 ?

B. Is version 3.3 the latest AUGUSTUS version? if not, what is latest version?

C. In this newest version, can you please share the species with training sets?

Add namespaces to C++ code

The code is currently without namespaces. This must be changed, in order to prevent possible conflicts with other libraries.

Couldn't generate XXX_exon_probs.pbl in the directory "config/species/XXX/"

In BRAKER pipeline, I got augustus.hints.stderr

path_to_augustus/config/../bin/augustus: ERROR
ExonModel: Couldn't open file /path_to_augustus/config/species/XXX/XXX_exon_probs.pbl

I searched XXX_exon_probs.pbl in the directory, but I couldn't find it.
However, other files XXX_metapars.cfg, XXX_metapars.cgp.cfg, XXX_metapars.utr.cfg, XXX_parameters.cfg and XXX_weightmatrix.txt exist in the directory.

WebAUGUSTUS not accepting correclty formatted hints file

A user informed us that WebAUGUSTUS currently rejects his hints file wiht source=M hints. The file looks like this:

CTG93    EVMpout    start    124803    124805    0    -    0    source=M
CTG93    EVMpout    CDS    124698    124805    0    -    0    source=M
CTG93    EVMpout    intron    124654    124697    0    -    0    source=M
CTG93    EVMpout    CDS    124534    124653    0    -    0    source=M
CTG93    EVMpout    intron    124474    124533    0    -    0    source=M
CTG93    EVMpout    CDS    124412    124473    0    -    0    source=M
CTG93    EVMpout    intron    124369    124411    0    -    0    source=M
CTG93    EVMpout    CDS    122295    124368    0    -    1    source=M
CTG93    EVMpout    stop    122292    122294    0    -    0    source=M
CTG12    EVMpout    genicpart    690262    690753    0    -    .    source=M
CTG12    EVMpout    start    690751    690753    0    -    0    source=M
CTG12    EVMpout    CDS    690372    690753    0    -    0    source=M
CTG12    EVMpout    intron    690330    690371    0    -    0    source=M
CTG12    EVMpout    CDS    690265    690329    0    -    2    source=M
CTG12    EVMpout    stop    690262    690264    0    -    0    source=M```

We need to check whether this is a format validation bug.

How do I add a training set

I need E.Coli which is in the README list but not in the results from my local command
How can I get the same list as that posted on Github?
Thanks

augustus --species=help
usage:
augustus [parameters] --species=SPECIES queryfilename

where SPECIES is one of the following identifiers

identifier species
human Homo sapiens
fly Drosophila melanogaster
arabidopsis Arabidopsis thaliana
brugia Brugia malayi
aedes Aedes aegypti
tribolium Tribolium castaneum
schistosoma Schistosoma mansoni
tetrahymena Tetrahymena thermophila
galdieria Galdieria sulphuraria
maize Zea mays
toxoplasma Toxoplasma gondii
caenorhabditis Caenorhabditis elegans
(elegans) Caenorhabditis elegans
aspergillus_fumigatus Aspergillus fumigatus
aspergillus_nidulans Aspergillus nidulans
(anidulans) Aspergillus nidulans
aspergillus_oryzae Aspergillus oryzae
aspergillus_terreus Aspergillus terreus
botrytis_cinerea Botrytis cinerea
candida_albicans Candida albicans
candida_guilliermondii Candida guilliermondii
candida_tropicalis Candida tropicalis
chaetomium_globosum Chaetomium globosum
coccidioides_immitis Coccidioides immitis
coprinus Coprinus cinereus
coprinus_cinereus Coprinus cinereus
cryptococcus_neoformans_gattii Cryptococcus neoformans gattii
cryptococcus_neoformans_neoformans_B Cryptococcus neoformans neoformans
cryptococcus_neoformans_neoformans_JEC21 Cryptococcus neoformans neoformans
(cryptococcus) Cryptococcus neoformans
debaryomyces_hansenii Debaryomyces hansenii
encephalitozoon_cuniculi_GB Encephalitozoon cuniculi
eremothecium_gossypii Eremothecium gossypii
fusarium_graminearum Fusarium graminearum
(fusarium) Fusarium graminearium
histoplasma_capsulatum Histoplasma capsulatum
(histoplasma) Histoplasma capsulatum
kluyveromyces_lactis Kluyveromyces lactis
laccaria_bicolor Laccaria bicolor
lodderomyces_elongisporus Lodderomyces elongisporus
magnaporthe_grisea Magnaporthe grisea
neurospora_crassa Neurospora crassa
(neurospora) Neurospora crassa
phanerochaete_chrysosporium Phanerochaete chrysosporium
(pchrysosporium) Phanerochaete chrysosporium
pichia_stipitis Pichia stipitis
rhizopus_oryzae Rhizopus oryzae
saccharomyces_cerevisiae_S288C Saccharomyces cerevisiae
saccharomyces_cerevisiae_rm11-1a_1 Saccharomyces cerevisiae
(saccharomyces) Saccharomyces cerevisiae
schizosaccharomyces_pombe Schizosaccharomyces pombe
ustilago_maydis Ustilago maydis
(ustilago) Ustilago maydis
yarrowia_lipolytica Yarrowia lipolytica
nasonia Nasonia vitripennis
tomato Solanum lycopersicum
chlamydomonas Chlamydomonas reinhardtii
amphimedon Amphimedon queenslandica
pea_aphid Acyrthosiphon pisum
mnemiopsis_leidyi Mnemiopsis leidyi
nematostella_vectensis Nematostella vectensis
ciona Ciona intestinalis
strongylocentrotus_purpuratus Strongylocentrotus purpuratus

augustus: ERROR

hi,
when i created the gene.gb.train and gene.gb.test, and use augustus to test the model, i found such error:
augustus: ERROR
Tried to sample from empty list.
I have no idea what's wrong, and is any one help me ?

Cleanup license information

Parts of the project have license headers, other do not. This should be harmonized.

Moreover, a LICENSE file should be added on the top level of the repository.

Augustus CGP test

Hi,

I'm running CAT and I'm having an error when the pipeline runs AugustusCgp. One of the developers said that "this could be because something is wrong with the version of Augustus" I'm running. Is there a sample test I can use to check if Augustus works fine?

Thanks a lot
Best,
F

Make AUGUSTUS core platform independent

Currently, AUGUSTUS is build on Linux systems. This should be changed to allow platform independence such that a Windows version of AUGUSTUS can be created

Refactor code to separate concerns

Currently, there is only one big source folder. The code for algorithms, data formats, and user interfaces should be separated into different modules.

Remodel folder structure

The biggest issue are different folders for different "modules" to structure the source code, a new folder for tests, and a lib folder. This is no simple copy&paste, as the integrity of the build must be maintained at all times.

Augustus CGP crash "upper case dna"

I have a subset of alignment partitions which consistently fail when running Augutsus CGP with the below error message

augustus: ERROR
Internal Error: upper case dna

Attached is a log file for one of the partitions.

Any help or further details on this error would be appreciated.

Thanks,
cgp-HLmyoMyo4_00000060.log

memory issue with v3.3

I realize v3.3 is now three versions old, but thought I would put this on the record anyway.

I was invoking our HPC's installed AUGUSTUS 3.3 from a BUSCO config file. Kept getting core dumps and quite bad (incomplete) BUSCO results. As an alternate I tried using the downloadable BUSCO VM and it worked perfectly; the AUGUSTUS version in that is 3.2.2. I reported all this to our HPC team. Investigation of the faulty process showed this:


Mar 26 13:03:33 c33-01 kernel: memory: usage 41943040kB, limit 41943040kB, failcnt 9376
Mar 26 13:03:33 c33-01 kernel: memory+swap: usage 41943040kB, limit 41943040kB, failcnt 0
Mar 26 13:03:33 c33-01 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Mar 26 13:03:33 c33-01 kernel: Memory cgroup stats for /slurm/uid_1415749/job_347628/step_batch: cache:0KB rss:41943040KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:436608KB active_anon:4150
6392KB inactive_file:0KB active_file:0KB unevictable:0KB
Mar 26 13:03:33 c33-01 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Mar 26 13:03:33 c33-01 kernel: [27782] 1415749 27782    28366      452       9        0             0 slurm_script
Mar 26 13:03:33 c33-01 kernel: [27800] 1415749 27800    73914     3631      60        0             0 python
Mar 26 13:03:33 c33-01 kernel: [28266] 1415749 28266 10509235 10483962   20511        0             0 augustus
Mar 26 13:03:33 c33-01 kernel: Memory cgroup out of memory: Kill process 28266 (augustus) score 1001 or sacrifice child
Mar 26 13:03:33 c33-01 kernel: Killed process 28266 (augustus) total-vm:42036940kB, anon-rss:41933020kB, file-rss:2828kB, shmem-rss:0kB

Our IT maven said the v3.3 problem was a 'memory leak'
"It turned out augustus 3.3 does not work well, memory usage continues increase till run out of memory"

So he built me a singularity package of BUSCO (using AUGUSTUS 3.2.2) and it works as it should. No core dumps, good BUSCO results.

I haven't yet tried the newer 3.3.1/3.3.2 versions of AUGUSTUS for comparison.

Setup Travis build

Setup Travis as CI system to ensure the build is always working. Later, this should be extended to also run tests or be replaced by another CI system to performs this task.

gtf2gff.pl: transcript jg1.t1 has conflicting gene parents: and jg1

Dear AUGUSTUS,

As you know Augustus is used in Braker. I am/ and several others are having issues with gft2gff.pl script. (Gaius-Augustus/BRAKER#9) .. Braker calls it as follows:

cat /home/pjt6/Desktop/newton_final/prot_no_utr/augustus.hints.gtf | perl -ne 'if(m/\tAUGUSTUS\t/) {print $_;}' | perl /home/pjt6/programs/Augustus/scripts/gtf2gff.pl --gff3 --out=/home/pjt6/Desktop/newton_final/prot_no_utr/augustus.hints.gff3

ERROR:
transcript jg1.t1 has conflicting gene parents: and jg1. Remember: In GTF txids need to be overall unique. at /home/pjt6/programs/Augustus/scripts/gtf2gff.pl line 120, line 218372.

It would be amazing if this could be solved. Will you please help?

cheers,

Pete

FeatureCollection::esource: invalid source key: E

Hi,

I tried to run Augustus, but got this notification
FeatureCollection::esource: invalid source key: E

./bin/augustus: ERROR
FeatureCollection::esource: invalid source key: E

The script I used is
./bin/augustus --genemodel=partial --protein=on --introns=on --hintsfile=hints1.gff --species=bombus_impatiens1 *.contigs.fasta > augout0305.txt

It was interesting that when I delete --hintsfile parameter, Augustus can run. But I want to include the intron hints. How can I do with this? Thank you

Loading example hint file in CGP mode with MySQL support fails

Loading example hint file in CGP mode with MySQL support mentioned in README-cgp.md fails:

load2db --species=hg19 --dbaccess=vertebrates,localhost,cgp,db_passwd examples/cgp/human.hints.gff

Trying to connect to database vertebrates on server localhost as user cgp using password db_passwd ...
Looks like human.hints.gff is in gff format.
terminate called after throwing an instance of 'mysqlpp::BadQuery'
what(): Out of range value for column 'start' at row 1
Aborted (core dumped)

Create README.md from README.TXT

The README.md is better formated and gives a nive front page on github. As part of this, the contents may also be revised.

For example, I just compiled augustus from scratch in a clean Ubuntu VM. Finding all the dependencies to install with apt-get in the readme was a bit annoying.

I suggest to extend the readme with an apt-get statement that installs dependencies. This is also a precursor for an automated deployment of augustus, as well as for CI with Travis.

Please use this issue for discussing all further changes that should be made, we can close this issue once we are satisfied with the new README.

augustus-3.3.2: /bin/sh: line 0: cd: ../bin: No such file or directory

Hi,
it seems the Makefile has some wrong expectations of directory structure.

...
cd homGeneMapping; make clean;
make[2]: Entering directory '/scratch/var/tmp/portage/sci-biology/augustus-3.3.2/work/Augustus-3.3.2/auxprogs/homGeneMapping'
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
(cd src; make clean; cd ../bin; rm -f homGeneMapping; cd ../../../bin; rm -f homGeneMapping)
make[3]: Entering directory '/scratch/var/tmp/portage/sci-biology/augustus-3.3.2/work/Augustus-3.3.2/auxprogs/homGeneMapping/src'
rm -f homGeneMapping gene.o genome.o
make[3]: Leaving directory '/scratch/var/tmp/portage/sci-biology/augustus-3.3.2/work/Augustus-3.3.2/auxprogs/homGeneMapping/src'
/bin/sh: line 0: cd: ../bin: No such file or directory
/bin/sh: line 0: cd: ../../../bin: No such file or directory
make[2]: Leaving directory '/scratch/var/tmp/portage/sci-biology/augustus-3.3.2/work/Augustus-3.3.2/auxprogs/homGeneMapping'
cd joingenes; make clean;
make[2]: Entering directory '/scratch/var/tmp/portage/sci-biology/augustus-3.3.2/work/Augustus-3.3.2/auxprogs/joingenes'
...

Luckily, it is re-entered later:

cd homGeneMapping; make;
make[2]: Entering directory '/scratch/var/tmp/portage/sci-biology/augustus-3.3.2/work/Augustus-3.3.2/auxprogs/homGeneMapping'
make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
(cd src; make)
make[3]: Entering directory '/scratch/var/tmp/portage/sci-biology/augustus-3.3.2/work/Augustus-3.3.2/auxprogs/homGeneMapping/src'
g++ -c -Wall -Wno-sign-compare -ansi -pedantic -std=c++0x -pthread -O2 -O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize  -O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize  -Wl,-O1 -Wl,--as-needed -o gene.o gene.cc -I../include
g++ -c -Wall -Wno-sign-compare -ansi -pedantic -std=c++0x -pthread -O2 -O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize  -O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize  -Wl,-O1 -Wl,--as-needed -o genome.o genome.cc -I../include
g++ -Wall -Wno-sign-compare -ansi -pedantic -std=c++0x -pthread -O2 -O2 -pipe -mpclmul -mpopcnt -march=native -ftree-vectorize   -o homGeneMapping main.cc gene.o genome.o -I../include  
cp homGeneMapping ../../../bin/homGeneMapping
make[3]: Leaving directory '/scratch/var/tmp/portage/sci-biology/augustus-3.3.2/work/Augustus-3.3.2/auxprogs/homGeneMapping/src'
make[2]: Leaving directory '/scratch/var/tmp/portage/sci-biology/augustus-3.3.2/work/Augustus-3.3.2/auxprogs/homGeneMapping'
cd joingenes; make;

Use cp -p instead of cp homGeneMapping ../../../bin/homGeneMapping or even better, use g++ -o ../../../bin/homGeneMapping directly like in other cases through the project.

Create automated tests

Create automated tests for Augustus and execute them as part of the continuous integration

proteinprofile broken on Apple Clang

When compiling on macOS 10.14.4, proteinprofile is broken:

 ❮ AUGUSTUS_CONFIG_PATH=$(pwd)/config ./bin/augustus --species=human --proteinprofile=examples/profile/HsDHC.prfl examples/example.fa
# This output was generated with AUGUSTUS (version 3.3.2).
# AUGUSTUS is a gene prediction tool written by M. Stanke ([email protected]),
# O. Keller, S. König, L. Gerischer, L. Romoth and Katharina Hoff.
# Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008),
# Using native and syntenically mapped cDNA alignments to improve de novo gene finding
# Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013
# No extrinsic information on sequences given.
# Initialising the parameters using config directory /Volumes/Data/Projects/augustus-3.3.2/config/ ...

./bin/augustus: ERROR
	PP::Profile: Error parsing pattern file"examples/profile/HsDHC.prfl", line 9.

If I change CC=gcc-9 (using Homebrew gcc9) in src/Makefile the problem is resolved.

❯ AUGUSTUS_CONFIG_PATH=$(pwd)/config ./bin/augustus --species=human --proteinprofile=examples/profile/HsDHC.prfl examples/example.fa                [16:51:13]
# This output was generated with AUGUSTUS (version 3.3.2).
# AUGUSTUS is a gene prediction tool written by M. Stanke ([email protected]),
# O. Keller, S. König, L. Gerischer, L. Romoth and Katharina Hoff.
# Please cite: Mario Stanke, Mark Diekhans, Robert Baertsch, David Haussler (2008),
# Using native and syntenically mapped cDNA alignments to improve de novo gene finding
# Bioinformatics 24: 637-644, doi 10.1093/bioinformatics/btn013
# No extrinsic information on sequences given.
# Initialising the parameters using config directory /Volumes/Data/Projects/augustus-3.3.2/config/ ...
Warning: Block HsDHC_AG is not significant enough, removed from profile.
Warning: Block HsDHC_AO is not significant enough, removed from profile.
Warning: Block HsDHC_BJ is not significant enough, removed from profile.
# Using protein profile DHC
# --[583..1610]--> HsDHC_F (24) <--[62..102]--> HsDHC_H (65) <--[38..58]--> HsDHC_J (46) <--[29..54]--> HsDHC_L (15) <--[43..76]--> HsDHC_N (20) <--[28..48]--> HsDHC_P (42) <--[4..29]--> HsDHC_Q (125) <--[13..34]--> HsDHC_R (16) <--[2..8]--> HsDHC_S (61) <--[23..47]--> HsDHC_T (17) <--[3..4]--> HsDHC_U (16) <--[29..42]--> HsDHC_W (39) <--[8..29]--> HsDHC_X (19) <--[0..2]--> HsDHC_Y (11) <--[10..66]--> HsDHC_Z (70) <--[74..141]--> HsDHC_AA (11) <--[70..156]--> HsDHC_AC (20) <--[43..124]--> HsDHC_AE (11) <--[2..3]--> HsDHC_AF (17) <--[88..119]--> HsDHC_AJ (20) <--[15..46]--> HsDHC_AK (23) <--[59..203]--> HsDHC_AL (21) <--[1..2]--> HsDHC_AM (31) <--[35..44]--> HsDHC_AP (26) <--[15..24]--> HsDHC_AQ (23) <--[13..17]--> HsDHC_AS (24) <--[152..202]--> HsDHC_AV (25) <--[6..22]--> HsDHC_AW (8) <--[201..260]--> HsDHC_AZ (19) <--[4..9]--> HsDHC_BA (17) <--[26..208]--> HsDHC_BC (12) <--[1..5]--> HsDHC_BD (134) <--[2..3]--> HsDHC_BE (13) <--[130..240]--> HsDHC_BG (23) <--[19..38]--> HsDHC_BI (20) <--[29..45]--> HsDHC_BK (23) <--[8..35]--> HsDHC_BL (9) <--[3..8]--> HsDHC_BM (20) <--[20..35]--> HsDHC_BO (44) <--[5..16]--> HsDHC_BP (31) <--[34..51]--> HsDHC_BR (16) <--[62..98]--> HsDHC_BS (33) <--[53..102]--> HsDHC_BV (21) <--[31..78]--> HsDHC_BX (12) <--[28..47]--> HsDHC_BZ (11) <--[16..30]--> HsDHC_CB (10) <----
# human version. Using default transition matrix.
# Looks like examples/example.fa is in fasta format.
# We have hints for 0 sequences and for 0 of the sequences in the input set.
#
# ----- prediction on sequence number 1 (length = 9453, name = HS04636) -----
#
# Predicted genes for sequence number 1 on both strands

Note, that this also means that the Homebrew formula is broken (https://github.com/Homebrew/homebrew-core/blob/master/Formula/augustus.rb).

bam2hints does not give the information wether the hint is on forward stand or reverse strand

When use bam2hints to produce hint file from tophat or hisat bam, the result won't tell if the hint is on forward or reverse strand, would that be a problem for the final running of Augustus?
"
HiC_scaffold_25 b2h intron 2629 10234 0 . . pri=4;src=E
HiC_scaffold_25 b2h intron 5437 5663 0 . . mult=2;pri=4;src=E
HiC_scaffold_25 b2h intron 10721 11018 0 . . mult=2;pri=4;src=E
HiC_scaffold_25 b2h intron 11131 12090 0 . . pri=4;src=E
HiC_scaffold_25 b2h intron 12235 26531 0 . . pri=4;src=E
HiC_scaffold_25 b2h intron 26355 26531 0 . . mult=2;pri=4;src=E
HiC_scaffold_25 b2h intron 26719 32677 0 . . mult=3;pri=4;src=E
HiC_scaffold_25 b2h intron 32822 36077 0 . . mult=2;pri=4;src=E
HiC_scaffold_25 b2h intron 36287 37007 0 . . mult=3;pri=4;src=E
HiC_scaffold_25 b2h intron 37282 38301 0 . . pri=4;src=E

"

1group1gene setting

Hello

When a source parameter is defined, e.g.

[SOURCE-PARAMETERS]
GLD 1group1gene
JR 1group1gene
XNT individual_liability
HU 1group1gene
PASA 1group1gene
NANO 1group1gene

If one is defined (let's PASA) in the SOURCE-PARAMETERS but NOT defined in the [SOURCES]
e.g.

[SOURCES]
M JR HU F XNT RM GLD RCOV E NANO

then ALL the SOURCE-PARAMETERS below the undefined one (PASA in this scenario) are ignored. Augustus doesn't croak, it just carries on happily ignoring these sources:

# Setting 1group1gene for GLD.
# Setting 1group1gene for JR.
# Setting individual_liability for XNT.
# Setting 1group1gene for HU.

The correct output should have been:

# Setting 1group1gene for GLD.
# Setting 1group1gene for JR.
# Setting individual_liability for XNT.
# Setting 1group1gene for HU.
# Setting 1group1gene for NANO.

Which what you'd get if you delete the PASA line.

this is true for the latest augustus git.

SIGSEGV during Augustus PB step on cat test data

Hi folks,

Apologies if I've missed something dumb during installation here, but I'm a bit stuck. I'm running cat on a slurm system, but using a single machine just to try and minimise complications. I seem to be running into issues with my Augustus installation. It seems to get through the Augustus CGP step fine, but then fails with a SIGSEGV on the Augustus PB step. I originally posted this issue there (ComparativeGenomicsToolkit/Comparative-Annotation-Toolkit#118), but @ifiddes-10x suggested posting here.

augustus --softmasking=1 --allow_hinted_splicesites=atac --alternatives-from-evidence=1 --UTR=1 --hintsfile=/nesi/nobackup/uoo02424/bin/temp_test_work_dir/toil-870fd876-08a7-4827-89c8-fd670c9d0bd2-a301ba27-672f-4856-ba5a-b6ec53983f3a/tmpsSFYBW/98d53963-ea74-4416-a1f3-538e1faac276/tmphdATE8.tmp --extrinsicCfgFile=/nesi/nobackup/uoo02424/bin/temp_test_work_dir/toil-870fd876-08a7-4827-89c8-fd670c9d0bd2-a301ba27-672f-4856-ba5a-b6ec53983f3a/tmpsSFYBW/98d53963-ea74-4416-a1f3-538e1faac276/tmpvtGP8k.tmp --species=human --/augustus/verbosity=0 --predictionStart=-0 --predictionEnd=-0 /scale_wlg_nobackup/filesets/nobackup/uoo02424/bin/temp_test_work_dir/toil-870fd876-08a7-4827-89c8-fd670c9d0bd2-a301ba27-672f-4856-ba5a-b6ec53983f3a/tmpsSFYBW/98d53963-ea74-4416-a1f3-538e1faac276/wbn033.40237.3604811865.tmp

Running the command outside of cat also gives an immediate Segmentation Fault. From previous issues I thought filename length could be an issue, so I copied out the offending files and renamed them (also attached here with the modified names), but still ran into the same issue running it outside of cat:

augustus --softmasking=1 --allow_hinted_splicesites=atac --alternatives-from-evidence=1 --UTR=1 --hintsfile=test/hintsfile.tmp --extrinsicCfgFile=test/extrinsicCfgFile.tmp --species=human --/augustus/verbosity=0 --predictionStart=-0 --predictionEnd=-0 test/input.tmp

extrinsicCfgFile.tmp.txt
hintsfile.tmp.txt
input.tmp.txt

@ifiddes-10x had no trouble running augustus with these files so I strongly suspect this might have something to do with our cluster architecture (RedHat)/dependencies but I'm not savvy enough to track it down. I followed the instructions for compiling Augustus CGP in this repository ('''b8ce1b0'''), and I'm using modules for Samtools (1.8) and bedtools (2.26.0). I've tried using both the boost/zlib/samtools modules and installing both of these from scratch, but am still getting the same outcome. Any help would be really appreciated. I did a gdb traceback and got the following in case it helps track down the issue:

#0  0x00002aaaaace1410 in boost::iostreams::detail::gzip_header::reset() ()
   from /lib64/libboost_iostreams.so.1.53.0
#1  0x000000000041e384 in gzip_header (this=0x7fffffff7c20)
    at /nesi/nobackup/uoo02424/bin/boost_install_1_67_0/include/boost/iostreams/filter/gzip.hpp:327
#2  boost::iostreams::basic_gzip_decompressor<std::allocator<char> >::basic_gzip_decompressor (this=0x7fffffff7c10, window_bits=<optimized out>,
    buffer_size=<optimized out>)
    at /nesi/nobackup/uoo02424/bin/boost_install_1_67_0/include/boost/iostreams/filter/gzip.hpp:736
#3  0x000000000041b9c9 in GBSplitter::GBSplitter (this=0x7fffffff8290,
    fname=...) at genbank.cc:542
#4  0x000000000041c2a2 in GBProcessor::GBProcessor (this=0x7fffffff8290,
    filename=...) at genbank.cc:27
#5  0x000000000040a69b in main (argc=12, argv=0x7fffffff8e88)
    at augustus.cc:141

augustus do not follow hints

Here is the a output from augustus (notice the bold line)

64_tag AUGUSTUS gene 1248552 1250684 0.95 - . g55
64_tag AUGUSTUS transcript 1248552 1250684 0.95 - . g55.t1
64_tag AUGUSTUS stop_codon 1248552 1248554 . - 0 transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS intron 1248731 1248843 1 - . transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS intron 1249034 1249124 1 - . transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS intron 1249249 1249653 1 - . transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS intron 1249826 1250040 1 - . transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS intron 1250082 1250142 0.95 - . transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS intron 1250175 1250250 1 - . transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS intron 1250406 1250491 1 - . transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS CDS 1248552 1248730 1 - 2 transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS CDS 1248844 1249033 1 - 0 transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS CDS 1249125 1249248 1 - 1 transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS CDS 1249654 1249825 1 - 2 transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS CDS 1250041 1250081 0.95 - 1 transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS CDS 1250143 1250174 1 - 0 transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS CDS 1250251 1250405 1 - 2 transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS CDS 1250492 1250684 1 - 0 transcript_id "g55.t1"; gene_id "g55";
64_tag AUGUSTUS start_codon 1250682 1250684 . - 0 transcript_id "g55.t1"; gene_id "g55";

Here is the hint it used (for CDSpart and exonpart I cut off 15 from both site)

64_tag b2h intron 1249826 1250040 . . . mult=8519;pri=4;src=E
64_tag b2h intron 1249826 1250040 . . . mult=9944;pri=4;src=E
64_tag GeneWise intron 1249826 1250040 . - . grp=64_tag:1250684:1248552-F-ENSDARP00000037871.5;pri=5;src=WISE
64_tag xnt2h intron 1249826 1250040 . - . grp=ENSGMOP00000009388.1;pri=4;src=XNT
64_tag assembler-sample_mydb_pasa exonpart 1250056 1250159 . - . grp=align_938114;pri=4;src=W
64_tag Cufflinks exonpart 1250056 1250159 . - . grp=CUFF.208163.1;pri=4;src=W
64_tag GeneWise CDSpart 1250056 1250159 . - 0 grp=64_tag:1250684:1248552-F-ENSDARP00000037871.5;pri=5;src=WISE
64_tag xnt2h CDSpart 1250056 1250159 . - . grp=ENSGMOP00000009388.1;pri=4;src=XNT
64_tag SNAP CDSpart 1250065 1250162 . + . grp=64_tag-snap.184;pri=3;src=AB
64_tag GeneMark.hmm CDSpart 1250086 1250159 . - 0 grp=15413_g;pri=3;src=AB
64_tag SNAP CDSpart 1250086 1250159 . - . grp=64_tag-snap.183;pri=3;src=AB

64_tag b2h intron 1250175 1250250 . . . mult=10868;pri=4;src=E
64_tag b2h intron 1250175 1250250 . . . mult=11309;pri=4;src=E
64_tag GeneWise intron 1250175 1250250 . - . grp=64_tag:1250684:1248552-F-ENSDARP00000037871.5;pri=5;src=WISE
64_tag xnt2h intron 1250175 1250250 . - . grp=ENSGMOP00000009388.1;pri=4;src=XNT
64_tag SNAP stop 1250245 1250409 . + . grp=64_tag-snap.184;pri=3;src=AB
64_tag assembler-sample_mydb_pasa exonpart 1250266 1250390 . - . grp=align_938114;pri=4;src=W
64_tag Cufflinks exonpart 1250266 1250390 . - . grp=CUFF.208163.1;pri=4;src=W
64_tag GeneMark.hmm CDSpart 1250266 1250390 . - 2 grp=15413_g;pri=3;src=AB
64_tag GeneWise CDSpart 1250266 1250390 . - 2 grp=64_tag:1250684:1248552-F-ENSDARP00000037871.5;pri=5;src=WISE
64_tag SNAP CDSpart 1250266 1250390 . - . grp=64_tag-snap.183;pri=3;src=AB
64_tag xnt2h CDSpart 1250266 1250390 . - . grp=ENSGMOP00000009388.1;pri=4;src=XNT
64_tag b2h intron 1250338 1263759 . . . pri=4;src=E
64_tag b2h intron 1250405 1250491 . . . mult=11;pri=4;src=E
64_tag b2h intron 1250406 1250491 . . . mult=13970;pri=4;src=E
64_tag b2h intron 1250406 1250491 . . . mult=14529;pri=4;src=E
64_tag GeneWise intron 1250406 1250491 . - . grp=64_tag:1250684:1248552-F-ENSDARP00000037871.5;pri=5;src=WISE
64_tag xnt2h intron 1250406 1250491 . - . grp=ENSGMOP00000009388.1;pri=4;src=XNT
64_tag SNAP start 1250492 1250684 . - . grp=64_tag-snap.183;pri=3;src=AB
64_tag assembler-sample_mydb_pasa exonpart 1250507 1250684 . - . grp=align_938114;pri=4;src=W
64_tag Cufflinks exonpart 1250507 1250684 . - . grp=CUFF.208163.1;pri=4;src=W
64_tag GeneMark.hmm CDSpart 1250507 1250669 . - 0 grp=15413_g;pri=3;src=AB
64_tag GeneWise CDSpart 1250507 1250669 . - 0 grp=64_tag:1250684:1248552-F-ENSDARP00000037871.5;pri=5;src=WISE
64_tag xnt2h CDSpart 1250507 1250669 . - . grp=ENSGMOP00000009388.1;pri=4;src=XNT
64_tag GeneMark.hmm start 1250682 1250684 . - 0 grp=15413_g;pri=3;src=AB
64_tag b2h intron 1250694 1251791 . . . mult=7;pri=4;src=E
64_tag b2h intron 1250694 1251791 . . . mult=8;pri=4;src=E
64_tag b2h intron 1250700 1251791 . . . mult=10158;pri=4;src=E
64_tag b2h intron 1250700 1251791 . . . mult=10495;pri=4;src=E
64_tag b2h intron 1250700 1251819 . . . mult=5;pri=4;src=E
64_tag b2h intron 1250700 1251819 . . . mult=5;pri=4;src=E


Hence you could see that the intron in bold should never be exist, but augustus is so stubborn to give that intron out, why is that?
I have tried many times with different hint parameters, the intron is always there.
Only this hint, the others are following hint well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.