Giter Site home page Giter Site logo

hillerlab / cesar2.0 Goto Github PK

View Code? Open in Web Editor NEW
26.0 26.0 10.0 22.28 MB

License: MIT License

Makefile 0.71% C 94.33% C++ 0.52% Shell 0.22% Perl 0.93% HTML 0.08% AngelScript 2.76% TSQL 0.02% PostScript 0.02% Python 0.01% Gnuplot 0.23% Roff 0.13% M4 0.05% ActionScript 0.01% CAP CDS 0.01%

cesar2.0's People

Contributors

kirilenkobm avatar michaelhiller avatar pschwede avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cesar2.0's Issues

Versioned package release request

I'm working to make Cesar2.0 available on Bioconda.

Please refer. -> https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository#creating-a-release

I need your help creating a versioned release to use for the Bioconda recipe. Once this is added to Bioconda, it'll also be made available as a Docker container from Biocontainers, and as a Singularity image from the Galaxy Project. The Bioconda bot will also recognize future releases and automatically update the recipe.

Please let me know

Thanks
Jay

warning message

Hi,

I have two questions.

First
I ran the CESAR2.0 using CESARTest you provide ('General workflow of annotating genes in several genomes' part )
but repeated warnings:
WARNING! src/Model.c:254 multi_exon(): Couldn't find declared stop codon in reference 0. (counting from zero)
WARNING! src/Model.c:226 multi_exon(): Couldn't find declared start codon in reference 0. (counting from zero)
WARNING! src/Model.c:226 multi_exon(): Couldn't find declared start codon in reference 0. (counting from zero)
WARNING! src/Model.c:254 multi_exon(): Couldn't find declared stop codon in reference 0. (counting from zero)

I wonder what this warning means.

Second
there is a difference in the number of genes(ENST) between genePred result (Results/*.gp) you provided and i tested.
In the case of mm10,
Results/mm10.gp ENST count : 18601, geneAnnotation/mm10.gp ENST count : 18600
In the case of falChe1,
Results/falChe1.gp ENST count : 17590, geneAnnotation/falChe1.gp ENST count : 17588
In the case of galGal4,
Results/galGal4.gp ENST count : 17556, geneAnnotation/galGal4.gp ENST count : 17549

so I wonder why this difference is happening.

Thanks,
Mindy

"error while loading shared libraries: libmysqlclient.so.18" when I try to run the miniExample

Hi,

I am trying to see if cesar2 works correctly on my computer, and encountered the following error with the mini example. I was wondering if you could help? Thank you so much!

Running "annotateGenesViaCESAR.pl POLR3K hg38_oryAfe1.bb twoGenes.gp.forCESAR hg38 oryAfe1 CESARoutput 2bitDir $profilePath -maxMemory 1" gave me the following error:

Processing gene 'POLR3K'
twoBitToFa: error while loading shared libraries: libmysqlclient.so.18: cannot open shared object file: No such file or directory
substr outside of string at /home/sidi/programs/CESAR2.0/tools/annotateGenesViaCESAR.pl line 169.
Use of uninitialized value $seq in uc at /home/sidi/programs/CESAR2.0/tools/annotateGenesViaCESAR.pl line 750.
mafSpeciesSubset: error while loading shared libraries: libmysqlclient.so.18: cannot open shared object file: No such file or directory
Error running '/bin/bash -c 'set -o pipefail; mafExtract -region=chr16:53475-53586 hg38_oryAfe1.bb stdout|mafSpeciesSubset stdin NULL /dev/shm/exon.maf.Rwsm2jQ1w -speciesList=oryAfe1,hg38''

I already installed libmysqlclient by:
sudo apt-get install libmysqlclient-dev

libmysqlclient.so is in the following path, and "/usr/lib/x86_64-linux-gnu" is in my $LD_LIBRARY_PATH. I am on Ubuntu 20.04 LTS system.
/usr/lib/x86_64-linux-gnu/libmysqlclient.so

(I'm not sure if this is related, but I did not compile kent from source and I assume the command above is using the pre-compiled version in tools?)

Thank you!!

WARNING:Couldn't find declared stop codon in reference

hi,when i run the step 3 for a reference and a query speice ,for almost every multi exon gene , i get the warning :
"WARNING! src/Model.c:254 multi_exon(): Couldn't find declared stop codon in reference 0. (counting from zero)"
I wonder whether the reference or query lack stop codon when this warning emerge.
Thanks

One error occurred when using make in the :~/Software/CESAR2.0/kent/src dictionary.

One error occurred when using make in the :~/Software/CESAR2.0/kent/src dictionary.

cd axtChain && echo axtChain && make
axtChain
make[2]: Entering directory '/home/chenmx/Software/CESAR2.0/kent/src/hg/mouseStuff/axtChain'
/home/chenmx/miniconda/bin/x86_64-conda-linux-gnu-cc -O -g -o ../../../../bin//axtChain axtChain.o ../../../lib/x86_64/jkhgap.a ../../../lib/x86_64/jkweb.a -lpthread /usr/lib/x86_64-linux-gnu/libssl.a -lcrypto ../../../htslib/libhts.a -lm -lz
/home/chenmx/miniconda/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: ../../../lib/x86_64/jkweb.a(htmshell.o):/home/chenmx/Software/CESAR2.0/kent/src/lib/../inc/htmshell.h:163: multiple definition of htmlRecover'; ../../../lib/x86_64/jkweb.a(cheapcgi.o):/home/chenmx/Software/CESAR2.0/kent/src/lib/../inc/htmshell.h:163: first defined here /home/chenmx/miniconda/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: ../../../lib/x86_64/jkweb.a(portimpl.o):/home/chenmx/Software/CESAR2.0/kent/src/lib/../inc/htmshell.h:163: multiple definition of htmlRecover'; ../../../lib/x86_64/jkweb.a(cheapcgi.o):/home/chenmx/Software/CESAR2.0/kent/src/lib/../inc/htmshell.h:163: first defined here
/home/chenmx/miniconda/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lcrypto: No such file or directory
/home/chenmx/miniconda/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lz: No such file or directory
collect2: error: ld returned 1 exit status
make[2]: *** [../../../inc/userApp.mk:31: ../../../../bin//axtChain] Error 1
make[2]: Leaving directory '/home/chenmx/Software/CESAR2.0/kent/src/hg/mouseStuff/axtChain'
make[1]: *** [makefile:54: axtChain.all] Error 2
make[1]: Leaving directory '/home/chenmx/Software/CESAR2.0/kent/src/hg/mouseStuff'
make: *** [makefile:22: userApps] Error 2

image

perl system call using bash specific feature

It seems that line 192 of annotateGenesViaCESAR.pl is trying to use 'set -o pipefail' which is a bash shell function and will not work on sh or dash (the default in Ubuntu). I fixed that line by changing it to my $mafExtractCall = "/bin/bash -c 'set -o pipefail; mafExtract -region=$chrRef:$refStart-$refStop $mafIndex stdout|mafSpeciesSubset std
in speciesList=$speciesList,$reference species.lst=NULL $mafFile'";

-speciesList is not a valid option Error

Hello,

I have some troubles, on Ubuntu 19.10

I tried to run "Mini example of annotating 2 genes in one query genome", however some issues about OpenSSL had occurred. (This problem had occurred on Ubuntu 17.10, too)

The issue seems to be from mafSpeciesSubset,

$ ~/Bioinformatics/tools/CESARTest/CESAR2.0/tools/mafSpeciesSubset

/home/kim/Bioinformatics/tools/CESARTest/CESAR2.0/tools/mafSpeciesSubset: /lib/x86_64-linux-gnu/libssl.so.10: version `libssl.so.10' not found (required by /home/kim/Bioinformatics/tools/CESARTest/CESAR2.0/tools/mafSpeciesSubset)
/home/kim/Bioinformatics/tools/CESARTest/CESAR2.0/tools/mafSpeciesSubset: /lib/x86_64-linux-gnu/libcrypto.so.10: version `libcrypto.so.10' not found (required by /home/kim/Bioinformatics/tools/CESARTest/CESAR2.0/tools/mafSpeciesSubset)

So, I downloaded latest UCSC tools and replaced.
(Downloaded from http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/)

The error about openSSL seems to be solved, but a new issue had occurred,

kim@kim-Ubuntu:~/Bioinformatics/tools/CESAR2.0/extra/miniExample$ annotateGenesViaCESAR.pl SNRNP25 hg38_oryAfe1.bb twoGenes.gp.forCESAR hg38 oryAfe1 CESARoutput 2bitDir $profilePath -maxMemory 1

Processing gene 'SNRNP25'
-specieslist is not a valid option
Error running '/bin/bash -c 'set -o pipefail; mafExtract -region=chr16:53989-54058 hg38_oryAfe1.bb stdout|echo stdin|mafSpeciesSubset stdin specieslist=oryAfe1,hg38 species.lst=NULL /dev/shm/exon.maf.C5j9jZxF0''

How do I solve this problem?

Thanks,
Kim

Missing stop codon in reference (similar to closed Issue #6)

Hi Michael,

When running ./jobList I get the repetitive error of WARNING! src/Model.c:254 multi_exon(): Couldn't find declared stop codon in reference 0. (counting from zero)

I realize this was addressed in the closed issue #6 but I don't understand where to add the missing stop codon in the reference, or more specifically I'm not sure which file to add the stop codon to?

Thank you for your time and help!

Sara

Hi,

I just tested the mini example of cesar2 according to the recommended steps and found an error reported below. Is this an error for the gb file? Thanks for your attentions.

Incorrent format
'POLR3K chr16 - 46406 53628 47429 53586 47429-47557,51557-51645,53475-53586' should have minimum 10 fields, found only 8
Can't locate Scalar/Util/Numeric.pm in @inc (@inc contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /home/csy/Downloads/CESAR2.0/tools/annotateGenesViaCESAR.pl line 8.
BEGIN failed--compilation aborted at /home/csy/Downloads/CESAR2.0/tools/annotateGenesViaCESAR.pl line 8.
./jobListGenePred: line 1: /home/csy/projects/cesarTest/geneAnnotation/echo: No such file or directory
sed: -e expression #1, char 7: unterminated s' command sed: -e expression #1, char 4: unterminated s' command
wrongs # args
/home/csy/Downloads/CESAR2.0/tools/bed2GenePred.pl - Convert CESAR2.0 annotated exons (as elements in a bed file) to a genePred file

CESAR on plant genomes

Hello,

I am wondering if, as is, CESAR will work as well on plant genomes.
More specifically, I am thinking to use it to predict coding regions of a new grass species using a closely-related reference grass (brachypodium, barley, ...) whose annotation is known. How much does relatedness really matter? E.g. is it better to use a very close good reference or rather a more distant well-annotated species?

Also, does CESAR's output constitute a final gene annotation or is it more to be used as another evidence for creating final gene models from other sets (ab initio prediction, EST alignment, ...)?
Thanks,
Dario

formatGenePred.pl produces zero sized files

I ran formatGenePred.pl ${inputGenes} ${inputGenes}.forCESAR ${inputGenes}.discardedTranscripts after doing the necessary pre-processing steps. All relevant shell variables appear to exist, but, the output files are zero-sized. I double checked that the genepred file was valid with genePredCheck and there were no failures. No errors are printed to stdout so no inkling as to what is going wrong.

Two error occur when I use "make" in the dictionary "CESAR2.0/kent/src"

Two errors as below:
#######
hmac.c:11:10: fatal error: openssl/hmac.h: No such file or directory
11 | #include "openssl/hmac.h"
| ^~~~~~~~~~~~~~~~
compilation terminated.
make[1]: *** [../inc/common.mk:420: hmac.o] Error 1
make[1]: Leaving directory '/root/software/CESAR2.0/kent/src/lib'
make: *** [makefile:9: topLibs] Error 2
#######
WHat should I do?

one error occurred when I used the miniExample dir

Hi, professor.
one error occurred when I used the miniExample dir:

annotateGenesViaCESAR.pl POLR3K hg38_oryAfe1.bb twoGenes.gp.forCESAR hg38 oryAfe1 CESARoutput 2bitDir $profilePath -maxMemory 1
Processing gene 'POLR3K'
-speciesList is not a valid option
Error running '/bin/bash -c 'set -o pipefail; mafExtract -region=chr16:53475-53586 hg38_oryAfe1.bb stdout|mafSpeciesSubset stdin NULL /dev/shm/exon.maf.qaUzT18Za -speciesList=oryAfe1,hg38''

twoBitToFa error

Hi,

I am trying to run"annotateGenesViaCESAR.pl ....", but the following error is :

Argument "" isn't numeric in numeric gt (>) at /data/zhangbo/software/CESAR2.0-master/tools/annotateGenesViaCESAR.pl line 329.
Argument "" isn't numeric in numeric gt (>) at /data/zhangbo/software/CESAR2.0-master/tools/annotateGenesViaCESAR.pl line 329.
Argument "" isn't numeric in numeric gt (>) at /data/zhangbo/software/CESAR2.0-master/tools/annotateGenesViaCESAR.pl line 329.
Argument "" isn't numeric in numeric gt (>) at /data/zhangbo/software/CESAR2.0-master/tools/annotateGenesViaCESAR.pl line 329.
Argument "" isn't numeric in numeric gt (>) at /data/zhangbo/software/CESAR2.0-master/tools/annotateGenesViaCESAR.pl line 329.
file /dev/shm/species.bed.0aHTuQ doesn't appear to be in bed format. At least 4 fields required, got 3
Error running twoBitToFa call for species "XXX"

Thanks for the reply!

Genome alignment

Hi,
I have a doubt about the genome alignment input. I see that in the workflow CESAR2 needs a genome alignment (reference genome and query genome) in maf format. Which software we can use for this task? Do you have any suggestion?
Thanks in advance
Max

mafIndex: Anc00refChr0 not found

Hello,

I am trying to index my MAF file (output of CACTUS whole-genome alignment HAL converted to MAF using HAL Tools) but get the error Anc00refChr0 is not found in chromosome sizes file \ command exited with 255: bedToBigBed temppLlCQT 2bitdir galloanserae.bb -type=bed4+1 when I run tools/mafIndex galloanserae.maf galloanserae.bb -chromSizes=2bitdir

I've tried pointing the -chromSizes option at multiple directories and chrom.sizes files in case I wasn't understanding that option, but receive the same error. If there is any feedback or advice for how I can move forward, it would be greatly appreciated!

Thank you for your time,

Sara

Annonated genes have structural error.

Hi,Mr Hiller
I have finished the pipeline.I found there are a number of genes with in-fram stop codon or lack of init codon in final result.Is that OK and i just do a filtering or there are something wrong in my process.
I used cattle genome as the reference.The gb file was generated by the filtered NCBI gff.The genome alignment was done by lastz according the UCSC 'whole genome alignment how to' tutorial.

cesar install concerns

Hi Team,

Before building to bioconda package, i would need to install it manually and check. I followed below

wget https://github.com/hillerlab/CESAR2.0/archive/refs/tags/1.0.tar.gz
cd CESAR2.0-1.0
make 
fatal: not a git repository (or any parent up to mount point /) 
<< throwing this error >>>

Again i tried with changing to kent/src

cd kent/src
make
cc -O -g -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_GNU_SOURCE -DMACHTYPE_x86_64   -Wall -Wformat -Wimplicit -Wreturn-type -Wuninitialized -I../inc -I../../inc -I../../../inc -I../../../../inc -I../../../../../inc -I../htslib   -o pngwrite.o -c pngwrite.c
pngwrite.c:7:10: fatal error: png.h: No such file or directory
    7 | #include "png.h"   // MUST come before common.h, due to setjmp checking  in pngconf.h
      |          ^~~~~~~
compilation terminated.
make[1]: *** [../inc/common.mk:420: pngwrite.o] Error 1

Please advise below

  1. What will be the prerequisite software needed to run this software?
  2. How to test once this is installed successfully -> for eg:- ./cesar --help

Because i would need to build all these in the bioconda package building recipe to make this work.

Thanks
Jay

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.