Giter Site home page Giter Site logo

eggnogdb / eggnog-mapper Goto Github PK

View Code? Open in Web Editor NEW
531.0 17.0 105.0 97.2 MB

Fast genome-wide functional annotation through orthology assignment

Home Page: http://eggnog-mapper.embl.de

License: GNU Affero General Public License v3.0

Python 100.00%
annotations orthology-assignments functional-annotation genomics

eggnog-mapper's Introduction

Build Status European Galaxy server

Overview

EggNOG-mapper is a tool for fast functional annotation of novel sequences. It uses precomputed orthologous groups and phylogenies from the eggNOG database (http://eggnog5.embl.de) to transfer functional information from fine-grained orthologs only.

Common uses of eggNOG-mapper include the annotation of novel genomes, transcriptomes or even metagenomic gene catalogs.

The use of orthology predictions for functional annotation permits a higher precision than traditional homology searches (i.e. BLAST searches), as it avoids transferring annotations from close paralogs (duplicate genes with a higher chance of being involved in functional divergence).

Benchmarks comparing different eggNOG-mapper options against BLAST and InterProScan can be found here.

EggNOG-mapper is also available as a public online resource: http://eggnog-mapper.embl.de

Documentation

https://github.com/jhcepas/eggnog-mapper/wiki

Citation

If you use this software, please cite:

[1] eggNOG-mapper v2: functional annotation, orthology assignments, and domain 
    prediction at the metagenomic scale. Carlos P. Cantalapiedra, 
    Ana Hernandez-Plaza, Ivica Letunic, Peer Bork, Jaime Huerta-Cepas. 2021.
    Molecular Biology and Evolution, msab293, https://doi.org/10.1093/molbev/msab293

[2] eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated
    orthology resource based on 5090 organisms and 2502 viruses. Jaime
    Huerta-Cepas, Damian Szklarczyk, Davide Heller, Ana Hernández-Plaza, Sofia
    K Forslund, Helen Cook, Daniel R Mende, Ivica Letunic, Thomas Rattei, Lars
    J Jensen, Christian von Mering, Peer Bork Nucleic Acids Res. 2019 Jan 8;
    47(Database issue): D309–D314. doi: 10.1093/nar/gky1085 

Please, cite also the underlying algorithm used for the search step of eggNOG-mapper, and Prodigal if it was used for gene prediction:

[HMMER] Accelerated Profile HMM Searches. 
        Eddy SR. 2011. PLoS Comput. Biol. 7:e1002195.

[DIAMOND] Sensitive protein alignments at tree-of-life scale using DIAMOND.
          Buchfink B, Reuter K, Drost HG. 2021.
          Nature Methods 18, 366–368 (2021). https://doi.org/10.1038/s41592-021-01101-x

[MMSEQS2] MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.
          Steinegger M & Söding J. 2017. Nat. Biotech. 35, 1026–1028. https://doi.org/10.1038/nbt.3988

[PRODIGAL] Prodigal: prokaryotic gene recognition and translation initiation site identification.
           Hyatt et al. 2010. BMC Bioinformatics 11, 119. https://doi.org/10.1186/1471-2105-11-119.

eggnog-mapper's People

Contributors

anahrnndz avatar beatrizserrano avatar caleb-easterly avatar cantalapiedra avatar douglasgscofield avatar jhcepas avatar jj-umn avatar keegan-evans avatar luispedro avatar lukasjelonek avatar mgalardini avatar nextgenusfs avatar olgabot avatar seretol avatar silask avatar urkary avatar varir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eggnog-mapper's Issues

Select best OG from output table

Dear All,

First congrats for the emapper tool. It is fast, delivers comprehensive results (as far is I checked) and the website look like it has great potential.

I have ran emapper with about 26,000 rice proteins and standard settings and got about 90% assigned. Now from the output, how can I proceed ? Specifically I want to put human readable terms on my proteins

There are six OGs for each of the identified proteins ? I understand that in the download sections there are mapping files for each of the @xyzNOGs. But which of those is the best OG ? There must be some kind of score or hierarchy to choose from. Or maybe via the seed ortholog ?

Gruß,

Stefan

error when running emapper

Hi there, when I was trying on a test file

$ cat test.fa

2512392153
MVLMTVALSLAAKTIAVAVIASQVAQAIAQLTAAKTSCNY

but got this error, could you advice?

$ ./runEggnog.sh test.fa

emapper-0.12.6

./emapper.py -i test.fa --output test -d bact

Sequence mapping starts now!
Traceback (most recent call last):
File "path/eggnog-mapper/emapper.py", line 897, in
main(args)
File "path/db/eggnog-mapper/emapper.py", line 220, in main
dump_hmm_matches(args.input, hmm_hits_file, dbpath, port, scantype, idmap, args)
File "path/db/eggnog-mapper/emapper.py", line 383, in dump_hmm_matches
print >>OUT, '\t'.join([name] + ['-'] * (len(hits_header) - 1))
TypeError: sequence item 0: expected string, float found

It seems running fine for other sequences, such as the following:

seqs
MSSTPVFDLQPLGVKPFESEREFWDSSSTLKSLYKYAQACSASPHAVLVCSLARVCALVPSTIVIPPFIGGSPATLNFAAITIGEPGGGKGLAMSVAKELIQFPDTVWTGQTASGEAIPRAYVHYEVEDNTGKADDGEGEKRERNRTRLKYSKANAFFDIPEVREFAANTSRVGSTLVPVLVKLIDGTTKLGCTTKCESNTLQVPEYGYRAAIVVGVQPANIRDITSHDGTGLPQRFLWTDVFDLDAPEVADLPERTAPSEWLPFKADAFNQYACSMRDLNSLLENGNELHRYELNYPKGVPESVKQNARKKLRREDDAKHGHSQLLVLKTAAIIALFLQEDKSNLLNVTAKNIQQAEWLVRQSLETVKNGLDEYTQTKNAETYRQEYDTKGDEEKEPLIQNVERSIIRKLEKSNGQGISNRDINRMLNVEKRPAISDALAALIKSKTIQEKDGKYYLTK
seq2
MKKPPFNTVSHEGGWDNSSNYTPLPCDYNFAKMSTEQQLDCFADGFARAIYEPFNLAVGSDVERRCRMFALAALSSVTEGCEKHKISAKGSDVDALAGIIILAMACPNAVVTIDFSGKIQNGVIRLNTPEHLSRFGEIECLPMSRAWFLENNPRVHAAKIREHLLEVAPLHKESSLTEMLEECDKDCWRNLMKPPSMEVTK

Thank you very much!

add cache system

store hashes of computed sequences and their searching results.

Mammalian specific example - database not found

Hi,
I am trying to to run from the documentation:

python emapper.py -i test/p53.fa --output p53_maNOG -d maNOG

And get this output:

[False, False, False, False, False]
Database maNOG not present. Use download_eggnog_database.py to fetch it
Traceback (most recent call last):
File "emapper.py", line 1082, in
main(args)
File "emapper.py", line 218, in main
host, port, dbpath, scantype, idmap = setup_hmm_search(args)
File "emapper.py", line 64, in setup_hmm_search
raise ValueError('Database not found')
ValueError: Database not found

Error loading data

Since yesterday afternoon I get the message "Error loading data" when trying to upload fasta files. Are there any issues with the eggnog-mapper at the moment?

Efficiency considerations for OG_fasta

Hi,

OG_fasta.tar.gz contains tens of thousands of small FASTA files, which is very hard to decompress. In fact I had to skip this simply because it has been hours and only a fraction of the files were decompressed.

Looking at the code and trying to figure out how it is used, it became clear to me that the current solution is to improve the random access efficiency during the processing of the hits file (i.e., https://github.com/jhcepas/eggnog-mapper/blob/master/emapper.py#L559).

I assume only a fraction of these FASTA files are in fact used to process hits for an average run. If I am correct, then I think it may be worthwhile to try to avoid the tremendous I/O overhead and file system stress by describing the contents of these FASTA files in an HDF5 file, and by creating necessary reads on-the-fly.

My 2 cents.

Best,

'NoneType' object is not iterable

Dear jhcepas, I have the tree files using -d bact and its ok, but when use viruses db the annotation step crash:

python emapper.py -d viruses -i gingi.faa --output eggnogBact --override --cpu 16
Sequence mapping starts now!
26 173.259658337 0.15 q/s
51 173.265169144 0.29 q/s
76 173.268593311 0.44 q/s
101 173.271979332 0.58 q/s
126 173.274962902 0.73 q/s
151 173.279873848 0.87 q/s
176 173.283407927 1.02 q/s
201 173.28731966 1.16 q/s
226 173.290519714 1.30 q/s
processed queries:238 total_time:173.292301893 rate:1.37 q/s
Functional annotation starts now!
Traceback (most recent call last):
File "/home/ecastron/programs/eggnog-mapper/emapper.py", line 420, in
main(args)
File "/home/ecastron/programs/eggnog-mapper/emapper.py", line 246, in main
level, nm, desc, cats = annota.get_og_annotations(hitname)
File "/home/ecastron/programs/eggnog-mapper/eggnogmapper/annota.py", line 31, in get_og_annotations
level, nm, desc, cat = db.fetchone()
TypeError: 'NoneType' object is not iterable

I don't know if is a bug or I did something wrong.
regards

Statistics of annotation file off by 1

Hello and kudos to @jhcepas for your work.

After running eggnog-mapper I noticed that the statistics reported at the end of the *.annotations file are +1 compared to what is read in the *.seed_orthologs file.

Based on the loop and the increment used in the annotate_hits_file function, I guess the variable qn should be reported instead of qn + 1 or qn+1 in the following lines:

https://github.com/jhcepas/eggnog-mapper/blob/c436da5779b333531038eacfaa0a0d4255696544/emapper.py#L717-L727

I suppose the annotate_hits_file_sequential function (which is not used at the moment, isn't it?) has the same issue (https://github.com/jhcepas/eggnog-mapper/blob/c436da5779b333531038eacfaa0a0d4255696544/emapper.py#L842-L848) as well as annotate_hmm_matches (https://github.com/jhcepas/eggnog-mapper/blob/c436da5779b333531038eacfaa0a0d4255696544/emapper.py#L461-L467).

annotate_hits_file function IndexError

Hi,
This option will be nice for my transcriptome annotation. When I run the example, it crashed with an error.
python emapper.py -i test/p53.fa -d euk -o p53

# emapper-0.12.6
# ./emapper.py -i test/p53.fa -d euk -o p53
Sequence mapping starts now!
Processed queries:1 total_time:31.3815100193 rate:0.03 q/s
Hit refinement starts now
Processed queries:1 total_time:11.6989130974 rate:0.09 q/s
Reading HMM matches
Functional annotation of refined hits starts now
Traceback (most recent call last):
File "./emapper.py", line 897, in <module>
main(args)
File "./emapper.py", line 259, in main
annotate_hits_file(seed_orthologs_file, annot_file, hmm_hits_file, args)
File "./emapper.py", line 625, in annotate_hits_file
match_levels = set([nog.split("@")[1] for nog in match_nogs])
IndexError: list index out of range

p53.emapper.seed_orthologs:
#query_name best_hit_eggNOG_ortholog best_hit_evalue best_hit_score
9606.p53 9598.ENSPTRP00000014836 2.5e-277 910.6

p53.emapper.hmm_hits:
# #query_name hit evalue sum_score query_length hmmfrom hmmto seqfrom seqto query_coverage
9606.p53 euNOG.ENOG410IITK.meta_raw 7.7e-217 723.3 392 1 373 1 392 0.997448979592

python version: Python 2.7.12

Hope for your help.
Regards.

not a gzipped file

Hi @jhcepas, the new version give an error with a gzipped file, what can I do?
the code are:

emapper.py -i testCOG0515.fa -o test -d bact --cpu 16 --override
Sequence mapping starts now!
Processed queries:5 total_time:21.0365080833 rate:0.24 q/s
Hit refinement starts now
Traceback (most recent call last):
File "/home/ecastron/programs/eggnog-mapper/emapper.py", line 684, in
main(args)
File "/home/ecastron/programs/eggnog-mapper/emapper.py", line 198, in main
refine_matches(refine_file, hits_file, args)
File "/home/ecastron/programs/eggnog-mapper/emapper.py", line 355, in refine_matches
for line in gopen(OGLEVELS_FILE)])
File "/home/ecastron/programs/python/lib/python2.7/gzip.py", line 446, in readline
c = self.read(readsize)
File "/home/ecastron/programs/python/lib/python2.7/gzip.py", line 252, in read
self._read(readsize)
File "/home/ecastron/programs/python/lib/python2.7/gzip.py", line 287, in _read
self._read_gzip_header()
File "/home/ecastron/programs/python/lib/python2.7/gzip.py", line 181, in _read_gzip_header
raise IOError, 'Not a gzipped file'
IOError: Not a gzipped file

cheers

Update diamond DB

Hello,
I´m running diamond 0.38 and getting a problem with the default eggNOD diamond db

Error: Database was built with a different version of diamond as is incompatible.
Do you have the link to the same diamond file in fasta file so i can recompile the db.

Thanks,
david

implement sequence thresholds post orthology prediction

That would allow apply common blast filters on top of orthology restrictions. Currently only taxonomic restrictions can be applied.

Implementation would consist on phmmer searches occurring always at the NOG level, so all orthologs are evaluated with an alignment.

potential filters are:

  • sequence identity
  • query coverage
  • branch distance (average from phylogenies)

Diamond vs Hmmer - very different results

Hi,

I have observed a very strange behaviour when running diamond compared to hmmer mode - attached results for an input file with 2000 genes. Briefly, diamond produces very few hits and they all fall into a narrow category of hits. That became very obvious when I ran diamond mode on 50k genes and 99% matched a set of ~300 seed_eggNOG_ortholog (not expected; not shown here).

Is that software version problem or am I mis-interpreting the results? Database files were downloaded on 2017.03.29.

Do you need additional information to help with this question?

Best,

Damian

Emapper.diamond.txt
Emapper.hmmer.txt

ValueError in refinement step

I got this error in the refinement step:

94500 2044.13207722 46.23 q/s (refinement)
Traceback (most recent call last):
File "emapper.py", line 1086, in
main(args)
File "emapper.py", line 260, in main
annotate_hits_file(seed_orthologs_file, annot_file, hmm_hits_file, args)
File "emapper.py", line 679, in annotate_hits_file
for result in pool.imap(annotate_hit_line, iter_hit_lines(seed_orthologs_file, args)):
File "/usr/lib/python2.7/multiprocessing/pool.py", line 668, in next
raise value
ValueError: invalid literal for float(): 8.9e-73# Sat Jun 24 10:21:42 2017

Cannot run search with HMMER against "bact" (nor "--usemem")

Hi,
This tool seem really useful, but I'm having issues testing your tool with "bact" database.
I tried running:

Nevertheless I thing it has to do with a bug in:
if not args.mode == 'diamond' and not pexists(EGGNOG_DMND_DB):

I believe "not" should be removed. Could you confirm this?

[fail]python emapper.py -i ../file.fa --output ../file.fa.bact --cpu 5 -d bact
DIAMOND database data/eggnog_proteins.dmnd not present. Use download_eggnog_database.py to fetch it
[fail]python emapper.py -i ../file.fa --output ../file.fa.bact --cpu 5 -d ./data/hmmdb_levels/bact_50/bact_50.hmm
DIAMOND database data/eggnog_proteins.dmnd not present. Use download_eggnog_database.py to fetch it

[succeded after removing "not"]
python emapper.py -i ../file.fa --output ../file.fa.bact --cpu 5 -d ./data/hmmdb_levels/bact_50/bact_50.hmm

-Furthermore, if I use "--usemem" option it gets stucked repeating the "Waiting..." line:
[fail] python emapper.py -i -i ../file.fa --output ../file.fa.bact --cpu 5 --usemem -d ./data/hmmdb_levels/bact_50/bact_50.hmm

emapper-unknown

./emapper.py -i -i ../file.fa --output ../file.fa.bact --cpu 5 --usemem -d ./data/hmmdb_levels/bact_50/bact_50.hmm

Loading server at localhost, port 53000-53001
Waiting for server to become ready... localhost 53000
Waiting for server to become ready... localhost 53000
...
Don't know if in this case might be something I had configure in my computer.

Many thanks
Xavi

Setting up large analyses link broken

Hi Jamie,

I noticed on the wiki that the setting up large analyses link is broken - or maybe its just under construction. Are there plans to add this page, I would be very interested! :)

C

What are the origins of the Gene Names?

Hi @jhcepas, great job on eggnog and the new mapper utility. I'm interested in using the results from eggnog-mapper for some automated functional annotation as part of my genome annotation tool. I'm curious where you are pulling the Gene Names from in the eggnog-mapper annotations output, are they all UniProtKB gene names? I'm working on trying to curate a mapping file of gene names to product definitions that will pass GenBank annotation rules. The eggnog descriptions are useful, but most would not pass GenBank submission requirements. Basically I'd just like to know where you are getting the names from so I can cross reference to make sure I'm recovering the proper product definition. I'd like to avoid doing another sequence alignment based check if I can.
Thanks,
Jon

AttributeError: 'Namespace' object has no attribute 'db'

this is when I try to download databases (all)
python download_eggnog_data.py -y all

Traceback (most recent call last):
File "download_eggnog_data.py", line 87, in
if args.db != 'none':

it solved changing "args.db" for "args.dbs" in line 87 and works

cheers

global name 'errno' is not defined

recently installed

~/eggnog-mapper-0.11.0/emapper.py -i testCOG0515.fa -o output_bact -d bact --cpu 16

Traceback (most recent call last):
File "/home/ecastron/programs/eggnog-mapper-0.11.0/emapper.py", line 895, in
main(args)
File "/home/ecastron/programs/eggnog-mapper-0.11.0/emapper.py", line 197, in main
print '# ', get_version()
File "/home/ecastron/programs/eggnog-mapper-0.11.0/eggnogmapper/common.py", line 114, in get_version
if e.errno == errno.ENOENT:
NameError: global name 'errno' is not defined

same with the version 0.12.1
regards

Optimized bacterial database error

Hi while running in default mode a set of proteins against optimzed bacteria (hmm) i get an error
...
Reading idmap /home/david/work/sources/eggnog-mapper/data/hmmdb_levels/bact_50/bact_50.hmm.idmap
159207 names loaded
Sequence mapping starts now!
Processed queries:1927 total_time:942.721111059 rate:2.04 q/s
refined hits not available for custom hmm databases.
Reading HMM matches
Functional annotation of refined hits starts now
error

It seems it is trying to refine hits however those are not available for custom databases ?? The database i have used is the optimized bacteria dowloaded with the download script. The annotations file does not display any annotation ??
...

UnboundLocalError: local variable 'annot_levels' referenced before assignment

  File "[...]/eggnog-mapper-0.12.7/emapper.py", line 638, in annotate_hits_file
    all_orthologies = annota.get_member_orthologs(best_hit_name, target_levels=annot_levels)
UnboundLocalError: local variable 'annot_levels' referenced before assignment

When tax_scope option is set to auto (default) and the best hit NOG gives match_levels equal to the general taxonomic level ['NOG'], the latter is not found in the list TAXONOMIC_RESOLUTION and result in an undefined annot_levels.

arguments commented out of annotate.py

#parser.add_argument('--go', dest='go', action='store_true')    
#parser.add_argument('--kegg', dest='kegg', action='store_true')
#parser.add_argument('--desc', dest='desc', action='store_true')
#parser.add_argument('--smart', dest='smart', action='store_true')

These arguments do not work any more from annotate.py.
I was trying to do the test case as per you instructions
"python annotate.py testCOG0515.hits.tsv --go --kegg --smart --desc > testCOG0515.annotations.tsv"
But it fails.

Issue with missing seed_orthologs_file

Hi,

I run emapper like this:

emapper.py -i aa_sequences.fa --output aa_sequences_eggnog -d bact --no_refine

And it crashed like this:

#  emapper-0.12.6
# ./emapper.py  -i aa_sequences.fa --output aa_sequences_eggnog -d bact --no_refine
Loading server at localhost, port 51500-51501
26 7.76936841011 3.35 q/s
(...)
32201 8095.21844888 3.98 q/s
32226 8099.19744229 3.98 q/s
32251 8104.59060812 3.98 q/s
(...)
Waiting for server to become ready... localhost 51500
Reading idmap /automounts/workspace/workspace/meren/EGGNOG/eggnog-mapper/data/hmmdb_levels/bact_50/bact_50.hmm.idmap
159207 names loaded
Sequence mapping starts now!
 Processed queries:32265 total_time:8106.77720094 rate:3.98 q/s
Reading HMM matches
Functional annotation of refined hits starts now
Traceback (most recent call last):
  File "/workspace/meren/EGGNOG/eggnog-mapper/emapper.py", line 897, in <module>
    main(args)
  File "/workspace/meren/EGGNOG/eggnog-mapper/emapper.py", line 259, in main
    annotate_hits_file(seed_orthologs_file, annot_file, hmm_hits_file, args)
  File "/workspace/meren/EGGNOG/eggnog-mapper/emapper.py", line 600, in annotate_hits_file
    for line in open(seed_orthologs_file):
IOError: [Errno 2] No such file or directory: 'aa_sequences_eggnog.emapper.seed_orthologs'

Best,

SyntaxError: invalid syntax

Dear jhcepas,

I downloaded the github copy, and after setting up the euk database and associated files following the basic instruction, I tried running a test and got this? I'm not sure why its throwing an error?

python2.7 ./emapper.py -d euk -i test/polb.fa -out polb_bact_euk
Traceback (most recent call last):
  File "./emapper.py", line 16, in <module>
    from eggnogmapper import search
  File "/home/ubuntu/software/eggnog-mapper/eggnogmapper/search.py", line 20, in <module>
    from . import annota
  File "/home/ubuntu/software/eggnog-mapper/eggnogmapper/annota.py", line 195
    if g and g.split('|')[2] not in excluded_gos])
                                         ^
SyntaxError: invalid syntax

bioconda integration

Is there any interest in creating a bioconda package and biocontainer for eggnog?
I could help in this process if you like.

hmm scan bug?

Hi,
I have just installed the last version.
I have an error while trying to run the example (./emapper.py -i test/polb.fa --output polb_viruses -d viruses) :
"Error: Unrecognized format, trying to open hmm file /eggnog-mapper-0.99.2/data/hmmdb_levels/viruses_hmm/viruses_hmm.all_hmm for reading."
I have tried to run hmmscan by myself with the downloaded files and it works.
I don't find where this error is produced, this error message doesn't come from emapper.py...

Do you have any clue?

Best,

Yann

Not getting outputs from server mode?

I run the following command in one terminal:
python emapper.py -d bact --cpu 10 --servermode

Then I run a python script that calls subprocess and the command:
python emapper.py -d bact:localhost:51500 -i file -o outfile

I have 330 files to process so my script just subs in the file and outfile name for each file in the directory:
(for file in directory: do command) I know the command works as I tried it manually before running it in the program.

I get the typical Sequence mapping starts now!

And then when the script has finished the process for all the files, there is no output to be seen. When I ran just the diamond or just the hmmr search from my command line, I was getting output, the process was just far too slow.

Any thoughts why I'm getting no results?

Version info "unknown" and not printed consistently in output files

Hello,

I noticed the version info is printed as unknown when using release 0.12.7 or a git cloned version with removed .git folder.

It seems related to a non-functioning import from eggnogmapper/common.py. Removing/commenting line 1 (from __future__ import absolute_import) or modifying line 15 to from .version import __VERSION__ or from eggnogmapper.version import __VERSION__ fixes the problem.

Furthermore, the header data at the start of the output files differs for the *.annotations file in that it is missing the emapper version and an empty comment line before the comment header.

To fix this, I suppose annotate_hits_file (and annotate_hits_file_sequential) should use the get_call_info function.

Regards

add all DB options

I'm working with a metagenome sample and will be nice use this option.
now only is possible ran -db bact|euk|arch separately

regards

Could not run emapper

I download eggnog-mapper from github, and download the bacteria database with the following command line

python download_eggnog_data.py bact

I then run emapper using the test data:

python emapper.py -i test/polb.fa --output polb_back -d bact

It gives me the following error message:
Error: bad file format in HMM file data/hmmdb_levels/bact_50/bact_50.hmm

When i list out the files under data/hmmdb_levels/bact_50, it contains
"bact_50.hmm.h3f" "bact_50.hmm.h3i" "bact_50.hmm.h3m" "bact_50.hmm.h3p" "bact_50.hmm.idmap" "bact_50.info" "bact_50.pkl"

Does anyone know if it is a bug in emapper? Or if I didn't download the data correctly?! Any help is appreciated. Thanks!!

Job failed error?

Hello,

I am trying to do a rough annotation on a phage genome, but it keeps failing due to an error found?
I have pasted the information below from what I believe is the error report found in MM_fZ9og3.1.err.txt:

Traceback (most recent call last):
File "/home/huerta/emapper_production/emapper.py", line 1082, in
main(args)
File "/home/huerta/emapper_production/emapper.py", line 221, in main
dump_hmm_matches(args.input, hmm_hits_file, dbpath, port, scantype, idmap, args)
File "/home/huerta/emapper_production/emapper.py", line 382, in dump_hmm_matches
base_tempdir=args.temp_dir)):
File "/home/huerta/emapper_production/eggnogmapper/search.py", line 284, in hmmscan
"Inconsistent qlen when parsing hmmscan output")
ValueError: Inconsistent qlen when parsing hmmscan output
exitcode: 1

Can anyone help me fix this issue? I don't know why I'm getting this error.
Thank you so much!

Make a pkl

Dear jhcepas, I want to include my own database and I only lack a pkl file, how can I make it?.

regards

Error running basic example

I am trying to run:
python emapper.py -i test/polb.fa --output polb_bact -d bact

after having downloaded using:
python download_eggnog_data.py -y euk bact arch viruses maNOG

This is what data/hmmdb_levels looks like:
arch_1 bact_50 euk_500 maNOG_hmm viruses_hmm

I get the following error:
[False, False, False, False, False] Database bact not present. Use download_eggnog_database.py to fetch it Traceback (most recent call last): File "emapper.py", line 1082, in <module> main(args) File "emapper.py", line 218, in main host, port, dbpath, scantype, idmap = setup_hmm_search(args) File "emapper.py", line 64, in setup_hmm_search raise ValueError('Database not found') ValueError: Database not found

Thanks!

ERROR 404 when downloading bact database

Dear jhcepas, the new version of your mapper are amazing!, I could download all databases (arch, euk, viruses), but bact give me the error (complete sequence):

Download 1 HMM database(s): bact? [y,n] y
Downloading bact HMM database " at /home/ecastron/programs/eggnogmapper/data/hmmdb_levels/bact_hmm ...
mkdir -p /home/ecastron/programs/eggnog-mapper/data/hmmdb_levels; cd /home/ecastron/programs/eggnog-mapper/data/hmmdb_levels; wget -N -nH --user-agent=Mozilla/5.0 --relative -r --no-parent --reject "index.html*" --cut-dirs=4 -e robots=off http://beta-eggnogdb.embl.de/download/eggnog_4.5/eggnog-mapper-data/hmmdb_levels/euk_50/
--2016-07-05 15:10:48-- http://beta-eggnogdb.embl.de/download/eggnog_4.5/eggnog-mapper-data/hmmdb_levels/euk_50/
Resolving beta-eggnogdb.embl.de... 194.94.44.96
Connecting to beta-eggnogdb.embl.de|194.94.44.96|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-07-05 15:10:49 ERROR 404: Not Found.

checking the code, the script search for euk_50, should be bact_50?.

regards

Emapper online - NOG name instead of top hit? KO instead of map?

Hi Jamie,

I was wondering if there is a mapping file of eggNOG where I could relate the top hit back to the actual NOG name + number (rather than the accession number for that particular protein).

Furthermore, is there a mapping file to go from the eggNOG number to the KEGG orthology rather than the map number?

Are these options implemented in the stand-alone version?

Thank you very much for your help :)
Courtney

emapper asks for DIAMOND db even when it is not necessary

I run this:

$ download_eggnog_data.py euk bact arch viruses

And downloaded everything except 'OG fasta' and 'diamond database'. However, when I run emapper, this is what I get:

$ emapper.py -i aa_sequences.fa --output emapper_output --usemem -d bact -m hmmer
DIAMOND database data/eggnog_proteins.dmnd not present. Use download_eggnog_database.py to fetch it

Best,

Inconsistent qlen when parsing hmmscan output

Hi jhcepas, I have a proteins of coli (and some pseudo genes), and im using the 0.12.6 version of this mapper but no output is write, can you help me?.

./emapper.py -i selected.faa -o output_bact -d bact --cpu 16

Traceback (most recent call last):
File "/home/ecastron/programs/eggnog-mapper-0.12.6/emapper.py", line 897, in
main(args)
File "/home/ecastron/programs/eggnog-mapper-0.12.6/emapper.py", line 220, in main
dump_hmm_matches(args.input, hmm_hits_file, dbpath, port, scantype, idmap, args)
File "/home/ecastron/programs/eggnog-mapper-0.12.6/emapper.py", line 376, in dump_hmm_matches
cpus=args.cpu)):
File "/home/ecastron/programs/eggnog-mapper-0.12.6/eggnogmapper/search.py", line 282, in hmmscan
"Inconsistent qlen when parsing hmmscan output")
ValueError: Inconsistent qlen when parsing hmmscan output

regards
selected.faa.zip

Make VirusDB

Dear:
i'm working in virusDB to pull it in your repository, but server.py also need a pkl file, how can i make it?

regards

Error running diamond

after loading git module:

~/programs/eggnog-mapper-0.11.0/emapper.py -i testCOG0515.fa -o output_bact -d bact --cpu 16 -m diamond

./emapper.py -i testCOG0515.fa -o output_bact -d bact --cpu 16 -m diamond --override

/home/ecastron/programs/eggnog-mapper-0.11.0/bin/diamond blastp -d /home/ecastron/programs/eggnog-mapper-0.11.0/data/eggnog_proteins.dmnd -q /home/ecastron/programs/eggnog-mapper-0.11.0/test/testCOG0515.fa --more-sensitive --threads 16 -e 0.001000 -o /home/ecastron/programs/eggnog-mapper-0.11.0/emappertmp_dmdn_v3OGhI/13983698aeb04e8d843512cdad7e2d9f --top 3
/home/ecastron/programs/eggnog-mapper-0.11.0/bin/diamond blastp -d /home/ecastron/programs/eggnog-mapper-0.11.0/data/eggnog_proteins.dmnd -q /home/ecastron/programs/eggnog-mapper-0.11.0/test/testCOG0515.fa --more-sensitive --threads 16 -e 0.001000 -o /home/ecastron/programs/eggnog-mapper-0.11.0/emappertmp_dmdn_v3OGhI/13983698aeb04e8d843512cdad7e2d9f --top 3
Traceback (most recent call last):
File "/home/ecastron/programs/eggnog-mapper-0.11.0/emapper.py", line 895, in
main(args)
File "/home/ecastron/programs/eggnog-mapper-0.11.0/emapper.py", line 214, in main
dump_diamond_matches(args.input, seed_orthologs_file, args)
File "/home/ecastron/programs/eggnog-mapper-0.11.0/emapper.py", line 331, in dump_diamond_matches
raise ValueError('Error running diamond')
ValueError: Error running diamond

regards

Speed of annotation - hmmer vs. diamond

I am comparing hmm and diamond mode in emapper-0.12.7 on our local cluster and, contrary to what I expected, I found that hmm was faster than diamond. The former mode took ~2.5h, the latter ~6.5h. Could you advice on the settings?

The test setup contains 10k microbial genes (eventually I want to annotate >1M sequences). The following commands were submitted to a node with 28 cores and 120 gb memory:

emapper.py -m hmmer -i 1k.fa --translate --output eggnog.hmmer.1k -d bact --usemem --cpu 28

emapper.py -m diamond -i 1k.fa --translate --output eggnog.diamond.nr.1k --usemem --cpu 28 --scratch_dir /scratch/$PBS_JOBID/ --temp_dir /scratch/$PBS_JOBID/

Note:

  1. /scratch/$PBS_JOBID/ - designated physical space for temporary files on the node
  2. There is also /tmp/$PBS_JOBID/ - designated memory space for temporary files on the node, but when I tried using that for --temp_dir and --scratch_dir it returned 'ValueError: Error running diamond'.

Thanks for help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.