Comments (3)
Hi Bastien,
It appears that particular sequence does not have a taxonomy ID associated with its GI number in the taxonomy data that Kraken downloads. As it is now, the lack of a taxonomy ID for that sequence is causing exact matches with the sequence to not be classified by Kraken. Now that I think about it, I probably should have Kraken associate these types of sequences with taxid 1 (root), not taxid 0 (unclassified). I will probably do that for the next release.
This particular sequence does seem to have a taxonomy ID (645657) when looking at the actual .gbk file, but kraken-build
doesn't download these in order to save download time and storage space. I'm not sure why NCBI hasn't associated this sequence's GI number with that taxonomy number in the gi_taxid_nucl.dmp
file that Kraken uses. I'll have to look into that a bit more to see if there's a more preferred way to associate sequences with taxonomy information.
Derrick
from kraken.
Ah, things get clearer. Indeed, the GI in that FASTA file has no taxonomy node ID associated with it. Funny thing, the other .fna file in the same directory does have a GI with an association. Probably a bug in the NCBI db or FASTA building process. Would you report it (you may have a bit more weight than I do)?
Regarding Kraken association: I think the classes important for a user could be
a) sequence associated to a classified organism
b) sequence associated to an unclassified organism
c) sequence not associated
In a sense the output is a bit misleading as Kraken currently has only categories a) and c). Having this b) category would help to distinguish cases like the above "your sequence is known, but I have no way to tell you from which organism." Keeping that in taxid 0 would be the right way to do it, because having them in taxid 1 normally means "sheesh, present in all kingdoms of life".
I suppose I'd keep track of sequences which have absolutely no hit with taxid -1 (or however you want to implement this) and output that as "sequence not associated".
B.
from kraken.
I've notified NCBI about this particular issue - we'll see how things go there.
As for the proper classification of these kinds of sequences, I'll have to give it a bit more thought if the taxid 1 solution isn't appropriate. Perhaps a different classification code in the first column of the output (e.g., adding O in addition to C/U) while maintaining the taxid 0 behavior would be best. There's also a taxid 12908 ("unclassified sequences"), but I don't know that I feel comfortable hard-coding a non-0/1 ID into the main Kraken code, especially when that particular ID might be changed by a future NCBI decision.
from kraken.
Related Issues (20)
- gzip: .gz: not in gzip format
- kraken2-build error HOT 2
- db_sort: unable to mmap database.jdb: Cannot allocate memory
- Bioconda Kraken2 build standard database issue HOT 1
- How much time should be expected for building a database by kraken2-build?
- build_db: error opening taxonomy//nodes.dmp: No such file or directory 2020 HOT 2
- Kraken max length
- Issue with PLASMID download? HOT 3
- what(): 'database_10916': File truncated HOT 1
- xargs: cat: terminated by signal 13
- Kraken1 database exit code 137 HOT 2
- Xargs: cat: terminated by signal 13 with kraken2-build --build. HOT 4
- issue with rsync_from_ncbi.pl HOT 2
- Why classified reads are contaminated and unclassified are clean reads?
- problems with building kraken and kraken2 databases HOT 1
- Unable to run kraken2-build HOT 1
- errors with build kraken database
- rsync error
- Kraken2 error
- Cant open file: [Errno 2] No such file or directory: 'prueba//results.spa'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kraken.