nbisweden / contigtax Goto Github PK
View Code? Open in Web Editor NEWTaxonomic classification of metagenomic contigs
License: MIT License
Taxonomic classification of metagenomic contigs
License: MIT License
Entries that have multiple definitions at a certain rank causes contigtax assign
to fail with the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
For instance, this can be the result of a taxonomic id having:
rank superkingdom phylum order genus species class family class
182248 2759 7711 9443 9499 30589 40674 378855 1338369
This will be fixed in the next release by only selecting unique columns when setting up the lineage dataframe.
After the reformatting (or during, if working with the uniref databases) it should be possible to extract only proteins matching a certain taxon, say only Bacteria. This could be done on the command line with --taxlimit superkingdom:Bacteria
for instance or --taxlimit superkingdom Bacteria Archaea
My student is following the instructions as given in the Readme
and got stuck on step 5:
$ contigtax search -p 4 53.fa uniref100/diamond.dmnd assembly.tsv.gz
ERROR: This diamond version requires you to supply a taxonmap file with --taxonmap at this stage
I managed to figure out she needed to add --taxonmap uniref100/prot.accession2taxid.gz
, but it's probably good to be explicit about this!
If a user has already performed the diamond search step it should be possible to supply a protein id to taxonomy id file to create the lineage dataframe from.
Thanks for the awesome software! If I understand the code correctly, Diamond is executed using largely default parameters. I'd suggest adding in ‐‐range‐culling ‐‐top 10 -F 15
(source), but this will likely require rewrites of other areas of contigtax. These parameters will perform local Diamond alignment, retaining the top hit (within 10%) in each area of the query contig. We'd then have to not filter by bitscore in contigtax, and also should rely on the evalue parameter of Diamond instead of filtering on that. Just wanted to get the discussion started - there's probably a bunch of other design decisions that I'm missing.
Making lineages: 7%|██▍ | 20724/277641 [00:14<02:51, 1497.39 taxids/s]multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/vsingh/.conda/envs/tango/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/assign.py", line 490, in process_lineages
x = add_names(x, taxid, ncbi_taxa)
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/assign.py", line 114, in add_names
if t < 0:
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/pandas/core/generic.py", line 1576, in nonzero
.format(self.class.name))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/vsingh/.conda/envs/tango/bin/tango", line 12, in
sys.exit(main())
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/main.py", line 283, in main
args.func(args)
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/main.py", line 77, in assign_taxonomy
args.rank_thresholds, args.taxdir, args.sqlitedb, args.chunksize, args.cpus)
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/assign.py", line 776, in parse_hits
lineage_df, name_dict = make_lineage_df(taxids, taxdir, sqlitedb, reportranks, cpus)
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/assign.py", line 561, in make_lineage_df
unit=" taxids", ncols=100))
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tqdm/std.py", line 1093, in iter
for obj in iterable:
File "/home/vsingh/.conda/envs/tango/lib/python3.5/multiprocessing/pool.py", line 731, in next
raise value
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I just installed the program through conda on a compute server. When downloading data base files, the command fails with the following:
contigtax download taxonomy
Downloading NCBI taxdump.tar.gz
0 bytes [00:00, ? bytes/s]
Traceback (most recent call last):
File "/home/m.sevi/miniconda3/envs/contitax_env/bin/contigtax", line 10, in
sys.exit(main())
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/site-packages/contigtax/main.py", line 387, in main
args.func(args)
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/site-packages/contigtax/main.py", line 24, in download
prepare.download_ncbi_taxonomy(args.taxdir, args.force)
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/site-packages/contigtax/prepare.py", line 200, in download_ncbi_taxonomy
urllib.request.urlretrieve(url, local, reporthook=reporthook)
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/urllib/request.py", line 274, in urlretrieve
reporthook(blocknum, bs, size)
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/site-packages/contigtax/prepare.py", line 37, in update_to
t.update((b - last_b[0]) * bsize)
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/site-packages/tqdm-4.7.2-py3.6.egg/tqdm/_tqdm.py", line 689, in update
ZeroDivisionError: float division by zero
A similar behaviour is observed with >contigtax download uniref100
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.