Giter Site home page Giter Site logo

contigtax's People

Contributors

johnne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

contigtax's Issues

Assignment fails due to duplicate taxonomic rank entries

Entries that have multiple definitions at a certain rank causes contigtax assign to fail with the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

For instance, this can be the result of a taxonomic id having:

rank   superkingdom phylum order genus species  class  family    class
182248         2759   7711  9443  9499   30589  40674  378855  1338369

This will be fixed in the next release by only selecting unique columns when setting up the lineage dataframe.

Unclear instructions

My student is following the instructions as given in the Readme and got stuck on step 5:

$ contigtax search -p 4 53.fa uniref100/diamond.dmnd assembly.tsv.gz
ERROR: This diamond version requires you to supply a taxonmap file with --taxonmap at this stage

I managed to figure out she needed to add --taxonmap uniref100/prot.accession2taxid.gz, but it's probably good to be explicit about this!

Feature request: Utilize Diamond's contig features

Thanks for the awesome software! If I understand the code correctly, Diamond is executed using largely default parameters. I'd suggest adding in ‐‐range‐culling ‐‐top 10 -F 15 (source), but this will likely require rewrites of other areas of contigtax. These parameters will perform local Diamond alignment, retaining the top hit (within 10%) in each area of the query contig. We'd then have to not filter by bitscore in contigtax, and also should rely on the evalue parameter of Diamond instead of filtering on that. Just wanted to get the discussion started - there's probably a bunch of other design decisions that I'm missing.

getting crashed

Making lineages: 7%|██▍ | 20724/277641 [00:14<02:51, 1497.39 taxids/s]multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/vsingh/.conda/envs/tango/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/assign.py", line 490, in process_lineages
x = add_names(x, taxid, ncbi_taxa)
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/assign.py", line 114, in add_names
if t < 0:
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/pandas/core/generic.py", line 1576, in nonzero
.format(self.class.name))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/vsingh/.conda/envs/tango/bin/tango", line 12, in
sys.exit(main())
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/main.py", line 283, in main
args.func(args)
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/main.py", line 77, in assign_taxonomy
args.rank_thresholds, args.taxdir, args.sqlitedb, args.chunksize, args.cpus)
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/assign.py", line 776, in parse_hits
lineage_df, name_dict = make_lineage_df(taxids, taxdir, sqlitedb, reportranks, cpus)
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tango/assign.py", line 561, in make_lineage_df
unit=" taxids", ncols=100))
File "/home/vsingh/.conda/envs/tango/lib/python3.5/site-packages/tqdm/std.py", line 1093, in iter
for obj in iterable:
File "/home/vsingh/.conda/envs/tango/lib/python3.5/multiprocessing/pool.py", line 731, in next
raise value
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

contigtax dowload fails

I just installed the program through conda on a compute server. When downloading data base files, the command fails with the following:

contigtax download taxonomy
Downloading NCBI taxdump.tar.gz
0 bytes [00:00, ? bytes/s]
Traceback (most recent call last):
File "/home/m.sevi/miniconda3/envs/contitax_env/bin/contigtax", line 10, in
sys.exit(main())
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/site-packages/contigtax/main.py", line 387, in main
args.func(args)
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/site-packages/contigtax/main.py", line 24, in download
prepare.download_ncbi_taxonomy(args.taxdir, args.force)
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/site-packages/contigtax/prepare.py", line 200, in download_ncbi_taxonomy
urllib.request.urlretrieve(url, local, reporthook=reporthook)
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/urllib/request.py", line 274, in urlretrieve
reporthook(blocknum, bs, size)
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/site-packages/contigtax/prepare.py", line 37, in update_to
t.update((b - last_b[0]) * bsize)
File "/home/m.sevi/miniconda3/envs/contitax_env/lib/python3.6/site-packages/tqdm-4.7.2-py3.6.egg/tqdm/_tqdm.py", line 689, in update
ZeroDivisionError: float division by zero

A similar behaviour is observed with >contigtax download uniref100

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.