Giter Site home page Giter Site logo

q2-deblur's Introduction

qiime2 (the QIIME 2 framework)

Source code repository for the QIIME 2 framework.

QIIME 2™ is a powerful, extensible, and decentralized microbiome bioinformatics platform that is free, open source, and community developed. With a focus on data and analysis transparency, QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.

Visit https://qiime2.org to learn more about the QIIME 2 project.

Installation

Detailed instructions are available in the documentation.

Users

Head to the user docs for help getting started, core concepts, tutorials, and other resources.

Just have a question? Please ask it in our forum.

Developers

Please visit the contributing page for more information on contributions, documentation links, and more.

Citing QIIME 2

If you use QIIME 2 for any published research, please include the following citation:

Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37:852–857. https://doi.org/10.1038/s41587-019-0209-9

q2-deblur's People

Contributors

chriskeefe avatar david-rod avatar ebolyen avatar eldeveloper avatar emford avatar gregcaporaso avatar hagenjp avatar jairideout avatar lizgehret avatar mortonjt avatar nbokulich avatar oddant1 avatar q2d2 avatar thermokarst avatar wasade avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

q2-deblur's Issues

support `SampleData[JoinedSequencesWithQuality]` as input

I have some code that addresses this here. While the code technically works (it doesn't error), the resulting table has very low counts relative to the same data unjoined. So I need to explore this a bit more before it's ready. The low sequence counts were due to an error on my part (set the trim length too long, which resulted in dropping a lot of the joined sequences).

support viewing of `DeblurStats` type with `qiime metadata tabulate`

Proposed Behavior
This involved adding a transformer from DeblurStatsDirFmt to qiime2.Metadata. After this is added we can remove the deblur visualize-stats visualizer in favor of metadata tabulate. The moving pictures tutorial will also need to be updated at that time to reflect this change.

References
These PRs address the same thing for q2-quality-filter, so may be useful as a reference:

  1. qiime2/q2-quality-filter#42
  2. qiime2/docs#255

Expose `reference-non-hit.biom`

Bug Description
This file contains the reads which failed to recruit to the positive filter.

Expected Behavior
The documentation surrounding this file should be appropriate w.r.t. the likelihood of the file containing sequences at are, for instance, not actually 16S.

CI Failing to collect tests?

Bug Description
CI may fail to catch failing tests, likely because the test runner isn't collecting some or all tests.

Steps to reproduce the behavior

  1. Build a test that should fail (see example below for the case that raised this issue)
  2. Open a CR
  3. Check CI results for test failure

Expected behavior
Failing tests should fail CI

References
Affected PR

Test if `reference-hit.seqs.fa` is empty instead of `all.seqs.fa`

Bug Description
The problem is that it is possible for a run to complete such that there are deblurred reads without any recruiting to the reference. This results in an empty DNAIterator being returned, which the plugin system interprets as a malformed FASTA file.

References

  1. Context is here.
  2. Specifically, these lines should be revised to test against reference-hit.seqs.fa instead of all.seqs.fa.

Visualizer looks fairly empty

Bug Description
I just tried running qiime deblur visualize-stats on deblur-stats.qza from the Moving Pictures Tutorial, and I got an empty table.

Screenshots
empty table

Questions
Is this a peculiarity of the specific .qza in the Moving Pictures Tutorial, or a bug in visualize-stats?

Comments

  1. In the first scenario, I think we should include a .qza that does produce a meaningful visualization in one of the tutorials somewhere, and in the second, I think we should fix the visualizer.
  2. (Of course this may be simply user error on my end.)

References

  1. deblur-stats.qza
  2. empty table

version should be 0.0.7.dev0

The core plugins version numbers are all in sync, which would mean that this initial version should have 0.0.7.dev0. This isn't required for plugins, but just recommended as a best practice.

--p-sample-stats fails to correctly parse sequence identifiers with a semicolon

Bug Description
Sequence identifiers that contain semicolons on input to q2-deblur will fail to parse correction when aggregating per-sample statistics. The issue was first reported by AhHua on the QIIME 2 Forum. Specifically, --p-sample-stats obtains the size= value added to the sequence identifiers by vsearch during dereplication by splitting the sequence identifiers on a semicolon. In AhHua's case, the sequence records already had some information in the comment section of the sequence identifiers split by a semicolon, leading the stats collector to attempt to operate off of a different value.

Steps to reproduce the behavior

  1. Create an input artifact with sequence identifiers that include a semicolon, and a non-integer value (e.g., >some_identifier some_comment;foo=0.123).
  2. Run q2-deblur with --p-sample-stats

Expected behavior
A TypeError will occur when collecting the sample stats when attempting to cast (in the above example) 0.123 to an integer.

Screenshots
See this post for an example of incompatible sequence identifiers, and this post for an example of the traceback.

Comment
A workaround for this bug is to not use --p-sample-stats.

Optionally suppress logfile?

Improvement Description
Currently launching q2-deblur will create a deblur.log file wherever the command was run, assuming as a step baked in to deblur itself. It would be nice if this were able to be optionally created, or controllable as to where it gets placed, via a Q2 artifact or piping the output somewhere (directly into provenance?), or was just catchable output, a la q2-dada2 information.

Underscores in sample IDs breaks the pipeline

Bug Description
Underscores in Sample IDs are not supported in deblur. This breaks in several ways --- the reference database check is unable to find any hits when there are underscores present in IDs. As well, IDs with underscores appear to be truncated when underscores are present.

Steps to reproduce the behavior

  1. Run denoise-16s with samples with underscores in the IDs.

Expected behavior
Deblur should work as advertised.

Screenshots
A user reported that mock community samples were producing the following results:

c51fa0f00a4373b1b435ce8f06f977c3ea767884

Note sample HMP_mock_2 has no reads hitting the reference. The user had previously used the same mock community and had had success, so this reference miss was surprising.

I reran the same samples through denoise-16s using underscore-less sample IDs:

screen shot 2018-07-12 at 8 14 59 am

The sample in question now has the expected amount of reads hitting the reference.

Computation Environment

  • OS: macOS High Sierra
  • QIIME 2 Release: 2018.6

Questions

  1. Perhaps the way to solve this in this plugin is to test sample IDs for underscores and error out when observed. Thoughts?

References

  1. Reference DB issue (forum)
  2. Truncated ID issue (forum)

Can't install deblur in Qiime 2: PackageNotFoundError: Package not found: '' Dependencies missing in current osx-64 channels:

PackageNotFoundError: Package not found: '' Dependencies missing in current osx-64 channels:

  • q2-deblur -> q2-types ==2017.2.0
  • q2-deblur -> q2cli ==2017.2.0
  • q2-deblur -> qiime2 ==2017.2.0

Close matches found; did you mean one of these?

qiime2: qiime, r-qiimer

You can search for packages on anaconda.org with

anaconda search -t conda qiime2

(and similarly for the other packages)

Within the Qiime 2 environment I am running:

(qiime2-2.0.6) Bobby-Mac-Pro:~ twchicken$ conda install -c biocore q2-deblur

new release 1.0.4

I fixed Deblur in a way that all denoised sequences are ensured to be upper case.

As far as I understand q2, only enforces that all sequences in a biom table are upper case, thus I figure this plugin is somewhere doing the conversion to all upper cases, maybe here?

rep_sequences = DNAIterator(

Anyways: with the new release this is fixed in the underlying Deblur program itself. From looking at the conda recipe, the new release should automatically be used when build. But you might want to document this change somewhere?
@gregcaporaso @wasade

Update to latest scipy

Bug Description
Currently pinned to an old scipy:

    # scipy isn't directly used in this plugin, but setting a version pin here
    # because deblur doesn't currently work with modern scipy versions.
    - scipy <1.1.0

Questions

  1. Why is this pin needed? Is it related to skbio?

Comments

  1. This isn't super critical (yet!), but it'll matter more the older this issue gets.

Documentation

Questions
Hallo,

Where exactly is the documentation. I didn't find anything on the website you have indicated.
Thank you

positive/negative filter parameters don't work

They are appending --pos-ref-db or --neg-ref-db to the deblur command that is being built, but the actual options in deblur are --pos-ref-db-fp --pos-ref-fp or --neg-ref-db-fp --neg-ref-fp.

deblur returned non-zero exit status 1

I am running q2-deblur by following the qiime2 moving pictures tutorial except I have replaced the dada2 denoise command with deblur denoise

qiime deblur denoise  --i-demultiplexed-seqs demux.qza --o-representative-sequences rep-seqs --o-table table

This returns the following error:

subprocess.CalledProcessError: Command '['deblur', 'workflow', '--seqs-fp', '/tmp/qiime2-archive-cdd9bdgw/08c4e925-3699-4fde-808a-bfb9025d0c25/data', '--output-dir', '/tmp/tmp93ongdi2', '--mean-error', '0.005', '--error-dist', '1,0.06,0.02,0.02,0.01,0.005,0.005,0.005,0.001,0.001,0.001,0.0005', '--indel-prob', '0.01', '--indel-max', '3', '--trim-length', '100', '--min-reads', '0', '--min-size', '2', '-w']' returned non-zero exit status 1

I installed q2-deblur directly from the github repository using pip. Any thought's on what may be causing this error?

make two filepath parameters to denoise into artifacts

For the time being, this is going to require that we transition the one existing method into four methods because we don't have optional artifact inputs. This is something that we're addressing shortly, so will be able to transition this back to one method at that time. In the meantime, we'd rather have four methods than accept file paths as input, as the latter won't work well in non-command-line interfaces (e.g., in QIIME Studio, users would have to type a filepath into a text field).

I'll take this one.

Remove Flask license from project files

Previously the cookiecutter template would use a block of code in setup.py to scrape the version from the __init__.py file, and that little block was attributed to Flask. This is no longer the case as of versioneer upgrades.

A ValueError is missing string formatting

Bug Description
A string formatting is missing the trim length resulting in an incomplete ValueError message. The message right now contains a %d when that string formatting construct should be replaced by the variable value.

Steps to reproduce the behavior
Run a dataset in which none of the samples have any sequences which make it past the positive filter.

Expected behavior
The message should contain the trim length.

Computation Environment

  1. Vanilla qiime2-2018.8 environment

Comments
An example of the traceback output being produced is below:

Traceback (most recent call last):
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "<decorator-gen-356>", line 2, in denoise_16S
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
    output_types, provenance)
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_deblur/_denoise.py", line 96, in denoise_16S
    hashed_feature_ids=hashed_feature_ids)
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_deblur/_denoise.py", line 170, in _denoise_helper
    raise ValueError("No sequences passed the filter. It is possible "
ValueError: No sequences passed the filter. It is possible the trim_length (%d) may exceed the longest sequence, that all of the sequences are artifacts like PhiX or adapter, or that the positive reference used is not representative of the data being denoised.

Plugin error from deblur:

  No sequences passed the filter. It is possible the trim_length (%d) may exceed the longest sequence, that all of the sequences are artifacts like PhiX or adapter, or that the positive reference used is not representative of the data being denoised.

See above for debug info.

support SampleData[Sequences] as input

Current Behavior
As of qiime2-2017.9 there's a new semantic type SampleData[Sequences] and a new file format (QIIME1DemuxFormat) associated with that type. This new data type represents the QIIME 1 "demux" format, where sequences have already been demultiplexed and quality-filtered.

Proposed Behavior
By supporting this data type, q2-deblur will be able to support denoising existing QIIME 1 data, or data that's still being produced in this format (some sequencing centers do this for their clients). Currently it is only possible to dereplicate, cluster de novo, or cluster closed-reference with q2-vsearch.

References

  1. QIIME 1 "demux" format
  2. Came up on the forum here.

ValueError: Decoded Phred score is out of range [0, 62]

Have you had a chance to check out the docs?
https://docs.qiime2.org
There are many tutorials, walkthroughs, and guides available.

If you still need help, please visit:
https://forum.qiime2.org/c/user-support

environment : qiime2 2023.05 - docker-image
data:PACBIO sequel II
my command as follows:

qiime tools import           --type 'SampleData[SequencesWithQuality]'            --input-path ../input-path-list.tsv            --output-path ccs-data-demux.qza            --input-format SingleEndFastqManifestPhred33V2
qiime cutadapt trim-single --i-demultiplexed-sequences ../ccs-data-demux.qza --p-cores 4 --p-adapter RGYTACCTTGTTACGACTT --p-front AGRGTTYGATYMTGGCTCAG --o-trimmed-sequences ccs-data-cutadapt.qza --output-dir result
qiime deblur denoise-16S --i-demultiplexed-seqs ccs-data-cutadapt.qza --o-table deblur.table.qza --o-representative-sequences deblur.representative-sequences.qza --o-stats deblur.stat.qza --p-trim-length 10 --verbose

when i run those programm, something error information as follows:

  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/deblur/workflow.py", line 130, in trim_seqs
    for label, seq in input_seqs:
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/deblur/workflow.py", line 99, in sequence_generator
    for record in skbio.read(input_fp, format=format, **kw):
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/registry.py", line 506, in <genexpr>
    return (x for x in itertools.chain([next(gen)], gen))
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/registry.py", line 531, in _read_gen
    yield from reader(file, **kwargs)
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
    yield from reader_function(fhs[-1], **kwargs)
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/format/fastq.py", line 351, in _fastq_to_generator
    phred_scores, seq_header = _parse_quality_scores(fh, len(seq),
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/format/fastq.py", line 522, in _parse_quality_scores
    _decode_qual_to_phred(chunk, variant=variant,
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/format/_base.py", line 34, in _decode_qual_to_phred
    raise ValueError("Decoded Phred score is out of range [%d, %d]."

this error information is ValueError: Decoded Phred score is out of range [0, 62],but data is correct when i import fastq data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.