Giter Site home page Giter Site logo

qiime2 / qiime2 Goto Github PK

View Code? Open in Web Editor NEW
448.0 448.0 234.0 1.64 MB

Official repository for the QIIME 2 framework.

Home Page: https://qiime2.org

License: BSD 3-Clause "New" or "Revised" License

Python 95.86% Makefile 0.04% Shell 0.01% TeX 4.07% HTML 0.02%
hacktoberfest

qiime2's Introduction

qiime2 (the QIIME 2 framework)

Source code repository for the QIIME 2 framework.

QIIME 2™ is a powerful, extensible, and decentralized microbiome bioinformatics platform that is free, open source, and community developed. With a focus on data and analysis transparency, QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.

Visit https://qiime2.org to learn more about the QIIME 2 project.

Installation

Detailed instructions are available in the documentation.

Users

Head to the user docs for help getting started, core concepts, tutorials, and other resources.

Just have a question? Please ask it in our forum.

Developers

Please visit the contributing page for more information on contributions, documentation links, and more.

Citing QIIME 2

If you use QIIME 2 for any published research, please include the following citation:

Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37:852–857. https://doi.org/10.1038/s41587-019-0209-9

qiime2's People

Contributors

andrewsanchez avatar antgonza avatar asoback avatar benkaehler avatar chriskeefe avatar colinvwood avatar david-rod avatar ebolyen avatar emollier avatar gregcaporaso avatar hunter-cameron avatar jairideout avatar jakereps avatar keegan-evans avatar kestrelgorlick avatar lizgehret avatar maxvonhippel avatar misialq avatar mortonjt avatar nbokulich avatar oddant1 avatar patthehat033 avatar q2d2 avatar sann5 avatar thermokarst avatar turanoo avatar wasade avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qiime2's Issues

artifact file extension?

What should the recommended file extension be for serialized artifacts?

@gregcaporaso @ebolyen and I were thinking of .qtf for QIIME Tar Format, since the format is a tar file. .qtf is only used by a couple of unrelated software packages and doesn't seem to be a popular extension.

The only thing I'm not sold on is having part of the underlying file format in the name (Tar) in case that changes in the future. It would also be nice to have "artifact" somewhere in the name since it is a format for serializing QIIME artifacts.

remove interface module

All interfaces, including the q2d3 prototype and cli will live in separate repositories. These will then serve as examples for other interface developers, and keep the distinction between functionality and interface clear.

ArtifactDataReader should track open files

ArtifactDataReader should track files that are opened for reading with get_file and close all tracked filehandles when Artifact is done using the data reader. This is a safety net for plugin developers in case they forget to close the filehandles they request. It also makes plugin code simpler. This is a similar strategy to what ArtifactDataWriter does.

should compressed artifacts be supported?

Should there be an API for creating a compressed tar file with Artifact.save?

See tarfile.open for supported compression schemes in Python. Artifact.load currently accepts compressed or uncompressed tar files, so supporting this should be trivial.

@ebolyen pointed out that a tar file is likely to be understood by software into the foreseeable future because it is a simple format, but compression is more of an unknown. IMO future-proof data archiving is outside the responsibility of QIIME 2, so we may want to give users/devs control over whether to compress artifacts. I could see different systems built around QIIME 2 having different needs w.r.t. compression.

Artifact.data should be lazy

Currently when instantiating an Artifact from a tar file, the artifact's data is loaded into memory and stored at the data property. The data property should be lazy such that the data is only loaded when .data is first accessed.

Question: should this always load a new instance of data, or cache the result? This depends on how Artifact is expected to be interacted with, which is unclear to me right now.

ArtifactDataWriter should only save tracked files

ArtifactDataWriter._save_ currently adds all files in its temporary directory to the tar file. It should only add files that were created with create_file in case other files are added to the temp dir through some other mechanism.

create a mechanism for importing data into artifacts

I have an old gist that illustrates how to do this - that code may or may not still work, but the idea is the same.

As of now, on import provenance should be None, but we likely want to provide some provenance information for these, such as when the file was uploaded, the source file path, etc.

stub execution context

This will first be necessary for the cli, and then for other interfaces (it wasn't necessary for q2d3 because users initiates execution themselves).

>>> e = LocalSubprocessExecution()
>>> e(workflow_instance, params to workflow_instance.to_script) # this would call subprocess

saving a file launches q2d3 server

This is really weird: if I have a q2d3 server running and save a file in my text editor (vim), a new index page is launched in my browser. I'm not editing or saving files in the server's working directory. Here's the jupyter notebook log entries that get added when I save a file:

[I 14:52:32.222 NotebookApp] The port 4445 is already in use, trying another random port.
[C 14:52:32.222 NotebookApp] ERROR: the notebook server could not be started because no available port could be found.

@ebolyen any ideas?

investigate snakemake as pipeline backend

Improvement Description
It has a simple, declarative file format, supports cluster execution, and connects to a variety of resources out of the box (including Amazon S3, Google Storage, Dropbox, FTP, SFTP). The snakemake devs are working on adding cloud support. It's written in Python 3, and was initially developed for bioinformatics applications (so there's a lot of real-world examples and publications using it). The snakemake devs are bioconda devs, so it's available via bioconda. This looks like a very, very sane bioinformatics library that could be used outright, or extended, to provide true DAG workflow/pipeline specification and execution in QIIME 2. This is definitely worth investigating further!

Comments
@bhillmann suggested snakemake as a workflow/pipeline management engine. @ebolyen and I briefly looked into it and are amazed. A few of the highlights (thanks @ebolyen!):

Should `.async` type-check before spawning?

Comments
This would make life a little easier for users of the "Artifact" API in a Jupyter notebook, etc... as they wouldn't need to call future.result() over and over to see if it "worked", but it wouldn't prevent all types of errors as the wrapped function may still raise. Additionally this will make it harder for interface developers who would need to catch errors and represent them from two different places.

Visualizers fail on OSX through async calls, due to matplotlib main thread limitations

Testing out the summarize visualizer from the feature-table plugin through qiime-studio revealed an issue where matplotlib does not work outside of the main thread on OSX. The Visualizer fails with the following traceback:

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
Traceback (most recent call last):
  File "/Users/Develop/Developer/work/qiime-studio/qiime_studio/api/jobs.py", line 111, in callback
    results = future.result()
  File "/Users/Develop/anaconda/envs/qiime_studio/lib/python3.5/concurrent/futures/_base.py", line 398, in result
    return self.__get_result()
  File "/Users/Develop/anaconda/envs/qiime_studio/lib/python3.5/concurrent/futures/_base.py", line 357, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Found it was due to matplotlib through this SO exchange: http://stackoverflow.com/a/16303620
Sounds like this might need another workaround.

make types hashable

This will be useful when we need to do type mapping, for example in the cli when we map QIIME types to click types.

create a Job class

Job objects should be returned from SystemContext.__call__, and should contain (at least) the complete job markdown, and the workflow uuid that gets stored as part of that jobs provenance. That workflow uuid can then be used, for example, in naming the markdown files that q2d3 is creating, which would provide a link from the Artifact objects that get created when the job is run to the actual code that was run, if the interface wants to track that (note that the created artifact's provenance does already have a reference to the workflow template that generated it).

create a mechanism for inspecting Artifacts

Likely as a method of Artifact, or some way to view properties of it from interfaces. We need to think about what this would be exactly, but we need some way to view provenance, view summaries of the underlying data, etc.

Property predicates should be "symbols" so that we can attach docs

Improvement Description
Currently, our design has them as strings. This is easy, but because semantic properties require community consensus it would be nice if they were "registered" in some way with some documentation. That way developers and users can better interpret what a given type with a semantic property actually means.

centralize test install instructions

Installation instructions are now duplicated across different repositories, including q2cli and q2d3. We should put these in one place - the QIIME 2 GitHub wiki is a likely spot for this.

Job object is not accessible through the use of SubprocessExecutor

.sdk.execution.SubprocessExecutor creates a .sdk.job.Job, and writes the code to a file to be ran, but doesn't allow access to the object information itself. This was noticed while trying to use the Job.uuid as the identifier for tracking the current processes being ran by qiime-studio, and realizing that only the Future object was being returned.

Found while working on PR#32 in qiime-studio

support user-configured temporary directory

Similar to QIIME 1's temp_dir config option, QIIME 2 should support a config file/directory where users can specify a temporary directory. ArtifactDataWriter should respect this temp dir in its call to tempfile.TemporaryDirectory.

better handling of unloadable artifacts

It is primarily up to interfaces to determine how they handle discovering artifacts that they cannot load (either because the plugin that defines their type is not installed, or the .qtf file is corrupt), but QIIME 2 should likely raise a custom error type to make this easier for interfaces to detect. The corresponding error should differentiate corrupt artifacts from unimportable artifacts. If Artifacts encode information about the plugin that defines them in their metadata, an unimportable Artifact could tell the user what plugin needs to be imported for it to be used.

Also, providing a non-qtf file should give a nice error message. The current message is something like: tarfile.ReadError: '/Users/jairideout/dev/qiime2/q2-ninja-ops/seqs.fna' is not a readable tar archive file.

unit test expansion

As part of #42, basic unit tests were added for pieces of the core framework. Expand on this to cover boundary cases and errors.

recommend a naming convention for plugins

@ebolyen and I think it'd make sense to recommend some sort of naming convention analogous to scikit-. We could do q2- for package names, and q2 for module names, so the diversity plugin would be called q2-diversity, and the module would be imported as q2diversity (like scikit-bio and import skbio).

arbitrary order of inputs/parameters

Noticed this while playing around with q2cli today. When calling --help on a method or visualizer, the order of inputs and parameters changes across runs, which is going to be confusing to users. This doesn't happen with methods defined in Markdown files because order is preserved in the Markdown file, and Method.from_markdown uses OrderedDict internally to preserve that order.

We have a couple of options:

  1. Display all options in alphabetical order in q2cli and qiime-studio, regardless of underlying function signature. Technically no change is necessary to the framework if we go this route.
  2. Display options in the order they are defined in the function signature. Note that this won't always group input artifacts followed by parameters, as plugin developers can define their function signature in whatever order they wish. The plugin developer understands the ordering of parameters best, so perhaps it's best if we respect their defined ordering in (all?) interfaces. If we go this route, we'll need to update the framework to preserve order, and q2cli probably won't need updating because it's just looping over the signature (unsure about qiime-studio).

@gregcaporaso @ebolyen @jakereps what do you think? I'm leaning towards option 2.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.