Giter Site home page Giter Site logo

q2-kmerizer's Introduction

q2-kmerizer

A QIIME 2 plugin for generating and working with kmers from biological sequence information.

Note: this plugin is under active development during pre-release. The code should not be considered stable or ready for publication-ready analyses.

Installation instructions

Install development version of q2-kmerizer "from scratch"

If you do not already have a QIIME 2 environment installed, you can follow these instructions to install a development version of q2-kmerizer.

Miniconda provides the conda environment and package manager, and is currently the only supported way to install QIIME 2. Follow the instructions for downloading and installing Miniconda.

After installing Miniconda and opening a new terminal, make sure you're running the latest version of conda:

conda update conda

Next, clone the repository and move into the top-level q2-kmerizer directory. NOTE: make sure your current working directory is a location where you want to install this plugin!

git clone https://github.com/bokulich-lab/q2-kmerizer.git
git cd q2-kmerizer

Then, run:

conda env create -n q2-kmerizer-dev --file ./environments/q2-kmerizer-qiime2-amplicon-2024.10.yml

After this completes, activate the new environment you created by running:

conda activate q2-kmerizer-dev

Finally, run:

make install

Examples

As an example test, we will use data from Sampson et al, 2016, a study testing whether the fecal microbiome contributed to the development of Parkinson’s Disease (PD).

First we will download the test data:

wget https://data.qiime2.org/2024.5/tutorials/pd-mice/sample_metadata.tsv
wget https://docs.qiime2.org/2024.5/data/tutorials/pd-mice/dada2_table.qza
wget https://docs.qiime2.org/2024.5/data/tutorials/pd-mice/dada2_rep_set.qza

We can count kmer frequencies per sample with this command:

qiime kmerizer seqs-to-kmers \
    --i-sequences dada2_rep_set.qza \
    --i-table dada2_table.qza \
    --o-kmer-table kmer_table.qza \
    --p-max-features 5000

Or run this pipeline to count kmer frequencies, calculate diversity metrics, and create an interactive scatterplot with the results:

qiime kmerizer core-metrics \
    --i-sequences dada2_rep_set.qza \
    --i-table dada2_table.qza \
    --p-sampling-depth 1000 \
    --m-metadata-file sample_metadata.tsv \
    --p-color-by-group donor \
    --p-max-features 5000 \
    --output-dir core-metrics/

Both of these actions output a frequency table that contains kmer counts per sample. This can be used like any other frequency table and passed to any action in QIIME 2 that accepts a frequency table (except for those that also require additional inputs that must match the features in the table, e.g., that require a taxonomy). For example, we can run a pipeline to train a Random Forest classifier and test on a hold-out subset of the dataset (note: this analysis is done purely for demonstrative purposes; the sample size in this test dataset is much smaller than would be required for a robust supervised learning analysis, and proper replicate handling should be done to avoid data leakage).

qiime sample-classifier classify-samples \
    --i-table kmer_table.qza \
    --m-metadata-file sample_metadata.tsv \
    --m-metadata-column donor \
    --output-dir sample-classifier/

About

The q2-kmerizer Python package was created from a template. To learn more about q2-kmerizer, refer to the project website. To learn how to use QIIME 2, refer to the QIIME 2 User Documentation. To learn QIIME 2 plugin development, refer to Developing with QIIME 2.

q2-kmerizer is a QIIME 2 plugin. For questions, comments, or feature requests about this plugin, please post in the Community Plugins category on the QIIME 2 Forum. The issue tracker on the GitHub repository is intended for use by the plugin developers and maintainers, not as a help forum.

q2-kmerizer's People

Contributors

nbokulich avatar

Watchers

 avatar Michal Ziemski avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.