Giter Site home page Giter Site logo

beexact's Introduction

BEExact: a metataxonomic database tool for high-resolution inference of bee-associated microbial communities

Update (04-April-2021)

BEExact v2021.0.2 is here! A lot of major improvements have been made to the database following the first set of revisions. Namely, it is no longer restricted for use to only Apis mellifera. It has now been expanded for use with all bee species (Hymenoptera:Apoidea:Anthophila). To do so, we not only added a long list of host-associated 16S rRNA gene sequences annotated at their lowest common rank (LCR) based on authorative type strains, but we also developed a novel approach for assigning phylogenetically consistent placeholder names to uncultivated microbial dark-matter. In effect, 618 placeholder labels were generated which should greatly enhances the ability to analyze associated microbial community structure and draw meaningful conclusions from routine 16S rRNA gene sequencing endeavours.

For more details, see original article in mSystems here: https://msystems.asm.org/content/6/2/e00082-21

Description

BEExact is a comprehensive, non-redudant, reference database that has been thoroughly curated for use with 16S rRNA gene-based sequencing on bee-associated microbial communities.

The database will be updated frequently to incoporate annotations and reference sequences for novel bee host-associated taxa. All suggestions for improvement are welcomed, see contact info below. If there is enough interest, I will write up a wiki tutorial for microbiota analysis using exact ASVs as opposed to the traditional OTU-based methods. As a quick note, there are several advantages to using ASVs specifically relating to their precision in characterizing microbial communities as well as their consistency for cross-study compatibility. See the latest DADA2 pipeline for more details on this.

Also, an excellent article simplifying the workflow for valid statistical analysis on compositional datasets: Microbiome Datasets Are Compositional: And This Is Not Optional

Benchmark

Across 32 indepedent studies encompassing 50 bee species, BEExact is enabled classification of ~80-90% of ASVs at the species-level whereas the leading exisiting database classified no more than ~30% at the same level. We noted that microbial communities from eusocial bee species generally exhibited higher classification rates, likely owing to the fact that their microbiota has been more intensively characterized compared to many solitary bee species.

Available files for download:

Download links for BEExact files

  1. The complete full length 16S rRNA gene sequences + taxonomy
    BEEx-FL-refs

  2. Pre-trained/formatted region-specific training sets for specific classifiers
    IDTAXA    Full-length   |   V3V4   |   V4   |   V4V5   |   V5V6
    DADA2    Full-length   |   V3V4   |   V4   |   V4V5   |   V5V6
    QIIME2    Full-length   |   V3V4   |   V4   |   V4V5   |   V5V6
    SINTAX    Full-length   |   V3V4   |   V4   |   V4V5   |   V5V6

More pre-trained/formatted region-specific classifiers coming shortly

The above 16S variable region-specific files were trimmed using the following primer sets:
  V3V4   Bakt_341F CCTACGGGNGGCWGCAG --- Bakt_805R GACTACHVGGGTATCTAATCC
  V4    515F(Parada) GTGYCAGCMGCCGCGGTAA --- 806R(Apprill) GGACTACNVGGGTWTCTAAT
  V4V5   515F(Parada) GTGYCAGCMGCCGCGGTAA --- 926R(Quince) CCGYCAATTYMTTTRAGTTT
  V5V6   799F-mod3 CMGGATTAGATACCCKGG --- 1115R(Kembel) AGGGTTGCGCTCGTTG

Creating your own region-specific training set

Other variable region-specific training sets can be generated using the full length BEExact database (BEEx-FL-refs)

An example using QIIME2 tools for making a V3-V4 specific training set:

Steps 1: Import sequence and taxonomy files as .qza

  qiime tools import \
    --type 'FeatureData[Sequence]' \
    --input-path BEEx-FL-refs_sequences.fa \
    --output-path BEEx-FL-refs_sequences.qza

  qiime tools import \
    --type 'FeatureData[Taxonomy]' \
    --input-format HeaderlessTSVTaxonomyFormat \
    --input-path BEEx-FL-refs_taxonomy.txt \
    --output-path BEEx-FL-refs_taxonomy.qza

Steps 2: Trim to specific region of interest (V3-V4 in this case)

qiime feature-classifier extract-reads \
    --i-sequences BEEx-FL-refs_sequences.qza \
    --p-f-primer ACTCCTACGGGAGGCAGCAG \
    --p-r-primer GGACTACHVGGGTWTCTAAT \
    --p-min-length 100 \
    --p-max-length 400 \
    --o-reads BEEx-V3V4-refs_sequences.qza

Steps 3: Train the classifier

  qiime feature-classifier fit-classifier-naive-bayes \
    --i-reference-reads BEEx-V3V4-refs_sequences.qza \
    --i-reference-taxonomy BEEx-FL-refs_taxonomy.qza \
    --o-classifier QIIME2_BxV3V4TS.qza

Step 4: Classify reads with the q2-feature-classifier

    qiime feature-classifier classify-sklearn \
      --i-classifier QIIME2_BxV3V4TS.qza \
      --i-reads ASVs_query_sequences.qza \
      --p-confidence 0.5 \
      --o-classification QIIME2_BxV3V4TS_ASVs_out.qza

Step 5: Visualize files

    qiime metadata tabulate \
      --m-input-file QIIME2_BxV3V4TS_ASVs_out.qza \
      --o-visualization QIIME2_BxV3V4TS_ASVs_out.qzv

For user-friendly conversion, drag and drop "QIIME2_BxV3V4TS_ASVs_out.qzv" to https://view.qiime2.org

Reference details

If you find the database helpful, please cite the following:

Daisley B.A. and G. Reid (2020). BEExact: a metataxonomic database tool for high-resolution inference of bee-associated microbial communities. mSystems 6(2):e00082-21

https://doi.org/10.1128/mSystems.00082-21

Contact information

All feedback welcomed. If you have any questions, please feel free to contact me. Sharing of information is also encouraged, especially for novel bee-associated species that have recently been discovered but not yet incorporated into the BEExact database. Teamwork makes the dreamwork.

Email: [email protected]

Twitter: @bdaisley

beexact's People

Contributors

bdaisley avatar lucas-bishop avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.