Giter Site home page Giter Site logo

blast-ws's People

Contributors

thomasstjerne avatar

Watchers

 avatar

blast-ws's Issues

Potential further Reference databases to include (irrespective of what kind of preparation they need)

SILVA
"SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya)."
Note: earlier this was the one big rDNA reference database. It is being re-designed in the coming years. GBIF S is in contact with developers.

NAMERS
"NAMERS is a data portal of high quality DNA reference sequences generated for use with environmental DNA technologies. It’s current taxonomic focus is freshwater fish of British Columbia, Canada"
Note: This database is based on genome skimming, and contains sequences for most mitochondrial marker regions on the mitochondrion for the targeted species (Canada, freshwater). The scope is meant to increase.

MIDORI2
Publ: Leray et al 2022
"MIDORI2 is a reference database of DNA and amino acid sequences used for taxonomic assignments of Eukaryota mitochondrial DNA sequences. Currently, the databases are available for download in seven formats. Since version GB237, MIDORI 2 includes not only Metazoan but also all Eukaryota sequences. Since version GB242, MIDORI 2 provides two types of databases, 1) with and 2) without binomial species description, such as "cf.," "aff.," and "sp." Since version GB243, we also provide amino acid sequence databases."
Notes: Midori is becoming more widely used. Has CO1, CytB, etc. based on GenBank.

CALeDNA databases
"These databases were made using the CRUX Pipeline, part of the Anacapa Toolkit (Curd et al., 2019 in MEE). We update these databases annually. If you are using a different primer or locus, we encourage you to make your own CRUX database. Let us know if you want additional reference libraries or if you need help making your own. "
Notes: By now includes: 16S: min size 60, max size 400 | 18S: min size 80, max size 550 | PITS: min size 100, max size 800 | CO1: min size 100, max size 700 | FITS: min size 80, max size 700 | trnL: min size 33, max size 225 | Vertebrate 12S: min size 40, max size 150.

Mare_MAGE
"The Mare-MAGE database contains quality-checked sequences of the mitochondrial 12S ribosomal RNA and Cytochrome c Oxidase I gene. All sequences were obtained from the National Center for Biotechnology Information- GenBank (NBCI-GenBank), the European Nucleotide Archive (ENA), AquaGene Database and BOLD database, and have undergone intensive processing. They were checked for false annotations and non-target anomalies, according to the Integrated Taxonomic Information System (ITIS) and FishBase. The dataset is compiled in ARB-Home, FASTA and Qiime2 formats, and is publicly available from the Mare-MAGE database website (http://mare-mage.weebly.com/)."

COInrCOInr and mkCOInr: Building and customizing a nonredundant barcoding reference database from BOLD and NCBI using a semi-automated pipeline
"The mkcoinr tool is a series of Perl scripts designed to download sequences from BOLD and NCBI, to build the COInr database and to customize it according to the users’ needs. It is possible to select or eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for blast, vtam, qiime and rdp classifier."

Methods for constructing reference databases

A list of tool that can be considered if GBIF considers to produce own reference databases.

RESCRIPt Reproducible sequence taxonomy reference database management

rCRUXA Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R.
Notes: Apparently the best performing algorithm presently. Based on in silico PCR followed by similarity searches. Part of the ANACAPA tool kit. Used by CALeDNA to build ref-db's.

crabsA software program to generate curated reference databases for metabarcoding sequencing data

METACURATORA hidden Markov model-based toolkit for extracting and curating sequences from taxonomically-informative genetic markers

ECOPCR
Notes: Originally part of the ObiTools tool set. I am unsure about recent developments. But it had the problem of not catching sequences that lack primer region (as other approaches, but these are followed up by similarity searches).

DB4Q2A detailed workflow to develop QIIME2‑formatted reference databases for taxonomic analysis of DNA metabarcoding data
Notes: A workflow for Qiime2

MARESa replicable pipeline and curated reference database for marine eukaryote metabarcoding

refdbManagement of DNA reference libraries for barcoding and metabarcoding studies with the R package refdb
Notes: maybe something that can be used to curate ref-dbs produced with any tool?

mkcoinrCOInr and mkCOInr: Building and customizing a nonredundant barcoding reference database from BOLD and NCBI using a semi-automated pipeline
"The mkcoinr tool is a series of Perl scripts designed to download sequences from BOLD and NCBI, to build the COInr database and to customize it according to the users’ needs. It is possible to select or eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for blast, vtam, qiime and rdp classifier."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.