thomasstjerne / blast-ws Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 208 KB

License: MIT License

JavaScript 100.00%

blast-ws's People

Contributors

Watchers

blast-ws's Issues

Potential further Reference databases to include (irrespective of what kind of preparation they need)

SILVA
"SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya)."
Note: earlier this was the one big rDNA reference database. It is being re-designed in the coming years. GBIF S is in contact with developers.

NAMERS
"NAMERS is a data portal of high quality DNA reference sequences generated for use with environmental DNA technologies. It’s current taxonomic focus is freshwater fish of British Columbia, Canada"
Note: This database is based on genome skimming, and contains sequences for most mitochondrial marker regions on the mitochondrion for the targeted species (Canada, freshwater). The scope is meant to increase.

MIDORI2
Publ: Leray et al 2022
"MIDORI2 is a reference database of DNA and amino acid sequences used for taxonomic assignments of Eukaryota mitochondrial DNA sequences. Currently, the databases are available for download in seven formats. Since version GB237, MIDORI 2 includes not only Metazoan but also all Eukaryota sequences. Since version GB242, MIDORI 2 provides two types of databases, 1) with and 2) without binomial species description, such as "cf.," "aff.," and "sp." Since version GB243, we also provide amino acid sequence databases."
Notes: Midori is becoming more widely used. Has CO1, CytB, etc. based on GenBank.

CALeDNA databases
"These databases were made using the CRUX Pipeline, part of the Anacapa Toolkit (Curd et al., 2019 in MEE). We update these databases annually. If you are using a different primer or locus, we encourage you to make your own CRUX database. Let us know if you want additional reference libraries or if you need help making your own. "
Notes: By now includes: 16S: min size 60, max size 400 | 18S: min size 80, max size 550 | PITS: min size 100, max size 800 | CO1: min size 100, max size 700 | FITS: min size 80, max size 700 | trnL: min size 33, max size 225 | Vertebrate 12S: min size 40, max size 150.

Mare_MAGE
"The Mare-MAGE database contains quality-checked sequences of the mitochondrial 12S ribosomal RNA and Cytochrome c Oxidase I gene. All sequences were obtained from the National Center for Biotechnology Information- GenBank (NBCI-GenBank), the European Nucleotide Archive (ENA), AquaGene Database and BOLD database, and have undergone intensive processing. They were checked for false annotations and non-target anomalies, according to the Integrated Taxonomic Information System (ITIS) and FishBase. The dataset is compiled in ARB-Home, FASTA and Qiime2 formats, and is publicly available from the Mare-MAGE database website (http://mare-mage.weebly.com/)."

COInr – COInr and mkCOInr: Building and customizing a nonredundant barcoding reference database from BOLD and NCBI using a semi-automated pipeline
"The mkcoinr tool is a series of Perl scripts designed to download sequences from BOLD and NCBI, to build the COInr database and to customize it according to the users’ needs. It is possible to select or eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for blast, vtam, qiime and rdp classifier."

Methods for constructing reference databases

A list of tool that can be considered if GBIF considers to produce own reference databases.

RESCRIPt Reproducible sequence taxonomy reference database management

rCRUX – A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R.
Notes: Apparently the best performing algorithm presently. Based on in silico PCR followed by similarity searches. Part of the ANACAPA tool kit. Used by CALeDNA to build ref-db's.

crabs– A software program to generate curated reference databases for metabarcoding sequencing data

METACURATOR – A hidden Markov model-based toolkit for extracting and curating sequences from taxonomically-informative genetic markers

ECOPCR
Notes: Originally part of the ObiTools tool set. I am unsure about recent developments. But it had the problem of not catching sequences that lack primer region (as other approaches, but these are followed up by similarity searches).

DB4Q2 – A detailed workflow to develop QIIME2‑formatted reference databases for taxonomic analysis of DNA metabarcoding data
Notes: A workflow for Qiime2

MARES – a replicable pipeline and curated reference database for marine eukaryote metabarcoding

refdb – Management of DNA reference libraries for barcoding and metabarcoding studies with the R package refdb
Notes: maybe something that can be used to curate ref-dbs produced with any tool?

mkcoinr – COInr and mkCOInr: Building and customizing a nonredundant barcoding reference database from BOLD and NCBI using a semi-automated pipeline
"The mkcoinr tool is a series of Perl scripts designed to download sequences from BOLD and NCBI, to build the COInr database and to customize it according to the users’ needs. It is possible to select or eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for blast, vtam, qiime and rdp classifier."

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.