Giter Site home page Giter Site logo

Run with third-party taxonomy db about mapseq HOT 6 OPEN

jfmrod avatar jfmrod commented on September 2, 2024
Run with third-party taxonomy db

from mapseq.

Comments (6)

jfmrod avatar jfmrod commented on September 2, 2024 1

Hi! To use a custom database, you would need to have a file with the fasta sequences (which is already provided with SILVA), and a taxonomy file which has two (tab separated) columns one with the IDs of the fasta sequences and one with the taxonomic labels for each of the sequences. The taxonomic annotations should be normalized (equal number of ranks).

That will get you a result, the problem is there are still a lot of misannotations in SILVA sequences that will throw off mapseq, so to get optimal results one would need to clean the sequences and annotations from SILVA a bit.

Some collaborators have recently made such a set for SILVA which we were planning on including in the next release, I can ask them for the dataset if you are interested in it and try to push it out faster.

from mapseq.

colinbrislawn avatar colinbrislawn commented on September 2, 2024

I'm interested in using MAPseq using Silva 138 pre-clustered at 99% identity (SILVA_138_SSURef_NR99_tax_silva.fasta.gz from here)

Here's what the silva files look like

gzip -dc SILVA_138_SSURef_NR99_tax_silva.fasta.gz | head -n 2
>AY846380.1.2583 Eukaryota;Archaeplastida;Chloroplastida;Chlorophyta;Chlorophyceae;Monoraphidium minutum
AACCUG...

gzip -dc tax_ncbi_ssu_ref_nr99_138.txt.gz | head -n 4
root;   1       no rank
root;Viruses;   10239   superkingdom
root;Viruses;Caudovirales;      28883   order
root;Viruses;Caudovirales;Ackermannviridae;     2169529 family

gzip -dc taxmap_ncbi_ssu_ref_nr99_138.txt.gz | head -n 3
primaryAccession        start   stop    Unclassified;   submitted_name
BD359736        3       2150    root;cellular organisms;Eukaryota;Alveolata;Apicomplexa;Aconoidasida;Haemosporida;Plasmodiidae;Plasmodium <genus>;Plasmodium (Plasmodium);Plasmodium malariae;                                                                                                                                        Plasmodium malariae
AB000278        1       1410    root;cellular organisms;Bacteria <prokaryotes>;Proteobacteria;Gammaproteobacteria;Vibrionales;Vibrionaceae;Photobacterium;Photobacterium iliopiscarium;                                                                                                                                               Photobacterium iliopiscarium

There's a plugin with Qiime 2 to normalize taxonomy levels, which could be helpful here.

from mapseq.

evilvenom avatar evilvenom commented on September 2, 2024

@colinbrislawn @alexaibio
Hello. Did anyone of you figure out how to use custom databases? If yes, it'll be really helpful. Thanks in advance.

from mapseq.

colinbrislawn avatar colinbrislawn commented on September 2, 2024

I have not figured out how to use custom databases, but also I have not worked on this sense posting. I would be interested in updates, though

from mapseq.

jfmrod avatar jfmrod commented on September 2, 2024

You can find an example of the taxonomy (NCBI and our OTUs) files included with mapseq, the NCBI taxonomy is mapref-2.2b.fna.ncbitax and the OTU "taxonomy" is mapref-2.2b.fna.otutax. You will want to copy the parameters in the NCBI taxonomy file in the line:
#cutoff: 0.00:0.08 0.70:0.35 0.70:0.35 0.70:0.35 0.80:0.25 0.92:0.08 0.95:0.05

these are needed to exclude hits based on identity cutoffs, and should work also for the SILVA set if you use 7 taxonomic levels.

from mapseq.

evilvenom avatar evilvenom commented on September 2, 2024

@jfmrod Thanks a lot for your response.
I should be able to use greengenes database also in that case right? It also has a fasta file taxonomy defined in a separate taxonomy file.

Also, my question was that if we use this, as I saw in some previous issue threads, how do I use the output with krona, was the krona output flag added? I don't see it in the help message. Yes we have -otucounts and -otutables option but when I import the generated -otutable in krona, it says "|Unclassified| has no OTU code".

I will be really grateful if you can help me with the issue. Is it going wrong from mapseq or krona is the question!

Thanks again!
PS: MapSeq version: v2.0.1alpha

from mapseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.