Giter Site home page Giter Site logo

running a gubaphage query about wort HOT 5 OPEN

sourmash-bio avatar sourmash-bio commented on May 30, 2024
running a gubaphage query

from wort.

Comments (5)

rchikhi avatar rchikhi commented on May 30, 2024 1

Great! oh for some reason I didn't get an email notification for this reply. Thanks for the detailed instructions, here are the updated sigs: Gubaphage_genomes.dna.sig.zip

from wort.

bluegenes avatar bluegenes commented on May 30, 2024

Hi @rchikhi,

Happy to run some DNA queries first (we don't have protein running quite yet). However, this search requires scaled signatures, and yours seem to be Mash-compatible signatures (probably due to your reported python 3.6 installation issue, sourmash-bio/sourmash#1561).

Could you try installing sourmash in an isolated environment, and then recalculating sigs, please?

installation: conda create -n sourmash4 -c conda-forge -c bioconda sourmash

activate conda environment: conda activate sourmash4

signature generation:

sourmash sketch dna -p k=21,k=31,k=51,scaled=1000,abund Gubaphage_genomes.fa -o Gubaphage_genomes.dna.sig

from wort.

bluegenes avatar bluegenes commented on May 30, 2024

Hi @rchikhi! I ran the query at k=21.

see https://github.com/bluegenes/2021-gubaphage-magsearch for code & a notebook where I did some light processing and filtration of the results. Click on the binder if you'd like to run the notebook interactively.

In that notebook, I selected results metagenome results that had at least 30% (output file: gubaphage.sra-search.k21-c30.csv) or at least 10% containment (output file: gubaphage.sra-search.k21-c10.csv) of the gubaphage query. These csvs can be found in the output.gubaphage-magsearch/processed_results folder, which you can see on the associated OSF repo. They can also be regenerated via the binder.

The processed results look like this:

search_genome,metagenome,containment
Gubaphage_genomes.fa,ERR2683220,0.506
Gubaphage_genomes.fa,ERR2683247,0.472
Gubaphage_genomes.fa,ERR2607412,0.439
Gubaphage_genomes.fa,ERR2592250,0.439
Gubaphage_genomes.fa,ERR2683203,0.432
Gubaphage_genomes.fa,ERR2683256,0.426
Gubaphage_genomes.fa,ERR2683152,0.419

...where containment is the fraction of your gubaphage query covered by the metagenome.

What other information would be helpful, or what questions can I answer about this run? I can also search at k=31 and k=51 if you'd like - I'm not sure what (if any) additional metagenomes would be recovered.

cheers,

Tessa

from wort.

rchikhi avatar rchikhi commented on May 30, 2024

Thanks much Tessa! I'll have a look at the results and let you know if more info is needed on our side.

from wort.

rchikhi avatar rchikhi commented on May 30, 2024

Hi Tessa, quick question: the query was a pangenome, i.e. a collection of all known genomes for that clade. I suspect there's also some redundancy, i.e. some genomes inside this pangenome are very similar. Would it be correct to say that a containment score of 0.5 means that.. essentially 50% of the data inside that query pangenome has a hit? (With the truth being somewhere between "50% of the entries inside that FASTA file having a full hit and 50% have no hit", and "100% of the entries match over half their length").

from wort.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.