Comments (5)
Great! oh for some reason I didn't get an email notification for this reply. Thanks for the detailed instructions, here are the updated sigs: Gubaphage_genomes.dna.sig.zip
from wort.
Hi @rchikhi,
Happy to run some DNA queries first (we don't have protein running quite yet). However, this search requires scaled
signatures, and yours seem to be Mash-compatible signatures (probably due to your reported python 3.6 installation issue, sourmash-bio/sourmash#1561).
Could you try installing sourmash in an isolated environment, and then recalculating sigs, please?
installation: conda create -n sourmash4 -c conda-forge -c bioconda sourmash
activate conda environment: conda activate sourmash4
signature generation:
sourmash sketch dna -p k=21,k=31,k=51,scaled=1000,abund Gubaphage_genomes.fa -o Gubaphage_genomes.dna.sig
from wort.
Hi @rchikhi! I ran the query at k=21.
see https://github.com/bluegenes/2021-gubaphage-magsearch for code & a notebook where I did some light processing and filtration of the results. Click on the binder if you'd like to run the notebook interactively.
In that notebook, I selected results metagenome results that had at least 30% (output file: gubaphage.sra-search.k21-c30.csv
) or at least 10% containment (output file: gubaphage.sra-search.k21-c10.csv
) of the gubaphage query. These csv
s can be found in the output.gubaphage-magsearch/processed_results
folder, which you can see on the associated OSF repo. They can also be regenerated via the binder.
The processed results look like this:
search_genome,metagenome,containment
Gubaphage_genomes.fa,ERR2683220,0.506
Gubaphage_genomes.fa,ERR2683247,0.472
Gubaphage_genomes.fa,ERR2607412,0.439
Gubaphage_genomes.fa,ERR2592250,0.439
Gubaphage_genomes.fa,ERR2683203,0.432
Gubaphage_genomes.fa,ERR2683256,0.426
Gubaphage_genomes.fa,ERR2683152,0.419
...where containment
is the fraction of your gubaphage query covered by the metagenome.
What other information would be helpful, or what questions can I answer about this run? I can also search at k=31 and k=51 if you'd like - I'm not sure what (if any) additional metagenomes would be recovered.
cheers,
Tessa
from wort.
Thanks much Tessa! I'll have a look at the results and let you know if more info is needed on our side.
from wort.
Hi Tessa, quick question: the query was a pangenome, i.e. a collection of all known genomes for that clade. I suspect there's also some redundancy, i.e. some genomes inside this pangenome are very similar. Would it be correct to say that a containment score of 0.5 means that.. essentially 50% of the data inside that query pangenome has a hit? (With the truth being somewhere between "50% of the entries inside that FASTA file having a full hit and 50% have no hit", and "100% of the entries match over half their length").
from wort.
Related Issues (20)
- Support project file downloads HOT 2
- make ascii diagrams compatible with svgbob HOT 1
- Use aspera for faster downloads
- frontend: fixed size download progress
- alternatives to keyring crate
- lighthouse ci for frontend
- Schemathesis for API testing HOT 1
- Update dockerfiles
- Split compute queues
- Save some basic metadata about datasets in the DB
- Sharing indices
- Check Stark for replacing postgres?
- Track task results for datasets
- Add JGI assemblies?
- Add GenBank/RefSeq historical data?
- Create a feed for daily added datasets
- Calculating SAC on metagenome clusters HOT 3
- some signatures that have been suppressed by RefSeq/GenBank are not in wort HOT 1
- Is the website down? Everything is timing out HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wort.