Comments (9)
You can try, but tbh it likely would be faster to just add vamb to the pipeline 🤣🤣🤣
from mag.
Having looked at Vamb, as it (ideally) require concatenating all assemblies and renaming contigs along a complicated scheme - I think it's going to play havoc with any system that's comparing bins using contig names (DAS_Tool and Tiara)... 😅
from mag.
Uuugghhhhh
from mag.
I guess we will need to make a metadata file to track them or something and covert headers back?
from mag.
"Concatenate the FASTA files together while making sure all contig headers stay unique"
If that's all it's doing, might be a reasonable thing to do upstream immediately after assembly anyway thinking about it...
from mag.
Furthermore, if you want to use binsplitting (and you should!), your contig headers must be of the format {Samplename}{Separator}{X}, such that the part of the string before the first occurrence of {Separator} gives a name of the sample it originated from. For example, you could call contig number 115 from sample number 9 "S9C115", where "S9" would be {Samplename}, "C" is {Separator} and "115" is {X}.
So it's a little more complicated! I'm not sure if renaming all the contigs initially is the best solution disk-space wise - as we just create a copy of all assemblies with different headers for a tool that we (potentially) might not choose to run...
Not to mention mapping the reads to the concatenated assembly, and then parsing that separately through the depths workflow 🫢
from mag.
Furthermore, if you want to use binsplitting (and you should!), your contig headers must be of the format {Samplename}{Separator}{X}, such that the part of the string before the first occurrence of {Separator} gives a name of the sample it originated from. For example, you could call contig number 115 from sample number 9 "S9C115", where "S9" would be {Samplename}, "C" is {Separator} and "115" is {X}.
So it's a little more complicated! I'm not sure if renaming all the contigs initially is the best solution disk-space wise - as we just create a copy of all assemblies with different headers for a tool that we (potentially) might not choose to run...
Not to mention mapping the reads to the concatenated assembly, and then parsing that separately through the depths workflow 🫢
Ugh ok.
It's weird though as earlier the documentation implies you don't have to do all of that?
I don't have a good suggestion then 😅, sounds like it'll all be painful one way or another...
from mag.
As a stopgap measure, I've written mgenotatte to do just that: genome QC, dereplication, and taxonomic annotation
https://github.com/maxibor/mgenottate
from mag.
As a stopgap measure, I've written mgenotatte to do just that: genome QC, dereplication, and taxonomic annotation https://github.com/maxibor/mgenottate
Ah, that's cool! In the end I just ended up forking mag, deleting the first part of the main workflow and dropping bins in via directory input: https://github.com/prototaxites/mag/tree/bin_entry
Also, I have a separate pipeline for metagenome gene annotation that is just a couple of characters different in name from yours: https://github.com/prototaxites/mgannotate 😅
from mag.
Related Issues (20)
- Problem with DUMPSOFTWAREVERSIONS HOT 2
- Centrifuge error - (ERR): mkfifo(/tmp/46.inpipe1) failed. HOT 3
- NFCORE_MAG:MAG:GTDBTK:GTDBTK_CLASSIFYWF fails when all genomes classified by ANI screening step HOT 4
- CHECKM_LINEAGEWF failing with exit status 1 HOT 7
- Error in executing PROKKA HOT 4
- Bins in CheckM summary do not match bins in bin depths summary HOT 7
- add diamond support to mag
- Improve documentation on the effects of completeness/contamination filters
- Replace CheckM with CheckM2 HOT 1
- Genome taxnomic classification with Sourmash HOT 3
- Pipeline completed sucessfully, but with errored process(es) HOT 1
- CAT_DB process has harcoded CAT subdirectory names HOT 1
- List of tool versions HOT 1
- Skipping NFCORE_MAG:MAG:BOWTIE2_PHIX_REMOVAL_ALIGN HOT 5
- Allow Prodigal to create a gbk file instead of the gff
- Add module pyrodigal
- Upgrade gtdbtk to version 2.4.0 and use reference data release220
- Update the URL for --genomad_db in docs HOT 1
- Pipeline fails with mag_depth script error when bins are empty HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mag.