Deion of feature Decided I wanted to try and bin my data usi

Enter at postbinning stage about mag HOT 9 OPEN

prototaxites commented on August 27, 2024

Enter at postbinning stage

from mag.

Comments (9)

jfy133 commented on August 27, 2024 1

You can try, but tbh it likely would be faster to just add vamb to the pipeline 🤣🤣🤣

from mag.

prototaxites commented on August 27, 2024

Having looked at Vamb, as it (ideally) require concatenating all assemblies and renaming contigs along a complicated scheme - I think it's going to play havoc with any system that's comparing bins using contig names (DAS_Tool and Tiara)... 😅

from mag.

jfy133 commented on August 27, 2024

Uuugghhhhh

from mag.

jfy133 commented on August 27, 2024

I guess we will need to make a metadata file to track them or something and covert headers back?

from mag.

jfy133 commented on August 27, 2024

"Concatenate the FASTA files together while making sure all contig headers stay unique"

If that's all it's doing, might be a reasonable thing to do upstream immediately after assembly anyway thinking about it...

from mag.

prototaxites commented on August 27, 2024

Furthermore, if you want to use binsplitting (and you should!), your contig headers must be of the format {Samplename}{Separator}{X}, such that the part of the string before the first occurrence of {Separator} gives a name of the sample it originated from. For example, you could call contig number 115 from sample number 9 "S9C115", where "S9" would be {Samplename}, "C" is {Separator} and "115" is {X}.

So it's a little more complicated! I'm not sure if renaming all the contigs initially is the best solution disk-space wise - as we just create a copy of all assemblies with different headers for a tool that we (potentially) might not choose to run...

Not to mention mapping the reads to the concatenated assembly, and then parsing that separately through the depths workflow 🫢

from mag.

jfy133 commented on August 27, 2024

Furthermore, if you want to use binsplitting (and you should!), your contig headers must be of the format {Samplename}{Separator}{X}, such that the part of the string before the first occurrence of {Separator} gives a name of the sample it originated from. For example, you could call contig number 115 from sample number 9 "S9C115", where "S9" would be {Samplename}, "C" is {Separator} and "115" is {X}.

So it's a little more complicated! I'm not sure if renaming all the contigs initially is the best solution disk-space wise - as we just create a copy of all assemblies with different headers for a tool that we (potentially) might not choose to run...

Not to mention mapping the reads to the concatenated assembly, and then parsing that separately through the depths workflow 🫢

Ugh ok.

It's weird though as earlier the documentation implies you don't have to do all of that?

I don't have a good suggestion then 😅, sounds like it'll all be painful one way or another...

from mag.

maxibor commented on August 27, 2024

As a stopgap measure, I've written mgenotatte to do just that: genome QC, dereplication, and taxonomic annotation
https://github.com/maxibor/mgenottate

from mag.

prototaxites commented on August 27, 2024

As a stopgap measure, I've written mgenotatte to do just that: genome QC, dereplication, and taxonomic annotation https://github.com/maxibor/mgenottate

Ah, that's cool! In the end I just ended up forking mag, deleting the first part of the main workflow and dropping bins in via directory input: https://github.com/prototaxites/mag/tree/bin_entry

Also, I have a separate pipeline for metagenome gene annotation that is just a couple of characters different in name from yours: https://github.com/prototaxites/mgannotate 😅

from mag.

Enter at postbinning stage about mag HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent