Giter Site home page Giter Site logo

Enter at postbinning stage about mag HOT 9 OPEN

prototaxites avatar prototaxites commented on July 4, 2024
Enter at postbinning stage

from mag.

Comments (9)

jfy133 avatar jfy133 commented on July 4, 2024 1

You can try, but tbh it likely would be faster to just add vamb to the pipeline 🤣🤣🤣

from mag.

prototaxites avatar prototaxites commented on July 4, 2024

Having looked at Vamb, as it (ideally) require concatenating all assemblies and renaming contigs along a complicated scheme - I think it's going to play havoc with any system that's comparing bins using contig names (DAS_Tool and Tiara)... 😅

from mag.

jfy133 avatar jfy133 commented on July 4, 2024

Uuugghhhhh

from mag.

jfy133 avatar jfy133 commented on July 4, 2024

I guess we will need to make a metadata file to track them or something and covert headers back?

from mag.

jfy133 avatar jfy133 commented on July 4, 2024

"Concatenate the FASTA files together while making sure all contig headers stay unique"

If that's all it's doing, might be a reasonable thing to do upstream immediately after assembly anyway thinking about it...

from mag.

prototaxites avatar prototaxites commented on July 4, 2024

Furthermore, if you want to use binsplitting (and you should!), your contig headers must be of the format {Samplename}{Separator}{X}, such that the part of the string before the first occurrence of {Separator} gives a name of the sample it originated from. For example, you could call contig number 115 from sample number 9 "S9C115", where "S9" would be {Samplename}, "C" is {Separator} and "115" is {X}.

So it's a little more complicated! I'm not sure if renaming all the contigs initially is the best solution disk-space wise - as we just create a copy of all assemblies with different headers for a tool that we (potentially) might not choose to run...

Not to mention mapping the reads to the concatenated assembly, and then parsing that separately through the depths workflow 🫢

from mag.

jfy133 avatar jfy133 commented on July 4, 2024

Furthermore, if you want to use binsplitting (and you should!), your contig headers must be of the format {Samplename}{Separator}{X}, such that the part of the string before the first occurrence of {Separator} gives a name of the sample it originated from. For example, you could call contig number 115 from sample number 9 "S9C115", where "S9" would be {Samplename}, "C" is {Separator} and "115" is {X}.

So it's a little more complicated! I'm not sure if renaming all the contigs initially is the best solution disk-space wise - as we just create a copy of all assemblies with different headers for a tool that we (potentially) might not choose to run...

Not to mention mapping the reads to the concatenated assembly, and then parsing that separately through the depths workflow 🫢

Ugh ok.

It's weird though as earlier the documentation implies you don't have to do all of that?

I don't have a good suggestion then 😅, sounds like it'll all be painful one way or another...

from mag.

maxibor avatar maxibor commented on July 4, 2024

As a stopgap measure, I've written mgenotatte to do just that: genome QC, dereplication, and taxonomic annotation
https://github.com/maxibor/mgenottate

from mag.

prototaxites avatar prototaxites commented on July 4, 2024

As a stopgap measure, I've written mgenotatte to do just that: genome QC, dereplication, and taxonomic annotation https://github.com/maxibor/mgenottate

Ah, that's cool! In the end I just ended up forking mag, deleting the first part of the main workflow and dropping bins in via directory input: https://github.com/prototaxites/mag/tree/bin_entry

Also, I have a separate pipeline for metagenome gene annotation that is just a couple of characters different in name from yours: https://github.com/prototaxites/mgannotate 😅

from mag.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.