Giter Site home page Giter Site logo

Error Correction about genomegraphs.jl HOT 4 CLOSED

biojulia avatar biojulia commented on June 5, 2024
Error Correction

from genomegraphs.jl.

Comments (4)

ardakdemir avatar ardakdemir commented on June 5, 2024

@benjward during error correction it would be nice to use the coverage of each node (number of occurrence of each kmer). Using this information we can simplify the graph by removing the low-coverage path in a bubble etc.
Do you think we should change the SequenceGraphNode type to also include information related to coverage?

from genomegraphs.jl.

ardakdemir avatar ardakdemir commented on June 5, 2024

@benjward also I forgot to ask your opinion about having edge multiplicities?
We can extract the number of occurrences of each edge during kmer extraction from reads.

Would getting such information about the reads be useful for the subsequent functionalities that we will implement ?

from genomegraphs.jl.

ardakdemir avatar ardakdemir commented on June 5, 2024

Dead-end trimming is implemented under the function : delete_tips.

Next step is to pop bubbles in the graph before forming the final contigs.
Popping bubbles can be implementing as removing the low-covered branch (contig) in each bubble after building unitigs from kmers. Yet, the new graph may have new contigs so the contig building must be repeated after the removal of the error branches.

To remove the bubbles, may plan is to make use of the 'build_unitigs_from_kmerlist' function to detect unitigs that start and end at the same nodes. Then remove the kmers on the low-covered contig and repeat the unitig building from the resulting kmerlist.

from genomegraphs.jl.

ardakdemir avatar ardakdemir commented on June 5, 2024

I tried to formulate the bubble popping problem as finding and removing unitigs that start and end at same nodes (kmers). Thus I have updated the 'build_unitigs_from_kmerlist' function which deletes the unitigs that have low coverage and start/end with another unitig. The updated function is available under gsoc/error-correction2 branch with the name 'build_unitigs_from_kmerlist2'.

However, constructing the unitigs and then removing them may not find all the contigs after bubble popping especially in the case of nested bubbles. Thus, I am planning to switch to another design where we delete the bubbles using the tour bus algorithm (similar to the approach taken by Velvet).

Another approach (taken by Arapan) is to repeat the path collapsing/bubble_removing steps multiple times until no changes are made in the graph.

from genomegraphs.jl.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.