Comments (4)
@benjward during error correction it would be nice to use the coverage of each node (number of occurrence of each kmer). Using this information we can simplify the graph by removing the low-coverage path in a bubble etc.
Do you think we should change the SequenceGraphNode type to also include information related to coverage?
from genomegraphs.jl.
@benjward also I forgot to ask your opinion about having edge multiplicities?
We can extract the number of occurrences of each edge during kmer extraction from reads.
Would getting such information about the reads be useful for the subsequent functionalities that we will implement ?
from genomegraphs.jl.
Dead-end trimming is implemented under the function : delete_tips.
Next step is to pop bubbles in the graph before forming the final contigs.
Popping bubbles can be implementing as removing the low-covered branch (contig) in each bubble after building unitigs from kmers. Yet, the new graph may have new contigs so the contig building must be repeated after the removal of the error branches.
To remove the bubbles, may plan is to make use of the 'build_unitigs_from_kmerlist' function to detect unitigs that start and end at the same nodes. Then remove the kmers on the low-covered contig and repeat the unitig building from the resulting kmerlist.
from genomegraphs.jl.
I tried to formulate the bubble popping problem as finding and removing unitigs that start and end at same nodes (kmers). Thus I have updated the 'build_unitigs_from_kmerlist' function which deletes the unitigs that have low coverage and start/end with another unitig. The updated function is available under gsoc/error-correction2 branch with the name 'build_unitigs_from_kmerlist2'.
However, constructing the unitigs and then removing them may not find all the contigs after bubble popping especially in the case of nested bubbles. Thus, I am planning to switch to another design where we delete the bubbles using the tour bus algorithm (similar to the approach taken by Velvet).
Another approach (taken by Arapan) is to repeat the path collapsing/bubble_removing steps multiple times until no changes are made in the graph.
from genomegraphs.jl.
Related Issues (13)
- Basic De-Bruijn Graph type HOT 5
- Adversarial testing datasets: PhiX HOT 1
- Graph construction is currently subject to the circle problem
- Typo in Links.jl HOT 8
- Short Read Mapper
- WorkSpace persistency and IO
- Kmer count comparrisons HOT 1
- Error when including BioSequenceGraphs.jl HOT 2
- Maybe a typo in SequenceGraph.jl HOT 1
- Node merging and Edge Collapsing HOT 2
- Checking the last updates
- Generating Contigs using UG and Reads HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from genomegraphs.jl.