Giter Site home page Giter Site logo

biojulia / genomegraphs.jl Goto Github PK

View Code? Open in Web Editor NEW
63.0 16.0 7.0 16.77 MB

A modern genomics framework for julia

Home Page: https://biojulia.dev

License: Other

Julia 100.00%
bioinformatics biology bio julia bioinformatics-analysis genomics genome-sequencing genome-assembly genome-graph genomes

genomegraphs.jl's Introduction

GenomeGraphs

Latest release MIT license DOI Stable documentation Pkg Status Chat

Description

GenomeGraphs provides a representation of sequence graphs. Such graphs represent genome assemblies and population graphs of genotypes/haplotypes and variation.

Installation

You can install GenomeGraphs from the julia REPL. Press ] to enter pkg mode again, and enter the following:

pkg> add GenomeGraphs

If you are interested in the cutting edge of the development, please check out the master branch to try new features before release.

Testing

GenomeGraphs is tested on Linux, OS X, and Windows, for the stable, LTS, and nightly builds of julia.

Latest build status:

Unit tests

Contributing

We appreciate contributions from users including reporting bugs, fixing issues, improving performance and adding new features.

Take a look at the contributing files detailed contributor and maintainer guidelines, and code of conduct.

Financial contributions

We also welcome financial contributions in full transparency on our open collective. Anyone can file an expense. If the expense makes sense for the development of the community, it will be "merged" in the ledger of our open collective by the core contributors and the person who filed the expense will be reimbursed.

Backers & Sponsors

Thank you to all our backers and sponsors!

Love our work and community? Become a backer.

backers

Does your company use BioJulia? Help keep BioJulia feature rich and healthy by sponsoring the project Your logo will show up here with a link to your website.

Questions?

If you have a question about contributing or using BioJulia software, come on over and chat to us on Gitter, or you can try the Bio category of the Julia discourse site.

genomegraphs.jl's People

Contributors

ardakdemir avatar github-actions[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genomegraphs.jl's Issues

Typo in Links.jl

I think "is_forward_link" function has a typo.

is_forward_link(l::SequenceGraphLink, n::NodeID) = source(l) == -n

'-n' should be replaced with 'n'.

I thought maybe there is a special reason behind this but could not figure it out.

Possible Solution / Implementation

I will push a new branch along with some other minor updates to the package.

Maybe a typo in SequenceGraph.jl

I am not sure if this is also a design concept or not but I think there is a typo in add_node function in SequenceGraph.jl:

function add_node!(sg::SequenceGraph, n::SequenceGraphNode)
    newlen = length(push!(nodes(sg)))
    resize!(links(sg), newlen)
    return newlen
end

I think the push function should also include n as follows:

    newlen = length(push!(nodes(sg),n))

I am pushing an updated issue to the gsoc branch

Node merging and Edge Collapsing

We have to implement functions for merging nodes and collapsing edges on a simple path on a DeBruijnGraph type graph.

To do this following steps are important:

  • We have to decide on the design concepts on how to represent the merged kmers.
  • We have to decide how to update the edges accordingly.
  • We have to find an efficient way of finding the simple paths, efficient way of traversing the graph.

Checking the last updates

We have to make sure that the query functions implemented are working properly. Especially we have to check compatibility of functions with different primitive types dnasequence and kmers.

More testing is necessary for the following functions:

  • is_a_path in DeBruijnGraph.jl
  • is_a_node in Nodes.jl
  • is_suffix in Nodes.jl

Kmer count comparrisons

We need to be able to tell if a genome assembly has correctly incorporated all of the motifs sampled into the graph. We do this through a comparison of Kmer Spectras.

Generating Contigs using UG and Reads

Generating Contigs using UG and Reads

We would like to generate contigs using the unitigs on the dbg and the reads given as the initial input for the dbg construction. This step of the project is more challenging compared to the previous two steps : DBG construction and DBG-to-UG.

We should decide on the core functionalities required during the contig generation and start implementing them. Right now, the dbg constructor takes as input a set of kmers in their canonical form.

First, we need to generalize the dbg constructor by having a preprocessing step for the raw reads to be represented as a set of canonical kmers. The preprocessing step should enumerate all unique canonical kmers, from a set of reads.

Error when including BioSequenceGraphs.jl

<@re_str not defined Error -->

I have cloned the repository to my local.
When including the package with "include("BioSequenceGraphs.jl")" I get the following error:

LoadError: LoadError: UndefVarError: @re_str not defined

For now, I have just removed the line

include("GFA1/GFA1.jl")

from BioSequenceGraphs.jl to be able to run the package smoothly.

Short Read Mapper

We need a process to map short reads to the GenomeGraph.

This should be fairly simple to achieve.
Paired reads are basically very accurate, and so it should be possible to map them reasonably well using an index of unique kmers in the graph.

Basic De-Bruijn Graph type

We need to implement a basic de-bruijn graph type, along with constructors and basic manipulation methods.

Error Correction

I think it would be nice to include some error correction functionalities before generating contigs. This will both enable us to work with real (error containing) data and also allow researchers who would like to do only error correction. Below I list some of the error correction functions I am planning to implement to simplify the de bruijn graph:

  • Trimming dead-end tips : We remove all tips with no outgoing edge. Tip refers to an edge from a node (with multiple outgoing edges), where the destination of the edge has no outgoing edges. These nodes are treated as errors that occur at the end of a read.

  • Popping bubbles : Two path that diverge from a single node and then merge into another node. In such a case one of the paths are removed from the graph. Usually the removed path has a low coverage (depth) and treated as an error that occurred in the middle of a read.

  • Removing chimeric edges : Edges that cross across two simple paths. Such edges usually have low coverage and removed from the graph.

Graph construction is currently subject to the circle problem

The method of initial graph construction from kmers that is currently used is much more efficient now.

However, it still suffers from the circle problem in that kmers that have themselves as a de-bruijn neighbour forwards and backwards are not included in the unititgs.

This is a fairly known and simple problem to solve.

But since it's very very rare in real data, right now the algorithm just warns you. But we should fix it later.

Adversarial testing datasets: PhiX

We should add some adversarial testing datasets either to BioSequenceGraphs.

They may take up some space and so we'll have to think about how to do this with bigger datasets.

But for now, I propose adding a PhiX dataset: We can use the PhiX reference genome sequence. Use Pseudoseq.jl to generate paired-end reads. We will need to decide on a read length and average insert size. Once we have the read files we can include the reference and the reads, and use that data to test how our graph functions are working.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.