Giter Site home page Giter Site logo

pangenome's Introduction

Introduction

This is a graph-based method to find the frequency across species. This method includes several steps:

  1. Build a dBG
  2. Build the reduced dBG by removing nodes with <= 1 indegree and outdegree in dBG.
  3. Convert sequence to compressed path according to 2.
  4. Remove weak edges in rdBG and index the connect components in rdBG.
  5. label each sequence again.

Requirement

Make sure that you have the following installed

  1. Python (3.7 or greater) with numpy (1.17 or greater), scipy (1.4 or greater), and numba (0.48 or greater) packages. Here, we strongly recommend using Anaconda which has all the required packages installed.

  2. MCL, a Markov Clustering algorithm.

Download

$git clone https://github.com/Rinoahu/pangenome

Usage

$python pangenome/kmer_pypy.py -m -i input.fasta -k 27 > result.tab


-i: genome sequences in fasta format.

-k: the kmer size. Currently, the kmer size is limited to 27, we will remove the limitation in the future.

Result

The result is a tab-seperated file.

The 1st column is the sequence identifier.

The 2-4 columns are the start, end, and strand of the conversed region.

The 5th column is the index of the conserved region.

For example:
Chr1       0       3250    +       340
Chr1       3250    6851    +       41
Chr1       6851    7420    +       18344
Chr1       7420    7661    +       25920
Chr1       7661    7811    +       36243
Chr1       7811    8015    +       15344
Chr1       8015    8071    +       16029
Chr1       8071    8105    +       35682
Chr1       8105    9779    +       49500
Chr1       9779    9806    +       7184

Citation

To cite our work, please refer to:

xxx

pangenome's People

Contributors

rinoahu avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.