Giter Site home page Giter Site logo

pasha's Introduction

PASHA: A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets

PASHA (A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets) is a tool for finding approximations for a small universal hitting set (a small set of k-mers that hits every sequence of length L). A detailed description of the functionality of PASHA, along with results in set size, running time, memory usage and CPU utilization are provided in:

Ekim, B., Berger, B., Orenstein, Y. A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets. bioRxiv (2020).

PASHA is a tool for research and still under development. It is not presented as accurate or free of any errors, and is provided "AS IS", without warranty of any kind.

Installation and setup

Installing the package

Clone the repository to your local machine, and compile via g++ using the commands:

cd src
g++ -O3 -o pasha rand.cpp -fopenmp

The -O3 or -Ofast flag is necessary for efficient parallelization and optimization. The -fopenmp flag is needed to process parallelization via OpenMP.

Example commands

To compute the hitting set for a specified k-mer and sequence length, use the command:

./pasha <kmerlength> <seqlength> <numThreads> <decycPath> <hitPath>

usr@usr:src usr$ ./pasha 8 20 24 decyc8.txt hit.txt

Decycling set size: 8230
Alpha: 2
Delta: 0.0769231
Epsilon: 0.0961538
Max hitting number: 3.71838e+07
Length of longest remaining path: 12
7.72136 seconds.
Hitting set size: 5287

License

PASHA is freely available under the MIT License.

References

If you find PASHA useful, please cite the following papers:

  • Ekim, B., Berger, B., Orenstein, Y. A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets. bioRxiv (2020).

  • Orenstein, Y., Pellow, D., Marçais, G., Shamir, R., Kingsford, C. Compact universal k-mer sets. 16th International Workshop on Algorithms in Bioinformatics (2016).

  • Orenstein, Y., Pellow, D., Marçais, G., Shamir, R., Kingsford, C. Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing. PLoS Computational Biology (2017).

Contributors

Development of PASHA is led by Barış Ekim, collaboratively in the labs of Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT), and Yaron Orenstein at the Department of Electrical and Computer Engineering at Ben-Gurion University of the Negev (BGU).

Contact

All of the software executables, manuals, and quick-start scripts, in addition to the calculated universal hitting sets are provided here. Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu.

pasha's People

Contributors

ekimb avatar zeynephrc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.