Giter Site home page Giter Site logo

thelonglab / cate Goto Github PK

View Code? Open in Web Editor NEW
11.0 1.0 0.0 51.25 MB

A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data.

Home Page: https://doi.org/10.1111/2041-210X.14168

License: MIT License

C++ 6.47% Cuda 81.71% R 0.29% Shell 0.09% F* 11.44%
1000genomes cuda cuda-programming evolutionary-algorithms evolutionary-biology evolutionary-computation genetic-algorithm genetics neutrality-test vcf

cate's Introduction

logo

CATE (CUDA Accelerated Testing of Evolution)

A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large-scale genomic data.


Description

The CATE software is a CUDA based solution to enable rapid processing of large-scale VCF files to conduct a series of six different tests on evolution.

🔵 Here we have provided only a brief overview of CATE's useability.
🟢 Please refer to CATE's wiki to obtain a more detailed understanding of its functionality and usability.


News

🔴 CATE is currently under a major update with the integration of APOLLO.

Apollo is our high-performance viral epidemic simulation platform powered by CATE's architecture.

Apollo is already available in CATE for use. Use the --simulator or -sim command. Documentation and a preprint of the simulation tool, its capabilities, and how to use Apollo are currently being worked on.

The Wiki for Apollo is currently being written and will be complete soon.


Prerequisites

  1. CUDA capable hardware
  2. LINUX or UNIX based kernel
  3. NVIDIA's CUDA toolkit (nvcc compiler)
  4. C++ compiler (gcc compiler)

How to INSTALL

CATE can be used via an on-device executable and also has the ability to run via Google Colab.

For the Google Colab notebook please follow the link to CATE on Colab.

Else, if you want to install CATE on-device you may have to compile the code using an nvcc compiler. If so execute the following on the terminal:

Download the repository:

git clone "https://github.com/theLongLab/CATE/"
cd CATE/

cuda 11.3.0 or higher

module load cuda/11.3.0

Finally, compile the project:

nvcc -std=c++17 *.cu *.cpp -o "CATE"

How to RUN

CATE is a command-line-based software. Its available functions include six different tests on evolution and a series of tools for editing and processing FASTA and VCF files.

The six tests on evolution are:

  1. Tajima’s D
  2. Fu and Li's D, D*, F, and F *
  3. Fay and Wu’s H and E
  4. McDonald–Kreitman test
  5. Fixation Index
  6. Extended Haplotype Homozygosity

CATE comes equipped with Apollo, our viral simulator that spans from network level to individual virion resolution complete with within-host dynamics. Apollo comes with its main simulation function and five additional utility tools.

  1. Apollo simulator
  2. Haplotype retriever
  3. Pedigree retriever
  4. Segregating sites matcher
  5. Base substitution model to JSON
  6. Recombination hotspots to JSON

Currently, the program's executable is called:

Test_Main

To run the software you need a JSON-style parameters file. An example is provided above:

parameters.json.

The parameters file is used to specify all input and output locations as well as the gene list file locations. Each function's execution can be customized individually using the parameters file.

The typical syntax for program execution is as follows (example below shows running the Tajima's function):

program_executable --function parameter_file

program_executable -f parameter_file

Example:

./Test_Main -t parameters.json

The HELP menu will list all available functions and how each function can be executed. It can be accessed by simply typing -h as the function as shown below:

./Test_Main -h


How to Cite

CATE has been successfully published in the journal Methods in Ecology and Evolution (MEE). If you find this framework or the software solution useful in your analyses, please CITE the published article available in MEE, CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data.

To cite CATE's code please use the Zenodo release:

DOI

The details of the citation are listed below:

Perera, D., Reisenhofer, E., Hussein, S., Higgins, E., Huber, C. D., & Long, Q. (2023). CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data. Methods in Ecology and Evolution, 00, 1–15. https://doi.org/10.1111/2041-210X.14168.


MIT License

Copyright (c) 2022 The Long Lab

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


cate's People

Contributors

deshanperera avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.