Giter Site home page Giter Site logo

ggcaller's Introduction

ggCaller: a bacterial gene caller for pangenome graphs

ggCaller traverses Bifrost graphs constructed from bacterial genomes to identify putative gene sequences, known as open reading frames (ORFs).

ggCaller incorporates Balrog to filter ORFs to improve specificity of calls and Panaroo for pangenome analysis and quality control.

Documentation

Guides for installation, usage and a tutorial can be found here.

Installation

ggCaller is available on Linux. If you are running Windows 10/11, Linux can be installed via the Windows Subsystem for Linux (WSL).

We plan to get a MacOS version up and running in the future.

Installation via conda/mamba

Install through bioconda:

conda install ggcaller

If conda is not installed, first install miniconda, then add the correct channels:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Installing with Docker

First, install Docker for your OS. If running with WSL2, you should still download Docker Desktop for Windows.

Then pull the latest image::

docker pull samhorsfield96/ggcaller:latest

To run ggCaller, run::

cd test && docker run --rm -it -v $(pwd):/workdir -v $(pwd):/data samhorsfield96/ggcaller:latest ggcaller --balrog-db /app/ggc_db --refs /workdir/pneumo_CL_group2_docker.txt --out /workdir/ggc_out

Installation from source

Required packages and versions can be found in environment_linux.yml and environment_macOS.yml depending on your operating system. In addition, a C++17 compiler (e.g. gcc >=7.3) is required.

For example, using conda (creates ggc_env environment)

conda env create -f environment_linux.yml
conda activate ggc_env

Once all required packages are installed, install ggCaller using:

git clone --recursive https://github.com/samhorsfield96/ggCaller
cd ggCaller
python setup.py install

Citation

Please cite the ggCaller pre-print:

Horsfield, S.T., Croucher, N.J., Lees, J.A. "Accurate and fast graph-based pangenome annotation and clustering with ggCaller" bioRxiv 2023.01.24.524926 (2023). doi: https://doi.org/10.1101/2023.01.24.524926

If you use this code, please also cite the dependencies:

DBG building and querying

FM-index generation and querying

Gene scoring and overlap penalisation

Pairwise gene comparisons

Gene annotation

  • DIAMOND: Buchfink B., Reuter K., Drost H.G. "Sensitive protein alignments at tree-of-life scale using DIAMOND", Nature Methods 18:366–368 (2021). https://doi.org/10.1038/s41592-021-01101-x
  • HMMER3: Eddy S.R. "A New Generation of Homology Search Tools Based on Probabilistic Inference." Genome Inform., 23:205-211 (2009).

Alignment, phylogenetic analysis and variant calling:

  • MAFFT: Katoh, K., Misawa, K., Kuma, K. & Miyata, T. "MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform." Nucleic Acids Research. 30 (14), 3059–3066 (2002). https://doi.org/10.1093/nar/gkf436
  • SNP-sites: Page, A.J., Taylor, B., Delaney, A.J., Soares, J., Seemann, T., Keane, J.A. & Harris, S.R. "SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microbial genomics." 2 (4), e000056 (2016). https://doi.org/10.1099/mgen.0.000056
  • RapidNJ: Simonsen, M., Pedersen, C. "Rapid computation of distance estimators from nucleotide and amino acid alignments" Proceedings of the ACM Symposium on Applied Computing (2011) https://doi.org/10.1145/1982185.1982208

Clustering and pangenome analysis

ggcaller's People

Contributors

samhorsfield96 avatar qtoussaint avatar johnlees avatar

Stargazers

Josh Zhang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.