Giter Site home page Giter Site logo

riverbacterialassembly's Introduction

Bacterial Genome Assembly

This project uses Snakemake to assemble and annotate a bacterial genome from nanopore reads.

Description

This workflow uses the following tools to assemble and annotate a bacterial genome:

  • Filtlong (v0.2.1) for read filtering.
  • Flye (v2.9.3) for genome assembly.
  • raven (v1.8.3) for genome assembly.
  • medaka (v1.11.3) for polishing the assemblies.
  • seqkit (v2.7) to gets statistics of the assemblies.
  • bakta (v1.9.2) for genome annotation.
  • checkm2 (v1.0.1) to estimate the quality (completeness and contamination) of the genome.

At the end, there will be two different assemblies for your genome, one from flye and the other from raven in the assemblies folder. We chose the best assembly based on the checkm2 results, where we compare the genome qualities with this formula: perc_diff = (max(completeness)-min(completeness))*100/max(completeness). If the difference in completeness is less than 5%, we choose the assembly with the lowest number of contigs. If the difference is greater than 5%, we choose the assembly with the highest completeness.

Setup

  1. Clone this repository to your local machine.
  2. Navigate to the project directory.
  3. Install miniconda from here.
  4. Create a new conda environment to install the snakemake dependencies:
conda env create -n snakemake_env -c conda-forge snakemake mamba
  1. Download the required databases:
  • bakta
  • checkm2

Configuration

Open the config/config.yaml file and modify the parameters according to your needs.

  • work_dir: The location of this folder.
  • results_dir : The directory where the results will be saved.
  • env_dir : The directory where the conda files are found to create new environments.
  • reads_dir : The location where the raw reads are stored in .fastq.gz.
  • threads : The number of threads to use for the assembly and annotation steps.
  • reads_size : The minimum size of the reads to be filtered.
  • bakta_db : The location of the bakta database.
  • checkm2_db : The location of the checkm2 database.

If using slurm, open the profile/config.yaml file and modify/add the following parameters if needed.

  • qos
  • partition
  • account

Running the Workflow

To run the Snakemake workflow using slurm, execute the following command in your terminal:

conda activate snakemake_env
snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --rerun-incomplete --profile profile/ 

Without slurm, you can run the workflow using the following command:

conda activate snakemake_env
snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --rerun-incomplete

riverbacterialassembly's People

Contributors

michoug avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.