Giter Site home page Giter Site logo

serka-m / mmlong2-lite Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 3.0 1.03 MB

Lightweight bioinformatics pipeline for microbial genome recovery

License: GNU General Public License v3.0

Shell 17.92% Python 82.08%
bioinformatics long-read-sequencing metagenome-assembled-genomes metagenomics microbial-genomics microbiome nanopore pacbio singularity-container snakemake-workflow

mmlong2-lite's Introduction

License: GPL v3 DOI

logo

Lightweight workflow for microbial genome recovery using either Nanopore or PacBio HiFi reads.
mmlong2-lite is the microbial genome production part of the mmlong2 pipeline.

Core workflow features:

  • Snakemake workflow running dependencies from a Singularity container for enhanced reproducibility
  • Bioinformatics tool and parameter optimizations for high complexity metagenomics samples
  • Circular microbial genome extraction as separate genome bins
  • Eukaryotic contig removal for reduced microbial genome contamination
  • Differential coverage support for improved microbial genome recovery
  • Iterative ensemble binning strategy for improved microbial genome recovery

Overview of mmlong2-lite workflow with Nanopore reads:

mmlong2-lite-wf

Installation (Conda):

To create a local Conda environment for running mmlong2-lite workflow, just copy-paste the following:

conda create --prefix mmlong2-lite -c conda-forge -c bioconda snakemake=7.26.0 singularity=3.8.6 zenodo_get=1.3.4 pv=1.6.6 pigz=2.6 tar=1.34 -y
conda activate ./mmlong2-lite || source activate ./mmlong2-lite && zenodo_get -r 8013498 -o mmlong2-lite/bin 
pv mmlong2-lite/bin/sing-mmlong2-lite*.tar.gz | pigz -dc - | tar xf - -C mmlong2-lite/bin/. && chmod +x mmlong2-lite/bin/mmlong2-lite

Full usage:

MAIN INPUTS:
-np     --nanopore_reads        Path to Nanopore reads (default: none)
-pb     --pacbio_reads          Path to PacBio HiFi reads (default: none)
-o      --output_dir            Output directory name (default: mmlong2)
-p      --processes             Number of processes/multi-threading (default: 3)
-cov    --coverage              CSV dataframe for differential coverage binning (e.g. NP/PB/IL,/path/to/reads.fastq)
-run    --run_until             Run pipeline until a specified stage completes (e.g. assembly polishing)

ADDITIONAL INPUTS:
-tmp    --temporary_dir         Directory for temporary files (default: none)
-med    --medaka_model          Medaka polishing model (default: r1041_e82_400bps_sup_v4.2.0)
-sem    --semibin_model         Binning model for SemiBin (default: global)
-fmo    --flye_min_ovlp         Minimum overlap between reads used by Flye assembler (default: auto)
-fmc    --flye_min_cov          Minimum initial contig coverage used by Flye assembler (default: 3)
-mlc    --min_len_contig        Minimum assembly contig length (default: 3000)
-mlb    --min_len_bin           Minimum genomic bin size (default: 250000)
-x      --extra_inputs          Extra inputs for Snakemake workflow (default: none)

MISCELLANEOUS INPUTS:
-h      --help                  Print help information
-v      --version               Print workflow version number

Overview of result files:

  • <output_name>_assembly.fasta - assembled and polished metagenome
  • <output_name>_bins.tsv - dataframe for automated binning results
  • dependencies.csv- list of dependencies used and their versions
  • bins - directory for metagenome assembled genomes

Additional documentation:

mmlong2-lite's People

Contributors

serka-m avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.