Giter Site home page Giter Site logo

cwbcm / clermontyping Goto Github PK

View Code? Open in Web Editor NEW

This project forked from iame-researchcenter/clermontyping

0.0 0.0 0.0 60.63 MB

Forked repo of Clermont PCR method In-Silico

License: GNU General Public License v3.0

Shell 24.51% Python 58.30% R 17.20%

clermontyping's Introduction

License: GPL v3

image

ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping

Contents

Introduction

The genus Escherichia is composed of Escherichia albertii, E. fergusonii, five cryptic Escherichia clades and E. coli sensu stricto. Furthermore, the E. coli species can be divided into seven main phylogroups termed A, B1, B2, C, D, E and F. As specific lifestyles and/or hosts can be attributed to these species/phylogroups, their identification is meaningful for epidemiological studies. Classical phenotypic tests fail to identify non-sensu stricto E. coli as well as phylogroups. Clermont and colleagues have developed PCR assays that allow the identification of most of these species/phylogroups, the triplex/quadruplex PCR for E. coli phylogroup determination being the most popular. With the growing availability of whole genome sequences, we have developed the ClermonTyping method and its associated web-interface, the ClermonTyper, that allows a given strain sequence to be assigned to E. albertii, E. fergusonii, Escherichia clades Iโ€“V, E. coli sensu stricto as well as to the seven main E. coli phylogroups. The ClermonTyping is based on the concept of in vitro PCR assays and maintains the principles of ease of use and speed that prevailed during the development of the in vitro assays. This in silico approach shows 99.4 % concordance with the in vitro PCR assays and 98.8 % with the Mash genome-clustering tool. The very few discrepancies result from various errors occurring mainly from horizontal gene transfers or SNPs in the primers. We propose the ClermonTyper as a freely available resource to the scientific community at:

http://clermontyping.iame-research.center/.

Dependencies Installation

Dependencies

Installation

...

Command line options

Main script usage

% clermonTyping.sh
Script usage :
	-h				: print this message and exit
        -v                              : print the version and exit
        --fasta                         : fasta contigs file(s). If multiple files, they must be separated by an arobase (@) value
        --name                          : name for this analysis (optional)
        --threshold                     : option for ClermontTyping, do not use contigs under this size (optional)
        --minimal                       : output a minimal set of files (optional)
        --fastafile                     : file with path of fasta contig file.  One file by line (optional)
        --summary                       : file with path of *_phylogroups.txt. One file by line (optional)

This script will execute the pipeline blast, mash and python to give the full output (html file by default) or only the *_phylogroups.txt files (--minmal).

If you need to analyse several fasta files you can list them with a @ sign (absolute path required):

% clermonTyping.sh --fasta my_ecoli1.fasta@my_ecoli2.fasta@my_ecoli3.fasta

or use a file with the option --fastafile:

% clermonTyping.sh --fastafile fileWithFasta.txt

Exemple of fileWithFasta.txt

my_ecoli1.fasta
my_ecoli2.fasta
my_ecoli3.fasta

Clermont Typing without mash and R

If you do not want to use mash analysis and/or R you can independently launch any part of the pipeline.

blastn launch

To launch blast you will need to locate the primers.fasta file in the data folder from clermonTyping's installation directory. This contains the essentials primers for PCR amplification. You will need to format the output in XML format in order to use the clermontyping script.

% makeblastdb -in my_fasta.fasta -input_type fasta -out my_fasta -dbtype nucl
% blastn -query ./data/primers.fasta -perc_identity 90 -task blastn -outfmt 5 -db my_fasta -out my_fasta.xml

Clermontyping launch

The python script will use the output of blastn only in xml format (option -outfmt 5 ).

% bin/clermont.py -x my_fasta.xml

If you really want to, there are several options for filtering the output.

-m/--mismatch <integer> : The maximum number of mismatches in hits. Default = 2.
-l/--length <integer> : The length of the crucial hybridation fragment (seed). Default = 5.
-s/--min_size <integer>: Minimum size for a hit to be counted. This avoid finding primers in smalls contigs.

Output Files

The default analysis name is analysis_date and every results are stored in the corresponding folder.

  • analysis.html : final output with the main script pipeline. Gives informations about phylogroups with 2 differents methods (mash and clermontyping).
  • analysis_phylogroups.txt : final output of clermontyping
  • analysis.R : intermediate file for producing the html output. You can run this Rscript alone.
  • strain.xml : intermediate file. Goes with the "db" folder. Output of blastn.
  • strain_mash_screen.tab : intermediate file. Output of mash.

HTML output

This is a table with each line is a fasta file you analyzed.

Phylogroup output

analysis_phylogroups.txt is a TSV file containing every fasta file analyzed with blastn + clermont method. Exemple:

ROAR344_fergusonii.fasta	['trpA', 'trpBA', 'aesI']	['-', '-', '-', '-']	[]	Fergusonii

Citing

Please cite:

Beghain, J., Bridier-Nahmias, A., Le Nagard, H., Denamur, E. & Clermont, O. ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping. Microbial Genomics (2018). doi:10.1099/mgen.0.000192

Clermont O., Dixit O.V.A., Vangchhia B., Condamine B., Dion S., Bridier-Nahmias A., Denamur E. & Gordon D. Characterization and rapid identification of phylogroup G in Escherichia coli, a lineage with high virulence and antibiotic resistance potential. Environ Microbiol (2019). doi: 10.1111/1462-2920.14713

clermontyping's People

Contributors

iame-researchcenter avatar cwbcm avatar benedicte-c avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.