Giter Site home page Giter Site logo

imanyass / shigapass Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 3.0 8.17 MB

An in silico tool to predict Shigella serotypes

License: GNU General Public License v3.0

Shell 100.00%
genotyping shigella genomics whole-genome-sequencing insilico-prediction

shigapass's Introduction

ShigaPass

ShigaPass is a new in silico tool used to predict Shigella serotypes and to differentiate between Shigella, EIEC (Enteroinvasive E. coli), and non Shigella/EIEC using assembled whole genomes.

Dependencies

ShigaPass is a command line tool written in Bash version 4.4.20 and requires Blast+ version 2.12.0 to run.

Installation

1. Clone this repository with the following command line:

git clone https://github.com/imanyass/ShigaPass.git

2. Give the execute permission to the file ShigaPass.sh:

chmod +x ShigaPass.sh

3. Execute ShigaPass with the following command line model:

./ShigaPass.sh  [options]

Usage

Run ShigaPass without option to read the following documentation:

###### This tool is used to predict Shigella serotypes  #####
        Usage : ShigaPass.sh [options]
   
        options :
        -l	List of input file(s) (FASTA) with their path(s) (mandatory)
        -o	Output directory (mandatory)
        -p	Path to databases directory (mandatory)
        -t	Number of threads (optional, default: 2)
        -u	Call the makeblastdb utility for databases initialisation (optional, but required when running the script for the first time)
        -k	Do not remove subdirectories (optional)
       	-v	Display the version and exit
        -h	Display this help and exit
        Example: ShigaPass.sh -l list_of_fasta.txt -o ShigaPass_Results -p ShigaPass/ShigaPass_DataBases -t 4 -u -k
        Please note that the -u option should be used when running the script for the first time and after databases updates

Example

  • The Fasta sequence files are available in the directory Example/Input

    • Please unzip the sequences (using gunzip) before running ShigaPass
  • All output files are available in the directory Example/ShigaPass_Results

Running ShigaPass

Create a list file containing the paths to the FASTA files then run ShigaPass

ShigaPass.sh -l ShigaPass_test.txt -o ShigaPass_Results -p ShigaPass_DataBases -u -k

Here's an example of ShigaPass summary file

Name rfb rfb_hits,(%) MLST fliC CRISPR ipaH Predicted_Serotype Predicted_FlexSerotype Comments
ERR5888634 C2 79,(48.2%) ST145 ShH57(ShH3cplx) A-var2 ipaH+ SB2
ERR5952732 B1-5 139,(93.3%) ST245 ShH2(ShH2cplx) A-var3,x,16 ipaH+ SF1-5 1b
ERR5976293 D 202,(70.6%) ST152 ShH25(ShH1cplx) A-var0,27 ipaH+ SS
ERR5982186 A2 100,(61.7%) ST147 none A-var1,12,3,5,11-var1 ipaH+ SD2

"none" means that no allele/profile is detected (in the ERR5982186 example no fliC allele was detected)

SB: S. boydii; SD: S. dysenteriae; SF: S. flexneri; SS: S. sonnei

Output Files

  • In the output directory, two files will be written:
    1. ShigaPass_summary.csv: semicolon-delimited file with one row per genome inclinding the sample name; type of rfb; number of rfb hits, (% of rfb coverage); MLST profile; type of fliC; CRISPR spacers; the presence of ipaH; the predicted serotype and S. flexneri subserotype; comments to show the number of rfb when more than one is detected
    2. ShigaPass_Flex_summary.csv: semicolon-delimited file detailing the phage and plasmid-encoded O-antigen modification (POAC) genes detected for the predicted S. flexneri genomes
  • In case -k option is used, a directory will be created for every assembled genome and will contain the following files:
Extension Description
blastout.txt Blast results in tabular format
allrecords.txt Blast hits that passed the selected thresholds
records.txt The best blast hit that passed the selected thresholds
hits.txt Name and number of hits that passed the selected thresholds (only for k-mers databases: rfb, ipaH and POAC genes)
hitscoverage.txt This file displays in addition to the name and the number of hits detected present in hits.txt, the total hits number for the identified gene (3rd column) and the percentage of the hits detected (number of hits detected/total number of hits) (4th column)

Notes

The Fasta sequences were assembled using SPAdes version 3.15 (Bankevich et al. Journal of Computational Biology, 2012) with the following options: -k 21,33,55,77 --only-assembler --careful --cov-cutoff auto

You can download the short reads using the following command lines:

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR588/004/ERR5888634/ERR5888634_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR588/004/ERR5888634/ERR5888634_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR595/002/ERR5952732/ERR5952732_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR595/002/ERR5952732/ERR5952732_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR597/003/ERR5976293/ERR5976293_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR597/003/ERR5976293/ERR5976293_2.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR598/006/ERR5982186/ERR5982186_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR598/006/ERR5982186/ERR5982186_2.fastq.gz

All reads were filtered with FqCleanER version 3.0 (https://gitlab.pasteur.fr/GIPhy/fqCleanER) with options -q 15 -l 50

shigapass's People

Contributors

imanyass avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

shigapass's Issues

instructions for indexing database files?

Hi, I was hoping to test out ShigaPass and when I went to run the script, it threw an error that no index files were found.

### ipaH checkpoint ###
BLAST Database error: No alias or index file found for nucleotide database [SCRIPT/ShigaPass_DataBases//IPAH/ipaH_150-mers.fasta] in search path [/home/curtis_kapsak/github/ShigaPass::]

Could you provide instructions on how to index the databases? I assume I can use the makeblastdb command, but wasn't sure what options should be used, if any.

It would be helpful to have these instructions on the main README.md.

Thank you

feature request - tab delimited output

Hi, sorry for flooding your inbox, but I wanted to raise another request.

Is it possible to add the ability to output the summary file as a tab-delimited/TSV file?

Currently, with v1.5.0 of ShigaPass the summary output file is semicolon ; delimited which is requires some extra steps to open in typical spreadsheet viewers like MS Excel.

I tried converting semicolons to commas, but realized that some of the cells contain commas, so when opened in Excel it added columns where it should not have.

Another idea is to output as JSON format, but that would likely require more work to implement.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.