Giter Site home page Giter Site logo

ivre's Introduction

Enterococcal secondary metabolome analysis

Repository associated with the biosynthetic gene cluster analysis of Enterococcal species associated with HCT patients.

License

Copyright (C) 2017, Robin Shields-Cutler and Gabe Al-Ghalith

These programs are free software: you can redistribute them and/or modify
them under the terms of the GNU General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This content is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

Methods relevant to the analysis

These analyses were carried out, and have only been tested, on Mac OSX 10.11.6, except where noted below.

Genome files were downloaded via FTP from the NCBI Reference Sequence Database. For information on which strains, see the strain mapping data file in this repo, which contains the RefSeq ID and annotated strain name for each assembly (i.e. not all are "complete" level genomes). Strain names and mappings to NCBI Refseq identifiers are provided in the data directory in this repo.

A local Linux server installation of antiSMASH v3.0 was used to predict biosynthetic gene clusters (BGCs)1, using the following basic command line structure:

run_antismash.py
                [genome_fna]
                --outputfolder [results_dir]
                --inclusive --clusterblast --asf --disable-BioSQL
		--disable-svg --disable-embl --disable-write_metabolicmodel
                --disable-xls --disable-html --disable-BiosynML

The extracted amino acid coding sequences were concatenated and compared "all-vs-all" using the command line tool for BLAST in Anaconda. We generated a custom blast protein database from the amino acid sequences and queried the database with the same set of sequences (e-value cutoff of 7x10-10). Custom C software was used to evaluate the identity and compositional similarity between every two BGCs, generating an all-vs-all square matrix, where each row/column is a single BGC and the matrix value represents the identity score scaled by the amount of homologous gene overlap. Therefore, a perfect self-self match scores 100, while very dissimilar pathways would score 0.

From here, the matrix was converted to long format in R, then de-replicated, fully annotated using the above strain map, and filtered at a specific similarity threshold in a custom Python script, here. The resulting table was used to generate the networks in Cytoscape v3.4.0.


References:
1 Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R, Lee SY, Fischbach MA, Müller R, Wohlleben W, Breitling R. (2015). antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic acids research, 43(W1), W237-W243.

ivre's People

Contributors

rrshieldscutler avatar

Watchers

James Cloos avatar Nick Jensen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.