Awesome of awesome chemoinformatics

This repository contains information about educational courses and related resourses, tools and databases in cheminformatics and computational drug discovery. This list was compiled from several sourses listed below.

Sources:

https://github.com/Bin-Chen-Lab/Awesome_BigData_AI_DrugDiscovery

https://github.com/LeeJunHyun/The-Databases-for-Drug-Discovery

https://github.com/lmmentel/awesome-python-chemistry

https://github.com/hsiaoyi0504/awesome-cheminformatics

http://polysearch.cs.ualberta.ca/otherdatabases

https://www.click2drug.org/ - the biggest list of tools and resources for drug discovery

Educational materials

Courses

Learncheminformatics.com - "Cheminformatics: Navigating the world of chemical data" courese at Indiana University.
Python for chemoinformatics
TeachOpenCADD - A teaching platform for computer-aided drug design (CADD) using open source packages and data.
Cheminformatics OLCC - Cheminformatics course of the Collaborative Intercollegiate Online Chemistry Course (OLCC) course of University of Arkansas at Little Rock by Robert Belford
BigChem - All lectures of BigChem (A Horizon 2020 MSC ITN EID project, which provides innovative education in large chemical data analysis.)
Molecular modeling course - by Dr. Jay Ponder, a professor from WashU St.Louis.
Simulation in Chemistry and Biochemistry - by Dr. Jay Ponder, a professor from WashU St.Louis.

Books

Computational Approaches in Cheminformatics and Bioinformatics - Include insights from public (NIH), academic, and industrial sources at the same time.
Chemoinformatics for Drug Discovery - Materials about how to use Chemoinformatics strategies to improve drug discovery results.
Molecular Descriptors for Chemoinformatics - More than 3300 descriptors and related terms for chemoinformatic analysis of chemical compound properties.

Blogs

Open Source Molecular Modeling - Updateable catalog of open source molecular modeling software.
PubChem Blog - News, updates and tutorials about PubChem.
The ChEMBL-og blog - Stories and news from Computational Chemical Biology Group at EMBL-EBI.
ChEMBL blog - ChEMBL on GitHub.
SteinBlog - Blog of Christoph Steinbeck, who is the head of cheminformatics and metabolism at the EMBL-EBI.
Practical Cheminformatics - Blog with in-depth examples of practical application of cheminformatics.
Noel O'Blog - Blog of Noel O'Boyle, who is a Senior Software Engineer at NextMove Software.
chem-bla-ics - Blog of Egon Willighagen, who is an assistant professor at Maastricht University.
steeveslab-blog - Some examples using RDKit.
Macs in Chemistry - Provide a resource for chemists using Apple Macintosh computers.
DrugDiscovery.NET - Blog of Andreas Bender, who is a Reader for Molecular Informatics at University of Cambridge.
Is life worth living? - Some examples for cheminformatics libraries.
Cheminformatics 2.0 - Blog of Alex M. Clark, a research scientist at Collaborative Drug Discovery.
Depth-First - Blog of Richard L. Apodaca, a chemist living in La Jolla, California.

Instruments

General Chemistry

Packages and tools for general chemistry.

batchcalculator - A GUI app based on wxPython for calculating the correct amount of reactants (batch) for a particular composition given by the molar ratio of its components.
cctbx - The Computational Crystallography Toolbox.
chemlib - A robust and easy-to-use package that solves a variety of chemistry problems.
chempy - ChemPy is a package useful for chemistry (mainly physical/inorganic/analytical chemistry).
GoodVibes - A Python program to compute quasi-harmonic thermochemical data from Gaussian frequency calculations.
ionize - Calculates the properties of individual ionic species in aqueous solution, as well as aqueous solutions containing arbitrary sets of ions.
mendeleev - A package that provides a python API for accessing various properties of elements from the periodic table of elements.
Open Babel - A chemical toolbox designed to speak the many languages of chemical data.
periodictable - This package provides a periodic table of the elements with support for mass, density and xray/neutron scattering information.
propka - Predicts the pKa values of ionizable groups in proteins and protein-ligand complexes based in the 3D structure.
pybel - Pybel provides convenience functions and classes that make it simpler to use the Open Babel libraries from Python.
pycroscopy - Scientific analysis of nanoscale materials imaging data.
pyEQL - A set of tools for conventional calculations involving solutions (mixtures) and electrolytes.
pyiron - pyiron - an integrated development environment (IDE) for computational materials science.
pymatgen - Python Materials Genomics is a robust, open-source library for materials analysis.
symfit - a curve-fitting library ideally suited to chemistry problems, including fitting experimental kinetics data.
symmetry - Symmetry is a library for materials symmetry analysis.
stk - A library for building, manipulating, analyzing and automatic design of molecules, including a genetic algorithm.

Simulations

Packages for atomistic simulations and computational chemistry.

amp - Is an open-source package designed to easily bring machine-learning to atomistic calculations.
Atomic Silumation Environment (ASE) - Is a set of tools and modules for setting up, manipulating, running, visualizing and analyzing atomistic simulations.
basis_set_exchange - A library containing basis sets for use in quantum chemistry calculations. In addition, this library has functionality for manipulation of basis set data.
CACTVS - Cactvs is a universal, scriptable cheminformatics toolkit, with a large collection of modules for property computation, chemistry data file I/O and other tasks.
ccdc - An API for the Cambridge Structural Database System.
cclib - A library for parsing output files various quantum chemical programs.
cinfony - A common API to several cheminformatics toolkits (Open Babel, RDKit, the CDK, Indigo, JChem, OPSIN and cheminformatics webservices).
chainer-chemistry - A Library for Deep Learning in Biology and Chemistry.
chemlab - Is a library that can help the user with chemistry-relevant calculations.
chemml - A machine learning and informatics program suite for the analysis, mining, and modeling of chemical and materials data.
deepchem - Deep-learning models for Drug Discovery and Quantum Chemistry.
emmet - A package to 'build' collections of materials properties from the output of computational materials calculations.
fromage - The "FRamewOrk for Molecular AGgregate Excitations" enables localised QM/QM' excited state calculations in a solid state environment.
GPAW - Is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE).
horton - Helpful Open-source Research TOol for N-fermion system, a quantum-chemistry program that can perform computations involving model Hamiltonians.
Indigo - Universal cheminformatics libraries, utilities and database search tools.
MAML - Aims to provide useful high-level interfaces that make ML for materials science as easy as possible.
mathchem - Is a free open source package for calculating topological indices and other invariants of molecular graphs.
MDAnalysis - Is an object-oriented library to analyze trajectories from molecular dynamics (MD) simulations in many popular formats.
MDTraj - Package for manipulating molecular dynamics trajectories with support for multiple formats.
MMTK - The Molecular Modeling Toolkit is an Open Source program library for molecular simulation applications.
MolMod - A library with many components that are useful to write molecular modeling programs.
OPEM - Open source PEM (Proton Exchange Membrane) fuel cell simulation tool.
pGrAdd - A library for estimating thermochemical properties of molecules and adsorbates using group additivity.
phonopy - An open source package for phonon calculations at harmonic and quasi-harmonic levels.
PLAMS - Python Library for Automating Molecular Simulation: input preparation, job execution, file management, output processing and building data workflows.
pMuTT - A library for ab-initio thermodynamic and kinetic parameter estimation.
PorePy - A Simulation Tool for Fractured and Deformable Porous Media.
ProDy - An open source package for protein structural dynamics analysis with a flexible and responsive API.
Psi4 - A hybrid Python/C++ open-source package for quantum chemistry.
pyEMMA - Library for the estimation, validation and analysis Markov models of molecular kinetics and other kinetic and thermodynamic models from molecular dynamics data.
pygauss - An interactive tool for supporting the life cycle of a computational molecular chemistry investigations.
PyQuante - Is an open-source suite of programs for developing quantum chemistry methods
pysic - A calculator incorporating various empirical pair and many-body potentials.
Pyscf - A quantum chemistry package written in Python.
pyvib2 - A program for analyzing vibrational motion and vibrational spectra.
RDKit - Open-Source Cheminformatics Software.
ReNView - A program to visualize reaction networks.
stk - A library for building, manipulating, analyzing and automatic design of molecules.
QUIP - A collection of software tools to carry out molecular dynamics simulations.
tsase - The library which depends on ASE to tackle transition state calculations.

Molecular Visualization

Packages for viewing molecular structures.

ase-gui - The graphical user-interface allows users to visualize, manipulate, and render molecular systems and atoms objects.
chemview - An interactive molecular viewer designed for the IPython notebook.
imolecule - An embeddable webGL molecule viewer and file format converter.
nglview - A Jupyter widget to interactively view molecular structures and trajectories.
PyMOL - A user-sponsored molecular visualization system on an open-source foundation, maintained and distributed by Schrödinger.
pymoldyn - A viewer for atomic clusters, crystalline and amorphous materials in a unit cell corresponding to one of the seven 3D Bravais lattices.
sumo - A toolkit for plotting and analysis of ab initio solid-state calculation data.
surfinpy - A library for the analysis, plotting and visualisation of ab initio surface calculation data.

Database Wrappers

Providing a python layer for accessing chemical databases

ChemSpiPy - ChemSpider wrapper, that allows chemical searches, chemical file downloads, depiction and retrieval of chemical properties.
CIRpy - An interface for the Chemical Identifier Resolver (CIR) by the CADD Group at the NCI/NIH.
pubchempy - PubChemPy provides a way to interact with PubChem in Python.

Databases

ZINC database http://zinc.docking.org/ and http://zinc15.docking.org/
- molecule dataset
- 250,000 drug like commercially available molecules
- 35 million commercially-available compounds
- maximum atom number 38
Connectivity Map https://clue.io/cmap
- A Next Generation Connectivity Map: L1000 Platform And The First 1,000,000 Profiles [Subramanian A, et al.]
PubChem https://pubchem.ncbi.nlm.nih.gov/
Protein Data Bank https://www.rcsb.org/

Information about the 3D structures of proteins, nucleic acids, and complex assemblies.
GEO (Gene Expression Omnibus) https://www.ncbi.nlm.nih.gov/geo/
- international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community.
PharmGKB PHARMACOGENOMICS. KNOWLEDGE BASE https://www.pharmgkb.org/
- knowledge about the impact of genetic variation on drug response
- relationship between genetic variations and how our body responds to medications.
- Drugs, Pathways, Dosing Guidelines, Drug Labels
STITCH http://stitch.embl.de/
- Chemical-Protein Interaction Networks
- ORGANISMS 2031, CHEMICALS 0.5 mio, PROTEINS 9.6 mio, INTERACTIONS 1.6 bn
- 68,000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes.
RDKit https://www.rdkit.org/
- Rdkit: Open-source cheminformatics
- SMILES -> chemical structure graph tool
Decagon (Multimodal graph of polypharmacy) http://snap.stanford.edu/decagon/
- Protein-protein interaction network
- Drug-target protein associations
- Drug-target protein associations culled from several curated databases
- Polypharmacy side effects in the form of (drug A, side effect type, drug B) triples
- Side effects of individual drugs in the form of (drug A, side effect type) tuples
- Side effect categories
DeepChem
- DeepChem aims to provide a high quality open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology.
- https://deepchem.io/
- https://github.com/deepchem/deepchem
MoleculeNet https://arxiv.org/abs/1703.00564
- MoleculeNet, molecular molecule · molecular physics · biophysics · living body for discovery of new drugs? A data set containing four kinds of data is released at DeepChem.
DrugBank https://www.drugbank.com/
- bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information.
- version 5.1.1, released 2018-07-03, contains 11,877 drug entries including 2,474 approved small molecule drugs, 1,180 approved biotech (protein/peptide) drugs, 129 nutraceuticals and over 5,748 experimental drugs. Additionally, 5,131 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. Each DrugCard entry contains more than 200 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data.
- https://www.drugbank.ca/
STRING https://string-db.org/
- database of putatively associating genes from multiple pieces of evidence like biological experiments, text-mined literature information, computational prediction, etc.
- ex) protein-protein interaction network topology
DTIP [Kyle Yingkai Gao et al., 2018]
- IBM research dataset from BindingDB
- paper : Interpretable Drug Target Prediction Using Deep Neural Representation
- 39,747 positive examples and 31,218 negative examples
- https://github.com/IBM/InterpretableDTIP
BindingDB https://www.bindingdb.org/bind/index.jsp
- Public, web-accessible database
- binding affinities, focusing chiefly on the interactions of small molecules (drugs/drug candidates) and proteins (targets/target candidates)
SIDER http://sideeffects.embl.de/
- drug side effect
- 996 drugs and 4192 side effects
ChEMBL (StARlite) https://www.ebi.ac.uk/chembl/
- Chemical European Molecular Biology Laboratory
- chemical database of bioactive molecules with drug-like properties.
- 1.8M compounds, 1.1M assays, 69k documents, 12k targets, 11k drugs, 1.7k cells
- https://www.ebi.ac.uk/chembl/beta/
- https://chembl.gitbook.io/chembl-interface-documentation/downloads
TTD database http://db.idrblab.net/ttd/
- therapeutic target database
- database to provide information about the known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets. Also included in this database are links to relevant databases containing information about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, clinical development status. All information provided is fully referenced.
PDBBind dataset http://www.pdbbind.org.cn/
- binding affinities for the protein-ligand complexes in the Protein Data Bank (PDB).
- version 2017 released by Jan 1st, 2017. This release provides binding data of a total of 17,900 biomolecular complexes, including protein-ligand (14,761), nucleic acid-ligand (121), protein-nucleic acid (837), and protein-protein complexes (2,181), which is currently the largest collection of this kind.
Tox21 Data Challenge 2014 https://tripod.nih.gov/tox21/challenge/data.jsp
- for prediction of compounds' interference in biochemical pathways using only chemical structure data(SMILES).
GDB Databases http://gdb.unibe.ch/downloads/
- GDB-11
  - small organic molecules up to 11 atoms of C, N, O and F following simple chemical stability and synthetic feasibility rules.
- GDB-13
  - small organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules.
- GDB-17
  - 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens.
  - Compared to known molecules in PubChem, GDB-17 molecules are much richer in nonaromatic heterocycles, quaternary centers, and stereoisomers, densely populate the third dimension in shape space, and represent many more scaffold types.
QM (Quantum machine) dataset http://quantum-machine.org/datasets/
dSPP: Database of structural propensities of proteins https://peptone.io/dspp
- This repository comprises residual propensities of individual residues in proteins to populate helical, extended or disordered structural states. The data are derived from experimental NMR assignments of unrelated proteins in solution state near physiological conditions. The residue-specific propensity scores are normalized in a range -1.0 to 1.0 and prepared for machine learning in Plain-text, Numerical Python, Keras and Tensorflow formats.
HMDD http://www.cuilab.cn/hmdd
- the Human microRNA Disease Database
- database that curated experiment-supported evidence for human microRNA (miRNA) and disease associations.
- HMDD v3.0, released June 28 2018, 32281 miRNA-disease association entries which include 1102 miRNA genes, 850 diseases from 17412 papers.
DrugTargetCommons https://drugtargetcommons.fimm.fi/
- Drug Target Commons (DTC) is a crowd-sourcing platform to improve the consensus and use of drug-target interactions.
IDG Pharos
- compound and target data resources on public domain
- Ligand, disease, target
- https://druggablegenome.net/
- https://pharos.nih.gov/idg/index
DDIExtraction2013 [BioNLP Challenge] https://www.cs.york.ac.uk/semeval-2013/task9/
- Extraction of Drug-Drug Interactions from BioMedical Texts
- Task 1: Recognition and classification of drug names.
- Task 2: Extraction of drug-drug interactions.
Biocreative PPI [BioNLP Kaggle] http://biocreative.sourceforge.net/index.html
- BioCreAtIvE (Critical Assessment of Information Extraction Systems in Biology)
- text mining and information extraction systems applied to the biological domain.
- Gene mention tagging [GM]
- Gene normalization [GN]
- Extraction of protein-protein interactions from text
Polysearch2 http://polysearch.cs.ualberta.ca/
- online text-mining system for identifying relationships between human diseases, genes, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies.
SuperTarget [BioNLP] http://insilico.charite.de/supertarget/index.php?site=about
- database developed in the first place to collect informations about drug-target relations. It consist mainly of three different types of entities: DRUGS, PROTEINS, SIDE-EFFECTS.
- database that contains a core dataset of about 7300 drug-target relations of which 4900 interactions have been subjected to a more extensive manual annotation effort. SuperTarget provides tools for 2D drug screening and sequence comparison of the targets. The database contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs
- data from DrugBank, BindingDB and SuperCyp
ConsensusPathDB http://cpdb.molgen.mpg.de/
- [2018.10.04] unique physical entities: 170,276, unique interactions: 603,543, gene regulations: 17,410, protein interactions: 397,088, genetic interactions: 1,738, biochemical reactions: 23,482, drug-target interactions: 163,825, pathways: 5,359
- Data originate from currently 32 public resources for interactions and interactions that we have curated from the literature.
ChemDataExtractor http://chemdataextractor.org/
- ChemDataExtractor is a python toolkit for automatically extracting chemical information from scientific documents.
- HTML, XML and PDF document readers
- Chemistry-aware natural language processing pipeline
- Chemical named entity recognition
- Rule-based parsing grammars for property and spectra extraction
- Table parser for extracting tabulated data
- Document processing to resolve data interdependencies

mkorshe / awesome_awesome_chemoinformatics Goto Github PK

awesome_awesome_chemoinformatics's Introduction

Awesome of awesome chemoinformatics

Sources:

Educational materials

Courses

Books

Blogs

Instruments

General Chemistry

Packages and tools for general chemistry.

Simulations

Packages for atomistic simulations and computational chemistry.

Molecular Visualization

Packages for viewing molecular structures.

Database Wrappers

Databases

awesome_awesome_chemoinformatics's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org