Giter Site home page Giter Site logo

molxspec's Introduction

molxspec

Machine learning models to convert molecules to ESI mass spectra (and maybe back again in a future version) trained on GNPS data. Currently the following models are available:

mol2spec

model description hidden unit dim num layers
mlp MLP with residual blocks trained on 1024 Morgan fingerprints 1024 6
gcn Simple GCN with deepchem like node features 1024 3
egnn Equivariant GNN trained on (RDkit optimized) 3D structures 1024 2
bert MLP trained on representations from the (smaller) ChemBERTa SMILES model 1024 6

Installation

You need to have torch and torch_geometric installed. I don't provide these as part of the dependencies since torch_geometric installs depends a lot on your CUDA and torch setup. To install torch_geometric from scratch use their documentation; e.g. can do it with pip using their wheels:

pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.9.0+cpu.html

Once torch and torch_geometric are installed, you can install molxspec:

pip install https://github.com/dimenwarper/molxspec/releases/download/v.0.1.0/molxspec-0.1.0-py3-none-any.whl

Usage

You can predict spectra from the command line:

mol2spec --model [mlp | gcn | egnn | bert] input_smiles.txt output.txt

Where input_smiles.txt is a file containing one molecule SMILES for each line. For the egnn model, molecules will have their 3D structure computed and optimized automatically using RDKit. First time use will download the pretrained models automatically, which can take some time, though it is a one-time thing only.

You can also predict spectra programmatically:

from molxspec import mol2spec
dict_of_smiles_and_spectra = mol2spec.predict(list_of_smiles)

molxspec's People

Contributors

dimenwarper avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

molxspec's Issues

When I run the training_setup.py,I encountered a problem.

When I run training_ setup. py file, the system prompts checkpoint: optional [any] = none Attributeerror: module 'Torch' has no attribute 'model'.
Have you encountered this problem before? If yes, do you know how to solve it?

Fix duplicate adduct types

These are the adduct types being considered by the models:

ADDUCTS = ['[M+H]+', '[M+Na]+', 'M+H', 'M-H', '[M-H2O+H]+', '[M-H]-', '[M+NH4]+', 'M+NH4', 'M+Na']

As you can see, there are repeated adducts when you remove the brackets, due to different notations. We should merge these adduct types as the bracket version have lower samples in the GNPS data and may be artifactually biasing the results. This of course means retraining the models.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.