Giter Site home page Giter Site logo

coarsegrainingvae's Introduction

Coarse-Graining Variational Autoencoders(CGVAE)

This is a repo accompanying our ICML 2022 paper "Generative Coarse-graining of Molecualr Conformations" (arxiv link). We propose a geometric generative model for backmmaping fine-grained coordinates from coarse-grained coordinates. It is essentially performing the geometric super-resolution task for molecular geometries.

Highlights:

  • ✔️ Our model works for arbitrary coarse-graining mapping protocols
  • ✔️ Our model can backmap from very representations
  • ✔️ Our model incoporates necessary geometric constraints
  • ✔️ Our model is generative

Install packages

I highly recommend creating a dedicated conda environment via:

conda create -n cgvae python=3.8

Download and install

git clone https://github.com/wwang2/CoarseGrainingVAE.git
cd CoarseGrainingVAE
conda activate cgvae
pip install -r requirements.txt # I have tested this, it should work 
pip install -e . # -e is useful if you want to edit the source code

Download data

Alanine Dipeptide trajectories

trajectories will be automatically downloaded when running the script. The dataset is provided via mdshare

Chignolin trajectories

Before downloading, make sure you have at least 1.3G diskspace.

cd CoarseGrainingVAE
mkdir data
wget http://pub.htmd.org/chignolin_trajectories.tar.gz -P data
cd data 
tar -xf chignolin_trajectories.tar.gz

Run experiments

Navigate to scripts directory. I assume you have access to a GPU.

To run experiment for alanine dipeptide trajecotries (approx. run time 2h for each fold):

python run_ala.py -logdir ./dipep_exp -dataset dipeptide -device 0 -n_cgs 3 -batch_size 32 -nsamples 20 -ndata 20000 -nepochs 600 -nevals 5 -atom_cutoff 8.5 -cg_cutoff 9.5 -nsplits 5 -beta 0.05 -activation swish -dec_nconv 5 -enc_nconv 4 -lr 0.00008 -n_basis 600 -n_rbf 8 --graph_eval -gamma 25.0 -eta 0.0 -kappa 0.0 -patience 15 -cg_method cgae -edgeorder 2

to run experiment for chignolin trajectoriesi (approx. run time 15h for each fold):

python run_ala.py -logdir ./chig_exp  -dataset chignolin -device 0 -n_cgs 6 -batch_size 2 -nsamples 35 -ndata 5000 -nepochs 100 -atom_cutoff 12.0 -cg_cutoff 25.0 -nsplits 5 -beta 0.05 -gamma 50.0 -eta 0.0 -kappa 0.0 -activation swish -dec_nconv 9 -enc_nconv 2 -lr 0.0001 -n_basis 600 -n_rbf 10 -cg_method cgae --graph_eval -n_ensemble 8 -factor 0.3 -patience 14

A set of coarse-graining mapping is first determined before the model training. The available mapping choices include the following (specified with -cg_method):

-cg_method cgae # coarse-graining autoencoders 
-cg_method newman # girvan-newman alrogithm 
-cg_method minimal # coarse-graining by keeping heavy atoms only 
-cg_method alpha # coarse-graining based on alpha carbon
-cg_method random # coarse-graining randomly (yes, you will find it also works!)
-cg_method backbonepartition # randomly generate parition based on backbone topology so that contiguity is ensured 

Feel free to change the desired CG resolution via -n_cg, it should work for a minimal of 3-bead coarse-graining representations. (My dream is to be able to backmap any molecular geometries from representations of only 3-beads, 3 beads are all you need?)

Evaluation stats is generated with 5-fold cross-validation. Feel free to change hyperparameters, it should not be very sensitive to hyperparameter choice. Generated samples are also dumped for visualization.

The model work for larger systems and proteins, I have tested it on a covid spike protein before, but the training might take a while.

Future plans

There are many things I want to do with this tool for CG modeling, but I might not have the time to do all of them. Here are a few things:

  • provide flexible user input options for handling different coordinates format and coarse-graining mapping
  • create a web-service for generating backmapping models
  • better code structures

This repo is a reference implementation, and it might need some more work to incorporate this into a pipeline. I will try to maintain the project as it goes. It is very likely that this project will be better packaged in another project for community use, depending how this field develops. If you want to contribute, please get in touch!

Citation info

@inproceedings{
  wang2022generativecg,
  title={Generative Coarse-Graining of Molecular Conformations},
  author={Wang, Wujie and Xu, Minkai and Cai, Chen and Miller, Benjamin Kurt and Smidt, Tess and Wang, Yusu and Tang, Jian and G{\'o}mez-Bombarelli, Rafael},
  booktitle={International Conference on Machine Learning},
  year={2022}
}

coarsegrainingvae's People

Contributors

wwang2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

coarsegrainingvae's Issues

Questions about running errors

Hi, nice work and thanks for sharing the code. I encountered some problems when running experiment for chignolin trajectories. I followed the instructions in README.md. The error is

2022-11-14 22:01:37,477 - .prody - INFO - ProDy is configured: verbosity='none'
2022-11-14 22:01:37,477 - .prody - INFO - ProDy is configured: auto_secondary=False
Sampling Task
Traceback (most recent call last):
File "run_ala.py", line 484, in
run_cv(params)
File "run_ala.py", line 131, in run_cv
traj = load_protein_traj(dataset_label)
File "CoarseGrainingVAE/scripts/../CoarseGrainingVAE/datasets.py", line 185, in load_protein_traj
traj = md.join(trajs)
File "envs/cgvae/lib/python3.8/site-packages/mdtraj/core/trajectory.py", line 541, in join
return functools.reduce(
TypeError: reduce() of empty sequence with no initial value

Could you please have a look at it? Thanks very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.