Giter Site home page Giter Site logo

msmbuilder / vde Goto Github PK

View Code? Open in Web Editor NEW
183.0 13.0 44.0 800 KB

Variational Autoencoder for Dimensionality Reduction of Time-Series

License: MIT License

Python 2.89% Jupyter Notebook 97.11%
variational-autoencoder molecular-dynamics computational-chemistry computational-biology time-series deep-learning pytorch python

vde's Introduction

MSMBuilder

Build Status PyPi version License Documentation

MSMBuilder is a python package which implements a series of statistical models for high-dimensional time-series. It is particularly focused on the analysis of atomistic simulations of biomolecular dynamics. For example, MSMBuilder has been used to model protein folding and conformational change from molecular dynamics (MD) simulations. MSMBuilder is available under the LGPL (v2.1 or later).

Capabilities include:

  • Feature extraction into dihedrals, contact maps, and more
  • Geometric clustering with a variety of algorithms.
  • Dimensionality reduction using time-structure independent component analysis (tICA) and principal component analysis (PCA).
  • Markov state model (MSM) construction
  • Rate-matrix MSM construction
  • Hidden markov model (HMM) construction
  • Timescale and transition path analysis.

Check out the documentation at msmbuilder.org and join the mailing list. For a broader overview of MSMBuilder, take a look at our slide deck.

Installation

The preferred installation mechanism for msmbuilder is with conda:

$ conda install -c omnia msmbuilder

If you don't have conda, or are new to scientific python, we recommend that you download the Anaconda scientific python distribution.

Workflow

An example workflow might be as follows:

  1. Set up a system for molecular dynamics, and run one or more simulations for as long as you can on as many CPUs or GPUs as you have access to. There are a lot of great software packages for running MD, e.g OpenMM, Gromacs, Amber, CHARMM, and many others. MSMBuilder is not one of them.

  2. Transform your MD coordinates into an appropriate set of features.

  3. Perform some sort of dimensionality reduction with tICA or PCA. Reduce your data into discrete states by using clustering.

  4. Fit an MSM, rate matrix MSM, or HMM. Perform model selection using cross-validation with the generalized matrix Rayleigh quotient

vde's People

Contributors

brookehus avatar cxhernandez avatar ilya-muromets avatar rmcgibbo avatar vicvaleeva avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vde's Issues

printing a VDE instance gives error

Just playing around with the Muller potential example, I get the following Traceback:

print(mdl)


---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
<ipython-input-16-9a3b537f0ea8> in <module>()
----> 1 print(mdl)

~/miniconda3/envs/dl/lib/python3.6/site-packages/sklearn/base.py in __repr__(self)
    287         class_name = self.__class__.__name__
    288         return '%s(%s)' % (class_name, _pprint(self.get_params(deep=False),
--> 289                                                offset=len(class_name),),)
    290 
    291     def __getstate__(self):

~/miniconda3/envs/dl/lib/python3.6/site-packages/sklearn/base.py in _pprint(params, offset, printer)
    153         else:
    154             # use repr of the rest
--> 155             this_repr = '%s=%s' % (k, printer(v))
    156         if len(this_repr) > 500:
    157             this_repr = this_repr[:300] + '...' + this_repr[-100:]

... last 2 frames repeated, from the frame below ...

~/miniconda3/envs/dl/lib/python3.6/site-packages/sklearn/base.py in __repr__(self)
    287         class_name = self.__class__.__name__
    288         return '%s(%s)' % (class_name, _pprint(self.get_params(deep=False),
--> 289                                                offset=len(class_name),),)
    290 
    291     def __getstate__(self):

RecursionError: maximum recursion depth exceeded while calling a Python object

I think this can be fixed implementing a __repr__ method for the VDE class.

settings for 2nd VDE coordinate

I wanted to look at the 2nd VDE coordinate and I have set the encoder_size=2, but then the resulting VDE coordinates look more like noise than coordinates, even the 1st VDE coordinate is gone.
Is there a way/setting to get 2 (or more) VDE coordinates?

Default learning rate and other hyper params.

Based upon some testing, I am starting to think that the default learning rate of 1e-4 is probably too low for our applications and might be better to bump it up to 5e-3 or even 1e-2. This is mostly based on empirical observations that the higher learning rates tend to get "similar" looking models even with differing architectures, batch sizes, and number of epochs. It also helps that we have the Adam optimizer which can attenuate the rate as training goes forward.

recursive error

installed on python3
for the example notebook it worked until mdl.eval(), where I got
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "xxx/lib/python3.5/site-packages/sklearn/base.py", line 290, in __repr__ offset=len(class_name),),) ... RecursionError: maximum recursion depth exceeded
should I reinstall in python2.7 or is the issue elsewhere?

Lambda class definition

    self.scale = scale
    self.z_mean = nn.Linear(i, o)
    self.z_log_var = nn.Linear(i, o)

def forward(self, x):
    self.mu = self.z_mean(x)
    self.log_v = self.z_log_var(x)
    eps = self.scale * Variable(torch.randn(*self.log_v.size())
                                ).type_as(self.log_v)
    return self.mu + torch.exp(self.log_v / 2.) * eps

In vde class Lambda definition(above):
z_mean = nn.Linear(i, o)
z_log_var = nn.Linear(i, o)

from the variable name:
z_mean is the mean of the input,
z_log_var is the log variation of the input.

I am trying to understand why they are defined in the same nn function?
Thank you very much!

typo

the "PCA" subplot in figure 2 of Muller-Brown.ipynb plots tica instead of pca.

Support for multivariate datasets

Hi Carlos,

I'm trying to visualize the performance of this model using a multivariate time series data set. More specifically I want to see clustering behaviour of my time series. It is of shape (1000, 9601, 6) (time series, time steps, sensor readings).

In your example, you grab the trajectories of the Brownian motion and get a projection value. So your trajectory data has a shape of say (10000, 2). In my case, I have six sensor readings for each time series with shape (9601, 6).

When you do the scatter plot you unpack these two columns and plot them with their corresponding energy value. By running your code with my data I get six possible values to unpack.

I'm still not clear as to what your dimension reduction is doing though, are you just computing the projection value in a reduced space but plotting the trajectories in the original space?

I appreciate any insight!

Osprey Compatibility

Have a version of this working offline. Would be nice to add it to this repo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.