Giter Site home page Giter Site logo

msmbuilder / msmexplorer Goto Github PK

View Code? Open in Web Editor NEW
17.0 5.0 17.0 576 KB

Data visualizations for biomolecular dynamics

Home Page: http://msmbuilder.org/msmexplorer/

License: MIT License

Python 80.65% Batchfile 0.11% Shell 1.70% Jupyter Notebook 11.51% TeX 6.04%
python data-visualization biomolecular-dynamics msmbuilder markov-model plotting

msmexplorer's Introduction

MSMBuilder

Build Status PyPi version License Documentation

MSMBuilder is a python package which implements a series of statistical models for high-dimensional time-series. It is particularly focused on the analysis of atomistic simulations of biomolecular dynamics. For example, MSMBuilder has been used to model protein folding and conformational change from molecular dynamics (MD) simulations. MSMBuilder is available under the LGPL (v2.1 or later).

Capabilities include:

  • Feature extraction into dihedrals, contact maps, and more
  • Geometric clustering with a variety of algorithms.
  • Dimensionality reduction using time-structure independent component analysis (tICA) and principal component analysis (PCA).
  • Markov state model (MSM) construction
  • Rate-matrix MSM construction
  • Hidden markov model (HMM) construction
  • Timescale and transition path analysis.

Check out the documentation at msmbuilder.org and join the mailing list. For a broader overview of MSMBuilder, take a look at our slide deck.

Installation

The preferred installation mechanism for msmbuilder is with conda:

$ conda install -c omnia msmbuilder

If you don't have conda, or are new to scientific python, we recommend that you download the Anaconda scientific python distribution.

Workflow

An example workflow might be as follows:

  1. Set up a system for molecular dynamics, and run one or more simulations for as long as you can on as many CPUs or GPUs as you have access to. There are a lot of great software packages for running MD, e.g OpenMM, Gromacs, Amber, CHARMM, and many others. MSMBuilder is not one of them.

  2. Transform your MD coordinates into an appropriate set of features.

  3. Perform some sort of dimensionality reduction with tICA or PCA. Reduce your data into discrete states by using clustering.

  4. Fit an MSM, rate matrix MSM, or HMM. Perform model selection using cross-validation with the generalized matrix Rayleigh quotient

msmexplorer's People

Contributors

cxhernandez avatar jeiros avatar mpharrigan avatar msultan avatar smsaladi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

msmexplorer's Issues

Argument for time units in visualizations

There's a few mentions about keeping track of the "frame -> time" conversions in the msmbuilder code, but I didn't see any issue about it there or here. Seems like MSME is a decent place to do this conversion for visualization purposes, maybe through some optional argument?

Adding 2D tIC trajectory visualization

screen shot 2017-08-31 at 1 02 01 pm

Taken from here, but probably in many papers. Should have the 2D free energy underneath? Pretty useful for exploratory analysis, especially when you look at a big grid of them. P.S. I'm aware it's super easy and it's just a scatter plot with a cmap ;)

Implement Joy Plots

They're trendy and a good option for discrete time-series analysis of distributions. I imagine this would be nice to look at features evolving along a transition pathway.

Also fairly simple to implement:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})

# Create the data
rs = np.random.RandomState(42)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m

# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=12, size=.5, palette=pal)

# Draw the densities in a few steps
g.map(sns.kdeplot, "x", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw=.2)
g.map(plt.axhline, y=0, lw=2, clip_on=False)

# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
    ax = plt.gca()
    ax.text(0, .2, label, fontweight="bold", color=color, 
            ha="left", va="center", transform=ax.transAxes)

g.map(label, "x")

# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)

# Remove axes details that don't play will with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)

image

Release v0.1

Would be nice to finally get this out into the wild and release a conda build ๐Ÿ˜„

Add free energy plots using GMM models

Hi,

I've found this paper which use Cross Validation and Gaussian Mixture Models to estimate Free Energy landscapes (they also compare them to other ways of estimating the densities), but apparently the GMM models are best for estimating the free energy values in the regions where data is sparse.

They've developed their code in matlab but I'm pretty sure it can be replicated using scikit learn's mixture package and CV utilities.

If I find time I can try to replicate that, I think it'd be a cool addition to msmexplorer.

Let me know if you have some thoughts on this :)

add plot_pipeline

Would be nice to have a utility that plots a user's workflow given a Pipeline object.

Could use something like daft for this.

MSMExplorer Installation: conflict error 'numba'

$conda install -c omnia msmexplorer
Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:

-msmexplorer
-numba
Use "conda info " to see the dependencies for each package.

Use of undirected graph in plot_msm_network

Is there a reason why an undirected graph is built from the transition matrix in the plot_msm_network function?

I've checked and the edges that are built when using an undirected graph do not match the entries of the transition matrix:

import networkx as nx
import numpy as np
tmat = np.array(
    [
        [0.8, 0.1, 0.1],
        [0.3, 0.6, 0.1],
        [0.0, 0.3, 0.7]
    ]
)
graph_di = nx.DiGraph(tmat)
graph_un = nx.Graph(tmat)
tot_un = 0
for v in graph_un.edge[0].values():
    tot_un += v['weight']
tot_di = 0
for v in graph_di.edge[0].values():
    tot_di += v['weight']
assert tot_di == tmat[0, :].sum()
assert tot_un == tmat[0, :].sum()
Traceback (most recent call last):
  File "/Users/je714/test_issue_tmat.py", line 19, in <module>
    assert tot_un == tmat[0, :].sum()
AssertionError
print(graph_di.edge)
{0: {0: {'weight': 0.8}, 1: {'weight': 0.1}, 2: {'weight': 0.1}},
 1: {0: {'weight': 0.3}, 1: {'weight': 0.6}, 2: {'weight': 0.1}},
 2: {1: {'weight': 0.3}, 2: {'weight': 0.7}}}
print(graph_un.edge)
{0: {0: {'weight': 0.8}, 1: {'weight': 0.3}, 2: {'weight': 0.1}},
 1: {0: {'weight': 0.3}, 1: {'weight': 0.6}, 2: {'weight': 0.3}},
 2: {0: {'weight': 0.1}, 1: {'weight': 0.3}, 2: {'weight': 0.7}}}

Also, as a side question: is there a way to hide the edges below a particular weight? When there are too many connections, the resulting plot is really crowded and is a bit confusing to look at.

No module named 'corner'

Upon trying to import msmexplorer, I get the following error.

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-161-36c499e6e3bd> in <module>()
----> 1 import msmexplorer as msme
      2 import numpy as np
      3 txx = np.concatenate(tica_trajs)
      4 _ = msme.plot_histogram(txx)

~/anaconda3/lib/python3.6/site-packages/msmexplorer/__init__.py in <module>()
----> 1 from .plots import *
      2 
      3 from .version import version as _version
      4 __version__ = _version

~/anaconda3/lib/python3.6/site-packages/msmexplorer/plots/__init__.py in <module>()
      2 from .msm import *
      3 from .tpt import *
----> 4 from .projection import *
      5 from .cluster import *
      6 from .misc import *

~/anaconda3/lib/python3.6/site-packages/msmexplorer/plots/projection.py in <module>()
      3 from matplotlib import pyplot as pp
      4 
----> 5 from corner import corner
      6 from seaborn.distributions import (_scipy_univariate_kde, _scipy_bivariate_kde)
      7 

ModuleNotFoundError: No module named 'corner'

[JOSS REVIEW]

The following tasks should be completed for JOSS acceptance:

Installation:

  • Support conda install with python 3.6.
  • Specify version requirements (if applicable) for dependencies. The Fs-peptide notebook gets angry at pandas 0.16 (a few years behind the times) for the tICA cell. Alternatively, suggest that users update to most recent version for these dependencies if they are using an existing python 3.4 build.

Example usage:

  • Describe a possible use case for the chord diagram. It seems a bit disconnected from the rest of the tools, so adding an example (perhaps to some visualization of FS-peptide structure) could really tie things together.

Community guidelines:

  • Add a sentence to the "Development" section of your README directing potential contributors to the link you have already included.
  • Include a preferred method of contact for users seeking support.

Version:

  • Cut v1.0.0

Path vertices problem in Chord Diagram

Seems to me that the destination vertex for the chord plot is incorrect. My
plot here has a line connecting labels "1" and "2" when I was hoping for "1" and "5"?

import msmexplorer as msme

d=np.zeros([10,10])
d[1,5]=1.0
labels=range(10)
msme.plot_chord(d, labels=labels)

screen shot 2016-09-08 at 2 58 15 pm

plot_msm_network and plot_tpaths problems when msm is not 100% ergodic

I'm trying to replicate the last figure of the Fs-Peptide notebook with an msm that is not 100% ergodic (for instance, my msm recovers an ergodic subspace of 259 microstates when my clustering was done with 500 states).

First question

This code:

pos = dict(zip(range(clusterer.n_clusters), clusterer.cluster_centers_))
_ = msme.plot_msm_network(msm, pos=pos,
                          with_labels=False)

fails with the following (long) Traceback but it does show (all?) the microstates:

download

I've also tried building an 'alternative' msm_pos dictionary using the msm.mapping_ dictionary, to see if that fixed the problem:

msm_pos = {}
for c_id, msm_id in msm.mapping_.items():
    msm_pos[msm_id] = clusterer.cluster_centers_[c_id]

but this also fails with a similar Traceback, yielding the ValueError: 'vertices' must be a 2D list or array with shape Nx2:

_ = msme.plot_msm_network(msm, pos=msm_pos,
                          with_labels=False)

download 1

Second Question

For the plot_tpath function, the sources and sinks that are provided as arguments, are they numbered with the ids from the clusterer object or the internal numbering of the msm object?

Eventual merging into MSMBuilder

The plan is to eventually incorporate this repo either as a plugin or as a part of MSMBuilder. The MarkovStateModel class would contain most of the plotting functionality: msm.plot_X(), and ideally this could work with forthcoming project templates.

`plot_trace` gets mad if only `ax` is provided

If you supply ax but not side_ax, it gives a confusing error message.

I would advocate for a convention where each function takes a parameter named ax, and it it is not None, then the function won't do any funny business with subplots.

Non-power users' use case: Don't mess with axes objects; msmexplorer makes everything look good by default

Power-users: maybe you want to reshape a plot, overlay something else on top, have subplots arranged in a different way, etc. Pass an axes to the ax parameter and msmexplorer will step off a bit

Wrong color in most dense region of free energy plot

Hi,

I'm running into a weird issue with plot_free_energy and I can't think of why it's happening.

I load some trajectories using a dataset from msmbuilder, featurize and scale them. The tICA time series look normal:

pipeline = Pipeline([
    ('feat', AtomPairsFeaturizer(atom_pairs)),
    ('scale', RobustScaler()),
    ('tica', tICA(n_components=3, lag_time=100)),
    ('cluster', MiniBatchKMeans(n_clusters=500)),
    ('msm', MarkovStateModel(lag_time=100, n_timescales=5))
])
scaled_data = pipeline.named_steps['scale'].fit_transform(pipeline.named_steps['feat'].fit_transform(xyz))
tica_trajs = pipeline.named_steps['tica'].fit_transform(scaled_data)

ax, side_ax = msme.plot_trace(np.concatenate(tica_trajs)[:, 0],
                              label='tIC1', xlabel='Timestep', color='rawdenim')
_ = msme.plot_trace(np.concatenate(tica_trajs)[:, 1],  label='tIC2',
                    xlabel='Timestep', color='cochineal', ax=ax,
                    side_ax=side_ax)

tica_trace
But the free energy plot colors the most dense region with the same colour as where there are no data points:

msme.plot_free_energy(np.concatenate(tica_trajs),
                      obs=(0, 1), pi=None,
                      shade=True,
                      n_levels=5,
                      cmap='Spectral',
                      clabel=True,
                      clabel_kwargs={'fmt': '%.1f'},
                      xlabel='tIC1', ylabel='tIC2'
)

free_energy_plot

I've tried using other color maps and changin the n_levels argument, but I still get the white region in the center. Have you seen this before / have any idea of why this might be happening?

AttributeError: module 'msmexplorer' has no attribute 'plot_implied_timescales'

Hi,
I am trying to plot Implied time_scales using msme.plot_implied_timescales, but I am getting the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-68-73601e52d38e> in <module>()
      1 colors = ['pomegranate', 'beryl', 'tarragon', 'rawdenim', 'carbon']
----> 2 msme.plot_implied_timescales(msm_objs, color_palette=colors, 
      3                              xlabel='Lag time (frames)',
      4                              ylabel='Implied Timescales ($ns$)')

AttributeError: module 'msmexplorer' has no attribute 'plot_implied_timescales'

Even the following example gives the same error. I think I have the updated package, but is there any other attribute for this?

Interpretation of plot_pop_resids

I'm building an MSM on the internal dynamics of a ligand, which I think should be well sampled within microseconds of simulation. I can see 'clean' jumps in my tIC time evolution, but when the pop_resids plot is looking very different from the one in your documentation.

download

What kind of information can I extract out of msme.plot_pop_resids? I've never seen this plot in a publication.

plot_free_energy_2d.py fails with "ValueError: a and p must have same size"

Greatings:
When I run plot_free_energy_2d.py or any script based on it. I got the error below
I am using python:

Python 3.6.2 | packaged by conda-forge | (default, Jul 23 2017, 22:59:30)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
msmbuilder 3.8.0
numpy 1.12.1

matplotlib.version
'2.0.2'

python plot_free_energy_2d.py
/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/seaborn/apionly.py:6: UserWarning: As seaborn no longer sets a default style on import, the seaborn.apionly module is deprecated. It will be removed in a future version.
warnings.warn(msg, UserWarning)
MSM contains 2 strongly connected components above weight=0.03. Component 1 selected, with population 98.921375%
Traceback (most recent call last):
File "plot_free_energy_2d.py", line 41, in
cbar_kwargs={'format': '%.1f', 'label': 'Free energy (kcal/mol)'}
File "/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/msmexplorer/utils.py", line 101, in wrapper
return func(*args, **kwargs)
File "/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/msmexplorer/plots/projection.py", line 122, in plot_free_energy
idx = random_state.choice(range(data.shape[0]), size=n_samples, p=pi)
File "mtrand.pyx", line 1126, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:17641)
ValueError: a and p must have same size

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.