msmbuilder / msmexplorer Goto Github PK
View Code? Open in Web Editor NEWData visualizations for biomolecular dynamics
Home Page: http://msmbuilder.org/msmexplorer/
License: MIT License
Data visualizations for biomolecular dynamics
Home Page: http://msmbuilder.org/msmexplorer/
License: MIT License
Can you guys take a look at the upstream PyQT4 dependency? This seems to be missing for py3.x builds on omnia
:
https://travis-ci.org/omnia-md/conda-recipes/jobs/172303670#L685-L717
I'm going to have to disable py3.x builds if we can't find a solution or else our PRs are going to perpetually fail.
Hi,
I am trying to plot Implied time_scales using msme.plot_implied_timescales, but I am getting the following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-68-73601e52d38e> in <module>()
1 colors = ['pomegranate', 'beryl', 'tarragon', 'rawdenim', 'carbon']
----> 2 msme.plot_implied_timescales(msm_objs, color_palette=colors,
3 xlabel='Lag time (frames)',
4 ylabel='Implied Timescales ($ns$)')
AttributeError: module 'msmexplorer' has no attribute 'plot_implied_timescales'
Even the following example gives the same error. I think I have the updated package, but is there any other attribute for this?
right now plot_X(data, color='beryl')
works but plot_X(data, 'beryl')
would throw an error
I cannot find any documentation that says what the numbers given on the histogram chart when I use: plot_histogram. In the example on the website it says: tIC1=-0.33 and two numbers super and subscripts. What do these mean?
They're trendy and a good option for discrete time-series analysis of distributions. I imagine this would be nice to look at features evolving along a transition pathway.
Also fairly simple to implement:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(42)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=12, size=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw=.2)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play will with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
and increase coverage in general
Would be nice to finally get this out into the wild and release a conda build ๐
The plan is to eventually incorporate this repo either as a plugin or as a part of MSMBuilder. The MarkovStateModel
class would contain most of the plotting functionality: msm.plot_X()
, and ideally this could work with forthcoming project templates.
plot_free_energy
should take the MSM object rather than pi
. This way ergodic trimming issues can be handled under the hood.
Hi,
I've found this paper which use Cross Validation and Gaussian Mixture Models to estimate Free Energy landscapes (they also compare them to other ways of estimating the densities), but apparently the GMM models are best for estimating the free energy values in the regions where data is sparse.
They've developed their code in matlab but I'm pretty sure it can be replicated using scikit learn's mixture package and CV utilities.
If I find time I can try to replicate that, I think it'd be a cool addition to msmexplorer.
Let me know if you have some thoughts on this :)
cluster_centers_
from plotThere's a few mentions about keeping track of the "frame -> time" conversions in the msmbuilder code, but I didn't see any issue about it there or here. Seems like MSME is a decent place to do this conversion for visualization purposes, maybe through some optional argument?
probably useful to have a function that just saves a figure at 300 dpi given some dimensions
The plots look better if you import seaborn proper
Hi,
I'm running into a weird issue with plot_free_energy
and I can't think of why it's happening.
I load some trajectories using a dataset
from msmbuilder, featurize and scale them. The tICA time series look normal:
pipeline = Pipeline([
('feat', AtomPairsFeaturizer(atom_pairs)),
('scale', RobustScaler()),
('tica', tICA(n_components=3, lag_time=100)),
('cluster', MiniBatchKMeans(n_clusters=500)),
('msm', MarkovStateModel(lag_time=100, n_timescales=5))
])
scaled_data = pipeline.named_steps['scale'].fit_transform(pipeline.named_steps['feat'].fit_transform(xyz))
tica_trajs = pipeline.named_steps['tica'].fit_transform(scaled_data)
ax, side_ax = msme.plot_trace(np.concatenate(tica_trajs)[:, 0],
label='tIC1', xlabel='Timestep', color='rawdenim')
_ = msme.plot_trace(np.concatenate(tica_trajs)[:, 1], label='tIC2',
xlabel='Timestep', color='cochineal', ax=ax,
side_ax=side_ax)
But the free energy plot colors the most dense region with the same colour as where there are no data points:
msme.plot_free_energy(np.concatenate(tica_trajs),
obs=(0, 1), pi=None,
shade=True,
n_levels=5,
cmap='Spectral',
clabel=True,
clabel_kwargs={'fmt': '%.1f'},
xlabel='tIC1', ylabel='tIC2'
)
I've tried using other color maps and changin the n_levels
argument, but I still get the white region in the center. Have you seen this before / have any idea of why this might be happening?
This is a pretty key plot to showcase.
bump @Eigenstate
Would it be more useful to have a CLI or local HTTP server for data exploration and plot generation. I imagine people would want both, but which is a higher priority?
some of these network plots are way nicer than networkx's: https://graph-tool.skewed.de/static/doc/draw.html
If you supply ax
but not side_ax
, it gives a confusing error message.
I would advocate for a convention where each function takes a parameter named ax
, and it it is not None
, then the function won't do any funny business with subplots.
Non-power users' use case: Don't mess with axes objects; msmexplorer makes everything look good by default
Power-users: maybe you want to reshape a plot, overlay something else on top, have subplots arranged in a different way, etc. Pass an axes to the ax
parameter and msmexplorer will step off a bit
I don't think it's working as intended
I'm trying to replicate the last figure of the Fs-Peptide
notebook with an msm that is not 100% ergodic (for instance, my msm recovers an ergodic subspace of 259 microstates when my clustering was done with 500 states).
First question
This code:
pos = dict(zip(range(clusterer.n_clusters), clusterer.cluster_centers_))
_ = msme.plot_msm_network(msm, pos=pos,
with_labels=False)
fails with the following (long) Traceback but it does show (all?) the microstates:
I've also tried building an 'alternative' msm_pos
dictionary using the msm.mapping_
dictionary, to see if that fixed the problem:
msm_pos = {}
for c_id, msm_id in msm.mapping_.items():
msm_pos[msm_id] = clusterer.cluster_centers_[c_id]
but this also fails with a similar Traceback, yielding the ValueError: 'vertices' must be a 2D list or array with shape Nx2
:
_ = msme.plot_msm_network(msm, pos=msm_pos,
with_labels=False)
Second Question
For the plot_tpath
function, the sources
and sinks
that are provided as arguments, are they numbered with the ids from the clusterer
object or the internal numbering of the msm
object?
I'd be interested in having an implied timescales plot API. Do you have in mind adding it? I could give it a go.
Add a section to showcase papers that use this software.
Already have one, just need to figure out how to incorporate it.
I'm building an MSM on the internal dynamics of a ligand, which I think should be well sampled within microseconds of simulation. I can see 'clean' jumps in my tIC time evolution, but when the pop_resids plot is looking very different from the one in your documentation.
What kind of information can I extract out of msme.plot_pop_resids
? I've never seen this plot in a publication.
Is there a reason why an undirected graph is built from the transition matrix in the plot_msm_network
function?
I've checked and the edges that are built when using an undirected graph do not match the entries of the transition matrix:
import networkx as nx
import numpy as np
tmat = np.array(
[
[0.8, 0.1, 0.1],
[0.3, 0.6, 0.1],
[0.0, 0.3, 0.7]
]
)
graph_di = nx.DiGraph(tmat)
graph_un = nx.Graph(tmat)
tot_un = 0
for v in graph_un.edge[0].values():
tot_un += v['weight']
tot_di = 0
for v in graph_di.edge[0].values():
tot_di += v['weight']
assert tot_di == tmat[0, :].sum()
assert tot_un == tmat[0, :].sum()
Traceback (most recent call last):
File "/Users/je714/test_issue_tmat.py", line 19, in <module>
assert tot_un == tmat[0, :].sum()
AssertionError
print(graph_di.edge)
{0: {0: {'weight': 0.8}, 1: {'weight': 0.1}, 2: {'weight': 0.1}},
1: {0: {'weight': 0.3}, 1: {'weight': 0.6}, 2: {'weight': 0.1}},
2: {1: {'weight': 0.3}, 2: {'weight': 0.7}}}
print(graph_un.edge)
{0: {0: {'weight': 0.8}, 1: {'weight': 0.3}, 2: {'weight': 0.1}},
1: {0: {'weight': 0.3}, 1: {'weight': 0.6}, 2: {'weight': 0.3}},
2: {0: {'weight': 0.1}, 1: {'weight': 0.3}, 2: {'weight': 0.7}}}
Also, as a side question: is there a way to hide the edges below a particular weight? When there are too many connections, the resulting plot is really crowded and is a bit confusing to look at.
Someone on mdtraj linked to pdf http://content.schrodinger.com/Resources/SID/3p6h_desmond_npt_10ns.pdf . There are some nice plots in there that we might want to consider bringing in. I like the dihedral ones in particular.
Would be nice to have a stable citation
Greatings:
When I run plot_free_energy_2d.py or any script based on it. I got the error below
I am using python:
Python 3.6.2 | packaged by conda-forge | (default, Jul 23 2017, 22:59:30)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
msmbuilder 3.8.0
numpy 1.12.1
matplotlib.version
'2.0.2'
python plot_free_energy_2d.py
/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/seaborn/apionly.py:6: UserWarning: As seaborn no longer sets a default style on import, the seaborn.apionly module is deprecated. It will be removed in a future version.
warnings.warn(msg, UserWarning)
MSM contains 2 strongly connected components above weight=0.03. Component 1 selected, with population 98.921375%
Traceback (most recent call last):
File "plot_free_energy_2d.py", line 41, in
cbar_kwargs={'format': '%.1f', 'label': 'Free energy (kcal/mol)'}
File "/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/msmexplorer/utils.py", line 101, in wrapper
return func(*args, **kwargs)
File "/home-3/[email protected]/miniconda2/envs/3/lib/python3.6/site-packages/msmexplorer/plots/projection.py", line 122, in plot_free_energy
idx = random_state.choice(range(data.shape[0]), size=n_samples, p=pi)
File "mtrand.pyx", line 1126, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:17641)
ValueError: a and p must have same size
cc @msultan
Docs could use some love
Would be nice to have a utility that plots a user's workflow given a Pipeline
object.
Could use something like daft for this.
Taken from here, but probably in many papers. Should have the 2D free energy underneath? Pretty useful for exploratory analysis, especially when you look at a big grid of them. P.S. I'm aware it's super easy and it's just a scatter plot with a cmap ;)
Upon trying to import msmexplorer, I get the following error.
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-161-36c499e6e3bd> in <module>()
----> 1 import msmexplorer as msme
2 import numpy as np
3 txx = np.concatenate(tica_trajs)
4 _ = msme.plot_histogram(txx)
~/anaconda3/lib/python3.6/site-packages/msmexplorer/__init__.py in <module>()
----> 1 from .plots import *
2
3 from .version import version as _version
4 __version__ = _version
~/anaconda3/lib/python3.6/site-packages/msmexplorer/plots/__init__.py in <module>()
2 from .msm import *
3 from .tpt import *
----> 4 from .projection import *
5 from .cluster import *
6 from .misc import *
~/anaconda3/lib/python3.6/site-packages/msmexplorer/plots/projection.py in <module>()
3 from matplotlib import pyplot as pp
4
----> 5 from corner import corner
6 from seaborn.distributions import (_scipy_univariate_kde, _scipy_bivariate_kde)
7
ModuleNotFoundError: No module named 'corner'
woo!
$conda install -c omnia msmexplorer
Solving environment: failed
UnsatisfiableError: The following specifications were found to be in conflict:
-msmexplorer
-numba
Use "conda info " to see the dependencies for each package.
ipymol
has an exposed API now, so it's easier to change settings, move camera, and get certain measurements:
https://github.com/cxhernandez/ipymol/blob/master/examples/Example1.ipynb
Then plots could be made with almost no effort on the user's part.
e.g. https://github.com/msmexplorer/msmexplorer/pull/11
This could be broken up into a utilities module that exports to different file types?
The following tasks should be completed for JOSS acceptance:
Installation:
Example usage:
Community guidelines:
Version:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.