Giter Site home page Giter Site logo

mdtraj / mdtraj Goto Github PK

View Code? Open in Web Editor NEW
551.0 551.0 269.0 34.65 MB

An open library for the analysis of molecular dynamics trajectories

Home Page: http://mdtraj.org

License: GNU Lesser General Public License v2.1

Python 43.25% C 15.47% C++ 7.36% Arc 26.54% Cython 7.37%
dihedral-angles mdtraj molecular-dynamics molecular-dynamics-trajectories pdb pdb-files python rmsd

mdtraj's Introduction

Note on current project status

TLDR: MDTraj is currently undergoing a transition to new maintainers (the Folding@home consortium). Please bear with us as we slowly respond to issues and clean up.

Hello!

For anybody wondering, the MDTraj repo is currently undergoing a transition to new maintainers; The Folding@home consortium (@FoldingAtHome) will now be taking over support, maintainence and management of MDTraj with @xuhuihuang leading repo management, and @mattwthompson, @apayne97, and myself (@sukritsingh) helping to manage and maintain this repo more directly.

Existing supporters, contributors, etc. are all welcome to contribute as much as they wish. The intention behind this transition is to ensure MDTraj receives continued support.

As we are slowly transitioning to this new support, please bear with us as we slowly respond to issues, clean up any PRs, etc.

I have opened a Discussions page on github for folks asking for help with code issues/not getting things working. Our hope is that the issues page will be for discrete bugs, feature requests, related discussions etc., but this is very flexible!

Best,

Sukrit Singh (@sukritsingh)

July 6th, 2023


MDTraj: an open-source library for analysis of molecular dynamics trajectories

Build Status PyPI Version Anaconda-Server Version Anaconda-Server Downloads DOI for Citing MDTraj

Read, write and analyze MD trajectories with only a few lines of Python code.

With MDTraj, you can

  • Read and write from every MD format imaginable (pdb, xtc, trr, dcd, binpos, netcdf, mdcrd, prmtop, gsd, ...)
  • Run blazingly fast RMSD calculations (4x the speed of the original Theobald QCP).
  • Use tons of analysis functions like bonds/angles/dihedrals, hydrogen bonding identification, secondary structure assignment, NMR observables.
  • Use a lightweight API, with a focus on speed and vectorized operations.

For details, see the website at mdtraj.org. To get involved, take a look at the github issue tracker and/or the gitter room.

Citation

MDTraj is research software. If you make use of MDTraj in scientific publications, please cite it. The BibTeX reference is

@article{McGibbon2015MDTraj,
    author = {McGibbon, Robert T. and Beauchamp, Kyle A. and Harrigan, Matthew P. and Klein, Christoph and Swails, Jason M. and Hern{\'a}ndez, Carlos X.  and Schwantes, Christian R. and Wang, Lee-Ping and Lane, Thomas J. and Pande, Vijay S.},
    title = {MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories},
    journal = {Biophysical Journal},
    volume = {109},
    number = {8},
    pages = {1528 -- 1532},
    year = {2015},
    doi = {10.1016/j.bpj.2015.08.015}
}

License

GNU LGPL version 2.1, or at your option a later version of the license. Various sub-portions of this library may be independently distributed under different licenses. See those files for their specific terms.

mdtraj's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mdtraj's Issues

little RMSD bug

On rmsd.pyx:51, xyz1 should not have to be the same n_frames and xyz2.

Move HDF Trajectory IO outside of trajectory.py

Feel free to shoot this one down, but I think it would be nice if trajectory.py reader more like a header file and less like an implementation file.

I think one way to do this would be to move the guts out of the HDF5 IO into a separate hdf5.py file.

The advantage of this is that trajectory.py is really really short and just points readers to the filetype-specific readers.

parse the topology from msmbuilder hdf5 files automatically

for greater backwards compatibility, it would be best to be able to parse the pdb-like entries in the MSMBuilder style hdf5 files and get a topology object.

this is one thing holding back incorporation of this module into msmbuilder, IMO.

BUG: Slicing a trajectory with box vectors with a single integer index errors

In [13]: t = trajectory.load('fbab7225-1925-4f48-abb1-5afa9d20eb67.lh5')

In [14]: t[0]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-d0a24a012877> in <module>()
----> 1 t[0]

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/mdtraj-0.1-py2.7-macosx-10.5-x86_64.egg/mdtraj/trajectory.pyc in __getitem__(self, key)
    645     def __getitem__(self, key):
    646         "Get a slice of this trajectory"
--> 647         return self.slice(key)
    648
    649     def slice(self, key, copy=True):

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/mdtraj-0.1-py2.7-macosx-10.5-x86_64.egg/mdtraj/trajectory.pyc in slice(self, key, copy)
    681             box = None
    682
--> 683         newtraj = self.__class__(xyz, topology, time, box)
    684         return newtraj
    685

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/mdtraj-0.1-py2.7-macosx-10.5-x86_64.egg/mdtraj/trajectory.pyc in __init__(self, xyz, topology, time, box)
    692         self._box = None
    693         if box is not None:
--> 694             self.box = box
    695
    696         # time will take the default 1..N

/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/mdtraj-0.1-py2.7-macosx-10.5-x86_64.egg/mdtraj/trajectory.pyc in box(self, value)
    560             if not value.shape == (self.n_frames, 3, 3):
    561                 raise ValueError("Wrong shape. Got %s, should be %s" % (value.shape,
--> 562                     (self.n_frames, 3, 3)))
    563             box = value
    564

ValueError: Wrong shape. Got (3, 3), should be (1, 3, 3)

Port RMSD from MSMB?

  1. Use Cython for wrapping.
  2. Implement reference implementation in Numpy
  3. Support general structural alignment / rotation / selection

Is this beyond the scope of MDTraj? My thinking was that RMSD calculation belongs in MDTraj, but clustering algorithms will remain in MSMB.

Round out the RMSD API

A few design decisions are left before the RMSD and alignment code should be considered stable.

  • API of the Transform operator, inverse() method.
  • Exactly how the RMSDTrajectory works
  • Test coverage

binpos not in main lib?

@rmcgibbo maybe something didn't get checked in (accidentally)? Or is it just not done yet?

TJ@dn51u4eb ~/Programs/mdtraj
$ sudo python setup.py install
dyld: DYLD_ environment variables being ignored because main executable (/usr/bin/sudo) is setuid or setgid
running install
running bdist_egg
running egg_info
writing MDTraj.egg-info/PKG-INFO
writing top-level names to MDTraj.egg-info/top_level.txt
writing dependency_links to MDTraj.egg-info/dependency_links.txt
reading manifest file 'MDTraj.egg-info/SOURCES.txt'
writing manifest file 'MDTraj.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.5-x86_64/egg
running install_lib
running build_py
running build_ext
skipping 'MDTraj/xtc/xtc.c' Cython extension (up-to-date)
skipping 'MDTraj/dcd/dcd.c' Cython extension (up-to-date)
cythoning MDTraj/binpos/binpos.pyx to MDTraj/binpos/binpos.c
error: /Users/TJ/Programs/mdtraj/MDTraj/binpos/binpos.pyx: No such file or directory

Numpy array access to topology items

I find it annoying that I have to use a generator to parse things the OpenMM topology.

When we start using mdtraj for real calculations, I worry that this will be even more of a pain.

In particular, I think being able to use fancy indexing, np.where(), and other features are probably useful. What do you think? For example, I'm thinking about how to calculate dihedral angles etc.

Create feature vectorizer class / API.

The idea is that we want a robust framework for featurizing trajectories. Key abilities:

  1. Mix and match functions from geometry
  2. Possible combination of multiple vectorizers
  3. Possibly memory aware--split trajectories into chunks to avoid trying to featurize everything at once...

Error Saving Topology in hdf5.py

I'm still trying to debug this, but I think there might be a size limit on the amount of data that can be stored in a node attribute. This might require a change to the HDF5 protocol, to put the topology into a table instead.

This same code works fine in the test suite, but this system is much larger than the one in the test suite.

Traceback (most recent call last):
  File "/home/rmcgibbo/opt/python/2.7.4/bin/accelerator", line 5, in <module>
    pkg_resources.run_script('msmaccelerator==0.3', 'accelerator')
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_script
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in run_script
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/msmaccelerator-0.3-py2.7.egg/EGG-INFO/scripts/accelerator", line 22, in <module>
    main()
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/msmaccelerator-0.3-py2.7.egg/EGG-INFO/scripts/accelerator", line 19, in main
    app.start()
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/msmaccelerator-0.3-py2.7.egg/msmaccelerator/core/app.py", line 132, in start
    return self.subapp.start()
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/msmaccelerator-0.3-py2.7.egg/msmaccelerator/simulate/simulation.py", line 79, in start
    super(Simulator, self).start()
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/msmaccelerator-0.3-py2.7.egg/msmaccelerator/core/device.py", line 79, in start
    self.on_startup_message(msg)
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/msmaccelerator-0.3-py2.7.egg/msmaccelerator/simulate/simulation.py", line 87, in on_startup_message
    return getattr(self, msg.header.msg_type)(msg.header, msg.content)
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/msmaccelerator-0.3-py2.7.egg/msmaccelerator/simulate/simulation.py", line 138, in simulate
    simulation.step(self.number_of_steps)
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 127, in step
    reporter.report(self, state)
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/reporters/hdf5reporter.py", line 174, in report
    self._initialize(simulation)
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/reporters/hdf5reporter.py", line 141, in _initialize
    self._traj_file.topology = simulation.topology
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/hdf5.py", line 74, in wrapper
    return f(*args, **kwargs)
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/hdf5.py", line 253, in topology
    self._handle.root._v_attrs.topology = json.dumps(topology_dict)
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/tables/attributeset.py", line 444, in __setattr__
    self._g__setattr(name, value)
  File "/home/rmcgibbo/opt/python/2.7.4/lib/python2.7/site-packages/tables/attributeset.py", line 386, in _g__setattr
    self._g_setAttr(self._v_node, name, stvalue)
  File "hdf5Extension.pyx", line 429, in tables.hdf5Extension.AttributeSet._g_setAttr (tables/hdf5Extension.c:4064)
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5A.c", line 927, in H5Awrite
    not an attribute

End of HDF5 error back trace

Can't set attribute 'topology' in node:
 / (RootGroup) ''.
Closing remaining open files: /home/rmcgibbo/projects/msmaccelerator2.villin/project/trajs/376cd500-7be1-4fda-9a05-6e8ae1f2a70b.h5... done

Catch Exception in DCD Reader

I need to add more error checking to the DCD reader. This may have been introduced when I added the n_frame option in read().

In [10]: x, lengths, angles = reader.read(40000)
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-10-c52b259ebf12> in <module>()
----> 1 x, lengths, angles = reader.read(40000)

/home/kyleb/opt/lib/python2.7/site-packages/mdtraj-0.1-py2.7-linux-x86_64.egg/mdtraj/dcd.so in mdtraj.dcd.DCDReader.read (MDTraj/dcd/dcd.c:2569)()

IOError: [Errno Error: %s] -1

Switch RMSD code to external dependency on @ihaque's newest IRMSD code

The latest version of @ihaque's RMSD code is on simtk as a separate installable project. Instead of copying the code in msmbuilder2, we should just use that code. I haven't looked too closely at the C, but the python side is much cleaner and more userfriendly that what we're using now. Also, it has support now for axis or atom major order, which means that for very large datasets, we can possibly avoid a copy operation at the cost of some speed (might be worth it).

https://github.com/simtk/irmsd

How to set _time after changing _xyz

Maybe we need a way to set the value of _xyz so that the _time gets set automatically? Not sure. The idea is that now it's easier for _xyz and _time to get out of sync.

Load / Save Box Size

Some of the trajectory formats store the box size with box vectors, and others store the box lengths and the three angles. Currently, we've pretty much punted and aren't doing the conversion.

We should have both represenations available via @property on the trajectory object, and then do the load/save in whatever representation is required.

get the following error when installing under ubuntu

from mdtraj import dcd
main:1: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility
main:1: RuntimeWarning: numpy.flatiter size changed, may indicate binary incompatibility

Any thoughts?

cannot load h5 using io.loadh

I would like to use the io.loadh functionality but I am having some troubles. Any reason why this is happening ?

In [17]: f = tables.open_file('test1.ring')

In [18]: f
Out[18]: 
File(filename=test1.ring, title='', mode='r', root_uep='/', filters=Filters(complevel=0, shuffle=False, fletcher32=False))
/ (RootGroup) ''
/xyza (Array(7952,)) ''
  atom := Float32Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := None
/rings (Group) ''
/rings/qres (Array(1,)) ''
  atom := Float32Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := None
/rings/ringI_267 (Array(500, 360)) ''
  atom := Float32Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := None
/rings/ringR_267 (Array(500, 360)) ''
  atom := Float32Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := None

but then I try

In [19]: io.loadh( 'test1.ring' )
Exception AttributeError: "'DeferredTable' object has no attribute '_own_fid'" in  ignored
---------------------------------------------------------------------------
NoSuchNodeError                           Traceback (most recent call last)
<ipython-input-19-80c137a1e905> in <module>()
----> 1 io.loadh( 'test1.ring' )

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mdtraj-0.0.0-py2.7-macosx-10.6-intel.egg/mdtraj/io.pyc in loadh(file, name, deferred)
    200         return result
    201 
--> 202     return DeferredTable(handle, own_fid)
    203 
    204 

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/mdtraj-0.0.0-py2.7-macosx-10.6-intel.egg/mdtraj/io.pyc in __init__(self, handle, own_fid)
    206     def __init__(self, handle, own_fid):
    207         self._handle = handle
--> 208         self._node_names = [e.name for e in handle.iterNodes(where='/')]
    209         self._loaded = {}
    210         self._own_fid = own_fid

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/group.pyc in __getattr__(self, name)
    768             self._g_add_children_names()
    769             return myDict[name]
--> 770         return self._f_get_child(name)
    771 
    772     def __setattr__(self, name, value):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/group.pyc in _f_get_child(self, childname)
    644         self._g_check_open()
    645 
--> 646         self._g_check_has_child(childname)
    647 
    648         childPath = join_path(self._v_pathname, childname)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/group.pyc in _g_check_has_child(self, name)
    377             raise NoSuchNodeError(
    378                 "group ``%s`` does not have a child named ``%s``"
--> 379                 % (self._v_pathname, name))
    380         return node_type
    381 

NoSuchNodeError: group ``/rings`` does not have a child named ``name``

In [20]: 

Formalized HDF5 Trajectory ABI

Idea

I want to formalize the format for storing HDF5 Trajectories on disk, and provide some kind of a "contract" that our code can say that it implements. It makes sense for the writers of trajectories to try to stick to this contract as closely as possible, and for the readers of trajectories to be as flexible as possible in reading trajectories.

Why HDF5?

Design Goals

In storing MD trajectory data for the for purposes including building MSMs, there are a few design goals. (1) The trajectories should be small and space efficient on disk. (2) The trajectories should be fast to write and fast to read. (3) The data format should support flexible read options. For instance, random access to different frames in the trajectory should be possible. It should be possible to easily query the dimensions of the trajectory (n_frames, n_atoms, etc) without loading the file into memory. It should be possible to load only every n-th frame, or to directly only a subset of the atoms with limited memory overhead. (5) The trajectory format should be easily extensible in a backward compatible manner. For instance, it should be possible to add new arrays/fields like the potential energy or the topology without breaking backwards compatibility.

Other Formats

Currently, MDTraj is able to read and write trajectories in dcd, xtc, trr, binpos, and amber netcdf formats, in addition to hdf5. This presents an opportunity to compare these formats and see how they fit our design goals. The most space efficient is xtc, because it uses 16bit fixed precision encoding. For some reason, the xtc read times are quite slow though. dcd is fast to read and write, but relatively inflexible. netcdf is fast and flexible. binpos is garbage -- it's neither fast, small, nor flexible.

What's lacking?

Of the formats we currently have, AMBER NetCDF is the best, in that it it satisfies all of the design goals except for the first. But the trajectories are twice as big on disk as XTC, which is really quite unfortunate. For dealing with FAH data, size matters. So let's define a HDF5 standard that has the benefits of AMBER NetCDF and the benefits of XTC mixed together. We'll use an extensible data format (HDF5), we'll provide options for lossy and lossless compression, and we'll store the topology inside the trajectory.

MDTraj HDF5 Format

This specification is heavily influenced by the AMBER NetCDF standard. Significant portions of the text are copied verbatim.

Encoding

  • Files will be encoded in HDF5, a data model, library, and file format for storing and managing data produced at NCSA.
  • Arrays may use be encoded with zlib compression.
  • Libraries implementing this standard may, at their desecration, round the data to an appropriate number of significant digits, which can significantly enhance zlib compression ratios.

Recommended file extensions

  • The .h5 extension will be preferred, although not required.

Global Attributes

  • Conventions (required) Contents of this attribute are a comma or space delimited list of tokens representing all of the conventions to which the file conforms. Creators shall include the string Pande as one of the tokens in this list. In the usual case, where the file conforms only to this convention, the value of the attribute will simply be ``Pande''. Readers may fail if this attribute is not present or none of the tokens in the list are Pande. Optionally, if the reader does not expect HDF5 files other than those conforming to the Pande convention, it may emit a warning and attempt to read the file even when the Conventions attribute is missing.
  • ConventionVersion (required) Contents are a string representation of the version number of this convention. Future revisions of this convention having the same version number may include definitions of additional variables, dimensions or attributes, but are guaranteed to have no incompatible changes to variables, dimensions or attributes specified in previous revisions. Creators shall set this attribute to "1.0". If this attribute is present and has a value other than "1.0", readers may fail or may emit a warning and continue. It is expected that the version of this convention will change rarely, if ever.
  • application (optional) If the creator is part of a suite of programs or modules, this attribute shall be set to the name of the suite.
  • program (required) Creators shall set this attribute to the name of the creating program or module.
  • programVersion (required) Creators shall set this attribute to the preferred textual formatting of the current version number of the creating program or module.
  • title (optional) Creators may set use this attribute to represent a user-defined title for the data represented in the file. Absence of a title may be indicated by omitting the attribute or by including it with an empty string value.
  • topology (optional, but highly recommended) For protein systems, creators shall describe the topology of the system in ASCII encoded JSON. The format for the topology definition is described in the topology subsection of this document.
  • randomState (optional), ASCII-encoded string Creators may optionally describe the
    state of their internal random number generators at the start of their simulation.
    The semantics of this string are specific to the MD code and are not specified by
    this standard.
  • forcefield (optional), ASCII-encoded string For data from a molecular dynamics simulation,
    creators may optionally describe the Hamiltonian used. This should be a short, human readable
    string, like "AMBER99sbildn".
  • reference (optional), ASCII-encoded string Creators may optionally specify published a reference
    that documents the program or parameters used to generate the data. The reference should be listed
    in a simple, human readable format. Multiple references may be listed simply by separating the references
    with a human readable delimiter within the string, like a newline.

Arrays

  • coordinates (required) shape=(n_frames, n_atoms, 3), type=Float32, units="nanometers". This variable shall contain the Cartesian coordinates of the specified particle for the specified.
  • time (optional), shape=(n_frames), dtype=Float32, units="picoseconds" When coordinates on the frame dimension have a temporal sequence (e.g. they form a molecular dynamics trajectory), creators shall define this dimension and write a float for each frame coordinate representing the simulated time value in picoseconds associated with the frame. Time zero is arbitrary, but typically will correspond to the start of the simulation. When the file stores a collection of conformations having no temporal sequence, creators shall omit this variable.
  • cell_lengths (optional), shape=(n_frames, 3, 3), dtype=Float32, units="nanometers" When the data in the coordinates variable come from a simulation with periodic boundaries, creators shall include this variable. his variable shall represent the lengths (a,b,c) of the unit cell for each frame. The edge with length a lies along the x axis; the edge with length b lies in the x-y plane. The origin (point of invariance under scaling) of the unit cell is defined as (0,0,0). If the simulation has one or two dimensional periodicity, then the length(s) corresponding to spatial dimensions in which there is no periodicity shall be set to zero.
  • cell_angles shape=(n_frames, 3, 3), dtype=Float32, units="degrees" Creators shall include this variable if and only if they include the cell_lengths variable. This variable shall represent the angles (\alpha, \beta, \gamma) defining the unit cell for each frame. \alpha defines the angle between the b and c vectors, \beta defines the angle between the a and c vectors and \gamma defines the angle between the a and b vectors. Angles that are undefined due to less than three dimensional periodicity shall be set to zero.
  • velocities (optional), shape=(n_frames, n_atoms, 3), type=Float32, units="nanometers/picosecond"
    When the velocities variable is present, it shall represent the cartesian components of
    the velocity for the specified particle and frame. It is recognized that due to the
    nature of commonly used integrators in molecular dynamics, it may not be possible
    for the creator to write a set of velocities corresponding to exactly the same point
    in time as defined by the time variable and represented in the coordinates variable.
    In such cases, the creator shall write a set of velocities from the nearest point in
    time to that represented by the specified frame.
  • kineticEnergy (optional), shape=(n_frames), type=Float32, units="kJ/mol"
    Creators may optionally specify the kinetic energy of the system at each frame.
  • potentialEnergy (optional), shape=(n_frames), type=Float32, units="kJ/mol"
    Creators may optionally specify the potential energy of the system at each frame.
  • temperature (optional), shape=(n_frames), type=Float32, units="Kelvin"
    Creators may optionally specify the temperature of the system at each frame.
  • lambda (optional), shape=(n_frames), type=Floa32 units="" For describing an alchemical free energy
    simulation, a creator may optionally notate each frame in the simulation with a value of lambda.
  • constraints (optional), shape=(n_constraints, 3), type=CompoundType(int, int, float) units=[None, None, "nanometers"]
    Creators may optionally describe any constraints applied to the bond lengths. constraints shall be a compound-type table (referred to a table as opposed to an array in the pytables documentation), such that the first two entries are the indices of the two atoms involved in the constant, and the final entry is the distance those atoms are constrained to.

Array Metadata

  • For arrays that contain naturally unitted numbers (which is all of them), creators shall explicitly declare their units. The unit system of length=nanometers, time=picoseconds, mass=daltons, temperature=Kelvin, energy=kJ/mol, force=kJ/mol/nm shall be used everywhere. For angles, degrees shall be used. The units shall be set as an "attribute", on the array, under the key "units", within the parlance of HDF5. It shall be a string.
  • For arrays that contain numbers which have been rounded to a certain number of significant digits, creators shall declare the number of significant digits by setting the "least_significant_digit" attribue, which should be a positive integer.

Topology

Rational

It is our experience that not having the topology stored in the same file as the the trajectory's coordinate data is a pain. It's just really inconvenient. And generally, the trajectories are long enough that it doesn't take up much incremental storage space to store the topology in there too. The topology is not that complicated.

Format

The topology will be stored in JSON. The JSON will then be serialized as a string and stored in the HDF5 file with an ASCII encoding.

The topology stores a hierarchical description of the chains, residues, and atoms in the system. Each chain is associated with an index and a list of residues. Each residue is associated with a name, an index, and a list of atoms. Each atom is associated with a name, an element, and an index. All of the indicies should be zero-based.

The name of a residue is not strictly proscribed, but should generally be one of the three letter codes for amino acids, or the commonly used extensions thereof, e.g. "HOH" for water. The element of an atom shall be one of the one or two letter element abbreviations from the periodic table. The name of an atom shall indicate some information about the type of the atom beyond just its element, such as 'CA' for the alpha carbom, 'HG' for a gamma hydrogen, etc. This format does not specify exactly what atom names are allowed -- creators should follow the conventions from the forcefield they are using.

In addition to the chains, the topology shall also contain a list of the bonds. The bonds shall be a list of length-2 lists of integers, where the integers refer to the index of the two atoms that are bonded.

Example

The following shows the topology of alanine dipeptide in this format. Since it's JSON, the whitespace is optional and just for readability.

{'bonds': [[4, 1],
           [4, 5],
           [1, 0],
           [1, 2],
           [1, 3],
           [4, 6],
           [14, 8],
           [14, 15],
           [8, 10],
           [8, 9],
           [8, 6],
           [10, 11],
           [10, 12],
           [10, 13],
           [7, 6],
           [14, 16],
           [18, 19],
           [18, 20],
           [18, 21],
           [18, 16],
           [17, 16]],
 'chains': [{'index': 0,
             'residues': [{'atoms': [{'element': 'H',
                                      'index': 0,
                                      'name': 'H1'},
                                     {'element': 'C',
                                      'index': 1,
                                      'name': 'CH3'},
                                     {'element': 'H',
                                      'index': 2,
                                      'name': 'H2'},
                                     {'element': 'H',
                                      'index': 3,
                                      'name': 'H3'},
                                     {'element': 'C',
                                      'index': 4,
                                      'name': 'C'},
                                     {'element': 'O',
                                      'index': 5,
                                      'name': 'O'}],
                           'index': 0,
                           'name': 'ACE'},
                          {'atoms': [{'element': 'N',
                                      'index': 6,
                                      'name': 'N'},
                                     {'element': 'H',
                                      'index': 7,
                                      'name': 'H'},
                                     {'element': 'C',
                                      'index': 8,
                                      'name': 'CA'},
                                     {'element': 'H',
                                      'index': 9,
                                      'name': 'HA'},
                                     {'element': 'C',
                                      'index': 10,
                                      'name': 'CB'},
                                     {'element': 'H',
                                      'index': 11,
                                      'name': 'HB1'},
                                     {'element': 'H',
                                      'index': 12,
                                      'name': 'HB2'},
                                     {'element': 'H',
                                      'index': 13,
                                      'name': 'HB3'},
                                     {'element': 'C',
                                      'index': 14,
                                      'name': 'C'},
                                     {'element': 'O',
                                      'index': 15,
                                      'name': 'O'}],
                           'index': 1,
                           'name': 'ALA'},
                          {'atoms': [{'element': 'N',
                                      'index': 16,
                                      'name': 'N'},
                                     {'element': 'H',
                                      'index': 17,
                                      'name': 'H'},
                                     {'element': 'C',
                                      'index': 18,
                                      'name': 'C'},
                                     {'element': 'H',
                                      'index': 19,
                                      'name': 'H1'},
                                     {'element': 'H',
                                      'index': 20,
                                      'name': 'H2'},
                                     {'element': 'H',
                                      'index': 21,
                                      'name': 'H3'}],
                           'index': 2,
                           'name': 'NME'}]}]}

Failure with tables version 3.0.0 (just released)

======================================================================
ERROR: MDTraj.test.test_hdf5.test_topology
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/rmcgibbo/local/mdtraj/MDTraj/test/test_hdf5.py", line 106, in test_topology
    f.topology = top
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/hdf5.py", line 73, in wrapper
    return f(*args, **kwargs)
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/hdf5.py", line 258, in topology
    self._handle.createArray(where='/', name='topology', object=[str(json.dumps(topology_dict))])
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/tables/_past.py", line 35, in oldfunc
    return obj(*args, **kwargs)
TypeError: create_array() got an unexpected keyword argument 'object'

======================================================================
ERROR: Failure: TypeError (create_array() got an unexpected keyword argument 'object')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/nose/loader.py", line 253, in generate
    for test in g():
  File "/home/rmcgibbo/local/mdtraj/MDTraj/test/test_io.py", line 125, in test_groups
    f.createArray(where='/mygroup', name='myarray', object=x)
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/tables/_past.py", line 35, in oldfunc
    return obj(*args, **kwargs)
TypeError: create_array() got an unexpected keyword argument 'object'

======================================================================
ERROR: Failure: TypeError (create_array() got an unexpected keyword argument 'object')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/nose/loader.py", line 253, in generate
    for test in g():
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/numpy/testing/decorators.py", line 153, in skipper_gen
    for x in f(*args, **kwargs):
  File "/home/rmcgibbo/local/mdtraj/MDTraj/test/test_reporter.py", line 67, in test_reporter
    simulation.step(100)
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/simtk/openmm/app/simulation.py", line 127, in step
    reporter.report(self, state)
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/reporters/hdf5reporter.py", line 172, in report
    self._initialize(simulation)
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/reporters/hdf5reporter.py", line 126, in _initialize
    self._traj_file.topology = simulation.topology
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/hdf5.py", line 73, in wrapper
    return f(*args, **kwargs)
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/hdf5.py", line 258, in topology
    self._handle.createArray(where='/', name='topology', object=[str(json.dumps(topology_dict))])
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/tables/_past.py", line 35, in oldfunc
    return obj(*args, **kwargs)
TypeError: create_array() got an unexpected keyword argument 'object'

======================================================================
ERROR: Failure: TypeError (create_array() got an unexpected keyword argument 'object')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/nose/loader.py", line 253, in generate
    for test in g():
  File "/home/rmcgibbo/local/mdtraj/MDTraj/test/test_trajectory.py", line 78, in test_box_load_save
    t.save(temp_fn)
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/trajectory.py", line 1042, in save
    return saver(filename)
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/trajectory.py", line 1057, in save_hdf
    f.topology = self.topology
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/hdf5.py", line 73, in wrapper
    return f(*args, **kwargs)
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/hdf5.py", line 258, in topology
    self._handle.createArray(where='/', name='topology', object=[str(json.dumps(topology_dict))])
  File "/home/rmcgibbo/envs/latest/lib/python2.7/site-packages/tables/_past.py", line 35, in oldfunc
    return obj(*args, **kwargs)
TypeError: create_array() got an unexpected keyword argument 'object'

Add a function in utils for managing delayed imports and printing a nice error message if they're not found

This functionality would be used for importing scipy, tables, netCDF, networkx, which are only required for specific modules and might not be installed on everyone's machine. We want to only import them when they're required (i.e. the class/function that actually uses them is called), and catch ImportErrors, giving a nice message about where the module can be downloaded from and how to install it.

How to remove waters / atoms

In the old MSMB trajectory, we had a function restrict_atom_indices. Do we plan on implementing similar features in MDTraj for performing operations on sets of atoms?

openmm_boxes at individual frames

I wonder if the traj.openmm_boxes() function should take a frame index as an argument?

Is there ever a case where OpenMM interacts with an entire trajectory at once? I don't think there is.

My thinking is that converting the entire trajectory to an OpenMM list could be a memory-killer for some applications.

Fix chi dihedral parsing

The current code does not take into account the several possible atom names for the chi torsion angle. We should fix this somehow.

Is eq() working?

This seems wrong to me. I get non-agreement even when I set decimal=1. This may have to do with testing a scalar value.

  File "/home/kyleb/src/kyleabeauchamp/mdtraj/MDTraj/test/test_geometry.py", line 80, in test_dihedral
    eq(float(phi[0,0]), phi0, decimal=1)
  File "/home/kyleb/opt/lib/python2.7/site-packages/mdtraj-0.2-py2.7-linux-x86_64.egg/mdtraj/testing/testing.py", line 71, in eq
    eq_(o1, o2)
AssertionError: -0.602303798297833 != -0.6023054454145343

----------------------------------------------------------------------
Ran 195 tests in 1.223s

FAILED (failures=1)

The topologies are not the same

import mdtraj.trajectory
r1 = mdtraj.trajectory.load("./traj.lh5")
r2 = mdtraj.trajectory.load("./traj.lh5")
r1 + r2

ValueError Traceback (most recent call last)
in ()
----> 1 r1 + r2

/home/kyleb/opt/lib/python2.7/site-packages/mdtraj-0.1-py2.7-linux-x86_64.egg/mdtraj/trajectory.pyc in add(self, other)
578 def add(self, other):
579 "Concatenate two trajectories"
--> 580 return self.join(other)
581
582 def join(self, other, check_topology=True):

/home/kyleb/opt/lib/python2.7/site-packages/mdtraj-0.1-py2.7-linux-x86_64.egg/mdtraj/trajectory.pyc in join(self, other, check_topology)
605 if not np.all(mdtraj.topology.to_bytearray(self.topology) ==
606 mdtraj.topology.to_bytearray(other.topology)):
--> 607 raise ValueError('The topologies are not the same')
608
609 xyz = np.concatenate((self.xyz, other.xyz))

ValueError: The topologies are not the same

Top level docstring

When we import mdtraj, it would be nice to be able to see the available modules. The goal is to avoid having to memorize the class structure before opening a python session.

This is low-priority and mostly a reminder for later.

Match fields in our HDF5 format with the AMBER NetCDF format

@kyleabeauchamp: The AMBER NetCDF format is really intelligent. It has all the info you want in there, and it's very explicit about the format. It has places for a version tag, on the format, and for labeling what units things are in. These are all things that our HDF5 format lacks.

We should make a new version of our HDF5 format that formalizes it, and makes it match the AMBER NetCDF format as much as possible.

We should also do some performance benchmarks to check if we really need our own proprietary format.

what should default behavior for overwriting a file be?

Right now, when using saveh in io, the method will die if trying to overwrite an (almost) identical file, but it fails at the very end (after a potentially expensive calculation). Should it either fail at the beginning, or overwrite the already-saved file?

Wish List

  • trajectory indexing (i.e. traj[5])
  • PBC vectors
  • Minimal logic on the PBC vectors.
    • Make sure they're converted between the format required for each library, written to PDB files in a way that OpenMM or something would understand.
  • DCD & XTC units same -- nm to be consistent with old msmbuilder?

Fixed Width vs. Float16

Hi,

I wanted to have a tiny discussion about the pros / cons of fixed width versus Float16. I originally used fixed width because it was used in Gromacs and because early PyTables didn't have Float16Atom(). However, now I'd say we are free to choose between the two.

As I see it, the pros of Float16 are that we offload all complexity to a simple choice of precision.

Can you just remind me why Float16 might be a bad idea? I'm having a tough time thinking about "what it means" in terms of complications regarding MD trajectories...

Serializing the topology as JSON inside the HDF5 or NetCDF Trajectories

Currently, the topology is being pickled and then the bytes are being saved in the HDF5 trajectory format.

I think this is a bad solution, because it's not very transparent. Pickle is a python-specific protocol, and being able to unpickle the object depends on the mdtraj.topology.Topology object existing and being in some way unchanged from the one in the file, I think.

Instead, I think we should serialize the topology in a way that's transparent, such that if we changed the way we were storing the topology inside our apps, we could write a new reader. The structure that makes the most sense I think is JSON, like:

{
  chain : {
    index : 0
    residue : {
      index : 0
      name : ALA,
      atom : {
        index : 0,
        name : N,
        element : N,
      },
      atom : {
        index : 1
        name : CA,
        element : C
     }
  },
  bonds : [[0,1]],
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.