chemfiles / chemfiles Goto Github PK
View Code? Open in Web Editor NEWLibrary for reading and writing chemistry files
Home Page: http://chemfiles.org
License: BSD 3-Clause "New" or "Revised" License
Library for reading and writing chemistry files
Home Page: http://chemfiles.org
License: BSD 3-Clause "New" or "Revised" License
MSYS is an Unix system on Windows. Msys2 is installed on Appveyor, and should be tested.
I recently cloned from github repo to ~/chemfiles and followed installation instructions, using these options for the cmake command:
cmake -DCMAKE_BUILD_TYPE=debug -DCHFL_BUILD_TESTS=ON -DCHFL_ENABLE_NETCDF=ON ..
But for some reason both these files weren't present in the ~/chemfiles/include/chemfiles folder, nor in the installed folder /usr/local/include/chemfiles, so when I tried to compile my code (including the flag -I/usr/local/include), the following error appeared:
/usr/local/include/chemfiles.hpp:11:32: fatal error: chemfiles/config.hpp: No such file or directory
The error was fixed by copying these 2 files to /usr/local/include/chemfiles.
File Type | text/xml |
---|---|
Topological information | Yes |
Positions | Yes |
Velocities | Yes |
Reference: http://www.xml-cml.org
Using a selection like water or name Na
could be nice. This need to predefine a group of atoms as "water".
I do not know how to do this, it may require to detect/classify molecules in the system using some kind of graph algorithms.
Functions to get the distance/angle/dihedral between particles could be nice to have. Something like
double Frame::distance(size_t i, size_t j);
double Frame::angle(size_t i, size_t j, size_t k);
double Frame::dihedral(size_t i, size_t j, size_t k, size_t m);
int chfl_frame_distance(const CHFL_FRAME* frame, size_t i, size_t j, double* r);
int chfl_frame_angle(const CHFL_FRAME* frame, size_t i, size_t j, size_t k, double* theta);
int chfl_frame_dihedral(const CHFL_FRAME* frame, size_t i, size_t j, size_t k, size_t m, double* phi);
The other solution is to have a function to wrap a vector in the unit cell, and then let the users write theses functions by himself.
This is weird. On Os X, when compiling the code in release mode with this exact set of flags, the xyz
test run into a Bus error:
CMAKE_OSX_DEPLOYMENT_TARGET=10.9
CMAKE_C_FLAGS="-mmacosx-version-min=10.9"
CHFL_ENABLE_NETCDF=ON
The full set of commands to run to reproduce in a fresh build is:
cmake -DCMAKE_OSX_DEPLOYMENT_TARGET=10.9 -DCMAKE_C_FLAGS="-mmacosx-version-min=10.9" -DCHFL_BUILD_TESTS=ON -DCHFL_ENABLE_NETCDF=ON ../..
make
./tests/xyz
Valgrnind report is here, and a LLDB session looks like this:
(lldb) target create "tests/xyz"
Current executable set to 'tests/xyz' (x86_64).
(lldb) run
Process 24586 launched: 'tests/xyz' (x86_64)
Process 24586 stopped
* thread #1: tid = 0xcbcd, 0x00000001000b3130 xyz`boost::system::system_category()::system_category_const, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1000b3130)
frame #0: 0x00000001000b3130 xyz`boost::system::system_category()::system_category_const
xyz`boost::system::system_category()::system_category_const:
-> 0x1000b3130 <+0>: lock
0x1000b3131 <+1>: repne
0x1000b3132 <+2>: orb (%rax), %al
0x1000b3134 <+4>: addl %eax, (%rax)
(lldb) bt
* thread #1: tid = 0xcbcd, 0x00000001000b3130 xyz`boost::system::system_category()::system_category_const, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1000b3130)
* frame #0: 0x00000001000b3130 xyz`boost::system::system_category()::system_category_const
(lldb)
Since I do not get any backtrace in LLDB, and valgrind reports jumps on uninitialized values, this might be due to a stack corruption.
This was spotted when investigating the need for export MACOSX_DEPLOYMENT_TARGET=""
in conda builds for conda-forge/staged-recipes#1571.
In PDB ATOM records, elements are always in upper case. I haven't noted this before, since biological molecules are almost exclusively formed by CHONP atoms.
Atom::Atom(std::string name, std::string type):
name_(std::move(name)), type_(std::move(type)) {
auto periodic = PERIODIC_INFORMATION.find(type_);
if (periodic != PERIODIC_INFORMATION.end()) {
mass_ = periodic->second.mass;
}
}
Maybe strings in PERIODIC INFORMATION should be lower case and a std::tolower()
be applied to the query string before any search.
Please confirm and I'll apply this minor change myself.
Cheers!
Getting the number of steps in a trajectory can be done at no cost for some formats (NetCDF), but might be very costly for other (XYZ). In big XYZ trajectory, just getting the number of steps will take 5s.
We always load the number of steps when opening a trajectory, which impose big cost just to open a file.
We should load this number of steps lazily (only when requested) and not systematically. To be still able to provide an indication of when a trajectory is fully read, we can add bool Format::is_done()
and use it to implement bool Trajectory::is_done()
and chfl_trajectory_is_done
.
The C FFI should #define
constants for error codes, and use them instead of relying random values.
This would allow to:
if (chfl_trajectory_open(...) != CHFL_SUCCESS)
;Result
using this;Allowing the select atoms inside a sphere/cube/..., or within a given distance of another atom.
What is needed:
I do not know if this is worth it, the BSD does not have that much advantages compared to MPL. Opening this to discuss, if you have an opinion or a requirement concerning Chemfiles licensing, please tell us!
It can be nice to set the path to molfile plugins at runtime, and not only at compile time or by the CHRP_MOLFILES environment variable.
On Linux and OS X, we can get the path to the shared library libchemharp.so
by using the following:
#include <dlfcn.h>
#include <iostream>
void foo() {
Dl_info info;
if (dladdr((void*)foo, &info)){
std::cout << "foo is at " << info.dli_fname << std::endl;
}
}
Then the path to molfile plugin can be detected from that.
Another option would be to use the static version of plugins.
The FFI functions chfl_frame_positions
and chfl_frame_velocities
should return a pointer to the Frame::positions
and Frame::velocities
. This would allow to remove the need to copy the data for big files, and remove the need to do
chfl_frame_positions(frame, positions, atoms);
positions[2][3] = 4.0;
chfl_frame_set_positions(frame, positions, atoms);
Array3D
, instead of std::vector<Vector3D>
;In the same spirit as the guess_bonds
function, we could try to guess the residues in a topology. The simplest version could simply assume residues == molecule, and just follow the bonds graph. A more elaborated version could try to match sub-graphs to known residues like amino-acids.
I think the current way to allow .nc trajectories without box info is not working as expected.
When NCFormat::read_cell()
calls dimension
, if the trajectory has no box information, an error gets thrown by nc::check
(called by dimension
) before dimension
can return, say, size=0
Maybe the call to dimenson
in NCFormat::read_cell()
could be enclosed in a try{...} catch{}
?
With the new Residue class, additional selectors should be added:
resname
would match the residue name;resid
would match the residue id;I still need to think about how this can interact with multiple selections.
This uses more memory, but have multiple advantages:
It is reserved by POSIX, but it makes the types look nicer.
The Windows version of Chemharp is starting to work with both MSVC and MinGW, but it is not yet tested in CI services.
First steps are here, but the build fails when bootstraping Boost.
I'm wondering if there's some space for an AminoAcid class. Objects of this class would be linked to each Frame object and 'contain' many objects of the Atom class. That is, a more 'OpenBabel' way of treating biological molecules and a more intuitive one.
I'm currently developing C++ software to use on biological molecules. Chemfiles is the only library that supports the major molecular dynamics trajectory formats and PDB format at the same time. But the lack of a Residue concept is critical. On the other hand, OpenBabel can't read netcdf and my soft became considerably slower when I made the change from chemfiles.
Since this is available, let's use it ! It can be used to build the OSX conda recipe too.
File Type | text |
---|---|
Topological information | Yes |
Positions | Yes |
Velocities | ??? |
Reference: ftp://ftp.wwpdb.org/pub/pdb/doc/format_descriptions/Format_v33_A4.pdf
This prevent easy static linkage of chemahrp, and the current version of netcdfcxx do not uses c++11 functionalities (move semantic in particular). I should call directly the C library for what is needed, wrapping a few types to C++.
What should be done before the public release of this code :
Santa Claus wish list (or, what will append after the public release but before 1.0)
It does not make sense to have a logging framework in a library, and it could be removed. It is mainly used by the C API, to log exceptions at the interface, but they are not strictly needed.
Another option would be to have it silenced by default.
File Type | binary |
---|---|
Topological information | ??? |
Positions | Yes |
Velocities | Yes |
Reference http://www.gromacs.org/Developer_Zone/Programming_Guide/XTC_Library
A conda recipe was introduced in 9c3ccfe, but I should check that the package is effectively relocatable. In particular, I may need to set the CHRP_MOLFILES
environment variable. See http://conda.pydata.org/docs/building/recipe.html for the documentation on recipes.
It is not really useful, and does not pull its weight. I only use it for tests, where I could just close the file.
Add a basic developer documentation before the 1.0 release.
Built with doxygen doc and some hand-written text.
Boost libraries should be removed, as there is no strong need for them, and this would make it easier to embed chemahrp in another application.
Places where Boost is used, and how to remove it:
=>
see the C++1y standard with header;any
type =>
roll my own version, or inherit ostream
in private to get access to the operator<<
;The tests/data git submodule is not really ergonomic, as submodules are not automatically updated when using git pull
.
The tests data files are in a separated repository (https://github.com/chemfiles/tests-data) because they are not needed when building the code, and the size of the repository is non negligible.
Just checking in a tar.gz archive and unpacking it with cmake would not work here, because all the tar.gz files would still be in .git/objects. I thing the better way is to download the tests files at compile time, using cmake ExternalProject.
To validate the BinaryFile interface.
File Type | binary |
---|---|
Topological information | Yes |
Positions | Yes |
Velocities | Yes |
Reference: http://ambermd.org/netcdf/nctraj.pdf
This is roughly the same as standard NetCDF, but using doubles most of the time instead of floats.
PBD CONNECT
record is only needed for bonds that are not in the standard table for connectivity. Chemfiles should use this table to be able to have all bonds in the system.
Hello @Luthaf, I'm currently going through my homebrew-juliadeps tap, updating versions of software where appropriate, and wanted to check with you to see if upgrading to v0.4.0 of chemharp would be appropriate for the julia code that uses it. Is that something I should do now, or should I wait until changes have been made on the julia side?
File Type | text |
---|---|
Topological information | Yes |
Positions | ??? |
Velocities | ??? |
PDBx/mmCIF is the choosen replacement for PDB files starting on 2016. A C++ parser exists here: http://mmcif.wwpdb.org/
Using
typedef enum {
CHFL_SUCCESS,
CHFL_ERROR,
...
} chfl_status;
and updating the function to return chfl_status
instead of int.
Using chemfiles from other project should be easy, and documented!
I am not sure whether the current PBC code can handle very tilted cells (with angles less than 60° or more than 120°). We should at least add a test for that, and maybe fix the algorithm.
In case the algorithm do not handle it, a new cell type might be needed for these cells.
The code for the C API should only use types with known size: int8_t
, uint64_t
. Else the bindings will rely on some assumptions about integer size that may not hold and will create nasty bugs.
Hi there,
In the main webpage of chemfiles, in section Multiple languages
some links are not present or not valid.
See for instance Fortran :
http://github.com/chemfiles/chemfiles.f90
instead of
http://github.com/chemfiles/chemfiles.f03
See also links related to C++ and C.
MinGW is a port of GNU compiler and tools on Windows. They are NOT an UNIX system.
They are available on Appveyor, and should be tested.
Support for VMD molfile plugins will bring to Chemharp support of 15 new format for free. The firsts steps are in the molfile
branch.
The dcd reader fails on 32 bit windows, and maybe the 64 one too.
This may be because of a 32/64bit integer size difference.
Users should be able to query Topology
for selection, in VMD style. API may looks like this:
//! Get the atoms maching the selection in select
std::vector<bool> Topology::select(const std::string& selection) const;
//! Get the atoms maching the tuple selection in select
std::vector<std::vector<bool>> Topology::select(const std::vector<std::string>& selections) const;
// C API version
int chrp_topology_select(const CHRP_TOPOLOGY* topology, const char* selection, bool* res, size_t natoms);
int chrp_topology_select_tuple(const CHRP_TOPOLOGY* topology, const char** selections, size_t nselect, bool** res, size_t natoms);
Selection are operated with strings, and return the list of atoms corresponding to the selection.
Tuple selections allow to select pairs, triplet, ... of atoms matching a given pair, triplet, ... of selection string.
topology = ["H", "O", "H", "H", "O", "H"]
auto s = topology.select("name O");
s == [false, true, false, false, true, false];
auto s = topology.select(std::vector{"name H", "name O"});
s == [[true, false], [false, true], [true, false], [true, false], [false, true], [true, false]];
The only usage documentation for now is the API doc, which is a bit rough for a first contact. The tutorials could use the example and explain them. Other ideas are welcome!
Also, it could be nice to have the same tutorial with code examples in multiple languages.
This is really easy to do (see 7237af1 for an overview of the work needed), but we needs to have one or more test file added to tests/data
.
Atom
concept to hold atom labels.Main TODOs:
label_
variable and read it from columns 13-16 of the PDB format file.name_
variable to element_
and read it from columns 78-80 in the PDB format file.name_
with element_
in the rest of the project.File Type | binary |
---|---|
Topological information | ??? |
Positions | Yes |
Velocities | Yes |
Reference http://dx.doi.org/10.1002/jcc.23495
File Type | text |
---|---|
Topological information | Yes |
Positions | Yes |
Velocities | Maybe |
Reference of the foirmat: http://scripts.iucr.org/cgi-bin/paper?S010876739101067X
A library for guessing symmetry is available here: https://github.com/atztogo/spglib
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.