Giter Site home page Giter Site logo

chemfiles / chemfiles Goto Github PK

View Code? Open in Web Editor NEW
160.0 160.0 48.0 102.41 MB

Library for reading and writing chemistry files

Home Page: http://chemfiles.org

License: BSD 3-Clause "New" or "Revised" License

CMake 2.46% C++ 88.37% Python 1.21% Shell 0.09% C 7.87%
cheminformatics chemistry compchem computational-chemistry files hacktoberfest library

chemfiles's Introduction

Chemfiles: a library for reading and writing chemistry files

Documentation Build Status Code Coverage Gitter DOI

Chemfiles is a high-quality library for reading and writing trajectory files created by computational chemistry simulations programs. To help you access information (atomic positions, velocities, names, topology, etc.) about these files, Chemfiles provides a simple and unified interface to a variety of file formats.

  • unified: the same code will work with all supported formats;
  • simple: the interface is easy to use and extensively documented.

You can use Chemfiles to conduct post-processing analysis and extract physical information about the systems you're simulating, to convert files from one format to another, to write trajectories with your own simulation software, and anything that requires reading or writing the file formats used in computational chemistry.

Chemfiles is used in multiple scientific software

  • cfiles provides ready-to-use analysis algorithms simulations trajectories as a command line tool;
  • lemon is a framework for rapidly mining structural information from the Protein Data Bank;
  • lumol is a prototype of universal extensible molecular simulation engine, supporting both molecular dynamics and Metropolis Monte Carlo simulations;
  • ANA detects cavities, calculates their volume and their flexibility in macromolecular structures and molecular dynamics trajectories;

This repository contains the core of the chemfiles library — written in C++11, with a C99 interface. You can also use chemfiles from other languages: Python 2&3, Fortran, Rust, and Julia.

Quick Links

Is chemfiles for you?

You might want to use chemfiles if any of these points appeals to you:

  • you don't want to spend time writing and debugging a file parser;
  • you use binary formats because they are faster and take up less disk space;
  • you write analysis algorithms and want to read more than one trajectory format;
  • you write simulation software and want to use more than one format for input or output.

There are other libraries doing the roughly the same job as chemfiles, have a look at them if chemfiles is not for you. Here we also say why we could not use them instead of creating a new library.

  • OpenBabel is a C++ library providing convertions between more than 110 formats. It is more complex than chemfiles, and distributed under the GPL license.
  • VMD molfile plugins are a collection of plugins witten in C and C++ used by VMD to read/write trajectory files. They do not support a variable number of atoms in a trajectory.
  • MDTraj, MDAnalyis, cclib are Python libraries providing analysis and read capacities for trajectories. Unfortunely, they are only usable from Python.

Chemfiles Features

  • Reads both text (XYZ, PDB, ...) and binary (NetCDF, TNG, ...) file formats;
  • Transparently read and write compressed files (.gz, .xz and .bz2);
  • Filters atoms with a rich selection language, including constrains on multiple atoms;
  • Supports non-constant numbers of atoms in trajectories;
  • Easy-to-use programming interface in Python, C++, C, Fortran 95, Julia and Rust;
  • Cross-platform and usable from Linux, OS X and Windows;
  • Open source and freely available (3-clauses BSD license);

Contact / Contribute / Cite

Chemfiles is free and open source. Your contributions are always welcome!

If you have questions or suggestions, or need help, please open an issue or join us on our Gitter chat room.

If you are using Chemfiles in a published scientific study, please cite us using the following DOI: https://doi.org/10.5281/zenodo.3653157.

Getting Started

Here, we'll help you get started with the C++ and C interface. If you want to use Chemfiles with another language, please refer to the corresponding documentation.

Installing Compiled Packages

We provide compiled packages of the latest Chemfiles release for Linux distributions. You can use your package manager to download them here.

We also provide conda packages in the conda-forge community channel for Linux and OS X. This package provides the C++, C and Python interfaces. Install the conda package by running:

conda install -c conda-forge chemfiles

Find more information about pre-compiled packages in the documentation.

Building from Source

You will need cmake and a C++11 compiler.

git clone https://github.com/chemfiles/chemfiles
cd chemfiles
mkdir build
cd build
cmake ..
make
make install

Usage Examples

This is what the interface looks like in C++:

#include <iostream>
#include "chemfiles.hpp"

int main() {
    chemfiles::Trajectory trajectory("filename.xyz");

    auto frame = trajectory.read();
    std::cout << "There are " << frame.size() << " atoms in the frame" << std::endl;

    auto positions = frame.positions();
    // Do awesome science with the positions here !
}

License

Guillaume Fraux created and maintains Chemfiles, which is distributed under the 3 clauses BSD license. By contributing to Chemfiles, you agree to distribute your contributions under the same license.

Chemfiles depends on multiple external libraries, which are distributed under their respective licenses. All external libraries licenses should be compatible with chemfiles's 3 clauses BSD. One notable execption depending on your use case is Gemmi which is distributed under the Mozilla Public License version 2. You can use CHFL_DISABLE_GEMMI=ON CMake flag to remove this dependency.

The AUTHORS file lists all contributors to Chemfiles. Many thanks to all of them!

chemfiles's People

Contributors

ezavod avatar frodofine avatar fxcoudert avatar jmintser avatar luthaf avatar maxlevesque avatar mdimura avatar pelsa avatar pgbarletta avatar sguionni avatar shoubhikraj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chemfiles's Issues

Runtime initialization of Molfile path

It can be nice to set the path to molfile plugins at runtime, and not only at compile time or by the CHRP_MOLFILES environment variable.

On Linux and OS X, we can get the path to the shared library libchemharp.so by using the following:

#include <dlfcn.h>
#include <iostream>

void foo() {
    Dl_info info;
    if (dladdr((void*)foo, &info)){
        std::cout << "foo is at " << info.dli_fname << std::endl;
    }
}

Then the path to molfile plugin can be detected from that.

Another option would be to use the static version of plugins.

PBC with tilted cells

I am not sure whether the current PBC code can handle very tilted cells (with angles less than 60° or more than 120°). We should at least add a test for that, and maybe fix the algorithm.

In case the algorithm do not handle it, a new cell type might be needed for these cells.

Remove the logging framework?

It does not make sense to have a logging framework in a library, and it could be removed. It is mainly used by the C API, to log exceptions at the interface, but they are not strictly needed.

Another option would be to have it silenced by default.

Relicense the code under BSD?

I do not know if this is worth it, the BSD does not have that much advantages compared to MPL. Opening this to discuss, if you have an opinion or a requirement concerning Chemfiles licensing, please tell us!

Support for biological molecules

I'm wondering if there's some space for an AminoAcid class. Objects of this class would be linked to each Frame object and 'contain' many objects of the Atom class. That is, a more 'OpenBabel' way of treating biological molecules and a more intuitive one.

I'm currently developing C++ software to use on biological molecules. Chemfiles is the only library that supports the major molecular dynamics trajectory formats and PDB format at the same time. But the lack of a Residue concept is critical. On the other hand, OpenBabel can't read netcdf and my soft became considerably slower when I made the change from chemfiles.

Updated homebrew bottle

Hello @Luthaf, I'm currently going through my homebrew-juliadeps tap, updating versions of software where appropriate, and wanted to check with you to see if upgrading to v0.4.0 of chemharp would be appropriate for the julia code that uses it. Is that something I should do now, or should I wait until changes have been made on the julia side?

case insensitive search for element(type) matching in PDB records

In PDB ATOM records, elements are always in upper case. I haven't noted this before, since biological molecules are almost exclusively formed by CHONP atoms.

Atom::Atom(std::string name, std::string type):
    name_(std::move(name)), type_(std::move(type)) {
    auto periodic = PERIODIC_INFORMATION.find(type_);
    if (periodic != PERIODIC_INFORMATION.end()) {
        mass_ = periodic->second.mass;
    }
}

Maybe strings in PERIODIC INFORMATION should be lower case and a std::tolower() be applied to the query string before any search.

Please confirm and I'll apply this minor change myself.
Cheers!

Add support for VMD Molfile plugin

Support for VMD molfile plugins will bring to Chemharp support of 15 new format for free. The firsts steps are in the molfile branch.

Write tutorials in the documentation

The only usage documentation for now is the API doc, which is a bit rough for a first contact. The tutorials could use the example and explain them. Other ideas are welcome!

Also, it could be nice to have the same tutorial with code examples in multiple languages.

Spatial zones for selections

Allowing the select atoms inside a sphere/cube/..., or within a given distance of another atom.

What is needed:

  • Decide on syntax;
  • Decide on implementation strategy inside the current implementation;

Provide a view in the positions/velocities

The FFI functions chfl_frame_positions and chfl_frame_velocities should return a pointer to the Frame::positions and Frame::velocities. This would allow to remove the need to copy the data for big files, and remove the need to do

chfl_frame_positions(frame, positions, atoms);
positions[2][3] = 4.0;
chfl_frame_set_positions(frame, positions, atoms);
  • Use contiguous data in Array3D, instead of std::vector<Vector3D>;
  • Provide this data in the FFI.

Lazy loading of Trajectory::nsteps

Getting the number of steps in a trajectory can be done at no cost for some formats (NetCDF), but might be very costly for other (XYZ). In big XYZ trajectory, just getting the number of steps will take 5s.

We always load the number of steps when opening a trajectory, which impose big cost just to open a file.

We should load this number of steps lazily (only when requested) and not systematically. To be still able to provide an indication of when a trajectory is fully read, we can add bool Format::is_done() and use it to implement bool Trajectory::is_done() and chfl_trajectory_is_done.

Add selections in topology

Users should be able to query Topology for selection, in VMD style. API may looks like this:

//! Get the atoms maching the selection in select
std::vector<bool> Topology::select(const std::string& selection) const;
//! Get the atoms maching the tuple selection in select
std::vector<std::vector<bool>> Topology::select(const std::vector<std::string>& selections) const;

// C API version
int chrp_topology_select(const CHRP_TOPOLOGY* topology, const char* selection, bool* res, size_t natoms);
int chrp_topology_select_tuple(const CHRP_TOPOLOGY* topology, const char** selections, size_t nselect, bool** res, size_t natoms);

Selection are operated with strings, and return the list of atoms corresponding to the selection.

Tuple selections allow to select pairs, triplet, ... of atoms matching a given pair, triplet, ... of selection string.

Examples

topology = ["H", "O", "H", "H", "O", "H"]

auto s = topology.select("name O");
s == [false, true, false, false, true, false];

auto s = topology.select(std::vector{"name H", "name O"}); 
s == [[true, false], [false, true], [true, false], [true, false], [false, true], [true, false]];

TODO list

  • Selection DSL specification;
  • Parsing selection string to AST;
  • Evaluation AST;
  • Testing;

Test chemfiles on MinGW

MinGW is a port of GNU compiler and tools on Windows. They are NOT an UNIX system.

They are available on Appveyor, and should be tested.

Test chemfiles on MSYS

MSYS is an Unix system on Windows. Msys2 is installed on Appveyor, and should be tested.

Public release planning

What should be done before the public release of this code :

  • Write interface
    • Text format: XYZ
    • Binary format: NetCDF
  • Bindings
    • C read-only binding
    • Fortran read-only binding
    • Python read-only binding
  • Formats
    • PDB
  • Documentation
    • User manual

Santa Claus wish list (or, what will append after the public release but before 1.0)

  • Read-Write bindings
  • Julia binding, either from C of C++
  • Developer documentation

Allowing Netcdf Amber trajectories without box information (Unit Cell)

I think the current way to allow .nc trajectories without box info is not working as expected.

When NCFormat::read_cell() calls dimension, if the trajectory has no box information, an error gets thrown by nc::check (called by dimension) before dimension can return, say, size=0

Maybe the call to dimenson in NCFormat::read_cell() could be enclosed in a try{...} catch{}?

Use constants for error handling in C FFI

The C FFI should #define constants for error codes, and use them instead of relying random values.

This would allow to:

  • check directly for error if (chfl_trajectory_open(...) != CHFL_SUCCESS);
  • add the failure causes in documentation;
  • propagate errors to the bindings. In the Rust one, we could remove a lot of Result using this;

PDBx/mmCIF

File Type text
Topological information Yes
Positions ???
Velocities ???

PDBx/mmCIF is the choosen replacement for PDB files starting on 2016. A C++ parser exists here: http://mmcif.wwpdb.org/

Remove dependency on netcdfcxx

This prevent easy static linkage of chemahrp, and the current version of netcdfcxx do not uses c++11 functionalities (move semantic in particular). I should call directly the C library for what is needed, wrapping a few types to C++.

Heuristic to guess residues in a topology

In the same spirit as the guess_bonds function, we could try to guess the residues in a topology. The simplest version could simply assume residues == molecule, and just follow the bonds graph. A more elaborated version could try to match sub-graphs to known residues like amino-acids.

Fix the size of data types at the C interface

The code for the C API should only use types with known size: int8_t, uint64_t. Else the bindings will rely on some assumptions about integer size that may not hold and will create nasty bugs.

Remove dependency on Boost libs

Boost libraries should be removed, as there is no strong need for them, and this would make it easier to embed chemahrp in another application.

Places where Boost is used, and how to remove it:

  • filesystem iteration in tests => see the C++1y standard with header;
  • any type => roll my own version, or inherit ostream in private to get access to the operator<<;

PDB format

File Type text
Topological information Yes
Positions Yes
Velocities ???

Reference: ftp://ftp.wwpdb.org/pub/pdb/doc/format_descriptions/Format_v33_A4.pdf

fresh install does not include config.hpp and exports.hpp

I recently cloned from github repo to ~/chemfiles and followed installation instructions, using these options for the cmake command:

 cmake -DCMAKE_BUILD_TYPE=debug -DCHFL_BUILD_TESTS=ON -DCHFL_ENABLE_NETCDF=ON ..

But for some reason both these files weren't present in the ~/chemfiles/include/chemfiles folder, nor in the installed folder /usr/local/include/chemfiles, so when I tried to compile my code (including the flag -I/usr/local/include), the following error appeared:

/usr/local/include/chemfiles.hpp:11:32: fatal error: chemfiles/config.hpp: No such file or directory

The error was fixed by copying these 2 files to /usr/local/include/chemfiles.

Add functions for working with PBC

Functions to get the distance/angle/dihedral between particles could be nice to have. Something like

double Frame::distance(size_t i, size_t j);
double Frame::angle(size_t i, size_t j, size_t k);
double Frame::dihedral(size_t i, size_t j, size_t k, size_t m);

int chfl_frame_distance(const CHFL_FRAME* frame, size_t i, size_t j, double* r);
int chfl_frame_angle(const CHFL_FRAME* frame, size_t i, size_t j, size_t k, double* theta);
int chfl_frame_dihedral(const CHFL_FRAME* frame, size_t i, size_t j, size_t k, size_t m, double* phi);

The other solution is to have a function to wrap a vector in the unit cell, and then let the users write theses functions by himself.

Support for residues in selections

With the new Residue class, additional selectors should be added:

  • resname would match the residue name;
  • resid would match the residue id;

I still need to think about how this can interact with multiple selections.

Replace the tests/data submodule by an archive

The tests/data git submodule is not really ergonomic, as submodules are not automatically updated when using git pull.

The tests data files are in a separated repository (https://github.com/chemfiles/tests-data) because they are not needed when building the code, and the size of the repository is non negligible.

Just checking in a tar.gz archive and unpacking it with cmake would not work here, because all the tar.gz files would still be in .git/objects. I thing the better way is to download the tests files at compile time, using cmake ExternalProject.

Developer documentation

Add a basic developer documentation before the 1.0 release.

Built with doxygen doc and some hand-written text.

Add support for atom labels in PDB files

Modify Atom concept to hold atom labels.

Main TODOs:

  • Create label_ variable and read it from columns 13-16 of the PDB format file.
  • Rename name_ variable to element_ and read it from columns 78-80 in the PDB format file.
    Then, for consistency, continue replacing name_ with element_ in the rest of the project.

Predefined atom groups: water, protein, ...

Using a selection like water or name Na could be nice. This need to predefine a group of atoms as "water".

I do not know how to do this, it may require to detect/classify molecules in the system using some kind of graph algorithms.

BUS_ERROR on OS X with OSX_DEPLOYMENT_TARGET=10.9 and NetCDF

This is weird. On Os X, when compiling the code in release mode with this exact set of flags, the xyz test run into a Bus error:

  • CMAKE_OSX_DEPLOYMENT_TARGET=10.9
  • CMAKE_C_FLAGS="-mmacosx-version-min=10.9"
  • CHFL_ENABLE_NETCDF=ON

The full set of commands to run to reproduce in a fresh build is:

cmake -DCMAKE_OSX_DEPLOYMENT_TARGET=10.9 -DCMAKE_C_FLAGS="-mmacosx-version-min=10.9" -DCHFL_BUILD_TESTS=ON -DCHFL_ENABLE_NETCDF=ON ../.. 
make
./tests/xyz

Valgrnind report is here, and a LLDB session looks like this:

(lldb) target create "tests/xyz"
Current executable set to 'tests/xyz' (x86_64).
(lldb) run
Process 24586 launched: 'tests/xyz' (x86_64)
Process 24586 stopped
* thread #1: tid = 0xcbcd, 0x00000001000b3130 xyz`boost::system::system_category()::system_category_const, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1000b3130)
    frame #0: 0x00000001000b3130 xyz`boost::system::system_category()::system_category_const
xyz`boost::system::system_category()::system_category_const:
->  0x1000b3130 <+0>: lock
    0x1000b3131 <+1>: repne
    0x1000b3132 <+2>: orb    (%rax), %al
    0x1000b3134 <+4>: addl   %eax, (%rax)
(lldb) bt
* thread #1: tid = 0xcbcd, 0x00000001000b3130 xyz`boost::system::system_category()::system_category_const, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1000b3130)
  * frame #0: 0x00000001000b3130 xyz`boost::system::system_category()::system_category_const
(lldb)

Since I do not get any backtrace in LLDB, and valgrind reports jumps on uninitialized values, this might be due to a stack corruption.

This was spotted when investigating the need for export MACOSX_DEPLOYMENT_TARGET="" in conda builds for conda-forge/staged-recipes#1571.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.