Comments (10)
@dgasmith @wadejong @matt-chan @cryos Default units and conversion factors cannot work, some of which is explained in earlier comments. I'll try to summarize the problem:
Different programs work in different units internally and they usually already have conversion factors to transform results to other units before printing. These aspects of existing software will not change. If you settle on standard units and conversion factors, one of the following two things is going to happen and neither are great:
- Such programs use the conversion factors of the JSON spec to write results in an agreed unit, which may be inconsistent with the usual output of that program.
- Such programs may ignore the standard conversion factors and become inconsistent with the spec.
A cleaner solution would be to let every program write results in a JSON file in its internal units, and to let it specify what these units mean. Then the receiver of the JSON data is free to handle the units in whichever way he/she likes. If conversion is needed, the most reasonable choice would be to take the units from the NIST website (which get refined occasionally as more literature becomes available). The disadvantage is that the spec becomes more complicated.
P.S. Most QC programs work in atomic units, which may not cause too much trouble. As soon as you want to exchange data with MM programs, all sorts of units are being used.
from qcschema.
Hey guys, big fan of your aspirations here, I wish people would put as much thought into their output formats as they do the rest of the program! I actually wrote the jsonextended package to help me parse and manipulate the data I'm working with from Gaussian, CRYSTAL, LAMMPS, etc, in the same kind of format you envisage. In particular, I thought you might be interested in how I am handling unit standardisation; with a "combine-apply-split" methodology, utilising the pint package. Here's a quick demo:
- read in your front page example output:
import json
from jsonextended import edict
test = json.load('test.json')
edict.pprint(test,depth=1)
driver: energy
error:
method: {...}
molecule: {...}
provenance: {...}
raw_output: Output storing was not requested.
return_value: {...}
success: True
variables: {...}
- Combine all ('val','units') leaf nodes into pint.Quantity objects:
from jsonextended import units as eunits
withunits = eunits.combine_quantities(test,'units','val')
edict.pprint(withunits,depth=2)
driver: energy
error:
method:
basis: sto-3g
expression: SCF
molecule:
atoms: [He, He]
geometry: [[0 0 0] [0 0 1]] Å
provenance:
creator: QM Program
routine: program.run_json
version: 1.1rc1
raw_output: Output storing was not requested.
return_value: -5.433191881443323 E_h
success: True
variables:
NUCLEAR REPULSION ENERGY: 2.11670883436 E_h
ONE-ELECTRON ENERGY: -11.67399006298957 E_h
SCF DIPOLE X: 0.0 E_h
SCF DIPOLE Y: 0.0 E_h
SCF DIPOLE Z: 0.0 E_h
SCF N ITERS: 2.0
SCF TOTAL ENERGY: -5.433191881443323 E_h
SCF TWO-ELECTRON ENERGY: 4.124089347186247 E_h
- Apply a unit schema to the data, to convert specified fields to the required units.
newunits = eunits.apply_unitschema(withunits,{'geometry':'nm',
'return_value':'kcal',
'variables':{'SCF*':'eV'}},
use_wildcards=True)
edict.pprint(newunits,depth=2)
driver: energy
error:
method:
basis: sto-3g
expression: SCF
molecule:
atoms: [He, He]
geometry: [[ 0. 0. 0. ] [ 0. 0. 0.1]] nm
provenance:
creator: QM Program
routine: program.run_json
version: 1.1rc1
raw_output: Output storing was not requested.
return_value: -5.661406639574504e-21 kcal
success: True
variables:
NUCLEAR REPULSION ENERGY: 2.11670883436 E_h
ONE-ELECTRON ENERGY: -11.67399006298957 E_h
SCF DIPOLE X: 0.0 eV
SCF DIPOLE Y: 0.0 eV
SCF DIPOLE Z: 0.0 eV
SCF N ITERS: 2.0 eV
SCF TOTAL ENERGY: -147.84466590569593 eV
SCF TWO-ELECTRON ENERGY: 112.22217528934715 eV
- Split the
pint.Quantity
objects back into their ('val','units') pairs:
removeunits = eunits.split_quantities(newunits,'units','val')
edict.pprint(removeunits,depth=3)
driver: energy
error:
method:
basis: sto-3g
expression: SCF
molecule:
atoms: [He, He]
geometry:
units: nanometer
val: [[ 0. 0. 0. ] [ 0. 0. 0.1]]
provenance:
creator: QM Program
routine: program.run_json
version: 1.1rc1
raw_output: Output storing was not requested.
return_value:
units: kilocalorie
val: -5.661406639574504e-21
success: True
variables:
NUCLEAR REPULSION ENERGY:
units: hartree
val: 2.11670883436
ONE-ELECTRON ENERGY:
units: hartree
val: -11.67399006298957
SCF DIPOLE X:
units: electron_volt
val: 0.0
SCF DIPOLE Y:
units: electron_volt
val: 0.0
SCF DIPOLE Z:
units: electron_volt
val: 0.0
SCF N ITERS:
units: electron_volt
val: 2.0
SCF TOTAL ENERGY:
units: electron_volt
val: -147.84466590569593
SCF TWO-ELECTRON ENERGY:
units: electron_volt
val: 112.22217528934715
Ta,
Chris
from qcschema.
jsonextended and pint are very impressive but I guess, for the sake of defining a JSON schema, they may add too much complexity? It would be nice though to design the schema such that it plays nice with these packages.
jsonextended and pint do not seem solve the original problem mentioned by @loriab, namely that different QC codes have different definitions of unit conversion factors, e.g. they use (slightly) different numbers to convert from Bohr to Angstrom. Is there a way to get around this?
from qcschema.
@tovrstra Agreed, I think we can recommend tools. However, the spec itself is tool independent.
Using slightly different conversion factors is tricky. We could take the following steps:
- Request that all input/output values to QM programs be in Hartree
- MolSSI could build a repository that had the updated values for everyone to use.
from qcschema.
@dgasmith So you suggest to drop any support for different units and require all numbers to use atomic units?
from qcschema.
from qcschema.
from qcschema.
This is a very tricky problem, with many different codes using different conversion factors and units in their output. In a JSON context, one possible approach would be to have an extra field that specifies the conversion factor for each quantity (length, energy, etc.) used by the program of interest to some specific convention, e.g. atomic units. This would allow a.u. input to be converted internally by any code, using their native conventions, as usual. It would also provide a mechanism for converting output received to a 'standard' form (a.u. in the example I provided).
from qcschema.
Instead of accepting a variety of units, it would be nice to work with one set. That way, a simple project implementing the spec wouldn't be required to include code to convert from a plethora of possible units.
As others have suggested we would need an agreed standard (molssi or iupac) for conversion.
We could include test cases which would help codes that don't natively work with those units to minimize bugs. (Even if we decide to accept multiple unit systems in the spec, it'd still be a good idea to have the tests)
from qcschema.
Agreed, strongly recommend one variety of units. Support others, but have a recommended set of units for the format. Agreed conversion factors to apply would then be available.
from qcschema.
Related Issues (20)
- Multiple conformations in a single file? HOT 16
- Chemical identity information for non-QM packages HOT 35
- Suggestion: support for YAML file format HOT 5
- Request wavefunction data returns HOT 11
- Multi-method properties HOT 8
- Basis issue orderings HOT 5
- Version 1 HOT 2
- molecule extensions for zmat and efp
- ordering of lists in Molecule schema HOT 10
- Bot Integration
- move "schema_*" fields into molecule schema HOT 1
- add schema fields to molecule HOT 3
- Wavefunction data HOT 7
- For CCSD(T) add separate entry for (T) contributions to cc_properties HOT 2
- QCSchema with PBC? HOT 22
- Charges (AKA populations) HOT 2
- Keeping QCSchema in sync with QCElemental HOT 20
- multipole storage HOT 3
- Additional tensorial properties: pair with QCEl#241 HOT 2
- C-compatible QCSchema implementation HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qcschema.