Giter Site home page Giter Site logo

h5z-sperr's Introduction

H5Z-SPERR

This is an HDF5 Plugin for the SPERR compressor. It is registered with the HDF Group with a plugin ID of 32028.

Build and Install

Needless to say, H5Z-SPERR depends on both HDF5 and SPERR. After HDF5 and SPERR are installed, H5Z-SPERR can be configured and built using cmake:

export HDF5_ROOT=/path/to/your/preferred/HDF5/installation
mkdir build-H5-SPERR && cd build-H5-SPERR
cmake -DSPERR_INSTALL_DIR=/path/to/SPERR/installation     \
      -DCMAKE_INSTALL_PREFIX=/path/to/install/this/plugin \
      /path/to/H5Z-SPERR/source/code
make install

The plugin library file libh5z-sperr.so will be placed at directory /path/to/install/this/plugin.

Use As a Dynamically Loaded Plugin

Using the dynamically loaded plugin mechanism by HDF5, one may use H5Z-SPERR by simply setting the environment variable HDF5_PLUGIN_PATH to the directory containing the plugin library file:

export HDF5_PLUGIN_PATH=/path/to/install/this/plugin

The user program does not need to link to this plugin or SPERR; it only needs to specify the plugin ID of 32028.

Find cd_values[]

To apply SPERR compression using the HDF5 plugin, one needs to specify 1) what compression mode and 2) what compression quality to use. Supported compression modes and qualities are summarized below:

Mode No. Mode Meaning Meaningful Quality Range
1 Fixed bit-per-pixel (BPP) 0.0 < quality < 64.0
2 Fixed peak signal-to-noise ratio (PSNR) 0.0 < quality
3 Fixed point-wise error (PWE) 0.0 < quality

In addition, the rank order needs to be swapped sometimes to achieve the best compression. For example, if a 2D slice of dimensions (64, 128) has the dim=128 rank being the fastest varying rank, then this slice needs a "rank order swap" to achieve the best compression. In general, given dimensions of (NX, NY, NZ), we want the X rank to be varying the fastest, and the Z rank to be varying the slowest, before the data is passed to the compressor. "Rank order swap" helps to achieve it.

The HDF5 libraries takes in these compression parameters as one or more 32-bit unsigned int values, which are named cd_values[] in most HDF5 routines. In the case of H5Z-SPERR, there is exactly one unsigned int used to carry this information.

Find cd_values[] Using the Programming Interface

Using the HDF5 programming interface, cd_values[] carrying the compression parameters are passed to HDF5 routines such as H5Pset_filter(). To find the correct cd_values[], a user needs to include the header h5z-sperr.h from this repository and call the unsigned int H5Z_SPERR_make_cd_values(int mode, double quality, int swap) function to have these two pieces of information correctly encoded. For example:

int mode = 3;              /* Fixed PWE compression */
double quality = 1e-6;     /* PWE tolerance = 1e-6 */
int swap = 1;              /* Enable rank order swap */
unsigned int cd_values = H5Z_SPERR_make_cd_values(mode, quality, swap);   /* Generate cd_values */
H5Pset_filter(prop, 32028, H5Z_FLAG_MANDATORY, 1, &cd_values);            /* Specify SPERR compression in HDF5 */

See a complete example here.

Find cd_values[] Using the CLI Tool generate_cd_values

After building H5Z-SPERR, a command line tool named generate_cd_values becomes available to encode 1) SPERR compression mode, 2) quality, and 3) if to perform rank order swap into a single unsigned int. The produced value can then be used in other command line tools such as h5repack. In the following example, generate_cd_values reports that 268651725u encodes fixed-rate compression with a bitrate of 3.3 bit-per-value, without doing rank order swap.

$ ./bin/generate_cd_values 1 3.3
For fixed-rate compression with a bitrate of 3, without swapping rank orders,
H5Z-SPERR cd_values = 268651725u (Filter ID = 32028).
Please use this value as a single 32-bit unsigned integer in your applications.

Note: an integer produced by generate_cd_values can be decoded by another command line tool, decode_cd_values, to show the coded compression parameters.

Use in NetCDF-4 APIs

H5Z-SPERR also facilitates the application of SPERR compression on NetCDF-4 files; one simply needs to define the filter on a variable:

nc_def_var_filter(ncid, varid, 32028, 1, &cd_values);

See a complete example here.

h5z-sperr's People

Contributors

shaomeng avatar

Watchers

 avatar  avatar

h5z-sperr's Issues

Consider using more compact storage of compression mode and quality

Idea from Peter:
"You may even consider encoding all of this information in a single 32-bit unsigned integer. Maybe use a few bits (say, 4, for future proofing) for compression mode and remaining bits for a fixed-point number. For the ranges above, 12 bits signed integer and 16 bits of fraction would work. It's true that no one could manually "decode" such a representation in their head, but I feel equally confident that no one could interpret a binary double-precision value broken into two 32-bit integers either. ;-)"

"Both BPP and PSNR have reasonable ranges that can be encoded in 32-bit fixed point. PWE > 0, so you may consider a fixed-point representation of -1074 <= log2(PWE) < 1024, rounded up (so you don't violate the tolerance)."

Another benefit of using 32 bits to store such information is that it's more resilient to byte swaps than using two 32-bit ints to store a 64-bit double. Consider the following example (again, from Peter)
"Suppose I do a memcpy() of a double to an unsigned int cd_vals array. And for simplicity, let's say our double has 64-bit binary representation 0x0123456789abcdef. On a little-endian machine, such a memcpy will result in the following byte sequence:

{0xef, 0xcd, 0xab, 0x89, 0x67, 0x45, 0x23, 0x01}

Because this is copied to an array of unsigned 32-bit ints, those two ints will be interpreted as

{0x89abcdef, 0x01234567}

Now suppose we're reading those two cd_vals on a big-endian machine. As I understand, HDF5 will byte swap them for you as 32-bit integers, resulting in this byte-swapped sequence:

{0x89, 0xab, 0xcd, 0xef, 0x01, 0x23, 0x45, 0x67}

Those are interpreted as these two 32-bit integers (i.e., same as on the little-endian machine).

{0x89abcdef, 0x01234567}

We then copy those 8 bytes to a 64-bit double using memcpy and obtain as 64-bit binary representation

0x89abcdef01234567

This is not the same as 0x0123456789abcdef, because the 8 bytes were swapped not as a 64-bit type but rather as two 32-bit types. I mean, HDF5 had no way of knowing that a pair of 32-bit integers were used to store a 64-bit quantity, so how could it correctly byte swap them? How is this dealt with properly? Or is it?"

None 4-byte unsigned int data types

The current metadata packing mechanism relies on that unsigned int being 4 bytes. However, it might not be the case, as Peter puts it:
"I'm also moderately concerned about the use of type punning. Part of this is HDF's fault for using unsigned integers (which are not safe for this purpose; only unsigned char is), which need not be 32 bits wide (e.g., the LP32 and (S)ILP64 data models use 16 or 64 bits)."

Provide helper functions to set/get metadata

From Peter:
"To simplify usage and handle potential future changes to how you encode metadata through cd_vals, I would suggest providing helper functions for getting/setting compression mode and parameters."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.