Giter Site home page Giter Site logo

hdf5-rust's Introduction

hdf5-rust

HDF5 for Rust.

Build Latest Version Documentation Changelog hdf5: rustc 1.51+ Total Lines Apache 2.0 MIT

The hdf5 crate (previously known as hdf5-rs) provides thread-safe Rust bindings and high-level wrappers for the HDF5 library API. Some of the features include:

  • Thread-safety with non-threadsafe libhdf5 builds guaranteed via reentrant mutexes.
  • Native representation of most HDF5 types, including variable-length strings and arrays.
  • Derive-macro for automatic mapping of user structs and enums to HDF5 types.
  • Multi-dimensional array reading/writing interface via ndarray.

Direct low-level bindings are also available and are provided in the hdf5-sys crate.

Requires HDF5 library of version 1.8.4 or later.

Example

#[cfg(feature = "blosc")]
use hdf5::filters::blosc_set_nthreads;
use hdf5::{File, H5Type, Result};
use ndarray::{arr2, s};

#[derive(H5Type, Clone, PartialEq, Debug)] // register with HDF5
#[repr(u8)]
pub enum Color {
    R = 1,
    G = 2,
    B = 3,
}

#[derive(H5Type, Clone, PartialEq, Debug)] // register with HDF5
#[repr(C)]
pub struct Pixel {
    xy: (i64, i64),
    color: Color,
}

impl Pixel {
    pub fn new(x: i64, y: i64, color: Color) -> Self {
        Self { xy: (x, y), color }
    }
}

fn write_hdf5() -> Result<()> {
    use Color::*;
    let file = File::create("pixels.h5")?; // open for writing
    let group = file.create_group("dir")?; // create a group
    #[cfg(feature = "blosc")]
    blosc_set_nthreads(2); // set number of blosc threads
    let builder = group.new_dataset_builder();
    #[cfg(feature = "blosc")]
    let builder = builder.blosc_zstd(9, true); // zstd + shuffle
    let ds = builder
        .with_data(&arr2(&[
            // write a 2-D array of data
            [Pixel::new(1, 2, R), Pixel::new(2, 3, B)],
            [Pixel::new(3, 4, G), Pixel::new(4, 5, R)],
            [Pixel::new(5, 6, B), Pixel::new(6, 7, G)],
        ]))
        // finalize and write the dataset
        .create("pixels")?;
    // create an attr with fixed shape but don't write the data
    let attr = ds.new_attr::<Color>().shape([3]).create("colors")?;
    // write the attr data
    attr.write(&[R, G, B])?;
    Ok(())
}

fn read_hdf5() -> Result<()> {
    use Color::*;
    let file = File::open("pixels.h5")?; // open for reading
    let ds = file.dataset("dir/pixels")?; // open the dataset
    assert_eq!(
        // read a slice of the 2-D dataset and verify it
        ds.read_slice::<Pixel, _, _>(s![1.., ..])?,
        arr2(&[
            [Pixel::new(3, 4, G), Pixel::new(4, 5, R)],
            [Pixel::new(5, 6, B), Pixel::new(6, 7, G)],
        ])
    );
    let attr = ds.attr("colors")?; // open the attribute
    assert_eq!(attr.read_1d::<Color>()?.as_slice().unwrap(), &[R, G, B]);
    Ok(())
}

fn main() -> Result<()> {
    write_hdf5()?;
    read_hdf5()?;
    Ok(())
}

Compatibility

Platforms

hdf5 crate is known to run on these platforms: Linux, macOS, Windows (tested on: Ubuntu 16.04, 18.04, and 20.04; Windows Server 2019 with both MSVC and GNU toolchains; macOS Catalina).

Rust

hdf5 crate is tested continuously for all three official release channels, and requires a reasonably recent Rust compiler (e.g. of version 1.51 or newer).

HDF5

Required HDF5 version is 1.8.4 or newer. The library doesn't have to be built with threadsafe option enabled in order to make the user code threadsafe.

Various HDF5 installation options are supported and tested: via package managers like homebrew and apt; system-wide installations on Windows; conda installations from both the official channels and conda-forge. On Linux and macOS, both OpenMPI and MPICH parallel builds are supported and tested.

The HDF5 C library can also be built from source and linked in statically by enabling hdf5-sys/static feature (CMake required).

Building

HDF5 version

Build scripts for both hdf5-sys and hdf5 crates check the actual version of the HDF5 library that they are being linked against, and some functionality may be conditionally enabled or disabled at compile time. While this allows supporting multiple versions of HDF5 in a single codebase, this is something the library user should be aware of in case they choose to use the low level FFI bindings.

Environment variables

If HDF5_DIR is set, the build script will look there (and nowhere else) for HDF5 headers and binaries (i.e., it will look for headers under $HDF5_DIR/include).

If HDF5_VERSION is set, the build script will check that the library version matches the specified version string; in some cases it may also be used by the build script to help locating the library (e.g. when both 1.8 and 1.10 are installed via Homebrew on macOS).

conda

It is possible to link against hdf5 conda package; a few notes and tips:

  • Point HDF5_DIR to conda environment root.
  • The build script knows about conda environment layout specifics and will adjust paths accordingly (e.g. Library subfolder in Windows environments).
  • On Windows, environment's bin folder must be in PATH (or the environment can be activated prior to running cargo).
  • On Linux / macOS, it is recommended to set rpath, e.g. by setting RUSTFLAGS="-C link-args=-Wl,-rpath,$HDF5_DIR/lib".
  • For old versions of HDF5 conda packages on macOS, it may also be necessary to set DYLD_FALLBACK_LIBRARY_PATH="$HDF5_DIR/lib".

Linux

The build script will attempt to use pkg-config first, which will likely work out without further tweaking for the more recent versions of HDF5. The build script will then also look in some standard locations where HDF5 can be found after being apt-installed on Ubuntu.

macOS

On macOS, the build script will attempt to locate HDF5 via Homebrew if it's available. If both 1.8 and 1.10 are installed and available, the default (1.10) will be used unless HDF5_VERSION is set.

Windows

hdf5 crate fully supports MSVC toolchain, which allows using the official releases of HDF5 and is generally the recommended way to go. That being said, previous experiments have shown that all tests pass on the gnu target as well, one just needs to be careful with building the HDF5 binary itself and configuring the build environment.

Few things to note when building on Windows:

  • hdf5.dll should be available in the search path at build time and runtime (both gnu and msvc). This normally requires adding the bin folder of HDF5 installation to PATH. If using an official HDF5 release (msvc only), this will typically be done automatically by the installer.
  • msvc: installed Visual Studio version should match the HDF5 binary (2013 or 2015). Note that it is not necessary to run vcvars scripts; Rust build system will take care of that.
  • When building for either target, make sure that there are no conflicts in the search path (e.g., some binaries from MinGW toolchain may shadow MSVS executables or vice versa).
  • The recommended platform for gnu target is TDM distribution of MinGW-GCC as it contains bintools for both 32-bit and 64-bit.
  • The recommended setup for msvc target is VS2015 x64 since that matches CI build configuration, however VS2013 and x86 should work equally well.

License

hdf5 crate is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0). See LICENSE-APACHE and LICENSE-MIT for details.

hdf5-rust's People

Contributors

ajtribick avatar aldanor avatar balintbalazs avatar berrysoft avatar hugwijst avatar jamesmc86 avatar kalcutter avatar kkirstein avatar magnusumet avatar mulimoen avatar phil-opp avatar pmarks avatar qkoziol avatar rex4539 avatar rikorose avatar rytheo avatar soph-dec avatar superfluffy avatar watsaig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdf5-rust's Issues

How to write to a dataset without holding the whole dataset in memory?

Hello! Is there any way to write to a dataset and not hold the whole dataset in memory? I'm sure I'm missing something simple here.

The line in question is in the counter function, write_slice. Should I be manually calling the flush method at some point?

fn create_loom(
    file_prefix: String,
    row_names: Vec<String>,
    col_names: Vec<String>,
) -> hdf5::Dataset {
    let _ = hdf5::silence_errors();
    let file = hdf5::File::open(format!("{}_counts.loom", file_prefix), "w").unwrap();
    let global_attrs = file.create_group("attrs").unwrap();
    let loom_version = global_attrs
        .new_dataset::<hdf5::types::VarLenUnicode>()
        .create("LOOM_SPEC_VERSION", 1)
        .unwrap();
    let row_attrs = file.create_group("row_attrs").unwrap();
    let col_attrs = file.create_group("col_attrs").unwrap();
    let r_names = row_attrs
        .new_dataset::<hdf5::types::VarLenUnicode>()
        .create("row_names", row_names.len())
        .unwrap();
    let c_names = col_attrs
        .new_dataset::<hdf5::types::VarLenUnicode>()
        .create("col_names", col_names.len())
        .unwrap();
    let row_num = row_names.len();
    let col_num = col_names.len();
    unsafe {
        loom_version
            .write(&[hdf5::types::VarLenUnicode::from_str_unchecked("3.0.0")])
            .unwrap();
        r_names
            .write(
                &row_names
                    .into_iter()
                    .map(|r| hdf5::types::VarLenUnicode::from_str_unchecked(r))
                    .collect::<Vec<_>>(),
            )
            .unwrap();
        c_names
            .write(
                &col_names
                    .into_iter()
                    .map(|c| hdf5::types::VarLenUnicode::from_str_unchecked(c))
                    .collect::<Vec<_>>(),
           )
            .unwrap();
    }
    let matrix = file
        .new_dataset::<u32>()
        .create("matrix", (row_num, col_num))
        .unwrap();
    matrix
}

fn counter(
    cells: &HashMap<String, HashMap<String, Lapper<u32>>>,
    regions: &HashMap<String, Lapper<u32>>,
    cell_ids: &Vec<String>,
    output_prefix: String,
) {
    let mut chrs = regions.keys().map(|k| k.clone()).collect::<Vec<String>>();
    chrs.sort_unstable();
    let row_names = create_region_names(regions);
    let col_num = cell_ids.len();

    let matrix = create_loom(output_prefix, row_names, cell_ids.clone());
    let mut i = 0;
    for chr in chrs.iter() {
        let region_ivs = &regions.get(chr).unwrap().intervals;
        for iv in region_ivs.iter() {
            let mut counts = vec![0; col_num];
            for (j, cell_id) in cell_ids.iter().enumerate() {
                let cell = cells.get(cell_id).unwrap();
                if let Some(lapper) = cell.get(chr) {
                    counts[j] = lapper.count(iv.start, iv.stop) as u32;
                }
            }
            matrix
                .write_slice(Array1::from_vec(counts).view(), s![i, ..])
                .unwrap();
            i += 1;
        }
    }
    //matrix.write(data.view()).unwrap();
}

Find default HDF5 location in Fedora Linux

The build script for hdf5-sys does not find HDF5 in Fedora Linux, neither via pkg-conf nor via default location lookup. I'll open a PR to find the default HDF5 location.

Attribute support?

I would really like to use this crate but all my usage of HDF5 requires pulling metadata out of attributes. I'm willing to contribute (to) attribute support if you can offer me some direction.

The hdf5 and hdf5-sys names on crates.io

Dear user of / contributor to hdf5-rs: if you support the idea that this crate is renamed to hdf5 on crates.io (and the library is renamed to hdf5 as well), please leave a shout here (your voice matters!):

Longer story: when this crate was created, names "hdf5" and "hdf5-sys" (and "hdf5" library name) had already been reserved on crates.io, hence we currently have the following collection of crates:

  • hdf5-rs
  • libhdf5-sys
  • hdf5-derive
  • hdf5-types

Also, since hdf5_rs makes for an unwieldy import name, the library is currently named h5, like h5::File. This makes it somewhat confusing and is arguably non-idiomatic, compared to other popular Rust crates (e.g. take "curl": the github repo is "curl-rust", the crates are "curl-sys" and "curl", the library is "curl").

As the author (@IvanUkhov) of hdf5 and hdf5-sys crates notes, he has no future plans in supporting or developing those packages -- which may lead to further confusion for new users. However, he kindly agrees to transfer the ownership of those names to us, conditional on the fact that this is "what the community prefers" (which is the part where you, the community, come in):

However, I do want to make sure that it is indeed what the community prefers, and I wasn’t able to infer this from the download counters of hdf5 and hdf5-rs.

If we rename the crates, we'll end up with the following:

  • hdf5
  • hdf5-sys
  • hdf5-derive
  • hdf5-types

with the library name being "hdf5", importable like hdf5::File.

libhdf5-sys crate would then be deprecated and eventually removed from the crates.io index.

h5lock is still private

I tried to make use of h5lock today since #60 is closed, but I'm still getting the same error:

error[E0603]: module `sync` is private
  --> <::hdf5::macros::h5lock macros>:7:53
   |
7  |         [allow (unused_unsafe)] unsafe { $ crate :: sync :: sync (|| $ expr) }
   |                                                     ^^^^ this module is private
   |
note: the module `sync` is defined here
  --> /home/user/.cargo/git/checkouts/hdf5-rust-c1072d7a2617f7d8/e7a28a7/src/lib.rs:79:1
   |
79 | mod sync;
   | ^^^^^^^^^

error: aborting due to previous error

I deleted the whole target directory to ensure this is not some remnant from the past. Rust recompiled everything, but this error is still there.

Make File cloneable

Can we make File cloneable? As far as I can see, we could just derive Clone as Handle already implements Clone. Or is there anything preventing this?

My use-case would be flushing the file from different threads (or tasks). This should be no problem, right?

Awesome work @aldanor btw!

0.5.0 release

So I've been thinking, although the crate is clearly in pre-alpha stage now, it has evolved enough for other folks to be able to contribute, plus it reached a somewhat usable state, with derive proc-macros done and basic dataset reading/loading working.

As such, I'm proposing to merge the current dev branch into master soon and release a 0.3.0a (where "a" underlines the crate's alpha state); motivation:

  • People who visit github page wouldn't be thrown off by "last updated: 2.5 years ago" status (there's been over 350 new commits in the dev branches). Most of the traffic is from google, so it would help.
  • Would hopefully also attract potential contributors (with master branch showing most recent progress).
  • The current version would be cargo-installable without touching git
  • hdf5-derive and hdf5-types crate names would be reserved on crates.io, currently they're not so anyone can take them.
  • Update the docs (they are in messy state right now and most are missing, but at least the API would be sort-of-browsable).

Things to do:

  • Merge in the dataset slice read/write API (although it needs cleanup, more tests, refactoring and non-standard layout support - we'll refactor it later; the API isn't likely to change). (#20)
  • Merge of the latest work into master; clean up the branches.
  • Update the README with a simplest example involving derive macro and dataset reading / writing.
  • Update the changelog (going to be lengthy).
  • Fix compiletest on Travis so we can finally see some green builds (also they're currently failing on nightly due to rust-lang/rust#57488; also there's an upstream bug in compiletest-rs with the latest nightly on OS X).
  • Fix the issue with hbool_t (#28); may require rewriting the current build system. (fixed in #29).
  • Make high-level objects cloneable, separate .copy() (#23) -- this is pretty simple to implement.
  • Finish the work on FCPL / FAPL, including various file driver backends and writing tests. (#31)
  • Resolve the story with potential crate renaming.

Compiling for WebAssembly

Compiling hdf5-sys and hdf5-types for WebAssembly seems to fail.
While cargo build is successful, running wasm-pack build ends up in a bunch of similar errors about libc:

error[E0425]: cannot find function `malloc` in module `libc`
  --> ~/.cargo/registry/src/github.com-1ecc6299db9ec823/hdf5-types-0.5.1/src/array.rs:71:29
   |
71 |             let dst = libc::malloc(len * mem::size_of::<T>());
   |                             ^^^^^^ not found in `libc`

Is might be related to rust-lang/libc#858 but that issue seems to be resolved.

It should be possible to reproduce the issue by adding hdf5 dependency to Cargo.toml

[dependencies]
hdf5 = "0.5.1"

followed by running wasm-pack build.

Reading slices fails with negative indices

Say you have a simple one-dimensional dataset with shape [64]. If you try to read the last entry with dataset.read_slice_1d(s![-1]) hdf5 will error out (of course, the same happens with any dimensional dataset). A current workaround is to set let shape = dataset.shape(); and then execute dataset.read_slice_1d(s![shape[0]-1]).

I will come up with a minimum working example, but just wanted to note this here already, in case somebody encounters the same issue.

Fixed length strings with run-time known length

A fairly common pattern is to create a fixed-length string dataset, with a string length that is selected at runtime. For example, one might select a length that is the maximum length of a list of strings.

The current strongly-typed dataset API seems to make a pattern like this very difficult: you have to instantiate a big set of FixedLengthAscii<> types at compile time, and select one of them at runtime.

Seems like it might be possible to have a 'dynamic' version of DatasetBuilder that lets the user pass in a TypeDescriptor directly? If anyone has a suggestion for how to approach this it would be appreciated. I'll be attempting to address this in the next few months.

Example for writing and reading user block

I would like to store some extra information about the data stored in my hdf5 file in the user block part of the file. I am not sure how to go about doing that. Can somebody maybe provide an example?

H5ls analogous behavior ?

Hi @aldanor ,

Thanks for the awesome library.
I am using the library for one of small project here. The use case I am looking for is analogous to h5ls i.e. I want to know the group names present in the file before extracting the datatset. Is it possible to share a small example which queries the h5 for group name and select a group and load the data from the group into a u32 vector ? I tried using fapl to extract the properties of the file but the program gives compilation error.

While poking I also found that as the example in the readme somehow hdf5::File::create command doesn't compiles but works when I do hdf5::File::open. Also open command always expects to have 2 parameters even if I want to open the file in just read only mode. I was looking at the code here but couldn't understand which code path actually is being followed for opening a file.

Thanks again.

What needs to be done?

What is left on the todo list to get this useable? What can developers with different backgrounds do to help?

silence_errors() not working

silence_errors() doesn't seem to work on Linux or Mac. I'm using HDF5 1.8.16 on Mac and HDF5 1.8.20 on Linux. At first I wasn't aware of it & made my own very simple version, but I couldn't get it to work either. I haven't been able to find any mentions of this issue in the HDF5 forums. Is there any guidance on using it properly?

Support vcpkg on windows

I tried to use this crate on windows in combination with vcpkg.

What I did was:

CMD> vcpkg install hdf5:x64-windows

But the build.rs file does not pickup this installation. Maybe we can re-use the pkg-config detection for this type of installation of hdf5?

Any ideas?

probable memory leak in Dataset::chunk_info (possibly in HDF5)

Hi, I've been working on a project that goes through all chunks of all datasets in a file. And then drops all references to the original hdf5::File. It seems that this causes a memory leak, but I'm not 100% sure. The structure I've built up is just a couple of MBs or less for a 3.7gb input file (or 8mb serialized to json), but after having built the index I am using about 700 mb of ram (vs just a few mbs if I only open and deserialize the same index). I have no idea if the leak is in hdf5-rust or perhaps more probable in native hdf5 itself. Relevant code: https://github.com/gauteh/dars/blob/hdfidx/hidefix/src/idx/dataset.rs#L86

HDF5 pb (1.8.15-patch1) - H5Pcreate(): not a property list class

Hi,

I am trying to use the the library (cargo version) but fail with the HDF5 library version mentionned in the title (under archlinux).

The code I use is:

extern crate hdf5_rs;

fn main() {
    let mut matfile = hdf5_rs::File::open("/path/to/h5ex_t_int.h5", "r").unwrap();
    matfile.close();
}

The error I get is:

$ cargo run                                                                                                                  ✹ ✭[1853][18:12:47]
     Running `target/debug/loadmat`
HDF5-DIAG: Error detected in HDF5 (1.8.15-patch1) thread 140264758147072:
  #000: H5P.c line 299 in H5Pcreate(): not a property list class
    major: Invalid arguments to routine
    minor: Inappropriate type
thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: "H5Pcreate(): not a property list class"', src/libcore/result.rs:688
Process didn't exit successfully: `target/debug/loadmat` (exit code: 101)

BTW the cargo indicate v 0.1.0 while the git indicates v 1.5.0 is this normal or the potential souce of my problem ?

Thanks for any tips you could provide

Compound type support

Hello @aldanor, currently compound types are not supported (H5T_COMPOUND).

Do you currently have any plans for supporting them?

Update the docs

  • Remove docs.rs link from README; docs.rs can't build this crate
  • For some reason, gh-pages don't get pushed automatically; figure out why and update them

hbool_t on Windows and accessing HDF5 config header

So, we currently have hbool_t mapped as c_uint which turns out to be wrong because on Linux / OSX it's an alias of C bool which is 1-byte. As a result, some struct layouts get completely jumbled (which is pretty hard to check other than observing stuff failing in obscure ways).

Once I've changed it to 1 byte, it worked out perfectly on Unix, but some Windows tests started failing on AppVeyor (thanks god we have some kind of an actual test suite...). Took me a while to narrow down the issue (I had to install Windows with MSVS in a virtual machine...), turns out to be this:

#ifdef H5_HAVE_STDBOOL_H
  #include <stdbool.h>
#else /* H5_HAVE_STDBOOL_H */
  #ifndef __cplusplus
    #if defined(H5_SIZEOF_BOOL) && (H5_SIZEOF_BOOL != 0)
      #define bool    _Bool
    #else
      #define bool    unsigned int
    #endif
    #define true    1
    #define false   0
  #endif /* __cplusplus */
#endif /* H5_HAVE_STDBOOL_H */
typedef bool hbool_t;

Now, the way it works with HDF5 builds, all those H5_* constants are auto-generated by the CMake process at build time, and dumped into a special header, H5pubconf.h, which is distributed along with the binaries and the source headers. It appears that whichever machine they were building this on, stdbool.h header was not accessible (which is somewhat weird because it has been available in MSVS since v2013). Apparently the CMake process has been fixed upstream about two weeks ago, so it will now use assume C99, and thus use proper bool type – however, we can't make use of it since we have to support all the older versions of the library.

We could just say, of course,

#[cfg(target_env = "msvc")]
pub type hbool_t = c_uint;
#[cfg(not(target_env = "msvc"))]
pub type hbool_t = u8;

... but this is generally wrong, dodgy and fragile. Because, for instance, 3 years ago it looked like this:

typedef unsigned int hbool_t;

Unfortunately, all those H5_* constants are not linked in into the library itself, so we can't access them the same way we extract library version.

... which leads to another point -- having those constants available as compile-time things, or at least some of them, like H5_HAVE_PARALLEL, H5_THREADSAFE, H5_HAVE_DIRECT would allow us to provide proper conditionally-compiled bindings for those cases (MPI support, thread-safe build and direct driver support, respectively).

Summarizing:

  • Pros:
    • we gain access to auto-generated information about how the library was built
    • we can provide conditional bindings for MPI stuff, direct driver support, thread-safe builds etc
    • types like hbool_t will be properly bound
  • Cons:
    • we have to assume include/H5pubconf.h always exists, we won't be able to build the crate without it (unless we make some assumptions which I'd rather not make at build time). This seems reasonable because the library is always distributed with the headers. Which means we'll have to search for the header kind of the same way we search for the library files; which might mean hard-coding library directory structure layout on windows and linux/osx.
    • we will either have to compile a C program into a crate (e.g. via using cc-rs; this implies access to a C compiler), or parse the header ourselves manually.

Options available:

  1. Compile a simple C program that dumps all this stuff (or provides it as a crate) via cc-rs. Now this assumes existence of a C compiler on the current platform, I'm not sure but that sounds like an excessively harsh requirement.
  2. Parse this C header manually with regex which shouldn't be too bad in theory since it's just a bunch of #defines and not much else). We could probably even auto-generate a Rust module out of it with all those constants as pub const variables. We would obviously only be able to parse definitions not containing any expressions (that is, literals only).

Reading / writing

This may be a stupid question, but I see the Dataset.read method being used in your example code in issues like #9 and I can't actually find documentation, or source code, for it. :)

I guess examples of use would be very helpful, as in #9, but I thought at least I could find the code to help me understand things! I have lots of questions, about what kinds of structured-types can be stored in datasets, whether datasets can be iterated to avoid loading them all into memory, etc. etc.; happy to answer these questions myself if I know where to look.

Thanks for what appears to be a very complete and powerful library!

Error using HDF5 1.10.2

On macOS:

HDF5-DIAG: Error detected in HDF5 (1.10.2) thread 0:
  #000: H5P.c line 253 in H5Pcreate(): not a property list class
    major: Invalid arguments to routine
    minor: Inappropriate type
thread 'main' panicked at 'Failed to open hdf5 file: "Invalid property list id: -1"', libcore/result.rs:1009:5

Example for Writing to VarLenArray

It would be very helpful to have a example for this. I can't seem to figure out how to write a 1D array of Vec to a dataset. Would be happy to write the example if someone could tell me how to do the writing.

Make it possible to open files through buffer images

Since you have the file interface already implemented, it's very easy to add reading files from buffer. For some applications, it accelerates the process of I/O a lot.

All you have to do is, instead of opening a file like this:

  fileHandle = H5Fopen(filename, H5F_ACC_RDONLY, H5P_DEFAULT);

Allow it with this (in C++ terms):

fileHandle = H5LTopen_file_image((void *)&FileImage.front(), FileImage.size(), H5LT_FILE_IMAGE_OPEN_RW);

In my C++ code I have an abstraction layer that does this. Everything else is the same. Of course, this assumes the file will not be modified, because modifying an image requires implementing a C interface that would realloc and do other things. I haven't done that part before myself because I didn't need it. But for now, it can be ignored, because most people who need to access files from memory only need reading.

test_flush may panic (very rarely so)

thread 'location::tests::test_flush' panicked at 'called `Result::unwrap()` on an `Err` value: Error { repr: Os(2) }', /Users/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-mac/build/src/libcore/result.rs:729

Happened only once, couldn't reproduce.

Mapping Rust tuples to HDF5 compound types

While working on a test suite for dataset reading/writing, I've encountered a curious problem.

Facts:

  • We'd like Rust's tuple types to be mapped to HDF5 types automatically (by deriving H5Type).
  • We don't have any control over Rust tuple layout, it will be optimized by the compiler

Here's an example (playground): field offsets for tuple (i8, u64, f32) are (8, 0, 12) – Rust reorders the fields so the largest one comes the first, and is rightfully free to do so.

Now, we want to bind this to an HDF5 compound datatype, with fields named "0", "1" and "2". In HDF5, however, the offsets must be increasing strictly; providing a decreasing offset will be considered as datatype shrinking and will most likely yield an error.

So, we have a few options:

  • Compound datatype with fields ["1", "0", "2"] and offsets [0, 8, 12].

    This can be mapped directly to the Rust tuple and won't require any conversion (no-op). However, the fields are ordered in a weird way. This ordering also depends on the internals of the Rust compiler (although this part isn't likely to change).

  • Compound datatype with fields ["0", "1", "2"] and offsets [0, 8, 16].

    The fields now have the same order as in Rust, however the memory layout is different (and the data element has a bigger size). However, due to incompatible memory layout, this will require a soft conversion each time the dataset is being read/written (an extra step and an extra memory allocation). This is pretty weird and confusing as well, e.g. you want to create a new dataset with this tuple type, and suddenly you're being asked to enable soft conversion. It would also be hard for the crate user to predict whether a given tuple type would require enabling soft conversion for reading/writing – and this would require knowledge of compiler internals.

  • Compound datatype with fields ["0", "1", "2"] and offsets [0, 8, 12].

    This doesn't require soft conversion, the fields in HDF5 are named 0-1-2, but "0" is now field 1, and "1" is now field 0, which is confusing.

  • ? (any other options I've missed?)

For reference: tuple type binding implementation if it's of any help (it's pretty hairy macro stuff).

support hdf5 v1.12

Building against hdf5 1.12 fails because Version::parse regards H5_VERSION 1.12.0 as invalid. Are there any breaking changes that prevent this crate from being used with 1.12?

Provide examples

What's the current state of this? It seems like out of all the hdf5 bindings out there, yours seems to be the most actively developed.

Are you planning to provide some example on how to interact with hdf5-rs, or do you think it is not ready for consumption yet?

Make mod globals public?

When using the hdf5-sys crate directly it would be convenient to have access to the safe-initialised globals from the hdf5 crate instead of having to reinvent that functionality. Currently we are patching the hdf5 crate for pub mod globals. Or is there a better way?

Building on nightly-x86_64-pc-windows-gnu?

Hi, I need to build on Windows 10 with the nightly-x86_64-pc-windows-gnu toolchain.

All libs are installed via MSYS2's pacman, lncluding hdf5. HDF5 headers are in /mingw64/include; libs are in /mingw64/lib

When I'm try to build my project on Windows I get the following error. Linux build is fine.

I've had a brief look at the source code and there seems to be a lot of best-effort guessing going on there. Is there any way to make the script's work easier by somehow telling it where the libs actually are?

thread 'main' panicked at 'Unable to locate HDF5 root directory and/or headers.', C:\Users\XXXXXXXXXX\.cargo\registry\src\github.com-1ecc6299db9ec823\hdf5-sys-0.5.2\build.rs:528:13
stack backtrace:
   0: backtrace::backtrace::dbghelp::trace
             at C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.44\src\backtrace/dbghelp.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.44\src\backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src\libstd\sys_common/backtrace.rs:78
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src\libstd\sys_common/backtrace.rs:59
   4: core::fmt::write
             at src\libcore\fmt/mod.rs:1063
   5: std::io::Write::write_fmt
             at src\libstd\io/mod.rs:1426
   6: std::sys_common::backtrace::_print
             at src\libstd\sys_common/backtrace.rs:62
   7: std::sys_common::backtrace::print
             at src\libstd\sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src\libstd/panicking.rs:204
   9: std::panicking::default_hook
             at src\libstd/panicking.rs:224
  10: std::panicking::rust_panic_with_hook
             at src\libstd/panicking.rs:470
  11: std::panicking::begin_panic
             at /rustc/c20d7eecbc0928b57da8fe30b2ef8528e2bdd5be\src\libstd/panicking.rs:397
  12: build_script_build::LibrarySearcher::try_locate_hdf5_library
             at .\build.rs:528
  13: build_script_build::main
             at .\build.rs:606
  14: std::rt::lang_start::{{closure}}
             at /rustc/c20d7eecbc0928b57da8fe30b2ef8528e2bdd5be\src\libstd/rt.rs:67
  15: std::rt::lang_start_internal::{{closure}}
             at src\libstd/rt.rs:52
  16: std::panicking::try::do_call
             at src\libstd/panicking.rs:303
  17: __rust_maybe_catch_panic
             at src\libpanic_unwind/lib.rs:86
  18: std::panicking::try
             at src\libstd/panicking.rs:281
  19: std::panic::catch_unwind
             at src\libstd/panic.rs:394
  20: std::rt::lang_start_internal
             at src\libstd/rt.rs:51
  21: std::rt::lang_start
             at /rustc/c20d7eecbc0928b57da8fe30b2ef8528e2bdd5be\src\libstd/rt.rs:67
  22: main
  23: _tmainCRTStartup
  24: mainCRTStartup
  25: unit_addrs_search
  26: unit_addrs_search
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```

How to append values to a dataset?

I use HDF5 to persist my large time-series data. I want to write my data asynchronously while receiving. Could you tell me how to create a dataset which maxlength is infinite and to append data to the end of this dataset?

Fixing the slicing API

@pmarks - I've tried including slicing API in the README but couldn't quite make it work actually.

We have a (2, 2) two-dimensional dataset, and I'm trying to extract a one-dimensional [.., 1] slice of it (second column), which would be a (2,) one-dimensional array.

(I've intentionally tried to make it work without reading the source first, trying to place myself in new user's shoes.)

  1. First I've tried pixels.read_slice_1d::<Pixel>(s![.., 1]), which failed to compile:

    no method named `read_slice_1d` found for type `h5::hl::dataset::Dataset`
    

    Slicing API should really be accessible directly from datasets; but that we can fix pretty easily.

  2. Ok, next try: pixels.as_reader().read_slice_1d::<Pixel>(s![.., 1]), also fails to compile:

    wrong number of type arguments: expected 2, found 1
    

    Now this one's not very nice, having to provide a _ placeholder for a second type is counter-intuitive and may lead to problems with type inference.

  3. Next one... pixels.as_reader().read_slice_1d::<Pixel, _>(s![.., 1])... it now compiles but fails at runtime:

    Error: ndim mismatch: expected 1, got 2
    

    This one's actually strange. Looking up docstring for read_slice_1d(), we see "Reads the given slice of the dataset into a 1-dimensional array. The dataset must be 1-dimensional". Now, why does the dataset have to be 1-dimensional? I'm reading a column of a matrix here, what's wrong with that?..

  4. Let's try generic version: pixels.as_reader().read_slice::<Pixel, _>(s![.., 1])... back to:

    wrong number of type arguments: expected 3, found 2
    

    More placeholders...

  5. Fixed: pixels.as_reader().read_slice::<Pixel, _, _>(s![.., 1]).. compiles but:

    Error: ndim mismatch: expected 1, got 2
    

At this point I've surrendered :)

Serde macro namespace clash for Windows builds

I am using both serde and serde_derive in my own crate which depends on hdf5. After adding the two dependencies to my crate, the following happens:

error[E0252]: the name `Deserialize` is defined multiple times
   --> lib\hdf5-rust\hdf5-sys\build.rs:345:9
    |
344 |     use serde::{Deserialize, Deserializer};
    |                 ----------- previous import of the macro `Deserialize` here
345 |     use serde_derive::Deserialize;
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^ `Deserialize` reimported here
    |
    = note: `Deserialize` must be defined only once in the macro namespace of this module

This prevents me from successfully building the library, but the fix is rather easy:

use serde::{Deserialize, Deserializer};
use serde_derive::{Deserialize as SDDeserialize}; 
// [...]
#[derive(Clone, SDDeserialize)]
struct App {

I found this issue which seems related.

ndarray 0.13 support

Hi,

Thanks for creating this crate, it is fantastic!

Could the dependency on ndarray be relaxed so both 0.12 and 0.13 of this crate can be supported? Not a big issue, but it took me some time to figure out that I mixed ndarray crate versions in my application.

Regards,
Windel

Deserialize dataset containing struct of Vec?

HI! Thanks for your job on this nice crate to use HDF5 in Rust!

I want to read some HDF5 files in which a dataset contains this kind of data:

struct LidarData {
    stamp: i64,
    number_of_layers: i32,
    layers: Vec<LidarLayer>
}

struct LidarLayer {
    number_of_scans_in_layer: i32,
    scans: Vec<LidarScan>
}

struct LidarScan {
    angle: f32,
    distance: f32,
}

So the memory has this layout:
LidarMemoryLayout

Is there a nice way to deserialize it with serde or do I have to read all the dataset as a Vec<u8> and then manually parse the binary blob (as I would have done in C++).

I am quite new to Rust so any pointers would be much appreciated.

Thanks in advance.

h5lock is private

Like we discussed, I wrote my own attribute reader because it's still not supported. Everything went OK, until I tried to use h5lock to make my low-level calls sync with the library. That didn't work from the outside, and I got the error:

error[E0603]: module `sync` is private
 --> <::hdf5::macros::h5lock macros>:5:53
  |
5 | ] # [ allow ( unused_unsafe ) ] unsafe { $ crate :: sync :: sync ( || $ expr )
  |                                                     ^^^^

Can you please make sync it public?

Rust (ndarray) -> HDF5 -> Python (NumPy)?

Suppose I have some ndarray matrices/vectors which I store in an HDF5 database using the hdf5 crate.

Am I correct in understanding that if I open this database using Python's h5py, I can load the data into NumPy matrices, because HDF5's data representation is independent of ndarray and NumPy?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.