Giter Site home page Giter Site logo

Comments (12)

steven-varga avatar steven-varga commented on June 18, 2024 1

It depends on the size of the matrix and convenience. Some of us do prototyping on some statistical platform: Julia/Matlab/R and then save/export the file in HDF5. It is convenient to load it from C++ regardless of the size, then proceed to fast implementation using C++ with some linear algebra library. -- this is one use case of H5CPP.

Alternatively you have 10GB - and up datasets, and you need efficient scalable IO. In the case the IO performance could be important.

Overall you can think of this question as walking on a Pareto front of implementation | maintenance | IO | runtime cost, which can only be answered (by a constrained optimisation mathematical program) if you have values ready.

from h5cpp.

xdotli avatar xdotli commented on June 18, 2024 1

It turned out that the unable to open dataset error was due to a python program I forgot to shut down was reading it, too. Thus it somehow made the dataset unavailable. As soon as I quit the python script I can use h5cpp to open the file successfully.

Thank you @steven-varga for guiding me re-installing the libraries from the source and providing me the workflow around high-performance data I/O and computing. Your library is awesome!

from h5cpp.

steven-varga avatar steven-varga commented on June 18, 2024

hmm... I don't quite have a mac-OS to work with, we could have a chat about this, if any interest? -- it should not be a biggie as LLVM tool chain works on mac.

from h5cpp.

xdotli avatar xdotli commented on June 18, 2024

@steven-varga Hi! And thank you for your reply.

I now could successfully compile the program by the following verbose command:
g++ -std=c++17 -o test cca.cpp -I /usr/local/include/ -L /usr/local/lib/ -lhdf5 -lhdf5_hl

But as I run the executible it gives the error of "unable to open dataset":

HDF5-DIAG: Error detected in HDF5 (1.12.1) thread 0:
  #000: H5D.c line 285 in H5Dopen2(): unable to open dataset
    major: Dataset
    minor: Can't open object
  #001: H5VLcallback.c line 1910 in H5VL_dataset_open(): dataset open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1877 in H5VL__dataset_open(): dataset open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_dataset.c line 123 in H5VL__native_dataset_open(): unable to open dataset
    major: Dataset
    minor: Can't open object
  #004: H5Dint.c line 1483 in H5D__open_name(): not found
    major: Dataset
    minor: Object not found
  #005: H5Gloc.c line 442 in H5G_loc_find(): can't find object
    major: Symbol table
    minor: Object not found
  #006: H5Gtraverse.c line 837 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #007: H5Gtraverse.c line 613 in H5G__traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #008: H5Gloc.c line 399 in H5G__loc_find_cb(): object 'create then write' doesn't exist
    major: Symbol table
    minor: Object not found
libc++abi: terminating with uncaught exception of type h5::error::io::dataset::open: /usr/local/include/h5cpp/H5Dopen.hpp line#  30 : opening dataset failed...
[1]    71327 abort      ./test

Do you happen to know any common causes to this error? I have been working too long only to try to read some hdf file. I have converted the original file to several JSON file to read into my program now. However, the JSON files appear to be much larger than the HDF files, I wonder if you think it will have performance issues?

from h5cpp.

steven-varga avatar steven-varga commented on June 18, 2024

Can you list the version of the file? Would it be possible to try it with libhdf5 v1.10.6? BTW: no need for the hdf5_hl. Can you share the file?

Not sure what you are trying to do; JSON is not for HPC, has different properties/use cases. In fact the acronym gives it away: JavaScript Object Notation whereas HDF5 is like ext4 filesystem with a convenient API, and most importantly MPI-IO capability.

from h5cpp.

xdotli avatar xdotli commented on June 18, 2024

Sure. I'm using HDF 1.12.1. The code is below:

#include <iostream>
#include <dlib/matrix.h>
#include <dlib/statistics/cca.h>
#include <h5cpp/all>

using namespace std;
using namespace dlib;
template <class T>
using Matrix = dlib::matrix<T>;

int main()
{
  Matrix<short> M = h5::read<Matrix<short>>("1000hpa.h5", "create then write");
  return 0;
}

Regarding the choice of JSON. Well, I worked with JavaScript and Python the most, and I'm simply trying to read a 726*14729 matrix into my program so I though maybe dump the data in HDF into JSON and read the JSON into my program would be possible.

By the way I'm using the nlohmann/json where the library is in a single header file. Should I delete the JSON object once the data are stored in matrices?

from h5cpp.

xdotli avatar xdotli commented on June 18, 2024

I think I'm not expressing my concerns clearly enough. The computation-heavy part of my program would be in the calculation of the matrices, so I wonder if I could assume performance of the I/O part before is not as relevant?

from h5cpp.

xdotli avatar xdotli commented on June 18, 2024

Thank you so much for your answer. The matrix is 72614965, and my out put is supposed to be 6 726726 matrices.

If this prototyping to production workflow is so prevalent, then it's imperative for me to work out a way to make this library work on my computer. Because I'm a new big data research assistant at school, and while other team members do prototyping in R/Python, I have to deliver C++ code that implements their algorithm in parallelism.

Anyway, I used brew install hdf5 as my last attempt to make the h5cpp work for me, but the library still gave me the unable to open dataset error. Before I had errors like Undefined symbols for architecture x86_64: "***". But would you say I will have a better chance of making all of this work on a linux server? Because in that way I won't be trying to install all kinds of libraries everywhere. Thank you very much!

from h5cpp.

steven-varga avatar steven-varga commented on June 18, 2024

It works on POSIX with C++17, and as I mentioned before I am opened to a conference call (I don't have a mac). Avoid using OS package manager -- this is HPC, where SPACK is more likely being used. Instead install components from source. Here is a laundry list:

  • HDF5 1.10.6 (no C++ or high level library support)
  • openMPI 4.0.7
  • orangefs for parallel filesystem

I am working on a reference platform: a rental cluster on AWS EC2 with the proper settings and convenient vscode front end; but will take a few more weeks to bring it up online.

Let me know about the call

from h5cpp.

xdotli avatar xdotli commented on June 18, 2024

I actually noticed the issue #42 where you are testing h5cpp's compatibility with different compilers before submitting this issue. I'd love to go over the laundry list to install these components, but I have a deadline about 12 hours from now. I'll work on this in my next assignment and report it here.

Thanks again for your help! I will try setting up a server environment to do the job as well. I'm familiar with vim, so I think I'll test h5cpp library before making further setting.

from h5cpp.

xdotli avatar xdotli commented on June 18, 2024

@steven-varga Sorry to bother you again! I tried installing HDF5 1.10.6, but in this case I don't know where folder to put it. Should I dump them in the /usr/local/include?

from h5cpp.

steven-varga avatar steven-varga commented on June 18, 2024

H5CPP doesn't care where you install HDF5. As for H5CPP headers: copy them to /usr/local/include/h5cpp then in your make files use gcc -I/usr/local/include It is customary to install local packages to user local: ./configure --prefix=/usr/local. Below are the default settings (after configure):

Features:
---------
                   Parallel HDF5: no
Parallel Filtered Dataset Writes: no
              Large Parallel I/O: no
              High-level library: yes
                    Threadsafety: no
             Default API mapping: v110
  With deprecated public symbols: yes
          I/O filters (external): deflate(zlib)
                             MPE: no
                      Direct VFD: no
                         dmalloc: no
  Packages w/ extra debug output: none
                     API tracing: no
            Using memory checker: no
 Memory allocation sanity checks: no
             Metadata trace file: no
          Function stack tracing: no
       Strict file format checks: no
    Optimization instrumentation: no

And no need to link against libhdf5_hl.so instead use the templated h5::append operator on h5::pt_t<T> for packet table; much faster.

from h5cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.