Giter Site home page Giter Site logo

h5bench's Introduction

hpc-io

h5bench's People

Contributors

brtnfld avatar github-actions[bot] avatar hammad45 avatar houjun avatar jeanbez avatar jjravi avatar kaushikvelusamy avatar kencasimiro avatar mierl avatar qkoziol avatar runzhouhan avatar sbyna avatar wkliao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

h5bench's Issues

Assertion Failure when Running Metadata Stress with MPI

Hello,

I encountered an issue while running Metadata Stress from h5bench-1.4. After installation, I executed ./h5bench_hdf5_iotest hdf5_iotest.ini and received the following output:

Config loaded from 'hdf5_iotest.ini':
	steps=20, arrays=500, rows=100, columns=200, scaling=weak
	proc-grid=1x1, slowest-dimension=step, rank=4
	layout=contiguous, mpi-io=independent

Wall clock [s]:		1.95
File size [B]:		1600002048
---------------------------------------------
Measurement:		_MIN (over MPI ranks)
			^MAX (over MPI ranks)
---------------------------------------------
Write phase [s]:	_1.51
			^1.51
Create time [s]:	_0.00
			^0.00
Write time [s]:		_1.50
			^1.50
Write rate [MiB/s]:	_1019.80
			^1019.80
Read phase [s]:		_0.31
			^0.31
Read time [s]:		_0.30
			^0.30
Read rate [MiB/s]:	_5133.93
			^5133.93

I attempted to use MPI to execute this benchmark with the command mpirun -n 4 ./h5bench_hdf5_iotest hdf5_iotest.ini, but I encountered an error:

h5bench_hdf5_iotest: /home/zhb/h5bench-1.4/metadata_stress/configuration.c:156: validate: Assertion `pconfig->proc_rows * pconfig->proc_cols == (unsigned)size' failed.
[ubuntu:257356] *** Process received signal ***
[ubuntu:257356] Signal: Aborted (6)
[ubuntu:257356] Signal code:  (-6)
[ubuntu:257356] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2ac3162125d0]
[ubuntu:257356] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2ac316455207]
[ubuntu:257356] [ 2] /lib64/libc.so.6(abort+0x148)[0x2ac3164568f8]
[ubuntu:257356] [ 3] /lib64/libc.so.6(+0x2f026)[0x2ac31644e026]
[ubuntu:257356] [ 4] /lib64/libc.so.6(+0x2f0d2)[0x2ac31644e0d2]
[ubuntu:257356] [ 5] ./h5bench_hdf5_iotest[0x404f6a]
[ubuntu:257356] [ 6] ./h5bench_hdf5_iotest[0x40244d]
[ubuntu:257356] [ 7] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ac3164413d5]
[ubuntu:257356] [ 8] ./h5bench_hdf5_iotest[0x402259]
[ubuntu:257356] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node ubuntu exited on signal 6 (Aborted).
--------------------------------------------------------------------------

It appears that there may be a configuration error. However, my configuration follows the guidelines provided in the documentation:

[DEFAULT]
version = 0
steps = 20
arrays = 500
rows = 100
columns = 200
process-rows = 1
process-columns = 1
scaling = weak
dataset-rank = 4
slowest-dimension = step
layout = contiguous
mpi-io = independent
hdf5-file = hdf5_iotest.h5
csv-file = hdf5_iotest.csv

Could you please assist me in resolving this issue?

Does not build due to undeclared H5ES symbols

I tried to build h5bench using HDF5 1.12.0 (and develop) but the build always fails with errors like this:

error: 'H5ES_NONE' undeclared (first use in this function)

(But also symbols like H5ESclose and H5Gclose_async.)

I am not specifying any CMake arguments, so I thought that any async code would be disabled. Am I doing anything wrong?

Breakdown reported times

  • Add observed I/O time
  • Add observed I/O rate
  • Rename observed time to benchmark walltime/runtime
  • Rename observed rate to benchmark rate
  • Rename raw time to raw I/O time
  • Rename raw rate to raw I/O rate
  • Group metadata, raw I/O together, and improve overall groupping
  • Ensure both CSV and output have these changes

Performance report / CSV file is hard to find for some kernels

During the test, I got another problem that some of the benchmarks do not report the overall performance (e.g. I/O bandwidth), like the AMReX, MACSio and Exerciser workloads. Passing the "csv-file" cannot generate the relevant .csv file. How I can get a metric for the overall performance?

Reported by Zhiyue Li

Installation issue on MacOS

I installed hdf5-mpi via "brew install hdf5-mpi" on MacOS, and the "cmake .. " can find it correctly.

-- Found HDF5: -- Found HDF5: /usr/local/Cellar/hdf5-mpi/1.14.1/lib/libhdf5.dylib;/usr/local/opt/libaec/lib/libsz.dylib;/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/lib/libz.tbd;/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/lib/libdl.tbd;/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/lib/libm.tbd (found version "1.14.1-2")  
-- Using HDF5 version: 1.14.1-2
-- Looking for H5_HAVE_SUBFILING_VFD
-- Looking for H5_HAVE_SUBFILING_VFD - not found

But when I tried to make it, 'hdf5.h' file cannot be found.

./h5bench/commons/h5bench_util.c:17:10: fatal error: 'hdf5.h' file not found
#include <hdf5.h>
     ^~~~~~~~
1 error generated.

I then tried the command "export CPATH="/usr/local/Cellar/hdf5-mpi/1.14.1/", and some new errors showed up:

./h5bench/h5bench_patterns/h5bench_write.c:340:9: error: implicit declaration of function 'H5Pset_dxpl_mpio' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        H5Pset_dxpl_mpio(*plist_id_out, H5FD_MPIO_COLLECTIVE);
        ^
./h5bench/h5bench_patterns/h5bench_write.c:931:9: error: implicit declaration of function 'H5Pset_all_coll_metadata_ops' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        H5Pset_all_coll_metadata_ops(fapl, 1);
        ^
./h5bench/h5bench_patterns/h5bench_write.c:932:9: error: implicit declaration of function 'H5Pset_coll_metadata_write' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        H5Pset_coll_metadata_write(fapl, 1);
        ^

exerciser bug with `indepio` option

For exerciser runs, "indepio": "True" is not working.
metacoll is working, though. So, other binary options are supposed to work similarly.

ERROR - unrecognized parameter: True.  Exitting.

I only used 2 and 3 dims.

python wrapper h5bench won't run

Bug Report

The python h5bench in the function check_parallel compares the command line to the shell environment variable, if they don't match it exits with the error print('You should not call MPI directly when running h5bench.') command[0]=/bin/sh and shell=/bin/sh
To Reproduce

How are you building/running h5bench?

slurm batch job on a cray, (with no srun command).

What is the input configuration file you use?

Doesn't get that far!

Expected Behavior

A clear and concise description of what you expected to happen.

Software Environment

  • version of h5bench: 1.1
  • installed h5bench using: from source
  • operating system: Linux ln04 4.12.14-197.78_9.1.64-cray_shasta_c #1 SMP Wed Aug 25 14:56:40 UTC 2021 (4e2a900) x86_64 x86_64 x86_64 GNU/Linux -
  • machine: Supercomputer Archer2 www.archer2.ac.uk
  • version of HDF5: [e.g. 1.12.0]
  • version of VOL-ASYNC: [e.g. 1.13.1]
  • name and version of MPI: [e.g. OpenMPI 4.1.1]

Additional information

Add any other information about the problem here.

Filter option not working as expected

h5bench option --filter skips the provided option, it seems, rather than executing it:

-f FILTER, --filter FILTER     Execute only filtered benchmarks

[documentation] Confusion on the usage of "MODE", "ASYNC_MODE" and "mode" in docs

1. Missing "MODE" in the example configuration file
Using the configuration.json file provided in h5bench.readthedocs.io returns the following error:

2022-08-02 11:47:13,410 h5bench - INFO - Starting h5bench Suite
2022-08-02 11:47:13,410 h5bench - WARNING - Base directory already exists: output
2022-08-02 11:47:13,410 h5bench - INFO - Lustre support not detected
2022-08-02 11:47:13,410 h5bench - INFO - h5bench [write] - Starting
2022-08-02 11:47:13,410 h5bench - INFO - h5bench [write] - DIR: output/c9a31822/
2022-08-02 11:47:13,411 h5bench - INFO - Parallel setup: mpirun -np 8
2022-08-02 11:47:13,411 h5bench - ERROR - Unable to run the benchmark: 'MODE'

The configuration.json file is also available in the GitHub repo here.

Looking at its first write benchmark:

{
    "benchmark": "write",
    "file": "test.h5",
    "configuration": {
        "MEM_PATTERN": "CONTIG",
        "FILE_PATTERN": "CONTIG",
        "NUM_PARTICLES": "16 M",
        "TIMESTEPS": "5",
        "DELAYED_CLOSE_TIMESTEPS": "2",
        "COLLECTIVE_DATA": "NO",
        "COLLECTIVE_METADATA": "NO",
        "EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s", 
        "NUM_DIMS": "1",
        "DIM_1": "16777216",
        "DIM_2": "1",
        "DIM_3": "1",
        "ASYNC_MODE": "NON",
        "CSV_FILE": "output.csv"
    }
}

ASYNC_MODE is provided while MODE is missing.

2. Missing "mode" in the example configuration file (AMRex)
Using the configuration.json file provided in h5bench.readthedocs.io returns the following error:

2022-08-02 12:04:36,003 h5bench - INFO - Starting h5bench Suite
2022-08-02 12:04:36,003 h5bench - WARNING - Base directory already exists: output
2022-08-02 12:04:36,003 h5bench - INFO - Lustre support not detected
2022-08-02 12:04:36,004 h5bench - INFO - h5bench [amrex] - Starting
2022-08-02 12:04:36,004 h5bench - INFO - h5bench [amrex] - DIR: output/a8b627f0/
2022-08-02 12:04:36,004 h5bench - INFO - Parallel setup: mpirun -np 8
2022-08-02 12:04:36,004 h5bench - ERROR - Unable to run the benchmark: 'mode'

The configuration.json file:

{
    "benchmark": "amrex",
    "file": "amrex.h5",
    "configuration": {
        "ncells": "64",
        "max_grid_size": "8",
        "nlevs": "1",
        "ncomp": "6",
        "nppc": "2",
        "nplotfile": "2",
        "nparticlefile": "2",
        "sleeptime": "2",
        "restart_check": "1",
        "hdf5compression": "ZFP_ACCURACY#0.001"
    }
}

mode is missing.

Also, not sure if it's important but there's a mismatch between mode and MODE (lowercase vs uppercase).

Software Environment

  • version of h5bench: 1.3.0, (current master branch)
  • installed h5bench using: from source
  • operating system: 5.10.133-1-MANJARO
  • machine: PC
  • version of HDF5: 1.13.0
  • version of VOL-ASYNC: N/A
  • name and version of MPI: MPICH 4.0.2

issues with mpicc and then cannot find hdf5

CMake Build issues

This is what I have tried so far:

I have tried set CMAKE_C_COMPILER=mpicc and similarly CMAKE_CXX_COMPILER=mpicxx with various levels of success. I have managed to get to make on my Mac laptop and on a linux cluster, but in both instances it doesn't find hdf5. Setting HDF5_ROOT gets partial success. What is the correct way to set mpi compilers and the hdf5 installation.

Thanks,

...

Software Environment

  • version of h5bench: [e.g. 1.0]
  • installed h5bench using: [spack, from source]
  • operating system: [name and version]
  • machine: [Are you running on a supercomputer or public cluster?]
  • version of HDF5: [e.g. 1.12.0]
  • version of VOL-ASYNC: [e.g. 1.13.1]
  • name and version of MPI: [e.g. OpenMPI 4.1.1]

Additional Information

Add any other information about the problem here.

Allow fine-tuning of the number of ranks per benchmark

Today the number of ranks parameter is applied globally to all benchmarks in the JSON file. We should allow users to fine-tune it by providing a per-benchmark option to override this parameter. This will add flexibility and allow the JSON to be re-used between distinct experiment sizes.

Add quick-start in documentation

Add a quick-start section in the documentation with a simple example of how to compile, run, get the results, and understand them.

Make install more portable

The install step throwed some errors, related to location of installation.

This is what I have tried so far:

cori build $ make
Scanning dependencies of target h5bench_util
[ 8%] Building C object CMakeFiles/h5bench_util.dir/commons/h5bench_util.c.o
[ 16%] Linking C static library libh5bench_util.a
[ 16%] Built target h5bench_util
Scanning dependencies of target h5bench_read
[ 25%] Building C object CMakeFiles/h5bench_read.dir/h5bench_patterns/h5bench_read.c.o
[ 33%] Linking C executable h5bench_read
[ 33%] Built target h5bench_read
Scanning dependencies of target h5bench_append
[ 41%] Building C object CMakeFiles/h5bench_append.dir/h5bench_patterns/h5bench_append.c.o
[ 50%] Linking C executable h5bench_append
[ 50%] Built target h5bench_append
Scanning dependencies of target h5bench_overwrite
[ 58%] Building C object CMakeFiles/h5bench_overwrite.dir/h5bench_patterns/h5bench_overwrite.c.o
[ 66%] Linking C executable h5bench_overwrite
[ 66%] Built target h5bench_overwrite
Scanning dependencies of target h5bench_write_unlimited
[ 75%] Building C object CMakeFiles/h5bench_write_unlimited.dir/h5bench_patterns/h5bench_write_unlimited.c.o
[ 83%] Linking C executable h5bench_write_unlimited
[ 83%] Built target h5bench_write_unlimited
Scanning dependencies of target h5bench_write
[ 91%] Building C object CMakeFiles/h5bench_write.dir/h5bench_patterns/h5bench_write.c.o
[100%] Linking C executable h5bench_write
[100%] Built target h5bench_write
cori build $ make install
[ 16%] Built target h5bench_util
[ 33%] Built target h5bench_read
[ 50%] Built target h5bench_append
[ 66%] Built target h5bench_overwrite
[ 83%] Built target h5bench_write_unlimited
[100%] Built target h5bench_write
Install the project...
-- Install configuration: "Debug"
-- Installing: /usr/local/bin/h5bench
CMake Error at cmake_install.cmake:41 (file):
file INSTALL cannot copy file
"/global/homes/d/dbin/work/h5bench/src/h5bench.py" to
"/usr/local/bin/h5bench": Permission denied.

Wrong unit in reported compute time

Bug Report

The reported total emulated compute time unit does not match the value informed in the configuration file. For instance, when setting 1 s it reports 4 ms for a total of 5 timesteps.

To Reproduce

=======================================
Benchmark configuration: 
File: ../h5bench_patterns/sample_config/sample_write_cc1d.cfg
Number of particles per rank: 16777216
Number of time steps: 5
Emulated compute time per timestep: 1
Async mode = 0 (0: ASYNC_NON; 1: ASYNC_EXP; 2: ASYNC_IMP)
Collective metadata operations: NO.
Collective buffering for data operations: NO.
Number of dimensions: 1
    Dim_1: 16777216
=======================================
Start benchmark: h5bench_write, Number of particles per rank: 16 M
Total number of particles: 496M
==PDC_CLIENT: PDC_DEBUG set to 0!
==PDC_CLIENT[0]: Found 1 PDC Metadata servers, running with 31 PDC clients
==PDC_CLIENT: using ofi+tcp
==PDC_CLIENT[0]: Client lookup all servers at start time!
==PDC_CLIENT[0]: using [./pdc_tmp] as tmp dir, 31 clients per server
Collective write: disabled.
Opened HDF5 file... 
Writing Timestep_0 ... 
Computing... 
Writing Timestep_1 ... 
Computing... 
Writing Timestep_2 ... 
Computing... 
Writing Timestep_3 ... 
Computing... 
Writing Timestep_4 ... 

 Performance measured with 31 ranks, 
==================  Performance results  =================
Total emulated compute time 4 ms
Total write size = 77 GB
Raw write time = 121.619 sec
Metadata time = 0.324 sec
H5Fcreate() takes 397257.000 sec
H5Fflush() takes 9.000 sec
H5Fclose() takes 1421.000 sec
Observed completion time = 126.427 sec
Sync Raw write rate = 0.633 GB/sec 
Sync Observed write rate = 0.629 GB/sec
===========================================================

Expected Behavior

The reported emulated compute time should match the unit provided in the configuration file, for this case, seconds.

Total emulated compute time 4 s

Software Environment

  • version of h5bench: 1.0
  • installed h5bench using: from source
  • machine: Cori
  • version of HDF5: develop (1.13.0)
  • name and version of MPI: Cray MPI

Exerciser configuration fails

Bug Report

Some configurations of exerciser cause the benchmark to fail.

To Reproduce

For instance:

        {
            "benchmark": "exerciser",
            "configuration": {
                "numdims": "4",
                "minels": "4 4 4 4",
                "dimranks": "8 4 4 2",
                "nsizes": "4",
                "bufmult": "4 4 4 4"
            }
        },

If we reduce the bufmult to 2 2 2 2 it seems to run to completion. However, with that particular configuration we get the following error:

HDF5-DIAG: Error detected in HDF5 (1.13.2) MPI-process 0:
  #000: H5D.c line 1227 in H5Dwrite(): can't synchronously write data
    major: Dataset
    minor: Write failed
  #001: H5D.c line 1174 in H5D__write_api_common(): can't write data
    major: Dataset
    minor: Write failed
  #002: H5VLcallback.c line 2181 in H5VL_dataset_write(): dataset write failed
    major: Virtual Object Layer
    minor: Write failed
  #003: H5VLcallback.c line 2148 in H5VL__dataset_write(): dataset write failed
    major: Virtual Object Layer
    minor: Write failed
  #004: H5VLnative_dataset.c line 345 in H5VL__native_dataset_write(): can't write data
    major: Dataset
    minor: Write failed
  #005: H5Dio.c line 381 in H5D__write(): src and dest dataspaces have different number of elements selected
    major: Invalid arguments to routine
    minor: Bad value

Also, using values >5 for that variable will cause a segfault. I could not find in the documentation any restrictions or constraints when setting those values, so I assume both settings are valid.

Wrong time units reported in

Bug Report

The create, flush, and close stats reported by the benchmark seems to be reported in ms instead of sec as described by the time unit in the output.

To Reproduce

=======================================
Benchmark configuration: 
File: ../h5bench_patterns/sample_config/sample_write_cc1d.cfg
Number of particles per rank: 16777216
Number of time steps: 5
Emulated compute time per timestep: 1
Async mode = 0 (0: ASYNC_NON; 1: ASYNC_EXP; 2: ASYNC_IMP)
Collective metadata operations: NO.
Collective buffering for data operations: NO.
Number of dimensions: 1
    Dim_1: 16777216
=======================================
Start benchmark: h5bench_write, Number of particles per rank: 16 M
Total number of particles: 496M
Collective write: disabled.
Opened HDF5 file... 
Writing Timestep_0 ... 
Computing... 
Writing Timestep_1 ... 
Computing... 
Writing Timestep_2 ... 
Computing... 
Writing Timestep_3 ... 
Computing... 
Writing Timestep_4 ... 

 Performance measured with 31 ranks, 
==================  Performance results  =================
Total emulated compute time 4 ms
Total write size = 77 GB
Raw write time = 70.011 sec
Metadata time = 0.005 sec
H5Fcreate() takes 189478.000 sec
H5Fflush() takes 355869.000 sec
H5Fclose() takes 30880.000 sec
Observed completion time = 74.593 sec
Sync Raw write rate = 1.100 GB/sec 
Sync Observed write rate = 1.091 GB/sec
===========================================================

Expected Behavior

The reported times for create, flush, and close stats should match the time unit.

Software Environment

  • version of h5bench: 1.0
  • installed h5bench using: from source
  • machine: Cori
  • version of HDF5: development (1.13.0)
  • name and version of MPI: Cray MPI

benchmarks won't complete as file not found.

Benchmarks won't as file not found.
The benchmarks start but each one exits with file not found, but it doesn't tell me which file. I used the configuration.json file which comes with the 1.1 release. For write, perhaps I need to provide the file? but what and where? For the write, well surely it should be writing the file!

This is the error I am facing during installation:

 ls
configuration-h5bench.log  configuration.json  configuration.json~  full-teste	job.sub  job.sub~  slurm-1038041.out  test.py
cmaynard@ln04:/work/n02/n02/cmaynard/H5bench/h5bench-1.1/bench_run> cd full-teste/
cmaynard@ln04:/work/n02/n02/cmaynard/H5bench/h5bench-1.1/bench_run/full-teste> ls
09dd69c2  1cbd24b9  1f4c48f4  249077b6	59d50b3c  61bee9fa  9411e16c  98ad11ae	b149e8ef  e373db41  f942bfa8
cmaynard@ln04:/work/n02/n02/cmaynard/H5bench/h5bench-1.1/bench_run/full-teste> more 09dd69c2/stderr 
slurmstepd: error: execve(): h5bench_write: No such file or directory
slurmstepd: error: execve(): h5bench_write: No such file or directory
slurmstepd: error: execve(): h5bench_write: No such file or directory
slurmstepd: error: execve(): h5bench_write: No such file or directory
srun: error: nid004833: tasks 0-3: Exited with exit code 2
srun: launch/slurm: _step_signal: Terminating StepId=1038041.

Thanks for any help.
Software Environment

  • version of h5bench: [e.g. 1.0]
  • installed h5bench using: [spack, from source]
  • operating system: [name and version]
  • machine: [Are you running on a supercomputer or public cluster?]
  • version of HDF5: [e.g. 1.12.0]
  • version of VOL-ASYNC: [e.g. 1.13.1]
  • name and version of MPI: [e.g. OpenMPI 4.1.1]

Additional Information

Add any other information about the problem here.

h5bench_write does not close the dataset correctly

Bug Report

The write benchmark does not close the dataset correctly using configuration "MEM_PATTERN": "CONTIG" and "FILE_PATTERN": "INTERLEAVED".

$ cat full-test/a7311c05/stderr
HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 1:
  #000: H5D.c line 504 in H5Dclose_async(): not a dataset ID
    major: Invalid arguments to routine
    minor: Inappropriate type
...

To Reproduce

How are you building/running h5bench?

h5bench --debug configuration.json

What is the input configuration file you use?

{
    "mpi": {
        "command": "mpirun",
        "ranks": "2"
    },
    "vol": {
    },
    "file-system": {
    },
    "directory": "full-test",
    "benchmarks": [
        {
            "benchmark": "write",
            "file": "test.h5",
            "configuration": {
                "MEM_PATTERN": "CONTIG",
                "FILE_PATTERN": "INTERLEAVED",
                "NUM_PARTICLES": "1 M",
                "TIMESTEPS": "5",
                "DELAYED_CLOSE_TIMESTEPS": "2",
                "COLLECTIVE_DATA": "NO",
                "COLLECTIVE_METADATA": "YES",
                "EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s", 
                "NUM_DIMS": "1",
                "DIM_1": "16777216",
                "DIM_2": "1",
                "DIM_3": "1",
                "CSV_FILE": "output.csv",
                "MODE": "SYNC"
            }
        }
    ]
}

Expected Behavior

No error should be reported in stderr.

Software Environment

  • version of h5bench: main branch
  • installed h5bench using: from source
  • operating system: Red Hat Enterprise Linux 8.5 (Ootpa)
  • machine: private machine
  • version of HDF5: 1.13.0
  • version of VOL-ASYNC: not installed
  • name and version of MPI: MPICH 3.4.2 link here

feature request: make test

I would like to make a feature request.
Usually, a software library provides a set of basic test programs.
The tests are done by command "make test". It helps users/developers
to rule out bugs, before installation and production runs.

Update the output with baseline results

Update the output with the results from the baseline benchmarks. These should include the observed rate and time. Point the user to where he can get full details about the execution.

h5bench_write issue: H5Fclose_async() & H5VLfile_close()

Hi there,

I'm using H5Bench master branch to test vol-provenance connector.

When I was testing h5bench_write with following command:
./h5bench_write write_cc1d.cfg test.h5

It prints out following error messages:

HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 0:
  #000: ../../hdf5/src/H5VLcallback.c line 4184 in H5VLfile_close(): unable to close file
    major: Virtual Object Layer
    minor: Unable to close file
  #001: ../../hdf5/src/H5VLcallback.c line 4116 in H5VL__file_close(): file close failed
    major: Virtual Object Layer
    minor: Unable to close file
  #002: ../../hdf5/src/H5VLnative_file.c line 778 in H5VL__native_file_close(): can't close file
    major: File accessibility
    minor: Unable to decrement reference count
  #003: ../../hdf5/src/H5Fint.c line 2340 in H5F__close(): can't close file, there are objects still open
    major: File accessibility
    minor: Unable to close file
------- PROVENANCE VOL WRAP CTX Free
HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 0:
  #000: ../../hdf5/src/H5F.c line 1111 in H5Fclose_async(): decrementing file ID failed
    major: File accessibility
    minor: Unable to close file
  #001: ../../hdf5/src/H5Iint.c line 1201 in H5I_dec_app_ref_async(): can't asynchronously decrement ID ref count
    major: Object ID
    minor: Unable to decrement reference count
  #002: ../../hdf5/src/H5Iint.c line 1118 in H5I__dec_app_ref(): can't decrement ID ref count
    major: Object ID
    minor: Unable to decrement reference count
  #003: ../../hdf5/src/H5Fint.c line 249 in H5F__close_cb(): unable to close file
    major: File accessibility
    minor: Unable to close file
  #004: ../../hdf5/src/H5VLcallback.c line 4147 in H5VL_file_close(): file close failed
    major: Virtual Object Layer
    minor: Unable to close file
  #005: ../../hdf5/src/H5VLcallback.c line 4116 in H5VL__file_close(): file close failed
    major: Virtual Object Layer
    minor: Unable to close file
  #006: ../../hdf5/src/H5VLcallback.c line 4184 in H5VLfile_close(): unable to close file
    major: Virtual Object Layer
    minor: Unable to close file
  #007: ../../hdf5/src/H5VLcallback.c line 4116 in H5VL__file_close(): file close failed
    major: Virtual Object Layer
    minor: Unable to close file
  #008: ../../hdf5/src/H5VLnative_file.c line 778 in H5VL__native_file_close(): can't close file
    major: File accessibility
    minor: Unable to decrement reference count
  #009: ../../hdf5/src/H5Fint.c line 2340 in H5F__close(): can't close file, there are objects still open
    major: File accessibility
    minor: Unable to close file

==================  Performance results  =================
Total emulated compute time 4000 ms
Total write size = 2560 MB
Raw write time = 1.037 sec
Metadata time = 10.315 ms
H5Fcreate() takes 182.754 ms
H5Fflush() takes 5352.875 ms
H5Fclose() takes 0.490 ms
Observed completion time = 10.599 sec
Sync Raw write rate = 2469.526 MB/sec 
Sync Observed write rate = 387.908 MB/sec
===========================================================

It seems there are some issues with 'close' APIs, but the program is able to exit normally.

I'm using HDF5 develop branch and vol-provenance develop branch both cloned from hpc-io git repo.

My enviromental variables are:

export HDF5_VOL_CONNECTOR="provenance under_vol=0;under_info={};path=/home/runzhou/Downloads/myhdfstuff/vol-provenance/my_trace.log;level=2;format="

export HDF5_PLUGIN_PATH=/home/runzhou/Downloads/myhdfstuff/vol-provenance

export LD_LIBRARY_PATH=/home/runzhou/Downloads/myhdfstuff/vol-provenance:/home/runzhou/Downloads/myhdfstuff/hdf5-develop/build/hdf5/lib:$LD_LIBRARY_PATH

Moreover, I also found similar errors without redirecting I/Os to vol-provenance connector. It has the same issue with H5Fclose_async():

HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 0:
  #000: ../../hdf5/src/H5F.c line 1111 in H5Fclose_async(): decrementing file ID failed
    major: File accessibility
    minor: Unable to close file
  #001: ../../hdf5/src/H5Iint.c line 1201 in H5I_dec_app_ref_async(): can't asynchronously decrement ID ref count
    major: Object ID
    minor: Unable to decrement reference count
  #002: ../../hdf5/src/H5Iint.c line 1118 in H5I__dec_app_ref(): can't decrement ID ref count
    major: Object ID
    minor: Unable to decrement reference count
  #003: ../../hdf5/src/H5Fint.c line 249 in H5F__close_cb(): unable to close file
    major: File accessibility
    minor: Unable to close file
  #004: ../../hdf5/src/H5VLcallback.c line 4147 in H5VL_file_close(): file close failed
    major: Virtual Object Layer
    minor: Unable to close file
  #005: ../../hdf5/src/H5VLcallback.c line 4116 in H5VL__file_close(): file close failed
    major: Virtual Object Layer
    minor: Unable to close file
  #006: ../../hdf5/src/H5VLnative_file.c line 778 in H5VL__native_file_close(): can't close file
    major: File accessibility
    minor: Unable to decrement reference count
  #007: ../../hdf5/src/H5Fint.c line 2340 in H5F__close(): can't close file, there are objects still open
    major: File accessibility
    minor: Unable to close file
 Performance measured with 1 ranks, 
==================  Performance results  =================
Total emulated compute time 4000 ms
Total write size = 2560 MB
Raw write time = 1.009 sec
Metadata time = 4.757 ms
H5Fcreate() takes 175.785 ms
H5Fflush() takes 5237.142 ms
H5Fclose() takes 0.307 ms
Observed completion time = 10.443 sec
Sync Raw write rate = 2536.316 MB/sec 
Sync Observed write rate = 397.357 MB/sec

I would very much appreciate any suggestions on this problem!

Is sequential HDF5 supported?

Issue Description
I'd like to know if h5bench supports building with the sequential HDF5.

This is what I have tried so far:
my HDF5_PATH is built with only h5cc.

cd build

cmake .. \
    -DCMAKE_INSTALL_PREFIX=$INSTALL_DIR \
    -DCMAKE_C_FLAGS="-I/$HDF5_PATH/include \
    -L/$HDF5_PATH/lib"
make 
make install

This is the error I am facing during installation:

-- h5bench baseline: ON
-- h5bench METADATA: OFF
-- h5bench EXERCISER: OFF
-- h5bench AMREX: OFF
-- h5bench OPENPMD: OFF
-- h5bench E3SM: OFF
-- h5bench MACSIO: OFF
-- Found HDF5: hdf5-shared (found version "1.14.0")  
-- Using HDF5 version: 1.14.0
-- Looking for H5_HAVE_SUBFILING_VFD
-- Looking for H5_HAVE_SUBFILING_VFD - not found
--    Detected HDF5 subfiling support: 
-- HDF5 VOL ASYNC: OFF
-- Found Python3: /share/apps/python/miniconda3.7/bin/python3.7 (found version "3.7.7") found components: Interpreter 
-- Configuring done
-- Generating done
-- Build files have been written to: /MY_PATH/scripts/vlen_workflow/h5bench/build
[  8%] Building C object CMakeFiles/h5bench_util.dir/commons/h5bench_util.c.o
[ 16%] Linking C static library libh5bench_util.a
[ 16%] Built target h5bench_util
[ 25%] Building C object CMakeFiles/h5bench_read.dir/h5bench_patterns/h5bench_read.c.o
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c: In function ‘set_dspace_plist’:
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:83:9: warning: implicit declaration of function ‘H5Pset_dxpl_mpio’; did you mean ‘H5Pset_fapl_stdio’? [-Wimplicit-function-declaration]
   83 |         H5Pset_dxpl_mpio(*plist_id_out, H5FD_MPIO_COLLECTIVE);
      |         ^~~~~~~~~~~~~~~~
      |         H5Pset_fapl_stdio
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c: In function ‘read_h5_data’:
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:97:5: warning: implicit declaration of function ‘H5Pset_all_coll_metadata_ops’ [-Wimplicit-function-declaration]
   97 |     H5Pset_all_coll_metadata_ops(dapl, true);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c: In function ‘set_pl’:
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:397:9: warning: implicit declaration of function ‘H5Pset_fapl_mpio’; did you mean ‘H5Pset_fapl_stdio’? [-Wimplicit-function-declaration]
  397 |         H5Pset_fapl_mpio(*fapl, MPI_COMM_WORLD, MPI_INFO_NULL);
      |         ^~~~~~~~~~~~~~~~
      |         H5Pset_fapl_stdio
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:397:33: error: ‘MPI_COMM_WORLD’ undeclared (first use in this function)
  397 |         H5Pset_fapl_mpio(*fapl, MPI_COMM_WORLD, MPI_INFO_NULL);
      |                                 ^~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:397:33: note: each undeclared identifier is reported only once for each function it appears in
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:397:49: error: ‘MPI_INFO_NULL’ undeclared (first use in this function)
  397 |         H5Pset_fapl_mpio(*fapl, MPI_COMM_WORLD, MPI_INFO_NULL);
      |                                                 ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:401:5: warning: implicit declaration of function ‘H5Pset_coll_metadata_write’ [-Wimplicit-function-declaration]
  401 |     H5Pset_coll_metadata_write(*fapl, true);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c: In function ‘main’:
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:415:5: warning: implicit declaration of function ‘MPI_Init_thread’ [-Wimplicit-function-declaration]
  415 |     MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &mpi_thread_lvl_provided);
      |     ^~~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:415:35: error: ‘MPI_THREAD_MULTIPLE’ undeclared (first use in this function)
  415 |     MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &mpi_thread_lvl_provided);
      |                                   ^~~~~~~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:417:5: warning: implicit declaration of function ‘MPI_Comm_rank’ [-Wimplicit-function-declaration]
  417 |     MPI_Comm_rank(MPI_COMM_WORLD, &MY_RANK);
      |     ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:417:19: error: ‘MPI_COMM_WORLD’ undeclared (first use in this function)
  417 |     MPI_Comm_rank(MPI_COMM_WORLD, &MY_RANK);
      |                   ^~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:418:5: warning: implicit declaration of function ‘MPI_Comm_size’ [-Wimplicit-function-declaration]
  418 |     MPI_Comm_size(MPI_COMM_WORLD, &NUM_RANKS);
      |     ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:513:5: error: unknown type name ‘MPI_Info’
  513 |     MPI_Info info = MPI_INFO_NULL;
      |     ^~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:513:21: error: ‘MPI_INFO_NULL’ undeclared (first use in this function)
  513 |     MPI_Info info = MPI_INFO_NULL;
      |                     ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:519:5: warning: implicit declaration of function ‘MPI_Barrier’ [-Wimplicit-function-declaration]
  519 |     MPI_Barrier(MPI_COMM_WORLD);
      |     ^~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:521:5: warning: implicit declaration of function ‘MPI_Allreduce’ [-Wimplicit-function-declaration]
  521 |     MPI_Allreduce(&NUM_PARTICLES, &TOTAL_PARTICLES, 1, MPI_LONG_LONG, MPI_SUM, MPI_COMM_WORLD);
      |     ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:521:56: error: ‘MPI_LONG_LONG’ undeclared (first use in this function)
  521 |     MPI_Allreduce(&NUM_PARTICLES, &TOTAL_PARTICLES, 1, MPI_LONG_LONG, MPI_SUM, MPI_COMM_WORLD);
      |                                                        ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:521:71: error: ‘MPI_SUM’ undeclared (first use in this function)
  521 |     MPI_Allreduce(&NUM_PARTICLES, &TOTAL_PARTICLES, 1, MPI_LONG_LONG, MPI_SUM, MPI_COMM_WORLD);
      |                                                                       ^~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:522:5: warning: implicit declaration of function ‘MPI_Scan’ [-Wimplicit-function-declaration]
  522 |     MPI_Scan(&NUM_PARTICLES, &FILE_OFFSET, 1, MPI_LONG_LONG, MPI_SUM, MPI_COMM_WORLD);
      |     ^~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:634:5: warning: implicit declaration of function ‘MPI_Finalize’ [-Wimplicit-function-declaration]
  634 |     MPI_Finalize();
      |     ^~~~~~~~~~~~
make[2]: *** [CMakeFiles/h5bench_read.dir/h5bench_patterns/h5bench_read.c.o] Error 1
make[1]: *** [CMakeFiles/h5bench_read.dir/all] Error 2
make: *** [all] Error 2

Software Environment

  • version of h5bench: 1.4
  • installed h5bench using: from source
  • operating system: CentOS7
  • machine: supercomputer
  • version of HDF5: 1.14.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.