hpc-io / h5bench Goto Github PK
View Code? Open in Web Editor NEWA benchmark suite for measuring HDF5 performance.
Home Page: https://h5bench.readthedocs.io
License: Other
A benchmark suite for measuring HDF5 performance.
Home Page: https://h5bench.readthedocs.io
License: Other
Bug Report
The create, flush, and close stats reported by the benchmark seems to be reported in ms
instead of sec
as described by the time unit in the output.
To Reproduce
=======================================
Benchmark configuration:
File: ../h5bench_patterns/sample_config/sample_write_cc1d.cfg
Number of particles per rank: 16777216
Number of time steps: 5
Emulated compute time per timestep: 1
Async mode = 0 (0: ASYNC_NON; 1: ASYNC_EXP; 2: ASYNC_IMP)
Collective metadata operations: NO.
Collective buffering for data operations: NO.
Number of dimensions: 1
Dim_1: 16777216
=======================================
Start benchmark: h5bench_write, Number of particles per rank: 16 M
Total number of particles: 496M
Collective write: disabled.
Opened HDF5 file...
Writing Timestep_0 ...
Computing...
Writing Timestep_1 ...
Computing...
Writing Timestep_2 ...
Computing...
Writing Timestep_3 ...
Computing...
Writing Timestep_4 ...
Performance measured with 31 ranks,
================== Performance results =================
Total emulated compute time 4 ms
Total write size = 77 GB
Raw write time = 70.011 sec
Metadata time = 0.005 sec
H5Fcreate() takes 189478.000 sec
H5Fflush() takes 355869.000 sec
H5Fclose() takes 30880.000 sec
Observed completion time = 74.593 sec
Sync Raw write rate = 1.100 GB/sec
Sync Observed write rate = 1.091 GB/sec
===========================================================
Expected Behavior
The reported times for create, flush, and close stats should match the time unit.
Software Environment
Update documentation with subfiling options and allow h5bench to accept tunable variables for this option.
Bug Report
id_2
is written out as H5T_NATIVE_FLOAT
but read as H5T_NATIVE_INT
.
h5bench/h5bench_patterns/h5bench_write.c
Line 482 in fb0c43a
h5bench/h5bench_patterns/h5bench_read.c
Line 105 in fb0c43a
During the test, I got another problem that some of the benchmarks do not report the overall performance (e.g. I/O bandwidth), like the AMReX, MACSio and Exerciser workloads. Passing the "csv-file" cannot generate the relevant .csv file. How I can get a metric for the overall performance?
Reported by Zhiyue Li
For exerciser runs, "keepfile": "True"
is not working.
ERROR - unrecognized parameter: True. Exitting.
I only used 2 and 3 dims.
Options are from https://h5bench.readthedocs.io/en/latest/exerciser.html
Changing the option to ""
instead of true
or True
as the documentation suggests seems to make it work.
"indepio": ""
h5bench option --filter
skips the provided option, it seems, rather than executing it:
-f FILTER, --filter FILTER Execute only filtered benchmarks
I would like to make a feature request.
Usually, a software library provides a set of basic test programs.
The tests are done by command "make test". It helps users/developers
to rule out bugs, before installation and production runs.
CMake Build issues
This is what I have tried so far:
I have tried set CMAKE_C_COMPILER=mpicc and similarly CMAKE_CXX_COMPILER=mpicxx with various levels of success. I have managed to get to make on my Mac laptop and on a linux cluster, but in both instances it doesn't find hdf5. Setting HDF5_ROOT gets partial success. What is the correct way to set mpi compilers and the hdf5 installation.
Thanks,
...
Software Environment
Additional Information
Add any other information about the problem here.
Benchmarks won't as file not found.
The benchmarks start but each one exits with file not found, but it doesn't tell me which file. I used the configuration.json file which comes with the 1.1 release. For write, perhaps I need to provide the file? but what and where? For the write, well surely it should be writing the file!
This is the error I am facing during installation:
ls
configuration-h5bench.log configuration.json configuration.json~ full-teste job.sub job.sub~ slurm-1038041.out test.py
cmaynard@ln04:/work/n02/n02/cmaynard/H5bench/h5bench-1.1/bench_run> cd full-teste/
cmaynard@ln04:/work/n02/n02/cmaynard/H5bench/h5bench-1.1/bench_run/full-teste> ls
09dd69c2 1cbd24b9 1f4c48f4 249077b6 59d50b3c 61bee9fa 9411e16c 98ad11ae b149e8ef e373db41 f942bfa8
cmaynard@ln04:/work/n02/n02/cmaynard/H5bench/h5bench-1.1/bench_run/full-teste> more 09dd69c2/stderr
slurmstepd: error: execve(): h5bench_write: No such file or directory
slurmstepd: error: execve(): h5bench_write: No such file or directory
slurmstepd: error: execve(): h5bench_write: No such file or directory
slurmstepd: error: execve(): h5bench_write: No such file or directory
srun: error: nid004833: tasks 0-3: Exited with exit code 2
srun: launch/slurm: _step_signal: Terminating StepId=1038041.
Thanks for any help.
Software Environment
Additional Information
Add any other information about the problem here.
Bug Report
In h5bench_write.c, whether to close datasets and groups synchronously or asynchronously depends on if the variable cnt_time_step_delay
is equal to or greater than 0. However, when I ran Write Benchmark in sync mode with DELAYED_CLOSE_TIMESTEPS
set to 2, the program would close them through the function ts_delayed_close(MEM_MONITOR, &meta_time1, dset_cnt);
, which returns 0 if has_vol_async == 0
. Therefore, datasets and groups are potentially not getting closed at the end of the benchmark. I think the program should disable or set cnt_time_step_delay
to a fixed 0 in sync mode to avoid this issue.
To Reproduce
How are you building/running h5bench?
// Build
mkdir build
cd build
~/.local/bin/cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_INSTALL_PREFIX=/users/PAS2406/henryzhou1201/h5bench_project ..
make
make install
// Run
export PATH=/users/PAS2406/henryzhou1201/h5bench_project/hdf5/installer/bin:$PATH
export LD_LIBRARY_PATH=/users/PAS2406/henryzhou1201/h5bench_project/hdf5/installer/lib:$LD_LIBRARY_PATH
export HDF5_HOME=/users/PAS2406/henryzhou1201/h5bench_project/hdf5/installer
export HDF5_PLUGIN_PATH=/users/PAS2406/henryzhou1201/h5bench_project/plugins/SZ3/installer/lib64
module load gcc-compatibility/11.2.0
bin/h5bench --debug henry_configs/sz3/sync-write-1d-contig-contig-same-shuffle-debug.json
What is the input configuration file you use?
# provide the configuration file options you used
1 {
0 "mpi": {
1 "command": "srun",
2 "ranks": "1"
3 },
4 "vol": {
5 },
6 "file-system": {
7 },
8 "directory": "storage/sz3",
9 "benchmarks": [
10 {
11 "benchmark": "write",
12 "file": "sync-write-1d-contig-contig-same-shuffle-debug.h5",
13 "configuration": {
14 "MEM_PATTERN": "CONTIG",
15 "FILE_PATTERN": "CONTIG",
16 "TIMESTEPS": "5",
17 "DELAYED_CLOSE_TIMESTEPS": "2",
18 "COLLECTIVE_DATA": "YES",
19 "COLLECTIVE_METADATA": "YES",
20 "EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s",
21 "NUM_DIMS": "1",
22 "DIM_1": "4194304",
23 "DIM_2": "1",
24 "DIM_3": "1",
25 "CSV_FILE": "output.csv",
26 "MODE": "SYNC",
27 "COMPRESS": "YES",
28 "COMPRESS_FILTER": "SZ3",
29 "CHUNK_DIM_1": "4194304",
30 "CHUNK_DIM_2": "1",
31 "CHUNK_DIM_3": "1"
32 }
33 }
34 ]
35 }
Expected Behavior
When users run benchmark with sync mode, the program closes the datasets and groups normally, no matter what the input of DELAYED_CLOSE_TIMESTEPS
is.
Software Environment
Additional information
N/A
I tried to build h5bench using HDF5 1.12.0 (and develop) but the build always fails with errors like this:
error: 'H5ES_NONE' undeclared (first use in this function)
(But also symbols like H5ESclose
and H5Gclose_async
.)
I am not specifying any CMake arguments, so I thought that any async code would be disabled. Am I doing anything wrong?
Bug Report
The python h5bench in the function check_parallel compares the command line to the shell environment variable, if they don't match it exits with the error print('You should not call MPI directly when running h5bench.') command[0]=/bin/sh and shell=/bin/sh
To Reproduce
How are you building/running h5bench?
slurm batch job on a cray, (with no srun command).
What is the input configuration file you use?
Doesn't get that far!
Expected Behavior
A clear and concise description of what you expected to happen.
Software Environment
Additional information
Add any other information about the problem here.
Bug Report
In h5bench/docs/source/vpic.rst, the documentation for Supported Read Patterns section in read/write page displayed incorrectly as the there is probably missing a line breaker in the RST source code.
To Reproduce
H5bench documentation page
Hi there,
I'm using H5Bench master
branch to test vol-provenance connector.
When I was testing h5bench_write with following command:
./h5bench_write write_cc1d.cfg test.h5
It prints out following error messages:
HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 0:
#000: ../../hdf5/src/H5VLcallback.c line 4184 in H5VLfile_close(): unable to close file
major: Virtual Object Layer
minor: Unable to close file
#001: ../../hdf5/src/H5VLcallback.c line 4116 in H5VL__file_close(): file close failed
major: Virtual Object Layer
minor: Unable to close file
#002: ../../hdf5/src/H5VLnative_file.c line 778 in H5VL__native_file_close(): can't close file
major: File accessibility
minor: Unable to decrement reference count
#003: ../../hdf5/src/H5Fint.c line 2340 in H5F__close(): can't close file, there are objects still open
major: File accessibility
minor: Unable to close file
------- PROVENANCE VOL WRAP CTX Free
HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 0:
#000: ../../hdf5/src/H5F.c line 1111 in H5Fclose_async(): decrementing file ID failed
major: File accessibility
minor: Unable to close file
#001: ../../hdf5/src/H5Iint.c line 1201 in H5I_dec_app_ref_async(): can't asynchronously decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#002: ../../hdf5/src/H5Iint.c line 1118 in H5I__dec_app_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#003: ../../hdf5/src/H5Fint.c line 249 in H5F__close_cb(): unable to close file
major: File accessibility
minor: Unable to close file
#004: ../../hdf5/src/H5VLcallback.c line 4147 in H5VL_file_close(): file close failed
major: Virtual Object Layer
minor: Unable to close file
#005: ../../hdf5/src/H5VLcallback.c line 4116 in H5VL__file_close(): file close failed
major: Virtual Object Layer
minor: Unable to close file
#006: ../../hdf5/src/H5VLcallback.c line 4184 in H5VLfile_close(): unable to close file
major: Virtual Object Layer
minor: Unable to close file
#007: ../../hdf5/src/H5VLcallback.c line 4116 in H5VL__file_close(): file close failed
major: Virtual Object Layer
minor: Unable to close file
#008: ../../hdf5/src/H5VLnative_file.c line 778 in H5VL__native_file_close(): can't close file
major: File accessibility
minor: Unable to decrement reference count
#009: ../../hdf5/src/H5Fint.c line 2340 in H5F__close(): can't close file, there are objects still open
major: File accessibility
minor: Unable to close file
================== Performance results =================
Total emulated compute time 4000 ms
Total write size = 2560 MB
Raw write time = 1.037 sec
Metadata time = 10.315 ms
H5Fcreate() takes 182.754 ms
H5Fflush() takes 5352.875 ms
H5Fclose() takes 0.490 ms
Observed completion time = 10.599 sec
Sync Raw write rate = 2469.526 MB/sec
Sync Observed write rate = 387.908 MB/sec
===========================================================
It seems there are some issues with 'close' APIs, but the program is able to exit normally.
I'm using HDF5 develop
branch and vol-provenance develop
branch both cloned from hpc-io git repo.
My enviromental variables are:
export HDF5_VOL_CONNECTOR="provenance under_vol=0;under_info={};path=/home/runzhou/Downloads/myhdfstuff/vol-provenance/my_trace.log;level=2;format="
export HDF5_PLUGIN_PATH=/home/runzhou/Downloads/myhdfstuff/vol-provenance
export LD_LIBRARY_PATH=/home/runzhou/Downloads/myhdfstuff/vol-provenance:/home/runzhou/Downloads/myhdfstuff/hdf5-develop/build/hdf5/lib:$LD_LIBRARY_PATH
Moreover, I also found similar errors without redirecting I/Os to vol-provenance connector. It has the same issue with H5Fclose_async()
:
HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 0:
#000: ../../hdf5/src/H5F.c line 1111 in H5Fclose_async(): decrementing file ID failed
major: File accessibility
minor: Unable to close file
#001: ../../hdf5/src/H5Iint.c line 1201 in H5I_dec_app_ref_async(): can't asynchronously decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#002: ../../hdf5/src/H5Iint.c line 1118 in H5I__dec_app_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#003: ../../hdf5/src/H5Fint.c line 249 in H5F__close_cb(): unable to close file
major: File accessibility
minor: Unable to close file
#004: ../../hdf5/src/H5VLcallback.c line 4147 in H5VL_file_close(): file close failed
major: Virtual Object Layer
minor: Unable to close file
#005: ../../hdf5/src/H5VLcallback.c line 4116 in H5VL__file_close(): file close failed
major: Virtual Object Layer
minor: Unable to close file
#006: ../../hdf5/src/H5VLnative_file.c line 778 in H5VL__native_file_close(): can't close file
major: File accessibility
minor: Unable to decrement reference count
#007: ../../hdf5/src/H5Fint.c line 2340 in H5F__close(): can't close file, there are objects still open
major: File accessibility
minor: Unable to close file
Performance measured with 1 ranks,
================== Performance results =================
Total emulated compute time 4000 ms
Total write size = 2560 MB
Raw write time = 1.009 sec
Metadata time = 4.757 ms
H5Fcreate() takes 175.785 ms
H5Fflush() takes 5237.142 ms
H5Fclose() takes 0.307 ms
Observed completion time = 10.443 sec
Sync Raw write rate = 2536.316 MB/sec
Sync Observed write rate = 397.357 MB/sec
I would very much appreciate any suggestions on this problem!
Issue Description
I'd like to know if h5bench supports building with the sequential HDF5.
This is what I have tried so far:
my HDF5_PATH
is built with only h5cc.
cd build
cmake .. \
-DCMAKE_INSTALL_PREFIX=$INSTALL_DIR \
-DCMAKE_C_FLAGS="-I/$HDF5_PATH/include \
-L/$HDF5_PATH/lib"
make
make install
This is the error I am facing during installation:
-- h5bench baseline: ON
-- h5bench METADATA: OFF
-- h5bench EXERCISER: OFF
-- h5bench AMREX: OFF
-- h5bench OPENPMD: OFF
-- h5bench E3SM: OFF
-- h5bench MACSIO: OFF
-- Found HDF5: hdf5-shared (found version "1.14.0")
-- Using HDF5 version: 1.14.0
-- Looking for H5_HAVE_SUBFILING_VFD
-- Looking for H5_HAVE_SUBFILING_VFD - not found
-- Detected HDF5 subfiling support:
-- HDF5 VOL ASYNC: OFF
-- Found Python3: /share/apps/python/miniconda3.7/bin/python3.7 (found version "3.7.7") found components: Interpreter
-- Configuring done
-- Generating done
-- Build files have been written to: /MY_PATH/scripts/vlen_workflow/h5bench/build
[ 8%] Building C object CMakeFiles/h5bench_util.dir/commons/h5bench_util.c.o
[ 16%] Linking C static library libh5bench_util.a
[ 16%] Built target h5bench_util
[ 25%] Building C object CMakeFiles/h5bench_read.dir/h5bench_patterns/h5bench_read.c.o
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c: In function ‘set_dspace_plist’:
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:83:9: warning: implicit declaration of function ‘H5Pset_dxpl_mpio’; did you mean ‘H5Pset_fapl_stdio’? [-Wimplicit-function-declaration]
83 | H5Pset_dxpl_mpio(*plist_id_out, H5FD_MPIO_COLLECTIVE);
| ^~~~~~~~~~~~~~~~
| H5Pset_fapl_stdio
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c: In function ‘read_h5_data’:
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:97:5: warning: implicit declaration of function ‘H5Pset_all_coll_metadata_ops’ [-Wimplicit-function-declaration]
97 | H5Pset_all_coll_metadata_ops(dapl, true);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c: In function ‘set_pl’:
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:397:9: warning: implicit declaration of function ‘H5Pset_fapl_mpio’; did you mean ‘H5Pset_fapl_stdio’? [-Wimplicit-function-declaration]
397 | H5Pset_fapl_mpio(*fapl, MPI_COMM_WORLD, MPI_INFO_NULL);
| ^~~~~~~~~~~~~~~~
| H5Pset_fapl_stdio
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:397:33: error: ‘MPI_COMM_WORLD’ undeclared (first use in this function)
397 | H5Pset_fapl_mpio(*fapl, MPI_COMM_WORLD, MPI_INFO_NULL);
| ^~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:397:33: note: each undeclared identifier is reported only once for each function it appears in
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:397:49: error: ‘MPI_INFO_NULL’ undeclared (first use in this function)
397 | H5Pset_fapl_mpio(*fapl, MPI_COMM_WORLD, MPI_INFO_NULL);
| ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:401:5: warning: implicit declaration of function ‘H5Pset_coll_metadata_write’ [-Wimplicit-function-declaration]
401 | H5Pset_coll_metadata_write(*fapl, true);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c: In function ‘main’:
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:415:5: warning: implicit declaration of function ‘MPI_Init_thread’ [-Wimplicit-function-declaration]
415 | MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &mpi_thread_lvl_provided);
| ^~~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:415:35: error: ‘MPI_THREAD_MULTIPLE’ undeclared (first use in this function)
415 | MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &mpi_thread_lvl_provided);
| ^~~~~~~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:417:5: warning: implicit declaration of function ‘MPI_Comm_rank’ [-Wimplicit-function-declaration]
417 | MPI_Comm_rank(MPI_COMM_WORLD, &MY_RANK);
| ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:417:19: error: ‘MPI_COMM_WORLD’ undeclared (first use in this function)
417 | MPI_Comm_rank(MPI_COMM_WORLD, &MY_RANK);
| ^~~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:418:5: warning: implicit declaration of function ‘MPI_Comm_size’ [-Wimplicit-function-declaration]
418 | MPI_Comm_size(MPI_COMM_WORLD, &NUM_RANKS);
| ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:513:5: error: unknown type name ‘MPI_Info’
513 | MPI_Info info = MPI_INFO_NULL;
| ^~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:513:21: error: ‘MPI_INFO_NULL’ undeclared (first use in this function)
513 | MPI_Info info = MPI_INFO_NULL;
| ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:519:5: warning: implicit declaration of function ‘MPI_Barrier’ [-Wimplicit-function-declaration]
519 | MPI_Barrier(MPI_COMM_WORLD);
| ^~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:521:5: warning: implicit declaration of function ‘MPI_Allreduce’ [-Wimplicit-function-declaration]
521 | MPI_Allreduce(&NUM_PARTICLES, &TOTAL_PARTICLES, 1, MPI_LONG_LONG, MPI_SUM, MPI_COMM_WORLD);
| ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:521:56: error: ‘MPI_LONG_LONG’ undeclared (first use in this function)
521 | MPI_Allreduce(&NUM_PARTICLES, &TOTAL_PARTICLES, 1, MPI_LONG_LONG, MPI_SUM, MPI_COMM_WORLD);
| ^~~~~~~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:521:71: error: ‘MPI_SUM’ undeclared (first use in this function)
521 | MPI_Allreduce(&NUM_PARTICLES, &TOTAL_PARTICLES, 1, MPI_LONG_LONG, MPI_SUM, MPI_COMM_WORLD);
| ^~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:522:5: warning: implicit declaration of function ‘MPI_Scan’ [-Wimplicit-function-declaration]
522 | MPI_Scan(&NUM_PARTICLES, &FILE_OFFSET, 1, MPI_LONG_LONG, MPI_SUM, MPI_COMM_WORLD);
| ^~~~~~~~
/MY_PATH/scripts/vlen_workflow/h5bench/h5bench_patterns/h5bench_read.c:634:5: warning: implicit declaration of function ‘MPI_Finalize’ [-Wimplicit-function-declaration]
634 | MPI_Finalize();
| ^~~~~~~~~~~~
make[2]: *** [CMakeFiles/h5bench_read.dir/h5bench_patterns/h5bench_read.c.o] Error 1
make[1]: *** [CMakeFiles/h5bench_read.dir/all] Error 2
make: *** [all] Error 2
Software Environment
Bug Report
Currently, the metadata timing report may include dataset operations' time, which is due to pending dataset operations holding the HDF5 global mutex and blocking any H5ESwait.
To Reproduce
How are you building/running h5bench?
Normal build with async I/O enabled.
What is the input configuration file you use?
h5bench write benchmarks
Expected Behavior
One workaround is to reorder the H5ESwait calls and wait on the dataset operations first. This way we will have an accurate recording of the dataset timing, the metadata timing could be less than actual as it may be included in the dataset timing, but since the metadata time is usually small, this could be a worthy trade-off.
Software Environment
Additional information
N/A
The install step throwed some errors, related to location of installation.
This is what I have tried so far:
cori build $ make
Scanning dependencies of target h5bench_util
[ 8%] Building C object CMakeFiles/h5bench_util.dir/commons/h5bench_util.c.o
[ 16%] Linking C static library libh5bench_util.a
[ 16%] Built target h5bench_util
Scanning dependencies of target h5bench_read
[ 25%] Building C object CMakeFiles/h5bench_read.dir/h5bench_patterns/h5bench_read.c.o
[ 33%] Linking C executable h5bench_read
[ 33%] Built target h5bench_read
Scanning dependencies of target h5bench_append
[ 41%] Building C object CMakeFiles/h5bench_append.dir/h5bench_patterns/h5bench_append.c.o
[ 50%] Linking C executable h5bench_append
[ 50%] Built target h5bench_append
Scanning dependencies of target h5bench_overwrite
[ 58%] Building C object CMakeFiles/h5bench_overwrite.dir/h5bench_patterns/h5bench_overwrite.c.o
[ 66%] Linking C executable h5bench_overwrite
[ 66%] Built target h5bench_overwrite
Scanning dependencies of target h5bench_write_unlimited
[ 75%] Building C object CMakeFiles/h5bench_write_unlimited.dir/h5bench_patterns/h5bench_write_unlimited.c.o
[ 83%] Linking C executable h5bench_write_unlimited
[ 83%] Built target h5bench_write_unlimited
Scanning dependencies of target h5bench_write
[ 91%] Building C object CMakeFiles/h5bench_write.dir/h5bench_patterns/h5bench_write.c.o
[100%] Linking C executable h5bench_write
[100%] Built target h5bench_write
cori build $ make install
[ 16%] Built target h5bench_util
[ 33%] Built target h5bench_read
[ 50%] Built target h5bench_append
[ 66%] Built target h5bench_overwrite
[ 83%] Built target h5bench_write_unlimited
[100%] Built target h5bench_write
Install the project...
-- Install configuration: "Debug"
-- Installing: /usr/local/bin/h5bench
CMake Error at cmake_install.cmake:41 (file):
file INSTALL cannot copy file
"/global/homes/d/dbin/work/h5bench/src/h5bench.py" to
"/usr/local/bin/h5bench": Permission denied.
Add a quick-start section in the documentation with a simple example of how to compile, run, get the results, and understand them.
I installed hdf5-mpi via "brew install hdf5-mpi" on MacOS, and the "cmake .. " can find it correctly.
-- Found HDF5: -- Found HDF5: /usr/local/Cellar/hdf5-mpi/1.14.1/lib/libhdf5.dylib;/usr/local/opt/libaec/lib/libsz.dylib;/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/lib/libz.tbd;/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/lib/libdl.tbd;/Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk/usr/lib/libm.tbd (found version "1.14.1-2")
-- Using HDF5 version: 1.14.1-2
-- Looking for H5_HAVE_SUBFILING_VFD
-- Looking for H5_HAVE_SUBFILING_VFD - not found
But when I tried to make it, 'hdf5.h' file cannot be found.
./h5bench/commons/h5bench_util.c:17:10: fatal error: 'hdf5.h' file not found
#include <hdf5.h>
^~~~~~~~
1 error generated.
I then tried the command "export CPATH="/usr/local/Cellar/hdf5-mpi/1.14.1/", and some new errors showed up:
./h5bench/h5bench_patterns/h5bench_write.c:340:9: error: implicit declaration of function 'H5Pset_dxpl_mpio' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
H5Pset_dxpl_mpio(*plist_id_out, H5FD_MPIO_COLLECTIVE);
^
./h5bench/h5bench_patterns/h5bench_write.c:931:9: error: implicit declaration of function 'H5Pset_all_coll_metadata_ops' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
H5Pset_all_coll_metadata_ops(fapl, 1);
^
./h5bench/h5bench_patterns/h5bench_write.c:932:9: error: implicit declaration of function 'H5Pset_coll_metadata_write' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
H5Pset_coll_metadata_write(fapl, 1);
^
Include a flexible tool to parse the results from all the benchmarks.
Bug Report
Confirm the exerciser is not running with 4 dim yet and update documentation accordingly, possibly adding a sanity check on the code.
Update the documentation with subfiling options regarding the already merged PR #64
h5bench_read does not have support for collective like h5bench_write
Bug Report
Some configurations of exerciser cause the benchmark to fail.
To Reproduce
For instance:
{
"benchmark": "exerciser",
"configuration": {
"numdims": "4",
"minels": "4 4 4 4",
"dimranks": "8 4 4 2",
"nsizes": "4",
"bufmult": "4 4 4 4"
}
},
If we reduce the bufmult
to 2 2 2 2
it seems to run to completion. However, with that particular configuration we get the following error:
HDF5-DIAG: Error detected in HDF5 (1.13.2) MPI-process 0:
#000: H5D.c line 1227 in H5Dwrite(): can't synchronously write data
major: Dataset
minor: Write failed
#001: H5D.c line 1174 in H5D__write_api_common(): can't write data
major: Dataset
minor: Write failed
#002: H5VLcallback.c line 2181 in H5VL_dataset_write(): dataset write failed
major: Virtual Object Layer
minor: Write failed
#003: H5VLcallback.c line 2148 in H5VL__dataset_write(): dataset write failed
major: Virtual Object Layer
minor: Write failed
#004: H5VLnative_dataset.c line 345 in H5VL__native_dataset_write(): can't write data
major: Dataset
minor: Write failed
#005: H5Dio.c line 381 in H5D__write(): src and dest dataspaces have different number of elements selected
major: Invalid arguments to routine
minor: Bad value
Also, using values >5 for that variable will cause a segfault. I could not find in the documentation any restrictions or constraints when setting those values, so I assume both settings are valid.
Bug Report
The write benchmark does not close the dataset correctly using configuration "MEM_PATTERN": "CONTIG"
and "FILE_PATTERN": "INTERLEAVED"
.
$ cat full-test/a7311c05/stderr
HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 1:
#000: H5D.c line 504 in H5Dclose_async(): not a dataset ID
major: Invalid arguments to routine
minor: Inappropriate type
...
To Reproduce
How are you building/running h5bench?
h5bench --debug configuration.json
What is the input configuration file you use?
{
"mpi": {
"command": "mpirun",
"ranks": "2"
},
"vol": {
},
"file-system": {
},
"directory": "full-test",
"benchmarks": [
{
"benchmark": "write",
"file": "test.h5",
"configuration": {
"MEM_PATTERN": "CONTIG",
"FILE_PATTERN": "INTERLEAVED",
"NUM_PARTICLES": "1 M",
"TIMESTEPS": "5",
"DELAYED_CLOSE_TIMESTEPS": "2",
"COLLECTIVE_DATA": "NO",
"COLLECTIVE_METADATA": "YES",
"EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s",
"NUM_DIMS": "1",
"DIM_1": "16777216",
"DIM_2": "1",
"DIM_3": "1",
"CSV_FILE": "output.csv",
"MODE": "SYNC"
}
}
]
}
Expected Behavior
No error should be reported in stderr.
Software Environment
Bug Report
The reported total emulated compute time unit does not match the value informed in the configuration file. For instance, when setting 1 s
it reports 4 ms
for a total of 5 timesteps.
To Reproduce
=======================================
Benchmark configuration:
File: ../h5bench_patterns/sample_config/sample_write_cc1d.cfg
Number of particles per rank: 16777216
Number of time steps: 5
Emulated compute time per timestep: 1
Async mode = 0 (0: ASYNC_NON; 1: ASYNC_EXP; 2: ASYNC_IMP)
Collective metadata operations: NO.
Collective buffering for data operations: NO.
Number of dimensions: 1
Dim_1: 16777216
=======================================
Start benchmark: h5bench_write, Number of particles per rank: 16 M
Total number of particles: 496M
==PDC_CLIENT: PDC_DEBUG set to 0!
==PDC_CLIENT[0]: Found 1 PDC Metadata servers, running with 31 PDC clients
==PDC_CLIENT: using ofi+tcp
==PDC_CLIENT[0]: Client lookup all servers at start time!
==PDC_CLIENT[0]: using [./pdc_tmp] as tmp dir, 31 clients per server
Collective write: disabled.
Opened HDF5 file...
Writing Timestep_0 ...
Computing...
Writing Timestep_1 ...
Computing...
Writing Timestep_2 ...
Computing...
Writing Timestep_3 ...
Computing...
Writing Timestep_4 ...
Performance measured with 31 ranks,
================== Performance results =================
Total emulated compute time 4 ms
Total write size = 77 GB
Raw write time = 121.619 sec
Metadata time = 0.324 sec
H5Fcreate() takes 397257.000 sec
H5Fflush() takes 9.000 sec
H5Fclose() takes 1421.000 sec
Observed completion time = 126.427 sec
Sync Raw write rate = 0.633 GB/sec
Sync Observed write rate = 0.629 GB/sec
===========================================================
Expected Behavior
The reported emulated compute time should match the unit provided in the configuration file, for this case, seconds.
Total emulated compute time 4 s
Software Environment
For exerciser runs, "usechunked": “True"
is not working
ERROR - unrecognized parameter: True. Exitting.
I only used 2 and 3 dims.
Options are from https://h5bench.readthedocs.io/en/latest/exerciser.html
Update the output with the results from the baseline benchmarks. These should include the observed rate and time. Point the user to where he can get full details about the execution.
1. Missing "MODE" in the example configuration file
Using the configuration.json
file provided in h5bench.readthedocs.io returns the following error:
2022-08-02 11:47:13,410 h5bench - INFO - Starting h5bench Suite
2022-08-02 11:47:13,410 h5bench - WARNING - Base directory already exists: output
2022-08-02 11:47:13,410 h5bench - INFO - Lustre support not detected
2022-08-02 11:47:13,410 h5bench - INFO - h5bench [write] - Starting
2022-08-02 11:47:13,410 h5bench - INFO - h5bench [write] - DIR: output/c9a31822/
2022-08-02 11:47:13,411 h5bench - INFO - Parallel setup: mpirun -np 8
2022-08-02 11:47:13,411 h5bench - ERROR - Unable to run the benchmark: 'MODE'
The configuration.json
file is also available in the GitHub repo here.
Looking at its first write
benchmark:
{
"benchmark": "write",
"file": "test.h5",
"configuration": {
"MEM_PATTERN": "CONTIG",
"FILE_PATTERN": "CONTIG",
"NUM_PARTICLES": "16 M",
"TIMESTEPS": "5",
"DELAYED_CLOSE_TIMESTEPS": "2",
"COLLECTIVE_DATA": "NO",
"COLLECTIVE_METADATA": "NO",
"EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s",
"NUM_DIMS": "1",
"DIM_1": "16777216",
"DIM_2": "1",
"DIM_3": "1",
"ASYNC_MODE": "NON",
"CSV_FILE": "output.csv"
}
}
ASYNC_MODE
is provided while MODE
is missing.
2. Missing "mode" in the example configuration file (AMRex)
Using the configuration.json file provided in h5bench.readthedocs.io returns the following error:
2022-08-02 12:04:36,003 h5bench - INFO - Starting h5bench Suite
2022-08-02 12:04:36,003 h5bench - WARNING - Base directory already exists: output
2022-08-02 12:04:36,003 h5bench - INFO - Lustre support not detected
2022-08-02 12:04:36,004 h5bench - INFO - h5bench [amrex] - Starting
2022-08-02 12:04:36,004 h5bench - INFO - h5bench [amrex] - DIR: output/a8b627f0/
2022-08-02 12:04:36,004 h5bench - INFO - Parallel setup: mpirun -np 8
2022-08-02 12:04:36,004 h5bench - ERROR - Unable to run the benchmark: 'mode'
The configuration.json
file:
{
"benchmark": "amrex",
"file": "amrex.h5",
"configuration": {
"ncells": "64",
"max_grid_size": "8",
"nlevs": "1",
"ncomp": "6",
"nppc": "2",
"nplotfile": "2",
"nparticlefile": "2",
"sleeptime": "2",
"restart_check": "1",
"hdf5compression": "ZFP_ACCURACY#0.001"
}
}
mode
is missing.
Also, not sure if it's important but there's a mismatch between mode
and MODE
(lowercase vs uppercase).
Software Environment
For exerciser runs, "addattr": "True"
is not working
ERROR - unrecognized parameter: True. Exitting.
Tested using only 2 and 3 dims.
Options are from https://h5bench.readthedocs.io/en/latest/exerciser.html
For exerciser runs, "derivedtype": "True"
is not working.
ERROR - unrecognized parameter: True. Exitting.
This was tested using only 2 and 3 dims.
Bug Report
Ok after some debugging I found the issue with the h5bench_e3sm. It is the way path was constructed for e3sm dataset Once I fixed that I could run it. It might be useful to mention that in the documentation to copy the dataset correctly.
However for the metadata I think the executable its running its incorrect. It is using h5bench_exerciser instead of h5bench_hdf5_iotest
2022-10-04 09:45:49,783 h5bench - INFO - h5bench [write] - Complete
2022-10-04 09:45:49,783 h5bench - INFO - h5bench [metadata] - Starting
2022-10-04 09:45:49,784 h5bench - INFO - h5bench [metadata] - DIR: /p/gpfs1/iopp/temp/h5bench/90bd7c67/
2022-10-04 09:45:49,785 h5bench - INFO - Parallel setup: jsrun -r 1 -a 4 -c 4
2022-10-04 09:45:49,787 h5bench - INFO - jsrun -r 1 -a 4 -c 4 h5bench_exerciser /p/gpfs1/iopp/temp/h5bench/90bd7c67/hdf5_iotest.ini
2022-10-04 09:45:50,130 h5bench - ERROR - Return: 255 (check /p/gpfs1/iopp/temp/h5bench/90bd7c67/stderr for detailed log)
2022-10-04 09:45:50,130 h5bench - INFO - Runtime: 0.3438661 seconds (elapsed time, includes allocation wait time)
2022-10-04 09:45:50,130 h5bench - INFO - h5bench [metadata] - Complete
2022-10-04 09:45:50,130 h5bench - INFO - Finishing h5bench Suite
Also, make sure ASYNC documentation is reported correctly for this benchmark:
I think setting the “ASYNC_MODE" has been deprecated, but the documentation is not updated.
Can you try setting the “MODE" parameter with “SYNC” instead of setting the ASYNC_MODE?
Hello,
I encountered an issue while running Metadata Stress from h5bench-1.4. After installation, I executed ./h5bench_hdf5_iotest hdf5_iotest.ini
and received the following output:
Config loaded from 'hdf5_iotest.ini':
steps=20, arrays=500, rows=100, columns=200, scaling=weak
proc-grid=1x1, slowest-dimension=step, rank=4
layout=contiguous, mpi-io=independent
Wall clock [s]: 1.95
File size [B]: 1600002048
---------------------------------------------
Measurement: _MIN (over MPI ranks)
^MAX (over MPI ranks)
---------------------------------------------
Write phase [s]: _1.51
^1.51
Create time [s]: _0.00
^0.00
Write time [s]: _1.50
^1.50
Write rate [MiB/s]: _1019.80
^1019.80
Read phase [s]: _0.31
^0.31
Read time [s]: _0.30
^0.30
Read rate [MiB/s]: _5133.93
^5133.93
I attempted to use MPI to execute this benchmark with the command mpirun -n 4 ./h5bench_hdf5_iotest hdf5_iotest.ini
, but I encountered an error:
h5bench_hdf5_iotest: /home/zhb/h5bench-1.4/metadata_stress/configuration.c:156: validate: Assertion `pconfig->proc_rows * pconfig->proc_cols == (unsigned)size' failed.
[ubuntu:257356] *** Process received signal ***
[ubuntu:257356] Signal: Aborted (6)
[ubuntu:257356] Signal code: (-6)
[ubuntu:257356] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2ac3162125d0]
[ubuntu:257356] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2ac316455207]
[ubuntu:257356] [ 2] /lib64/libc.so.6(abort+0x148)[0x2ac3164568f8]
[ubuntu:257356] [ 3] /lib64/libc.so.6(+0x2f026)[0x2ac31644e026]
[ubuntu:257356] [ 4] /lib64/libc.so.6(+0x2f0d2)[0x2ac31644e0d2]
[ubuntu:257356] [ 5] ./h5bench_hdf5_iotest[0x404f6a]
[ubuntu:257356] [ 6] ./h5bench_hdf5_iotest[0x40244d]
[ubuntu:257356] [ 7] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2ac3164413d5]
[ubuntu:257356] [ 8] ./h5bench_hdf5_iotest[0x402259]
[ubuntu:257356] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node ubuntu exited on signal 6 (Aborted).
--------------------------------------------------------------------------
It appears that there may be a configuration error. However, my configuration follows the guidelines provided in the documentation:
[DEFAULT]
version = 0
steps = 20
arrays = 500
rows = 100
columns = 200
process-rows = 1
process-columns = 1
scaling = weak
dataset-rank = 4
slowest-dimension = step
layout = contiguous
mpi-io = independent
hdf5-file = hdf5_iotest.h5
csv-file = hdf5_iotest.csv
Could you please assist me in resolving this issue?
For exerciser runs, "indepio": "True"
is not working.
metacoll
is working, though. So, other binary options are supposed to work similarly.
ERROR - unrecognized parameter: True. Exitting.
I only used 2 and 3 dims.
Add VOL async as a variant in Spack installation:
Some sample JSON files have --np 4
, and some have -np
e.g. sync-metadata.json. For mpirun
options, -np
seems to be the right one. https://www.open-mpi.org/doc/v4.0/man1/mpirun.1.php
Today the number of ranks parameter is applied globally to all benchmarks in the JSON file. We should allow users to fine-tune it by providing a per-benchmark option to override this parameter. This will add flexibility and allow the JSON to be re-used between distinct experiment sizes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.