unibas-dmi-hpc / sph-exa Goto Github PK

SPH-EXA is a C++20 simulation code for performing hydrodynamics simulations (with gravity and other physics), parallelized with MPI, OpenMP, CUDA, and HIP.

Home Page: https://hpc.dmi.unibas.ch/en/research/pasc-sph-exa2/

License: MIT License

Python 3.13% C++ 76.63% Shell 0.17% Cuda 17.72% CMake 2.35%

computational-fluid-dynamics exascale n-body sph sph-simulations

sph-exa's People

Contributors

Stargazers

Watchers

sph-exa's Issues

Reduce code redundancy between CPU and CUDA implementation

Currently kernels are implemented twice, meaning that if we modify, e.g., momentumAndEnergyIAD.hpp, then we also need to modify cuda/cudaMomentumAndEnergyIAD.cu.

However the code does the same thing for every particle.

For every computeXXX function in sph-exa, we should have a:

namespace kernel{
    inline void kernel::computeXXX(int pi, int *clist, ...)
}

function that takes the particle index as a parameter and only does the computation for that one particle. This function should only accept simple variables and raw pointers (by copy), and no references.

Basically, this function should usable both by OpenMP, OpenACC, and CUDA.

The workflow is something like this:

computeDensity(taskList)
-> calls computeDensity(task)
-> calls inline computeDensity(particleArray)
-> calls inline kernel::computeDensity(int pi, int *clist, ...)

computeDensity(task) will handle data movement for OpenMP / OpenACC offloading / CUDA
computeDensity(particleArray) will handle omp / acc directives / CUDA kernel launch

kernel::computeDensity(int pi, int *clist) is identical for all models. Data movement and CUDA kernel launch are handled separately in computeDensity(task) and computeDensity(particleArray).

The easiest way to do this is probably by starting from the existing CUDA code, which is the most constrained.

The challenge is to compile the CUDA parts independently with nvcc. I am thinking of using a simple #include to import the kernel::computeXXX function. Code structure should look like:

include/sph/
    density.hpp: contains computeDensity(taskList) as well as CPU implementations of computeDensity(task) and computeDensity(particleArray)
    cuda/
        density.cu: contains CUDA implementations of computeDensity(task) and computeDensity(particleArray)
    kernel/
        density.hpp: contains kernel::computeDensity(int pi, int *clist, ...)

kernel/density.hpp is included both in sph/density.hpp and sph/cuda/density.cu.

Of course, we want the same pattern for all computeXXX functions, not just density.

pinned_allocator was not declared in this scope

If I change Task.hpp with:

  4 #ifdef USE_CUDA                                                                 
  5 #include "pinned_allocator.h"                                                   
  6 #endif

mpi+omp won't compile:

include/Task.hpp:31:22: error: 'pinned_allocator' was not declared in this scope
     std::vector<int, pinned_allocator<int>> neighborsCount;
                      ^~~~~~~~~~~~~~~~
include/Task.hpp:31:22: note: suggested alternative: 'aligned_alloc'

observables compile H5Part unconditionally?

I cannot make sphexa with the flag SPH_EXA_WITH_H5PART:BOOL=OFF because it fails in any case with errors such as:

/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp:101:9: error: there are no arguments to 'H5PartReadFileAttrib' that depend on a template parameter, so a declaration of 'H5PartReadFileAttrib' must be available [-fpermissive]
101 | H5PartReadFileAttrib(h5_file, khGrowthRate.c_str(), &attrValue);
| ^~~~~~~~~~~~~~~~~~~~
/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp:102:9: error: there are no arguments to 'H5PartCloseFile' that depend on a template parameter, so a declaration of 'H5PartCloseFile' must be available [-fpermissive]
102 | H5PartCloseFile(h5_file);
| ^~~~~~~~~~~~~~~
/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp: In instantiation of 'std::unique_ptr<sphexa::IObservables > sphexa::observablesFactory(const string&, std::ofstream&) [with Dataset = sphexa::ParticlesData<double, long unsigned int, cstone::CpuTag>; std::string = std::__cxx11::basic_string; std::ofstream = std::basic_ofstream]':
/users/jfavre/Projects/SPH-EXA/main/src/sphexa/sphexa.cpp:153:109: required from here
/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp:100:78: error: 'H5PartOpenFile' was not declared in this scope
100 | h5_file = H5PartOpenFile(testCase.c_str(), H5PART_READ);
| ^
/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp:101:71: error: 'H5PartReadFileAttrib' was not declared in this scope
101 | H5PartReadFileAttrib(h5_file, khGrowthRate.c_str(), &attrValue);
| ^
/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp:102:24: error: 'H5PartCloseFile' was not declared in this scope
102 | H5PartCloseFile(h5_file);

is there redundancy between SPH_EXA_WITH_H5PART and SPH_EXA_HAVE_H5PART?

NVCC does not define USE_MPI when compiling with omp+mpi+cuda

Branch
Occurs in branch gpu-hackathlon

Explanation
The cuda compier is not seeing the USE_MPI flag when compiling ".cu" files. This raises a serious issue when using cuda+mpi. The ParticlesData struct compiled with mpic++ is different from the one compiled with nvcc. This is not shown as error nor as warning during compilation but is problematic when I added the DeviceParticleData.

How to reproduce?

Add:
printf("[DEBUG] -- sizeof: %ld\n", sizeof(Dataset));
To file: density.hpp
At line: 109 (before cuda::computeDensity<T>(l, taskList, d); )
Add:
printf("[DEBUG] -- sizeof: %ld\n", sizeof(ParticlesData));
To file: cudaDensity.cu
At line: 53 (before const int maz = d.bbox.PBCz ? 2 : 0; )
While we should get similar results (the two structures should be the same), we get different sized because the variables declared in ParticlesData.cpp after the USE_MPI definition are not included in the struct passed to the cuda file.

How to reproduce 2?

Just add:
#ifdef USE_MPI #warning "BROOKS WAS HERE" #endif
To one of the *.cu files and compile with "mpi+omp+cuda".

MPI IO missing particle with more than 1 MPI rank

In MPIFileutils.hpp, the code has a wrong assumption on the number of particles per node.

const size_t split = d.n / d.nrank;
const size_t remaining = d.n - d.nrank * split;

const MPI_Offset col = d.n * sizeof(double);
MPI_Offset offset = d.rank * split * sizeof(double);
if (d.rank == 0) offset += remaining * sizeof(double);

Evaluate ADIOS for i/o

Start from https://github.com/omlins/adios2-tutorial/blob/main/parallel_io_webinar_with_ADIOS2_tutorial.pdf

-wextra not caught as error

If the user gives the argument -wextra instead of --wextra (note that two dashes should be used), sphexa takes the argument and silently ignores it. It should fail immediately after starting and complain that -wextra is not known.
If this error is not caught at the beginning the user might perform a full simulation expecting some additional outputs that will never occur.

Continous Integration / Testing

Add several tests (travis? jenkins?)

Very Small (one iteration, 20x20x20 cube)
mpirun -np 1 bin/mpi+omp.app -s 0 -n 20

Small (100 iterations, 20x20x20 cube)
mpirun -np 4 bin/mpi+omp.app -s 100 -n 20

Medium (10 iterations, 100x100x100 cube)
mpirun -np 4 bin/mpi+omp.app -s 10 -n 100

Large (10 iterations, 300x300x300 cube)
mpirun -np 16 bin/mpi+omp.app -s 10 -n 300

We should look at the total number of neighbors, avg neighbors per particles, total internal energy and total energy values. It is expected that GPU runs have slightly different values. In the long run, they might have slightly different number of neighbors as well.

I think this is a good start. If possible, we should repeat the test with:
gcc, clang, cray cce, intel and pgi

And with the following models:
mpi+omp
mpi+omp+target
mpi+omp+acc
mpi+omp+cuda

The test should be a success if mpi+omp and mpi+omp+cuda models are passing. We know already that on Daint target and acc are not well supported by all compilers. acc should work with pgi, and target should work with cray cce (perhaps clang as well?), gcc will run it on the CPU...

Bonus: It would be super interesting to get a table with the runtimes and failure / success of the tests with the Large scenario.

mpi+omp+target fails to link the lookup table during compilation

3 warnings generated. @E@nvlink error : Undefined reference to '_ZN6sphexa13lookup_tablesL20wharmonicLookupTableE' in '/tmp/cooltmp-a1af33/tmp_cce_omp_offload_linker__sqpatch.o__sec.cubin' @E@nvlink error : Undefined reference to '_ZN6sphexa13lookup_tablesL30wharmonicDerivativeLookupTableE' in '/tmp/cooltmp-a1af33/tmp_cce_omp_offload_linker__sqpatch.o__sec.cubin' clang-9: error: linker command failed with exit code 255 (use -v to see invocation) make: *** [Makefile:77: mpi+omp+target] Error 255

Proposed solution: make the lookup table a local variable instead of a global variable. We need to pass it on to the sph kernels, somehow.

Missing header in main/src/observables/factory.hpp

It seems the header init/wind_shock_init.hpp should be included in this file. Otherwise line 129-133 would have missing class definitions.

viz wrapper breaks compilation with Ascent and Catalyst in-situ wrappers

In file included from /users/jfavre/Projects/SPH-EXA/src/sedov/sedov.cpp:19:
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h: In function 'void viz::init_ascent(DataType&, long int)':
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h:29:34: error: 'domain' was not declared in this scope
29 | AscentAdaptor::Initialize(d, domain.startIndex());
| ^~~~~~
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h: In function 'void viz::execute(DataType&, long int, long int)':
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h:41:31: error: 'domain' was not declared in this scope
41 | AscentAdaptor::Execute(d, domain.startIndex(), domain.endIndex());

In file included from /users/jfavre/Projects/SPH-EXA/src/sedov/sedov.cpp:19:
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h: In function 'void viz::execute(DataType&, long int, long int)':
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h:38:33: error: 'domain' was not declared in this scope
38 | CatalystAdaptor::Execute(d, domain.startIndex(), domain.endIndex());

[SUGGESTION] Rename Conserved

Throughout the code a "conserved" field is used to mean variables (or arrays) that preserve their value between iterations. However, a conserved field in physics has a very specific meaning (e.g., a conserved vector field is a vector field that is the gradient of a scalar potential function). To avoid confusion about the nature of these variables another name should be used. For example, preserved or persistent.

Add an automatic release artifact via travis deploy upon pushing to master

I know we can do this with Travis and that would be a nice feature to have.
I propose Travis but I am open to other solutions.

The current release is really old and the master branch has not been updated in long time.

Update Workflow in Evrard, SqPatch, and Windblob tests for PR #37

Merging of PR #37 is blocked because it will break the Evrard test and GPU support.

The first step would be to check / update / cleanup the workflow and command-line parameters to support volume elements in all tests (src/sqpatch, src/evrard, src/windblob).

The sqpatch test has been updated to use volume elements, however it has not been tested.

Are all the steps in the new workflow correct? => Remove 'hacky**' function calls and variables.

cannot compile sphexa with the in-situ Catalyst library on eiger

prgenv:
module load cpeCray CMake cray-hdf5-parallel ParaView
cd
mkdir buildCatalystEiger
cd buildCatalystEiger
cmake
-DCMAKE_CXX_COMPILER=CC
-DINSITU=Catalyst
-DBUILD_ANALYTICAL:BOOL=OFF
-DBUILD_TESTING:BOOL=OFF
-DSPH_EXA_WITH_H5PART:BOOL=OFF
..
make sphexa

In file included from /users/jfavre/Projects/SPH-EXA/domain/include/cstone/domain/domain.hpp:41:
/users/jfavre/Projects/SPH-EXA/domain/include/cstone/focus/octree_focus_mpi.hpp:264:9: error: no member named 'exclusive_scan' in namespace 'std'; did you mean 'stl::exclusive_scan'?
std::exclusive_scan(leafCounts_.begin() + firstIdx, leafCounts_.begin() + lastIdx + 1,
^~~~~~~~~~~~~~~~~~~
stl::exclusive_scan
/

CTest setup for sphexa is useless

For example, tests such as this should use mpiexec to run N>1 ranks if that's what is required

3: Test command: /home/biddisco/build/sphexa/domain/test/integration_mpi/globaloctree
3: Test timeout computed to be: 30
3: [==========] Running 1 test from 1 test suite.
3: [----------] Global test environment set-up.
3: [----------] 1 test from GlobalTree
3: [ RUN      ] GlobalTree.basicRegularTree32
3: unknown file: Failure
3: C++ exception with description "this test needs 2 ranks
3: " thrown in the test body.

My local build gives

44% tests passed, 9 tests failed out of 16

Total Test time (real) =  62.08 sec

The following tests FAILED:
          3 - GlobalTreeTests (Failed)
          4 - GlobalDomainExchange (Failed)
          7 - GlobalHaloExchange (Failed)
         10 - GlobalFocusExchange (Failed)
         12 - GlobalKeyExchange (Failed)
         13 - FocusTransfer (Failed)
         14 - GlobalDomain2Ranks (Failed)
         15 - ComponentUnits (Timeout)
         16 - ComponentUnitsOmp (Failed)
Errors while running CTest

This should be cleaned up.

Fix imHereBecauseOfCrayCompiler bug

This does not look too good:

grep imHereBecauseOfCrayCompilerO2Bug -r include/*

include/sph/density.hpp:    std::vector<T> imHereBecauseOfCrayCompilerO2Bug(4, 10);
include/sph/momentumAndEnergyIAD.hpp:    std::vector<T> imHereBecauseOfCrayCompilerO2Bug(4, 10);
include/sph/findNeighbors.hpp:    std::vector<T> imHereBecauseOfCrayCompilerO2Bug(4, 10);
include/sph/IAD.hpp:    std::vector<T> imHereBecauseOfCrayCompilerO2Bug(4, 10);

GPU Offloading: Overlap communications and computations

Implement async memory copies and kernel launches.

This is already possible in both CUDA OpenMP and OpenACC.

Proof of concept with CUDA by @sebkelle1 : https://github.com/unibas-dmi-hpc/SPH-EXA_mini-app/blob/hpxDist/src/include/sph/cuda/cudaDensity.cu

We should address issue #36 first.

unibas-dmi-hpc / sph-exa Goto Github PK

sph-exa's People

Contributors

Stargazers

Watchers

Forkers

sph-exa's Issues

Recommend Projects

Recommend Topics

Recommend Org