Giter Site home page Giter Site logo

unibas-dmi-hpc / sph-exa Goto Github PK

View Code? Open in Web Editor NEW
79.0 79.0 24.0 24.18 MB

SPH-EXA is a C++20 simulation code for performing hydrodynamics simulations (with gravity and other physics), parallelized with MPI, OpenMP, CUDA, and HIP.

Home Page: https://hpc.dmi.unibas.ch/en/research/pasc-sph-exa2/

License: MIT License

Python 3.13% C++ 76.63% Shell 0.17% Cuda 17.72% CMake 2.35%
computational-fluid-dynamics exascale n-body sph sph-simulations

sph-exa's People

Contributors

acavelan avatar biddisco avatar cflorina avatar cypox avatar dimbert avatar finkandreas avatar gabuzi avatar j-piccinali avatar jaescartin1 avatar jfavre avatar jgphpc avatar jpcoles avatar lexasov avatar lks1248 avatar michalgrabarczyk avatar nestaaaa avatar nknk567 avatar osmanseckinsimsek avatar rmcabezon avatar sekelle avatar sguera avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sph-exa's Issues

Reduce code redundancy between CPU and CUDA implementation

Currently kernels are implemented twice, meaning that if we modify, e.g., momentumAndEnergyIAD.hpp, then we also need to modify cuda/cudaMomentumAndEnergyIAD.cu.

However the code does the same thing for every particle.

For every computeXXX function in sph-exa, we should have a:

namespace kernel{
    inline void kernel::computeXXX(int pi, int *clist, ...)
}

function that takes the particle index as a parameter and only does the computation for that one particle. This function should only accept simple variables and raw pointers (by copy), and no references.

Basically, this function should usable both by OpenMP, OpenACC, and CUDA.

The workflow is something like this:

computeDensity(taskList)
-> calls computeDensity(task)
-> calls inline computeDensity(particleArray)
-> calls inline kernel::computeDensity(int pi, int *clist, ...)

computeDensity(task) will handle data movement for OpenMP / OpenACC offloading / CUDA
computeDensity(particleArray) will handle omp / acc directives / CUDA kernel launch

kernel::computeDensity(int pi, int *clist) is identical for all models. Data movement and CUDA kernel launch are handled separately in computeDensity(task) and computeDensity(particleArray).

The easiest way to do this is probably by starting from the existing CUDA code, which is the most constrained.

The challenge is to compile the CUDA parts independently with nvcc. I am thinking of using a simple #include to import the kernel::computeXXX function. Code structure should look like:

include/sph/
    density.hpp: contains computeDensity(taskList) as well as CPU implementations of computeDensity(task) and computeDensity(particleArray)
    cuda/
        density.cu: contains CUDA implementations of computeDensity(task) and computeDensity(particleArray)
    kernel/
        density.hpp: contains kernel::computeDensity(int pi, int *clist, ...)

kernel/density.hpp is included both in sph/density.hpp and sph/cuda/density.cu.

Of course, we want the same pattern for all computeXXX functions, not just density.

pinned_allocator was not declared in this scope

If I change Task.hpp with:

  4 #ifdef USE_CUDA                                                                 
  5 #include "pinned_allocator.h"                                                   
  6 #endif                

mpi+omp won't compile:

include/Task.hpp:31:22: error: 'pinned_allocator' was not declared in this scope
     std::vector<int, pinned_allocator<int>> neighborsCount;
                      ^~~~~~~~~~~~~~~~
include/Task.hpp:31:22: note: suggested alternative: 'aligned_alloc'

observables compile H5Part unconditionally?

I cannot make sphexa with the flag SPH_EXA_WITH_H5PART:BOOL=OFF because it fails in any case with errors such as:

/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp:101:9: error: there are no arguments to 'H5PartReadFileAttrib' that depend on a template parameter, so a declaration of 'H5PartReadFileAttrib' must be available [-fpermissive]
101 | H5PartReadFileAttrib(h5_file, khGrowthRate.c_str(), &attrValue);
| ^~~~~~~~~~~~~~~~~~~~
/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp:102:9: error: there are no arguments to 'H5PartCloseFile' that depend on a template parameter, so a declaration of 'H5PartCloseFile' must be available [-fpermissive]
102 | H5PartCloseFile(h5_file);
| ^~~~~~~~~~~~~~~
/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp: In instantiation of 'std::unique_ptr<sphexa::IObservables > sphexa::observablesFactory(const string&, std::ofstream&) [with Dataset = sphexa::ParticlesData<double, long unsigned int, cstone::CpuTag>; std::string = std::__cxx11::basic_string; std::ofstream = std::basic_ofstream]':
/users/jfavre/Projects/SPH-EXA/main/src/sphexa/sphexa.cpp:153:109: required from here
/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp:100:78: error: 'H5PartOpenFile' was not declared in this scope
100 | h5_file = H5PartOpenFile(testCase.c_str(), H5PART_READ);
| ^
/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp:101:71: error: 'H5PartReadFileAttrib' was not declared in this scope
101 | H5PartReadFileAttrib(h5_file, khGrowthRate.c_str(), &attrValue);
| ^
/users/jfavre/Projects/SPH-EXA/main/src/observables/factory.hpp:102:24: error: 'H5PartCloseFile' was not declared in this scope
102 | H5PartCloseFile(h5_file);

is there redundancy between SPH_EXA_WITH_H5PART and SPH_EXA_HAVE_H5PART?

NVCC does not define USE_MPI when compiling with omp+mpi+cuda

Branch
Occurs in branch gpu-hackathlon

Explanation
The cuda compier is not seeing the USE_MPI flag when compiling ".cu" files. This raises a serious issue when using cuda+mpi. The ParticlesData struct compiled with mpic++ is different from the one compiled with nvcc. This is not shown as error nor as warning during compilation but is problematic when I added the DeviceParticleData.

How to reproduce?

  1. Add:
    printf("[DEBUG] -- sizeof: %ld\n", sizeof(Dataset));
    To file: density.hpp
    At line: 109 (before cuda::computeDensity<T>(l, taskList, d); )

  2. Add:
    printf("[DEBUG] -- sizeof: %ld\n", sizeof(ParticlesData));
    To file: cudaDensity.cu
    At line: 53 (before const int maz = d.bbox.PBCz ? 2 : 0; )

  3. While we should get similar results (the two structures should be the same), we get different sized because the variables declared in ParticlesData.cpp after the USE_MPI definition are not included in the struct passed to the cuda file.

How to reproduce 2?

Just add:
#ifdef USE_MPI #warning "BROOKS WAS HERE" #endif
To one of the *.cu files and compile with "mpi+omp+cuda".

MPI IO missing particle with more than 1 MPI rank

In MPIFileutils.hpp, the code has a wrong assumption on the number of particles per node.

const size_t split = d.n / d.nrank;
const size_t remaining = d.n - d.nrank * split;

const MPI_Offset col = d.n * sizeof(double);
MPI_Offset offset = d.rank * split * sizeof(double);
if (d.rank == 0) offset += remaining * sizeof(double);

-wextra not caught as error

If the user gives the argument -wextra instead of --wextra (note that two dashes should be used), sphexa takes the argument and silently ignores it. It should fail immediately after starting and complain that -wextra is not known.
If this error is not caught at the beginning the user might perform a full simulation expecting some additional outputs that will never occur.

Continous Integration / Testing

Add several tests (travis? jenkins?)

Very Small (one iteration, 20x20x20 cube)
mpirun -np 1 bin/mpi+omp.app -s 0 -n 20

Small (100 iterations, 20x20x20 cube)
mpirun -np 4 bin/mpi+omp.app -s 100 -n 20

Medium (10 iterations, 100x100x100 cube)
mpirun -np 4 bin/mpi+omp.app -s 10 -n 100

Large (10 iterations, 300x300x300 cube)
mpirun -np 16 bin/mpi+omp.app -s 10 -n 300

We should look at the total number of neighbors, avg neighbors per particles, total internal energy and total energy values. It is expected that GPU runs have slightly different values. In the long run, they might have slightly different number of neighbors as well.

I think this is a good start. If possible, we should repeat the test with:
gcc, clang, cray cce, intel and pgi

And with the following models:
mpi+omp
mpi+omp+target
mpi+omp+acc
mpi+omp+cuda

The test should be a success if mpi+omp and mpi+omp+cuda models are passing. We know already that on Daint target and acc are not well supported by all compilers. acc should work with pgi, and target should work with cray cce (perhaps clang as well?), gcc will run it on the CPU...

Bonus: It would be super interesting to get a table with the runtimes and failure / success of the tests with the Large scenario.

mpi+omp+target fails to link the lookup table during compilation

3 warnings generated. @E@nvlink error : Undefined reference to '_ZN6sphexa13lookup_tablesL20wharmonicLookupTableE' in '/tmp/cooltmp-a1af33/tmp_cce_omp_offload_linker__sqpatch.o__sec.cubin' @E@nvlink error : Undefined reference to '_ZN6sphexa13lookup_tablesL30wharmonicDerivativeLookupTableE' in '/tmp/cooltmp-a1af33/tmp_cce_omp_offload_linker__sqpatch.o__sec.cubin' clang-9: error: linker command failed with exit code 255 (use -v to see invocation) make: *** [Makefile:77: mpi+omp+target] Error 255

Proposed solution: make the lookup table a local variable instead of a global variable. We need to pass it on to the sph kernels, somehow.

viz wrapper breaks compilation with Ascent and Catalyst in-situ wrappers

In file included from /users/jfavre/Projects/SPH-EXA/src/sedov/sedov.cpp:19:
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h: In function 'void viz::init_ascent(DataType&, long int)':
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h:29:34: error: 'domain' was not declared in this scope
29 | AscentAdaptor::Initialize(d, domain.startIndex());
| ^~~~~~
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h: In function 'void viz::execute(DataType&, long int, long int)':
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h:41:31: error: 'domain' was not declared in this scope
41 | AscentAdaptor::Execute(d, domain.startIndex(), domain.endIndex());

In file included from /users/jfavre/Projects/SPH-EXA/src/sedov/sedov.cpp:19:
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h: In function 'void viz::execute(DataType&, long int, long int)':
/users/jfavre/Projects/SPH-EXA/src/insitu_viz.h:38:33: error: 'domain' was not declared in this scope
38 | CatalystAdaptor::Execute(d, domain.startIndex(), domain.endIndex());

[SUGGESTION] Rename Conserved

Throughout the code a "conserved" field is used to mean variables (or arrays) that preserve their value between iterations. However, a conserved field in physics has a very specific meaning (e.g., a conserved vector field is a vector field that is the gradient of a scalar potential function). To avoid confusion about the nature of these variables another name should be used. For example, preserved or persistent.

Update Workflow in Evrard, SqPatch, and Windblob tests for PR #37

Merging of PR #37 is blocked because it will break the Evrard test and GPU support.

The first step would be to check / update / cleanup the workflow and command-line parameters to support volume elements in all tests (src/sqpatch, src/evrard, src/windblob).

The sqpatch test has been updated to use volume elements, however it has not been tested.

Are all the steps in the new workflow correct? => Remove 'hacky**' function calls and variables.

cannot compile sphexa with the in-situ Catalyst library on eiger

prgenv:
module load cpeCray CMake cray-hdf5-parallel ParaView
cd
mkdir buildCatalystEiger
cd buildCatalystEiger
cmake
-DCMAKE_CXX_COMPILER=CC
-DINSITU=Catalyst
-DBUILD_ANALYTICAL:BOOL=OFF
-DBUILD_TESTING:BOOL=OFF
-DSPH_EXA_WITH_H5PART:BOOL=OFF
..
make sphexa

In file included from /users/jfavre/Projects/SPH-EXA/domain/include/cstone/domain/domain.hpp:41:
/users/jfavre/Projects/SPH-EXA/domain/include/cstone/focus/octree_focus_mpi.hpp:264:9: error: no member named 'exclusive_scan' in namespace 'std'; did you mean 'stl::exclusive_scan'?
std::exclusive_scan(leafCounts_.begin() + firstIdx, leafCounts_.begin() + lastIdx + 1,
^~~~~~~~~~~~~~~~~~~
stl::exclusive_scan
/

CTest setup for sphexa is useless

For example, tests such as this should use mpiexec to run N>1 ranks if that's what is required

3: Test command: /home/biddisco/build/sphexa/domain/test/integration_mpi/globaloctree
3: Test timeout computed to be: 30
3: [==========] Running 1 test from 1 test suite.
3: [----------] Global test environment set-up.
3: [----------] 1 test from GlobalTree
3: [ RUN      ] GlobalTree.basicRegularTree32
3: unknown file: Failure
3: C++ exception with description "this test needs 2 ranks
3: " thrown in the test body.

My local build gives

44% tests passed, 9 tests failed out of 16

Total Test time (real) =  62.08 sec

The following tests FAILED:
          3 - GlobalTreeTests (Failed)
          4 - GlobalDomainExchange (Failed)
          7 - GlobalHaloExchange (Failed)
         10 - GlobalFocusExchange (Failed)
         12 - GlobalKeyExchange (Failed)
         13 - FocusTransfer (Failed)
         14 - GlobalDomain2Ranks (Failed)
         15 - ComponentUnits (Timeout)
         16 - ComponentUnitsOmp (Failed)
Errors while running CTest

This should be cleaned up.

Fix imHereBecauseOfCrayCompiler bug

This does not look too good:

grep imHereBecauseOfCrayCompilerO2Bug -r include/*

include/sph/density.hpp:    std::vector<T> imHereBecauseOfCrayCompilerO2Bug(4, 10);
include/sph/momentumAndEnergyIAD.hpp:    std::vector<T> imHereBecauseOfCrayCompilerO2Bug(4, 10);
include/sph/findNeighbors.hpp:    std::vector<T> imHereBecauseOfCrayCompilerO2Bug(4, 10);
include/sph/IAD.hpp:    std::vector<T> imHereBecauseOfCrayCompilerO2Bug(4, 10);

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.