Giter Site home page Giter Site logo

stellar-group / octotiger Goto Github PK

View Code? Open in Web Editor NEW
47.0 11.0 17.0 81.34 MB

Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees

Home Page: http://octotiger.stellar-group.org/

License: Boost Software License 1.0

CMake 6.59% Shell 0.76% C++ 88.46% Makefile 0.03% Python 0.08% Cuda 0.01% Batchfile 0.18% Dockerfile 0.40% Groovy 3.50%
astrophysics hpx cuda-kernels cuda stellar-mergers kokkos simd sycl

octotiger's Introduction

Octo-Tiger

Codacy Badge CITATION-cff DOI

Logo

From https://doi.org/10.1145/3204919.3204938:

Octo-Tiger is an astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees. It was implemented using high-level C++ libraries, specifically HPX and Vc, which allows its use on different hardware platforms.

Build Status [master]

Jenkins - All CPU / GPU node-level tests for the 8 major build configurations:

CPU/GPU Tests with Kokkos, CUDA, HIP, SYCL Build Status

Jenkins - Special machine tests:

KNL Kokkos HPX Backend / SIMD tests Build Status
Development environment tests Build Status

Quick Reference

IRC Channel #ste||ar on libera.chat

  • Where to file issues:

Octo-Tiger Issue Tracker

  • Documentation:

The documentation of the master branch.

Citing

In publications, please use the following publication to cite Octo-Tiger:

  • Dominic C. Marcello, Sagiv Shiber, Orsola De Marco, Juhan Frank, Geoffrey C. Clayton, Patrick M. Motl, Patrick Diehl, Hartmut Kaiser, "Octo-Tiger: A New, 3D Hydrodynamic Code for Stellar Mergers that uses HPX Parallelisation", accepted for publication in the Monthly Notices of the Royal Astronomical Society, 2021

For more publications, please review Octo-Tigers' documentation.

Funding

Allocations

  • Porting Octo-Tiger, an astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees, Testbed, Ookami, Stony Brook University
  • Porting Octo-Tiger, an astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees, Testbed, Fugaku, RIKEN Center for Computational Science

License

Distributed under the Boost Software License, Version 1.0. (See http://www.boost.org/LICENSE_1_0.txt)

octotiger's People

Contributors

biddisco avatar brycelelbach avatar circle-bot avatar diehlpk avatar dmarce1 avatar g-071 avatar gentryx avatar hkaiser avatar jiakunyan avatar junghans avatar khuck avatar msimberg avatar parsa avatar shibersag avatar sithhell avatar srinivasyadav18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

octotiger's Issues

More documentation for the hotspots would be really helpful

I'm working on the optimization of some of the hotspots. For this it would be very helpful if more documentation were available to make it easier for me to understand the implemented algorithm.

Based on the current level of optimization, the following functions are most interesting:

  • in taylor.hpp: taylor<5, simd_vector>::set_basis
  • in grid_fmm.cpp:
    • grid::compute_interactions
    • grid::compute_boundary_interactions_multipole_multipole
    • grid::compute_boundary_interactions_monopole_monopole
    • grid::compute_boundary_interactions_multipole_monopole
    • grid::compute_boundary_interactions_monopole_multipole
      (ordered by their share of runtime)

What would be helpful:

  • Some line comment information would be very useful. Very helpful are comments before loops and non-obvious conditionals. Some explanations for the most important variables are also helpful.
  • It would be useful to know the mathematical formula the taylor expansion is applied to

I selected those hotspots based on experiments of the moving star scenario on my Skylake machine and results shown in the Optimization progress wiki page.

datapar_execution + zip iterator: lambda arguemnts aren't references

I wanted to write a datapar_execution algorithm that modifies the input tuple. But it seems that the argument isn't of a reference type with datapar_execution. The following example works with "seq" execution policy, but produces incorrect results with "datapar_execution" policy.

#include <hpx/include/datapar.hpp>
#include <hpx/include/parallel_for_each.hpp>
#include <hpx/hpx_init.hpp>
#include <hpx/include/iostreams.hpp>
#include <boost/range/functions.hpp>

int hpx_main(boost::program_options::variables_map& vm) {
  std::vector<double> large(64);
  
  auto zip_it_begin = hpx::util::make_zip_iterator(boost::begin(large));
  auto zip_it_end = hpx::util::make_zip_iterator(boost::end(large));
  
  hpx::parallel::for_each(hpx::parallel::seq, zip_it_begin, zip_it_end, [](auto t) -> void {
  // hpx::parallel::for_each(hpx::parallel::datapar_execution, zip_it_begin, zip_it_end, [](auto t) -> void {
      hpx::util::get<0>(t) = 10.0;
    });

  for (double &v: large) {
    hpx::cout << v << " ";
  }
  hpx::cout << std::endl << hpx::flush;
  return hpx::finalize(); // Handles HPX shutdown
}

int main(int argc, char **argv) {

  boost::program_options::options_description desc_commandline(
							       "Usage: " HPX_APPLICATION_STRING " [options]");

  
// Initialize and run HPX
  int return_value = hpx::init(desc_commandline, argc, argv);   
  return return_value;
}

Expected result:

pfandedd@dpfanderLSU ~ $ ./debug/wrong_types 
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 

Actual result:

pfandedd@dpfanderLSU ~ $ ./debug/wrong_types 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

Octotiger Crashes on Cori with slurm problems

When trying to run on cori KNL, I get this crash:

Reading symbols from /project/projectdirs/xpress/hpx-lsu-cori-II/knl-build/octotiger-RelWithDebInfo/octotiger...done.
(gdb) run
Starting program: /global/project/projectdirs/xpress/hpx-lsu-cori-II/knl-build/octotiger-RelWithDebInfo/octotiger -Disableoutput -Problem=moving_star -Max_level=6 -Xscale=32 -Odt=0.5 -Stopstep=0 --hpx:threads=68
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGFPE, Arithmetic exception.
0x00002aaaada0c4e5 in hpx::util::batch_environments::slurm_environment::retrieve_number_of_threads (
    this=0x7fffffffee43)
    at /project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx/src/util/batch_environments/slurm_environment.cpp:353
353	                    num_threads_ = num_pus / num_tasks_;
(gdb) print num_pus
$1 = 272
(gdb) print num_tasks_
$2 = 5332314968118409533

I got an allocation like this:
salloc -N 1 -p debug -C knl,quad,flat -t 30:00 -A xpress

and I ran with these options:

/project/projectdirs/xpress/hpx-lsu-cori-II/knl-build/octotiger-RelWithDebInfo/octotiger -Disableoutput -Problem=moving_star -Max_level=6 -Xscale=32 -Odt=0.5 -Stopstep=0 --hpx:threads=68

The slurm environment variables are:

khuck@nid09195:/project/projectdirs/xpress/hpx-lsu-cori-II> env | grep SLURM
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint
SLURM_NODELIST=nid09195
SLURM_JOB_NAME=sh
SLURMD_NODENAME=nid09195
SLURM_TOPOLOGY_ADDR=s32.s16.nid09195
SLURM_PRIO_PROCESS=0
SLURM_SRUN_COMM_PORT=61678
SLURM_PTY_WIN_ROW=32
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node
SLURM_CPU_BIND_VERBOSE=quiet
SLURM_SPANK_SHIFTER_GID=33734
SLURM_CPU_BIND_LIST=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
SLURM_NNODES=1
SLURM_STEP_NUM_NODES=1
SLURM_JOBID=3394590
SLURM_LAUNCH_NODE_IPADDR=10.128.1.214
SLURM_STEP_ID=0
SLURM_STEP_LAUNCHER_PORT=61678
SLURM_TASKS_PER_NODE=272
SLURM_JOB_ID=3394590
SLURM_JOB_USER=khuck
SLURM_STEPID=0
SLURM_SRUN_COMM_HOST=10.128.1.214
SLURM_CPU_BIND_TYPE=mask_cpu:
SLURM_PTY_WIN_COL=110
SLURM_UMASK=0022
SLURM_JOB_UID=33734
SLURM_NODEID=0
SLURM_STEP_RESV_PORTS=63577
SLURM_SUBMIT_DIR=/global/project/projectdirs/xpress/hpx-lsu-cori-II/scripts
SLURM_TASK_PID=124255
SLURM_CPUS_ON_NODE=272
SLURM_PROCID=0
SLURM_JOB_NODELIST=nid09195
SLURM_PTY_PORT=55387
SLURM_LOCALID=0
SLURM_JOB_CPUS_PER_NODE=272
SLURM_CLUSTER_NAME=cori
SLURM_GTIDS=0
SLURM_SUBMIT_HOST=mom5
SLURM_JOB_PARTITION=knl_reboot
SLURM_STEP_NUM_TASKS=1
SLURM_JOB_NUM_NODES=1
SLURM_STEP_TASKS_PER_NODE=1
SLURM_STEP_NODELIST=nid09195
SLURM_CPU_BIND=quiet,mask_cpu:0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
SLURM_MEM_PER_NODE=92160

Problem with taylor::set_basis

There is definitely an issue with set_basis. The moving star dissipates under the new set_basis, but behaves much more reasonably under the old one. I am trying to figure out what the problem is.

After regriding the memory increases

Running the v1309 more than 15 time steps and regriding happens, the amount of memory grows after some regridings and it does not fit in the memory any more. So it is not possible to run the level fitting in the memory for longer time steps.

slurm-12631276.out:src/central_freelist.cc:333] tcmalloc: allocation failed 8192

Octotiger gets stuck on hpx::finalize when only using one thread

  • Octotiger currently gets stuck on hpx::finalize when using the current master branch of octotiger when using a HPX commit from January 22 (29da281b87cfa1171b42f5e2f22caa29b3c4be2d).
  • This problem is not present when using hpx 1.0.0 with other versions of octotiger (branch kernel_refactoring).
  • This problem is already present when switching away from hpx 1.0.0 to hpx commit c9f2c09d9378d920f7d68ce857ca3729014fc601, which is quite a bit older than the other commit I tested.

Used configuration: gcc 5.4.0, boost 1.63,
Vc: Commit 157026d3adf3494922269d11a2e4dede09bf867c (current commit of branch pfandedd_inlining_AVX512)

Used scenario:
/octotiger -t1 -Problem=moving_star -Max_level=3 -Odt=0.3 -Stoptime=0.2 -Xscale=20.0 -Omega=0.1 -Stopstep=9 -Disableoutput
This problem does not occur when using -t2 (or more).

Note: Due to #60 I currently cannot test it with a more recent version of HPX.

How can I fix this?

Octotiger does not scale to 2 knl nodes on Cori

When running Octotiger on Cori knl,quad,cache, the moving_star problem with -Xscale 32.0 -max_level=5 takes about 4 seconds per timestep on one node, but about 40 seconds per timestep on 2 nodes.

This is using the commands:
srun -n 2 -c 68 -u ./octotiger -t 68 -Problem=moving_star -Xscale=32.0 -Max_level=5 -Disableoutput
and
srun -n 1 -c 68 -u ./octotiger -t 68 -Problem=moving_star -Xscale=32.0 -Max_level=5 -Disableoutput

This is with with tcmalloc.

Please add me to STEllAR-GROUP/Vc

Hi,

could someone give me write-access to the Vc fork?

I want to have a branch there. So that my code is in the project and we don't have yet another fork.

Write global configuration class

There's a mix of global variables (problem function) and static members of node_server (gravity_on flag). All of this state should be abstracted to a single global configuration class.

Optimize Taylor Expansion

See: Hartmut's explanation of the expansion.

From a first analysis, the function taylor<>::set_basis is clearly
dominating the execution time (~25-30% of the overall execution time is
spent in this function). It essentially consists of 3 loop-groups structured
like:

taylor<5, T> A;
A() = ...
for (int a = 0; a != NDIM; ++a) {
   A(a) = ...;
   for (int b = a; b != NDIM; ++b) {
       A(a, b) = ...;
       for (int c = b; c != NDIM; ++c) {
           A(a, b, c) = ...;
           for (int d = c; d != NDIM; ++d) {
               A(a, b, c, d) = ...;
       }
   }
}

(NDIM == 3)

If we manually unroll the loops like this:

A(0) = ...
A(1) = ...
A(2) = ...

A(0, 0) = ...
...
A(2, 2) = ...

A(0, 0, 0) = ...
...
A(2, 2, 2) = ...

A(0, 0, 0, 0) = ...
...
A(2, 2, 2, 2) = ...

We gain about 15% runtime (of set_basis) per loop group!

Apparently we can't ignore the loop overheads in this context. Note that the
A(...) expressions essentially calculate an index into an 1d array inside
the taylor<> class. In the end the unrolled expressions access the elements
of that array in sequential order which improves cache coherency as well.
That means that we will have to come up with a better way to access those
array elements, i.e. something not relying on 4 dimensional iteration.

Octotiger does not compile with gcc 6.3.1

In file included from /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/immintrin.h:41:0,
from /home/diehlpk/git/octotiger/src/simd.hpp:13,
from /home/diehlpk/git/octotiger/src/grid.hpp:13,
from /home/diehlpk/git/octotiger/src/grid_fmm.cpp:7:
/usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/avxintrin.h: In constructor ‘simd_vector::simd_vector(double)’:
/usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/avxintrin.h:1216:1: error: inlining failed in call to always_inline ‘__m256d _mm256_set_pd(double, double, double, double)’: target specific option mismatch
_mm256_set_pd (double __A, double __B, double __C, double __D)
^~~~~~~~~~~~~
In file included from /home/diehlpk/git/octotiger/src/grid.hpp:13:0,
from /home/diehlpk/git/octotiger/src/grid_fmm.cpp:7:
/home/diehlpk/git/octotiger/src/simd.hpp:23:40: note: called from here
#define _mmx_set_pd(d) _mm256_set_pd((d),(d),(d),(d))
~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
/home/diehlpk/git/octotiger/src/simd.hpp:65:10: note: in expansion of macro ‘_mmx_set_pd’
v[i] =_mmx_set_pd(d);

octotiger crashes on my desktop

Some recent update leads to crashes of octotiger on my desktop machine. This might especially interesting for those who recently pushed to hpx or octotiger.

I ran octotiger with:

./bin/octotiger -Disableoutput -Problem=moving_star -Max_level=4 -Stopstep=0 --hpx:threads=4

The output was:

Running
# direct = 743
-----------------------------------------------
checking for refinement
regridding
rebalancing 9 nodes
forming tree connections
regrid done
-----------------------------------------------
---------------Created Level 1---------------

-----------------------------------------------
checking for refinement
regridding
rebalancing 73 nodes
forming tree connections
regrid done
-----------------------------------------------
---------------Created Level 2---------------

-----------------------------------------------
checking for refinement
regridding
rebalancing 585 nodes
forming tree connections
regrid done
-----------------------------------------------
---------------Created Level 3---------------

-----------------------------------------------
checking for refinement
regridding
rebalancing 4681 nodes
forming tree connections
regrid done
-----------------------------------------------
---------------Created Level 4---------------

-----------------------------------------------
checking for refinement
regridding
rebalancing 4681 nodes
forming tree connections
regrid done
-----------------------------------------------
---------------Regridded Level 4---------------

Forming tree connections------------
...done
1.000000e+00 1.000000e+00
Starting...
-----------------------------------------------
checking for refinement
regridding
rebalancing 4681 nodes
forming tree connections
regrid done
-----------------------------------------------
OMEGA = -1.000000e+00, output_dt = 1.000000e-02
0 8.276112e-04 8.276112e-04 6.550337e+02 -8.276112e-04 0.000000e+00 0.000000e+00 -1.000000e+00 0.000000e+00
L1, L2
             rho 1.170957e-04 8.868352e-04
            egas 1.248001e-04 8.377598e-04
              sx 1.096599e-03 2.938433e-03
              sy 1.072766e-03 2.877629e-03
             tau 7.090303e-05 5.147036e-04
    primary_core 1.170957e-04 8.868352e-04
Outputing...

{os-thread}: worker-thread#0
{config}:
  HPX_HAVE_NATIVE_TLS=ON
  HPX_HAVE_STACKTRACES=ON
  HPX_HAVE_COMPRESSION_BZIP2=OFF
  HPX_HAVE_COMPRESSION_SNAPPY=OFF
  HPX_HAVE_COMPRESSION_ZLIB=OFF
  HPX_HAVE_PARCEL_COALESCING=ON
  HPX_HAVE_PARCELPORT_TCP=ON
  HPX_HAVE_PARCELPORT_MPI=OFF
  HPX_HAVE_VERIFY_LOCKS=OFF
  HPX_HAVE_HWLOC=ON
  HPX_HAVE_ITTNOTIFY=OFF
  HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF
  HPX_PARCEL_MAX_CONNECTIONS=512
  HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
  HPX_AGAS_LOCAL_CACHE_SIZE=4096
  HPX_HAVE_MALLOC=tcmalloc
  HPX_PREFIX (configured)=/home/pfandedd/git/hpx/build_without_datapar
  HPX_PREFIX=/home/pfandedd/hpx_install
{version}: V1.0.0-trunk (AGAS: V3.0), Git: 562694af89
{boost}: V1.62.0
{build-type}: release
{date}: Jan 10 2017 10:38:52
{platform}: linux
{compiler}: GNU C++ version 6.2.0 20160901
{stdlib}: GNU libstdc++ version 20160901
{what}: Resource temporarily unavailable: HPX(unhandled_exception)

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<hpx::exception> >'
  what():  Resource temporarily unavailable: HPX(unhandled_exception)

massive memory leak in octotiger

Current master is unusable due to massive memory requirements, possibly due to (a) memory leak(s). However, I don't know how to meaningfully debug this, as I can't get either a valgrind or an address sanitizer build to work (see STEllAR-GROUP/hpx#2746).

Any ideas what we could do?

Octotiger crashes on cori KNL

When I run on cori @ NERSC, I get this transient error in gdb (which I had to build myself):

...
-----------------------------------------------
---------------Created Level 2---------------

-----------------------------------------------
checking for refinement

Thread 8 "octotiger" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaab87d9700 (LWP 170780)]
node_server::check_for_refinement (this=0x2aaacae6ca20)
    at /project/projectdirs/xpress/hpx-lsu-cori-II/src/octotiger/src/node_server_actions_2.cpp:24
24	    if (is_refined) {
(gdb) list
19	}
20	
21	bool node_server::check_for_refinement() {
22	    bool rc = false;
23	    std::vector<hpx::future<bool>> futs;
24	    if (is_refined) {
25	        futs.reserve(children.size());
26	        for (auto& child : children) {
27	            futs.push_back(child.check_for_refinement());
28	        }
(gdb) print is_refined
$1 = 128
(gdb) whatis is_refined
type = bool

remove MPI timers

MPI_WTime() should be replaced with std::chrono and the MPI dependency should be removed.

Distributed output

Silo was not able to write large files with more than 64k sub grids. Somehow it would be beneficial to write several files or a file for each node.

Can't compile octotiger with gcc+datapar enabled

@hkaiser:
Compiling octotiger with datapar (Vc-variant) doesn't compile with gcc 6.2.
With the Vc-types enabed, in https://github.com/STEllAR-GROUP/octotiger/blob/master/src/grid_output.cpp#L282:

			cnt += foo(&(G[iii][0]), sizeof(real), NGF, fp) * sizeof(real);

GCC complains that I take the address of a temporary, because the bracket operator on a Vc-type returns the underlying type (aka double in this case). However, the bracket operator of the original class returns "double &".

The error is therefore due to https://github.com/STEllAR-GROUP/octotiger/blob/master/src/simd.hpp#L403.

There are further errors. Certain construct with a preceding "#pragma GCC ivdep" don't seem to compile with my gcc (6.2).

Octotiger Crashes on initialization on cori

Octotiger crashes on cori. I ran octotiger with

(gdb) run -Disableoutput -Problem=moving_star -Max_level=6 -Stopstep=1 -Xscale=32     -Odt=0.5 -Stoptime=0.1 --hpx:threads=6

from an interactive shell allocated with

salloc -N 1 -p knl -C knl,quad,flat -t 5:00 -A xpress

gdb tells me

0x00002aaab27f9b3f in memkind_check_available () from /global/common/cori/software/memkind/20161027/lib/libmemkind.so.0

Valgrind gives some additional information. It looks like the initialization of memkind fails with some nullpointer:

==92289== Process terminating with default action of signal 11 (SIGSEGV)
==92289==  Access not within mapped region at address 0x0
==92289==    at 0xC95FB3F: memkind_check_available (in /global/common/cori/software/memkind/20161027/lib/libmemkind.so.0.0.1)
==92289==    by 0xCBB5950: autohbw_load (in /global/common/cori/software/memkind/20161027/lib/libautohbw.so.0.0.1)
==92289==    by 0x400E8E9: call_init.part.0 (in /lib64/ld-2.19.so)
==92289==    by 0x400E9D2: _dl_init (in /lib64/ld-2.19.so)
==92289==    by 0x40011C9: ??? (in /lib64/ld-2.19.so)

The HPX examples (seemingly) run correctly, although they hang at shutdown.

Any ideas?

Move SCF to initialize()

main() sets up the problem (either generating it from initial conditions by calling initialize() or by loading it from a checkpoint file) and then call node_server::start_run(). node_server::start_run() takes a bool that specifies whether SCF should be run. SCF should only be run if the problem was generated from initial conditions, so it seems like SCF should probably be in initialize(), or be called by main().

Cleanup issues

Please check our issues and close them if they are not relevant anymore.

Octotiger fails to compile with GCC 5.3

When trying to build octotiger with GCC 5.3, I get compiler errors like these:

In file included from /home/users/khuck/src/octotiger/src/grid.hpp:19:0,
                 from /home/users/khuck/src/octotiger/src/roe.cpp:9:
/home/users/khuck/src/octotiger/src/taylor.hpp: In member function ‘taylor<N, T> taylor<N, T>::operator-() const’:
/home/users/khuck/src/octotiger/src/taylor.hpp:194:14: error: expected primary-expression before ‘]’ token
             [](T const& val)
              ^
/home/users/khuck/src/octotiger/src/taylor.hpp:194:18: error: expected primary-expression before ‘const’
             [](T const& val)
                  ^

Make compute_ilist constexpr if possible

Make compute_ilist constexpr if possible, even if heroics are needed. We have way too much data movement in the gravity solve (compute_interactions()) - our FLOPS/byte ratio is really low. We've got to reduce working set size wherever possible.

Non-linear scaling on single KNL node

octotiger doesn't scale to a single KNL node (on cori).

For more details, see here.

I'll close this issue as soon as close to linear scaling on a single node is observed (or a good reason is given why this isn't possible).

Octotiger won't compile on KNL without Vc

I get this error building octotiger with Intel compilers (and without Vc):

[ 42%] Building CXX object CMakeFiles/octotiger_exe.dir/src/grid_fmm.cpp.o
/opt/cray/pe/craype/2.5.7/bin/CC  -shared -fPIC -DBOOST_DISABLE_ASSERTS -DHPX_APPLICATION_EXPORTS -DHPX_APPLICATION_NAME=octotiger_exe -DHPX_APPLICATION_STRING=\"octotiger_exe\" -DHPX_DISABLE_ASSERTS -DHPX_PREFIX=\"/project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx-build\" -std=c++14 -Wno-attributes -Wno-deprecated-declarations -xmic-avx512 -O3 -DNDEBUG -I/project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx -I/project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx-build -I/project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx/apex/src/apex -I/project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx/examples -I/project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx/tests -I/usr/common/software/boost/1.61/hsw/intel/include -I/project/projectdirs/xpress/hpx-lsu-cori-II/hwloc/include -I/project/projectdirs/xpress/hpx-lsu-cori-II/src/octotiger/src -I/global/homes/k/khuck/include    -std=c++11 -o CMakeFiles/octotiger_exe.dir/src/grid_fmm.cpp.o -c /project/projectdirs/xpress/hpx-lsu-cori-II/src/octotiger/src/grid_fmm.cpp
/project/projectdirs/xpress/hpx-lsu-cori-II/src/octotiger/src/grid_fmm.cpp(707): error: no operator "+" matches these operands
            operand types are: double + simd_vector
                  Y[d] = bnd.x[d] * dx + Xbase[d];
                                       ^

I configured HPX with:

module swap craype-haswell craype-mic-knl
module swap intel/16.0.3.210 intel/17.0.1.132
module load gcc/4.9.3 
module load boost/1.61
module load cmake/3.3.2

cmake \
-DCMAKE_TOOLCHAIN_FILE=/project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx/cmake/toolchains/CrayKNL.cmake \
-DCMAKE_BUILD_TYPE=Release \
-DBOOST_ROOT=$BOOST_ROOT \
-DHPX_WITH_MALLOC=tcmalloc \
-DTCMALLOC_ROOT=/project/projectdirs/xpress/hpx-lsu-cori-II/gperftools \
-DHWLOC_ROOT=/project/projectdirs/xpress/hpx-lsu-cori-II/hwloc \
-DCMAKE_INSTALL_PREFIX=. \
-DHPX_WITH_APEX=TRUE \
-DHPX_WITH_APEX_NO_UPDATE=TRUE \
-DAPEX_WITH_ACTIVEHARMONY=TRUE \
-DACTIVEHARMONY_ROOT=/project/projectdirs/xpress/hpx-lsu-cori-II/activeharmony \
-DAPEX_WITH_OTF2=TRUE \
-DOTF2_ROOT=/project/projectdirs/xpress/hpx-lsu-cori-II/otf2 \
/project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx

and I configured Octotiger with:

cmake \
-DCMAKE_TOOLCHAIN_FILE=/project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx/cmake/toolchains/CrayKNL.cmake \
-DCMAKE_PREFIX_PATH=/project/projectdirs/xpress/hpx-lsu-cori-II/src/hpx-build \
-DHPX_WITH_MALLOC=tcmalloc \
-DCMAKE_BUILD_TYPE=Release \
-DOCTOTIGER_WITH_SILO=OFF \
/project/projectdirs/xpress/hpx-lsu-cori-II/src/octotiger

Move node_client code from node_server_actions*.cpp

Parts of node_client live in node_server_actions_*.cpp - the parts which need a declaration or definition of the server actions. It should be possible to put these into node_client.cpp and just use action declarations.

Distributed Input

Octotiger should have an mechanism to provide several input files for the restart and not all nodes should read one file.

Questions

  • What does regrid(gid_type root, bool rebalance_only) do when rebalance_only is true?
  • Why do we regrid(gid_type root, true) when loading from a checkpoint file? Presumably it was load balanced before it the checkpoint file was written?
  • Why pass the root_gid to regrid() instead of storing it as a member in node_server? Can refinining and rebalancing be separated into two functions (instead of one with a bool flag to turn off refinement), or are they tightly coupled?
  • What does compute_ilist() do? A: Only done at startup A: Pre-computes the dependencies of each point in the subgrid.
  • What does node_server::form_tree() do (I have a basic understanding of this, octopus had a similar operation, but specifics would be good)?
  • What is the ene paramter in node_server::solve_gravity(bool ene) (I'm guessing ene is energy)?
  • Does node_server need to hold a shared_ptr to its grid? If loading from a checkpoint, is it necessary to do solve_gravity() in main()? start_run() will call solve_gravity() early on (and there's no SCF if we loaded from a checkpoint. Similarly, should regrid be called in start_run() if we've loaded from a checkpoint and we're not doing SCF?
  • How do grid::solve_gravity() and node_server::solve_gravity() differ?
  • array seems wrong, shouldn't it be array? A: No, that's correct.
  • What array layout is used? *index() appears to be layout_right (aka C++ indexing)
  • What iteration order is used?
  • Is any vectorization done today?
  • Which extents are known at compile-time?

Compilation fails when using most recent HPX

I have problems building the current master branch of octotiger using the current master branch of HPX (98effdc7e9b8b2ac39da4a023aaa079ffe81c4ee).

I get a lot compilation error about seq when I try to compile octotiger, for example:

/home/daissgr/current-oct/src/hpx/hpx/parallel/datapar/transform_loop.hpp: In static member function ‘static typename std::enable_if<(((! hpx::parallel::util::detail::iterators_datapar_compatible<InIter1, InIter2>::value) || (! hpx::parallel::util::detail::iterator_datapar_compatible::value)) || (! hpx::parallel::util::detail::iterator_datapar_compatible::value)), std::pair<_U1, _U2> >::type hpx::parallel::util::detail::datapar_transform_loop::call(InIter, InIter, OutIter, F&&)’:
/home/daissgr/current-oct/src/hpx/hpx/parallel/datapar/transform_loop.hpp:130:45: error: ‘seq’ is not a member of ‘hpx::parallel::v1’
return util::transform_loop(parallel::v1::seq,

The compilation of hpx itself does not show any errors whatsoever.

This problem does not occur when using a slightly older HPX commit from January 22
( 29da281b87cfa1171b42f5e2f22caa29b3c4be2d)

I am using gcc 5.4.0 and boost 1.63. Full compilation log:
octotiger-compilation-log.txt

Problems in Building Octotiger on MacOS

I have encountered two problems when trying to build octotiger on MacOS:

  1. When running the following cmake command:
cmake -DBOOST_ROOT=$HOME/local/boost \
      -DTCMALLOC_ROOT="/usr/local/opt/gperftools/" \
      -DHWLOC_ROOT="/usr/local/opt/hwloc/" \
      -DBOOST_SUFFIX=-clang-darwin32-mt-1_65 \
      -DCMAKE_PREFIX_PATH="$HOME/local/hpx1.3/" \
      -DCMAKE_CXX_COMPILER=clang++ \
      -DCMAKE_CXX_FLAGS="-std=c++11" \
      -DCMAKE_BUILD_TYPE=release \
      -DCMAKE_INSTALL_PREFIX="$HOME/local/release/octotiger" \
      -DSilo_INCLUDE_DIR=$HOME/local/silo/include \
      -DSilo_LIBRARY=$HOME/local/silo/lib/libsiloh5.a \
      -DVc_DIR=/Users/shiber/local/vc/lib/cmake/Vc \
      -DSilo_BROWSER=$HOME/local/silo \
      ../octotiger

I get the following error:

CMake Error at CMakeLists.txt:348 (target_link_libraries):
  The plain signature for target_link_libraries has already been used with
  the target "octolib".  All uses of target_link_libraries with a target must
  be either all-keyword or all-plain.

  The uses of the plain signature are here:

   * /Users/shiber/local/hpx1.3/lib/cmake/HPX/HPX_SetupTarget.cmake:218 (target_link_libraries)

This error can be bypassed by running CMake without tests and without libquadmath by adding the following flags to the CMake command:

      -DOCTOTIGER_WITH_BLAST_TEST=OFF \
      -DOCTOTIGER_WITH_TESTS=OFF \
      -DOCTOTIGER_WITH_QUADMATH=OFF \

Then, the configuration completes successfully.

  1. When running the make command (after a successful configuration is done), I get these errors:
../octotiger/frontend/main.cpp:140:13: error: use of undeclared identifier '_EM_INEXACT'
        _controlfp(_EM_INEXACT | _EM_DENORMAL | _EM_INVALID, _MCW_EM);
                          ^
../octotiger/frontend/main.cpp:140:27: error: use of undeclared identifier '_EM_DENORMAL'
        _controlfp(_EM_INEXACT | _EM_DENORMAL | _EM_INVALID, _MCW_EM);
                                                      ^
../octotiger/frontend/main.cpp:140:42: error: use of undeclared identifier '_EM_INVALID'
        _controlfp(_EM_INEXACT | _EM_DENORMAL | _EM_INVALID, _MCW_EM);
                                                                                      ^
../octotiger/frontend/main.cpp:140:55: error: use of undeclared identifier '_MCW_EM'
        _controlfp(_EM_INEXACT | _EM_DENORMAL | _EM_INVALID, _MCW_EM);

Which I can bypass by just commenting the following lines (139-140) in /frontend/main.cpp:

#else
         _controlfp(_EM_INEXACT | _EM_DENORMAL | _EM_INVALID, _MCW_EM);.

I am using MacBook Air with macOS High Sierra 10.13.6; Boost 1.65.0; Clang 10.0.0 (clang-1000.10.44.4).
I installed HPX and Boost using Clang and libc++ as instructed in:
https://stellar-group.github.io/hpx/docs/sphinx/latest/html/manual/building_hpx.html

vector<bool> iterator not dereferencable

Upon starting Octotiger, error message vector<bool> iterator not dereferencable
appears. The stack trace shows that it happens while executing this line:

stencil_masks[index] = true;
with index=18446744073709551217 which is invalid, and is coming from:
thread_local std::vector<bool> p2p_interaction_interface::stencil_masks =
calculate_stencil_masks(p2p_interaction_interface::stencil).first;
.

Cleanup branches

Right now we have 54 branches which is too much. Please check if your old branches can be deleted. I deleted all merged branches already.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.