Giter Site home page Giter Site logo

tonyskjellum / mpix-harmonize Goto Github PK

View Code? Open in Web Editor NEW

This project forked from devreal/mpix-harmonize

0.0 0.0 0.0 1.03 MB

Experimental MPI function to synchronize processes in the space and time dimension

License: BSD 3-Clause "New" or "Revised" License

Shell 12.42% C++ 0.99% Python 1.59% C 38.23% Java 4.37% Cuda 0.05% Makefile 16.88% CMake 0.18% M4 0.37% Roff 24.91%

mpix-harmonize's Introduction

mpix-harmonize

Experimental MPI function to synchronize processes in the space and time dimension

Installation

Basic Build

mkdir build
cd build
cmake .
make

Specific Target

If you want to install the library into a specific target folder, use

cmake -DCMAKE_INSTALL_PREFIX=<YOUR_PATH> .

Time Source

The clock synchronization algorithms depend on the time source used internally. By default, libmpix-harmonize will use clock_gettime(CLOCK_MONOTONIC,..) as its source of time. Then, the library uses the timing function that is provided by the MPI library, which may be mapped to different timing function internally.

The libmpix-harmonize library can also be configured to use a specific time function internally. Thus, in order to use clock_gettime() with CLOCK_MONOTONIC configure with

cmake -DMPITS_CLOCK=monotonic .

and to use clock_gettime() with CLOCK_REALTIME configure like this

cmake -DMPITS_CLOCK=realtime .

Usage

A call to MPIX_Harmonize does a couple of things:

  1. It periodically synchronizes the internal clock across all processes. By default, if the last call to MPIX_Harmonize has been more than 1s ago or any process has failed the previous call to MPIX_Harmonize the internal clock synchronization is performed.
  2. All processes select an internal deadline based on the synchronized internal clocks.
  3. All processes wait until that deadline before returning.

If the calling process was able to meet the deadline it will return 1 in flag, otherwise it returns 0. It is important that flag only signals success or failure of the calling process, not all processes in the communicator. Internally synchronizing this flag would again introduce skew among processes. It is left to the application to handle cases in which processes have missed the deadline are returned late.

The library will automatically adjust the slack used to determine the deadline, i.e., upon successful synchronization the slack will be reduced and if a synchronization fails the slack will be increased. Thus, spurious synchronization failures may occur if the slack is chosen too small.

Example

The below example shows a possible use of MPIX_Harmonize: the benchmark performs NUM_ITERATIONS repetitions and in each iteration synchronizes the processes through a call to MPIX_Harmonize. After the call returns, a timestamp is taken, the collective operation under test (MPI_Allreduce in this case) is performed, and a second timestamp is taken. Using MPI_Allreduce, all processes determine whether this experiment was valid, i.e., whether all processes succeeded in the synchronization and no process missed the internal deadline. If that is true, the num_valid counter is incremented and the measured time is added to the accumulated time. After NUM_ITERATIONS the number of valid experiments and the average latency of MPI_Allreduce are printed.

Note that this is a simplified example. Real-world benchmarks may want to adjust NUM_ITERATIONS dynamically and/or combine multiple experiements in a single check, as it's done in the OSU benchmarks included in this repository.

int flag, num_valid = 0;
double t_start, t_stop, t_sum = 0.0;
for (int i = 0; i < NUM_ITERATIONS; ++i) {
  /* synchronize processe */
  MPIX_Harmonize(MPI_COMM_WORLD, &flag);
  t_start = MPI_Wtime();
  MPI_Allreduce(sendbuf, recvbuf, num_elements,
                datatype, MPI_SUM,
                MPI_COMM_WORLD));
  t_stop = MPI_Wtime();
  /* check whether this experiment is valid */
  MPI_Allreduce(MPI_IN_PLACE, &flag, MPI_INT, MPI_LAND, MPI_COMM_WORLD);
  if (flag) { /* the experiment is valid */
    num_valid++;
    t_sum += t_stop - t_start;
  }
}
printf("MPI_Allreduce: %d valid iterations, %f average latency\n", num_valid, t_sum/num_valid);

License

The 3-Clause BSD License

Publication

Joseph Schuchart, Sascha Hunold, and George Bosilca. 2023. Synchronizing MPI Processes in Space and Time. In Proceedings of the 30th European MPI Users' Group Meeting (EuroMPI '23). Association for Computing Machinery, New York, NY, USA, Article 7, 1โ€“11. https://doi.org/10.1145/3615318.3615325

mpix-harmonize's People

Contributors

devreal avatar hunsa avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.