Giter Site home page Giter Site logo

mpigraph's Introduction

mpiGraph

Benchmark to generate network bandwidth images

Build

make

Run

Run one MPI task per node:

SLURM: srun -n <nodes> -N <nodes> ./mpiGraph 1048576 10 10 > mpiGraph.out
Open MPI: mpirun --map-by node -np <nodes> ./mpiGraph 1048576 10 10 > mpiGraph.out

General usage:

mpiGraph <size> <iters> <window>

To compute bandwidth, each task averages the bandwidth from iters iterations. In each iteration, a process sends window number of messages of size bytes to another process while it simultaneously receives an equal number of messages of equal size from another process. The source and destination processes in each step are not necessary the same process.

Watch progress:

tail -f mpiGraph.out

Results

Parse output and create html report:

crunch_mpiGraph mpiGraph.out

View results in a web browser:

firefox file:///path/to/mpiGraph.out_html/index.html

Description

This package consists of an MPI application called "mpiGraph" written in C to measure message bandwidth and an associated "crunch_mpigraph" script written in Perl to parse the application output a generate an HTML report. The mpiGraph application is designed to inspect the health and scalability of a high-performance interconnect while subjecting it to heavy load. This is useful to detect hardware and software problems in a system, such as slow nodes, links, switches, or contention in switch routing. It is also useful to characterize how interconnect performance changes with different settings or how one interconnect type compares to another.

Typically, one MPI task is run per node (or per interconnect link). For a job of N MPI tasks, the N tasks are logically arranged in a ring counting ranks from 0 and increasing to the right with the end wrapping back to rank 0. Then a series of N-1 steps are executed. In each step, each MPI task sends to the task D units to the right and simultaneously receives from the task D units to the left. The value of D starts at 1 and runs to N-1, so that by the end of the N-1 steps, each task has sent to and received from every other task in the run, excluding itself. At the end of the run, two NxN matrices of bandwidths are gathered and written to stdout -- one for send bandwidths and one for receive bandwidths.

The crunch_mpiGraph script is then run on this output to generate a report. It includes a pair of bitmap images representing bandwidth values between different task pairings. Pixels in this image are colored depending on relative bandwidth values. The maximum bandwidth value is set to pure white (value 255) and other values are scaled to black (0) depending on their percentage of the maximum. One can then visually inspect and identify anomalous behavior in the system. One may zoom in and inspect image features in more detail by hovering the mouse cursor over the image. Javascript embedded in the HTML report opens a pop-up tooltip with a zoomed-in view of the cursor location.

References

Contention-free Routing for Shift-based Communication in MPI Applications on Large-scale Infiniband Clusters, Adam Moody, LLNL-TR-418522, Oct 2009

mpigraph's People

Contributors

adammoody avatar onewayforever avatar

Stargazers

Jonathan Ato Markin avatar Zijing avatar  avatar jasonwho avatar Yang Wenzhuo avatar Chen Qin avatar Derek Ryan Strong avatar  avatar Mohamed Wahib avatar Phil Chiu avatar Hang Yan avatar Jon Bernard avatar Charlotte Woodrow avatar  avatar  avatar Samuel K. Gutiérrez avatar Giulia Guidi avatar Raghu Raja avatar Tim Helfensdörfer avatar Jan Eitzinger avatar  avatar  avatar Nic McDonald avatar Millad avatar Sylvain Didelot avatar

Watchers

Ian Lee avatar James Cloos avatar Dong H. Ahn avatar  avatar Jean-Yves VET avatar Katrina Trujillo avatar  avatar

mpigraph's Issues

Found Bug: Passing false request_array address

191 if (!flag_sends) {
192 MPI_Testall((k+1)/2, &request_array[(k+1)/2-1], &flag_sends, &status_array[(k+1)/2-1]);
193 if (flag_sends) { TIME_END_SEND; }
194 }

should be
192 MPI_Testall((k+1)/2, &request_array[(k+1)/2], &flag_sends, &status_array[(k+1)/2]);

problem node name pattern in crunch_mpiGraph?

Hello,

I have tested mpiGraph on a cluster that has the node names in this pattern
ab1-1026.bullx

When I run crunch on the mpiGraph output the following error is printed:
Use of uninitialized value in numeric comparison (<=>) at ./crunch_mpiGraph line 459

and the node names in *_html/map.txt

look as follow

Rank    Node
0       ab1
1       ab3
2       ab3
3       ab2
4       ab1
5       ab3
6       ab5
7       ab4
...

which is obviously wrong.
The error propagates in the inspection window of the plot.

A fix of this issue with be very useful.

Kind regards,

Lucian Anton

Bug: Can't locate hostlist_lite.pm

I'm getting the below error while using crunch_mpiGraph on generated output.

crunch_mpiGraph mpiGraph.out
Can't locate hostlist_lite.pm in @INC (you may need to install the hostlist_lite module) (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at /home/sameer/mpiGraph/crunch_mpiGraph line 64.
BEGIN failed--compilation aborted at /home/sameer/mpiGraph/crunch_mpiGraph line 64.

Do we need to add any path variable to use in Perl?
At least Readme doesn't reflect it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.