Giter Site home page Giter Site logo

ofi-bgq-buildenv's People

Contributors

mblockso avatar pkcoff avatar roblatham00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ofi-bgq-buildenv's Issues

compiler optimization flags

Here are some flags to consider for the "optimized" builds:

OPT_CFLAGS="-DNVALGRIND -falign-functions -finline-limit=2147483647 --param inline-unit-growth=300 --param ipcp-unit-growth=300 --param large-function-insns=500000000 --param large-function-growth=5000000000 --param large-stack-frame-growth=5000000 --param max-inline-insns-single=2147483647 --param max-inline-insns-auto=2147483647"

#-ggdb -O3 -DNVALGRIND -DNDEBUG -falign-functions -finline-limit=2147483647 -msse2 -msse4.2 -mcrc32 -mavx2 -mtune=generic -flto=32 -flto-partition=balanced --param inline-unit-growth=300 --param ipcp-unit-growth=300 --param large-function-insns=500000000 --param large-function-growth=5000000000 --param large-stack-frame-growth=5000000 --param max-inline-insns-single=2147483647 --param max-inline-insns-auto=2147483647 --param inline-min-speedup=0

Rec Fifo and Completion Queue processing rebalancing

While investigating an issue with FI_MULTI_RECV and how mpich is utilizing it for active message processing it became apparent via the manual progress engine that rfifo processing was able to get way ahead of the completion queue processing, resulting in higher memory consumption on the target nodes in the case of mpich one-sided communication, which becomes a bigger problem at exascale where floods of incoming active messages could become quite a memory hit and result in odd message processing scenarios, plus in the interest of overlapping comm/comp we want to complete messages in a timely fashion after we receive them. So what we want instead in the case of a flood of incoming messages is for the network to modulate the incoming messages and back them up to the origins, so if we rebalance so to speak so we are completing more often relative to receiving the messages the originating processes will essentially wait on the targets to keep up with the completions, so the memory hit will be less. There is currently an attempt at a 64-packet governor for the rec fifo but this is inadequate, a specific description of an example with code flow is as follows:

Application is waiting on a request to complete. The MPI_Wait (or test or poke or whatever) eventually loops calling fi_cq_read() until the request completes. Depending on what type of request this is it could be looping for some time.

In https://github.com/pmodels/mpich/blob/master/src/mpid/ch4/netmod/ofi/ofi_am_impl.h#L155 only one cq entry can be retrieved each time the fi_cq_read() is invoked.

In https://github.com/pmodels/mpich/blob/master/src/mpid/ch4/netmod/ofi/ofi_progress.h#L35 (the regular progress loop) only MPIDI_OFI_NUM_CQ_ENTRIES (which is 8) can be retrieved each time the fi_cq_read() is invoked.

The very first thing we do in the fi_cq_read() implementation is to invoke the progress loop on all endpoints that are bound to this completion queue (https://github.com/ofiwg/libfabric/blob/master/prov/bgq/include/rdma/fi_direct_eq.h#L451). Any kind of multirecv flood scenario will quickly result in all multirecv buffers filling up, then all incoming packets will be added to unexpected queue.

For our scenario where each multirecv buffer can store 112 lock/unlock requests and we can pull 64 packets off the torus each time we poll the reception fifo, it will look like this:

            poll adds 64 cq entries (cq size is now 64)
            8 cq entries are copied into application (cq size is now 56)
            poll adds 65 cq entries (cq size is now 121, recycle entry is #105)
                            48 entries are from first multirecv buffer, then this multirecv buffer is full and the “recycle” cq entry is enqueued
                            16 entries are from second multirecv buffer
            8 cq entries are copied into application (cq size is now 113, recycle entry is #97)
            poll adds 64 cq entries (cq size is now 177, recycle entry is #97)
            8 cq entries are copied into application (cq size is now 169, recycle entry is #89)
            poll adds 65 cq entries into application (cq size is now 234, recycle entries are #89, #202)
                            32 entries are from second multirecv buffer, then this multirecv buffer is full and the “recycle” cq entry is enqueued
                            32 entries are from third multirecv buffer
            8 cq entries are copied into application (cq size is now 226, recycle entries are #81, #194)
            poll adds 64 cq entries (cq size is now 290, recycle entries are #81, #194)
            8 cq entries are copied into application (cq size is now 282, recycle entries are #73, #186)
            poll adds 65 sq entries (cq size is now 347, recycle entries are #73, #186, #307)
                            16 entries are from third multirecv buffer, then this multirecv buffer is full and the “recycle” cq entry is enqueued
                            48 entries are from fourth multirecv buffer
            8 cq entries are copied into application (cq size is now 339, recycle entries are #65, #178, #299)
            poll adds 65 cq entries (cq size is now 404, recycle entries are #65, #178, #299, #364)
                            64 entries are from fourth multirecv buffer, then this multirecv buffer is full and the “recycle” cq entry is enqueued

            ** at this point four multirecv buffers are full (but not yet reposted) and the fifth multirecv buffer has completely empty (weird, huh?) **

The first recycle entry started at #105 and is now at #65 .. it moved up 40 spots. If we consume another four multirecv buffers (5,6,7, and 8) then that first recycle cq entry will still not be processed - it is sitting in the queue at #25.

So now we don’t have any multirecv buffers posted. The next poll will take 64 packets off the torus and add them to the unexpected queue. Other providers may abort at this point or drop packets or do something else equally catastrophic.

A potential solution is:

This will take a big code reorg of fi_bgq_cq_poll() and fi_bgq_cq_poll_inline() - probably put the two back together again.

here’s the new code flow:

  1.   check error queue, if not NULL return appropriate error
    
  2.   check pending queue, move completed operations to the completed queue
    
  3.   check completed queue, fill application buffer
    
  4.   if application buffer still has space
    

a. for all endpoints bound to this completion queue, poll_rfifo <---------------------------
b. check completed queue, fill application buffer

Issue while compiling (with gnu 4.4 toolchain)

Hi Paul,

I am following provided instructions and somehow getting following error :

make[3]: Entering directory `/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/build/src/mpid/ch4/netmod/ofi/libfabric'
  CC       prov/bgq/src/src_libfabric_la-fi_bgq_atomic.lo
In file included from /gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/fi_direct_eq.h:43,
                 from /gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/fi_direct_endpoint.h:45,
                 from /gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq.h:55,
                 from /gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/src/fi_bgq_atomic.c:32:
/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq_mu.h: In function ‘fi_bgq_addr_create’:
/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq_mu.h:236: error: unknown field ‘uid’ specified in initializer
/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq_mu.h:236: warning: braces around scalar initializer
/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq_mu.h:236: warning: (near initialization for ‘tmp.fi’)
/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq_mu.h:236: error: unknown field ‘unused_0’ specified in initializer
/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq_mu.h:236: warning: excess elements in union initializer
/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq_mu.h:236: warning: (near initialization for ‘tmp’)
/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq_mu.h:236: error: unknown field ‘fifo_map’ specified in initializer
/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq_mu.h:236: warning: excess elements in union initializer
/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/OFI-BGQ-BuildEnv/mpi/mpich/src/mpid/ch4/netmod/ofi/libfabric/prov/bgq/include/rdma/bgq/fi_bgq_mu.h:236: warning: (near initialization for ‘tmp’)
make[3]: *** [prov/bgq/src/src_libfabric_la-fi_bgq_atomic.lo] Error 1
make[3]: Leaving directory `/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/build/src/mpid/ch4/netmod/ofi/libfabric'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/build/src/mpid/ch4/netmod/ofi/libfabric'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/sources/progmodels/mpich_ofi_home/build'
make: *** [all] Error 2

I don't see this mentioned in README. It seems like I am mixing incompatible versions (?) but not sure which one. The bg-q system we have shows :

mpicc -show
/opt/ibmcmp/vacpp/bg/12.1/bin/bgxlc_r -I/bgsys/drivers/V1R2M4/ppc64/comm/include -I/bgsys/drivers/V1R2M4/ppc64/comm/lib/xl -I/bgsys/drivers/V1R2M4/ppc64 -I/bgsys/drivers/V1R2M4/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M4/ppc64/spi/include -I/bgsys/drivers/V1R2M4/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M4/ppc64/comm/lib -L/bgsys/drivers/V1R2M4/ppc64/comm/lib -L/bgsys/drivers/V1R2M4/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M4/ppc64/comm/lib -L/bgsys/drivers/V1R2M4/ppc64/spi/lib -L/bgsys/drivers/V1R2M4/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M4/ppc64/spi/lib -L/bgsys/drivers/V1R2M4/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M4/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M4/ppc64/comm/lib -L/bgsys/drivers/V1R2M4/ppc64/spi/lib -I/bgsys/drivers/V1R2M4/ppc64/comm/include -L/bgsys/drivers/V1R2M4/ppc64/comm/lib -lmpich-xl -lopa-xl -lmpl-xl -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread

# and

/bgsys/drivers/ppcfloor/comm/bin/gcc/mpicc -show
/bgsys/drivers/V1R2M4/ppc64/gnu-linux/bin/powerpc64-bgq-linux-gcc -I/bgsys/drivers/V1R2M4/ppc64/comm/include -I/bgsys/drivers/V1R2M4/ppc64/comm/lib/gnu -I/bgsys/drivers/V1R2M4/ppc64 -I/bgsys/drivers/V1R2M4/ppc64/comm/sys/include -I/bgsys/drivers/V1R2M4/ppc64/spi/include -I/bgsys/drivers/V1R2M4/ppc64/spi/include/kernel/cnk -L/bgsys/drivers/V1R2M4/ppc64/comm/lib -L/bgsys/drivers/V1R2M4/ppc64/comm/lib -L/bgsys/drivers/V1R2M4/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M4/ppc64/comm/lib -L/bgsys/drivers/V1R2M4/ppc64/spi/lib -L/bgsys/drivers/V1R2M4/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M4/ppc64/spi/lib -L/bgsys/drivers/V1R2M4/ppc64/comm/sys/lib -L/bgsys/drivers/V1R2M4/ppc64/comm/lib64 -L/bgsys/drivers/V1R2M4/ppc64/comm/lib -L/bgsys/drivers/V1R2M4/ppc64/spi/lib -I/bgsys/drivers/V1R2M4/ppc64/comm/include -L/bgsys/drivers/V1R2M4/ppc64/comm/lib -lmpich-gcc -lopa-gcc -lmpl-gcc -lpami-gcc -lSPI -lSPI_cnk -lrt -lpthread -lstdc++ -lpthread

/bgsys/drivers/V1R2M4/ppc64/gnu-linux/bin/powerpc64-bgq-linux-gcc --version
powerpc64-bgq-linux-gcc (BGQ-V1R2M4-160823) 4.4.7

We don't have gnu4.7 toolchain but gnu4.4. I don't see source files for the bgq driver in /bgsys/source on our system and hence I downloaded bgq-V1R2M4.tar.gz from here. The entire compilation script is :

#!/bin/bash

set -e
set -x

export AUTOTOOLS_HOME=/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/install/common/autoconf
export PATH=$AUTOTOOLS_HOME/bin/:$PATH

BUILD_DIR="`pwd`/mpich_ofi_home"
INSTALL_DIR=/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/lugbgq/softwares/install/progmodels/mpich_33

mkdir -p $BUILD_DIR
cd  $BUILD_DIR

git clone https://github.com/pkcoff/OFI-BGQ-BuildEnv.git
cd OFI-BGQ-BuildEnv
git checkout Release-0.2

cd ofi
git clone https://github.com/ofiwg/libfabric.git

cd ../mpi
git clone https://github.com/pmodels/mpich.git
cd mpich/src/mpid/ch4/netmod/ofi && ln -s -f `cd ../../../../../../../ofi/libfabric && pwd`

sed -i 's#bgq_driver=/bgsys/drivers/ppcfloor#bgq_driver=/gpfs/bbp.cscs.ch/home/kumbhar/tmp/bgq-V1R2M4#g' $BUILD_DIR/OFI-BGQ-BuildEnv/ofi/libfabric/prov/bgq/configure.m4

cd $BUILD_DIR/OFI-BGQ-BuildEnv

./autogen.sh --with-autotools=$AUTOTOOLS_HOME/bin
./configure

cd ofi/libfabric
./autogen.sh --with-autotools=$AUTOTOOLS_HOME/bin

cd ../../mpi/mpich
./autogen.sh --with-autotools=$AUTOTOOLS_HOME/bin

sed -i '$ d' $BUILD_DIR/OFI-BGQ-BuildEnv/mpi/simple_configure
sed -i '$ d' $BUILD_DIR/OFI-BGQ-BuildEnv/mpi/simple_configure
echo "--enable-handle-allocation=default \\" >> $BUILD_DIR/OFI-BGQ-BuildEnv/mpi/simple_configure
echo "--with-bgq-src=/gpfs/bbp.cscs.ch/home/kumbhar/tmp/bgq-V1R2M4" >> $BUILD_DIR/OFI-BGQ-BuildEnv/mpi/simple_configure

cd $BUILD_DIR
mkdir -p build && cd build
CONFIG_TOOLCHAIN=gnu CONFIG_FLAVOR=debug $BUILD_DIR/OFI-BGQ-BuildEnv/mpi/simple_configure
make -j VERBOSE=1

What am I missing here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.