Comments (7)
FYI, at the moment, the PMIx calls are done in the __kmp_register_library_startup() function, which is the best location I found that would vaguely be like a init function. So once we figure out where the places are created, we need to figure out if the __kmp_register_library_startup() function is called before or after. If it is called before, it means that the __kmp_register_library_startup() is not really an early initialization function and therefore we will have to revisit the code.
from libomp.
Found that we can enable func entry/exit traces (KA_TRACE()
) by setting env var KMP_A_DEBUG=<level>
when using a debug build of clang/openmp compiler. This may be useful for determining if we have defined placement info in proper phase.
export KMP_A_DEBUG=100
orterun -n 1 ./mpi_omp_hello.dbg
from libomp.
Quick update...
I realized we couldn't set the value of a generic key (e.g., moc.myrank
) in multiple places without, which makes since after you think about it. So I needed to be able to pass the rank, but needed a way to get something unique. Then I realized we can get the rank from the results of PMIx_Init(myproc, ....)
via the myproc.rank
. We already call PMIx_Init()
from within the startup code for the OMP-RT.
2017.12.19_test-output.txt
So I modified the libomp kmp_runtime.cpp
to do just this and calculate the places for each process based on their MPI rank. Currently I just print out the places with the associated cores from each rank. This is the information needed to setup the OMP place details. I'm currently working on getting that in place and then we should be able to dynamically define an even split of the cores for each MPI rank on a node.
Here's an example of the output from the test with the relevant pieces highlighted (cleaned up) for easier viewing (full logfile is attached).
beaker:$ orterun -n 4 ./mpi_omp_hello.dbg
...<snip>...
[moc.c:MOC_Init:29] PID: 169376 PMIx_rank: 2
[moc.c:MOC_Init:29] PID: 169377 PMIx_rank: 3
[moc.c:MOC_Init:29] PID: 169375 PMIx_rank: 1
[moc.c:MOC_Init:29] PID: 169374 PMIx_rank: 0
[moc.c:MOC_Init:76] numCPUS: 12
[moc.c:MOC_Init:76] numCPUS: 12
[moc.c:MOC_Init:76] numCPUS: 12
[moc.c:MOC_Init:76] numCPUS: 12
...<snip>...
[kmp_runtime.cpp:__kmp_register_library_startup:6444] 4 job procs are running on the node
[kmp_runtime.cpp:__kmp_register_library_startup:6444] 4 job procs are running on the node
[kmp_runtime.cpp:__kmp_register_library_startup:6444] 4 job procs are running on the node
[kmp_runtime.cpp:__kmp_register_library_startup:6444] 4 job procs are running on the node
...<snip>...
[kmp_runtime.cpp:__kmp_register_library_startup:6636] OMP-RT (pid:169374) Rank: 0 places: {0}, {1}, {2}
[kmp_runtime.cpp:__kmp_register_library_startup:6636] OMP-RT (pid:169376) Rank: 2 places: {6}, {7}, {8}
[kmp_runtime.cpp:__kmp_register_library_startup:6636] OMP-RT (pid:169375) Rank: 1 places: {3}, {4}, {5}
[kmp_runtime.cpp:__kmp_register_library_startup:6636] OMP-RT (pid:169377) Rank: 3 places: {9}, {10}, {11}
...<snip>...
Here's the place prints for a 2 rank case on same machine.
beaker:$ orterun -n 2 ./mpi_omp_hello.dbg
...<snip>...
[kmp_runtime.cpp:__kmp_register_library_startup:6636] OMP-RT (pid:169471) Rank: 0 places: {0}, {1}, {2}, {3}, {4}, {5}
[kmp_runtime.cpp:__kmp_register_library_startup:6636] OMP-RT (pid:169472) Rank: 1 places: {6}, {7}, {8}, {9}, {10}, {11}
...<snip>...
from libomp.
I've added initial changes (#3) for libomp to setup an OMP_PLACES that is defined as mentioned above. I also added a changes in the MOC repo to help display places info (See MOC repo PR-2 and PR-3).
For the ticket trails, here's the output for current tests (using the script to pretty print the output from logs).
beaker:$ pwd
/home/tjn/projects/ompi-ecp/source/moc-git-br-master/test
beaker:$ make -f Makefile.devel
clang -I/home/tjn/projects/ompi-ecp/source/llvm/release_50/install/debug/include \
-I/home/tjn/projects/ompi-ecp/install/include \
-L/home/tjn/projects/ompi-ecp/source/llvm/release_50/install/debug/lib \
-L/home/tjn/projects/ompi-ecp/install/lib \
-I/home/tjn/projects/ompi-ecp/install/include/openmpi -I/home/tjn/projects/ompi-ecp/install/include/openmpi/opal/mca/hwloc/hwloc2a/hwloc/include -I/home/tjn/projects/ompi-ecp/install/include -I/home/tjn/projects/ompi-ecp/install/include -pthread -fopenmp -g -pthread -L/home/tjn/projects/ompi-ecp/install/lib -Wl,-rpath -Wl,/home/tjn/projects/ompi-ecp/install/lib -Wl,-rpath -Wl,/home/tjn/projects/ompi-ecp/install/lib -Wl,--enable-new-dtags -L/home/tjn/projects/ompi-ecp/install/lib -lmpi \
-o mpi_omp_hello.dbg mpi_omp_hello.c -lpmix -lmoc
beaker:$ orterun -n 2 ./mpi_omp_hello.dbg >& 2rank
beaker:$ orterun -n 4 ./mpi_omp_hello.dbg >& 4rank
beaker:$ orterun -n 1 ./mpi_omp_hello.dbg >& 1rank
beaker:$ orterun -n 6 ./mpi_omp_hello.dbg >& 6rank
beaker:$ cat SHOW-RESULTS.sh
#!/bin/bash
FILE=${1:x}
if [ "x$FILE" = "x" ] ; then
echo "Usage: $0 LOG_FILENAME"
exit 1
fi
grep PROC_BIND $FILE
echo "=========================================="
grep "Number of threads" $FILE |sort -k1
echo "=========================================="
grep OMP-RT $FILE | grep -v OVERRIDE | sort -k5
echo "=========================================="
grep " \[RANK:" $FILE | sort -k1
beaker:$ ./SHOW-RESULTS.sh 1rank
[kmp_runtime.cpp:__kmp_partition_places:4763: 77472] DBG: __kmp_partition_places() BINDING = PROC_BIND_SPREAD
==========================================
[RANK:0,TID:0] Number of threads = 11
==========================================
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77472) Rank: 0 places: {0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9}, {10}, {11}
==========================================
[RANK:0,TID:0,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:0,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:0,TID:10,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:10,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:0,TID:1,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:1,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:0,TID:2,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:2,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:0,TID:3,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:3,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:0,TID:4,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:4,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:0,TID:5,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:5,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:0,TID:6,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:6,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:0,TID:7,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:7,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:0,TID:8,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:8,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:0,TID:9,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:9,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 6
beaker:$ ./SHOW-RESULTS.sh 2rank
[kmp_runtime.cpp:__kmp_partition_places:4763: 77429] DBG: __kmp_partition_places() BINDING = PROC_BIND_SPREAD
[kmp_runtime.cpp:__kmp_partition_places:4763: 77430] DBG: __kmp_partition_places() BINDING = PROC_BIND_SPREAD
==========================================
[RANK:0,TID:0] Number of threads = 5
[RANK:1,TID:0] Number of threads = 5
==========================================
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77429) Rank: 0 places: {0}, {1}, {2}, {3}, {4}, {5}
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77430) Rank: 1 places: {6}, {7}, {8}, {9}, {10}, {11}
==========================================
[RANK:0,TID:0,PLACE:0] num_places = 1, places_num_procs = 1, places_procids = 0
[RANK:0,TID:1,PLACE:0] num_places = 1, places_num_procs = 1, places_procids = 0
[RANK:0,TID:2,PLACE:0] num_places = 1, places_num_procs = 1, places_procids = 0
[RANK:0,TID:3,PLACE:0] num_places = 1, places_num_procs = 1, places_procids = 0
[RANK:0,TID:4,PLACE:0] num_places = 1, places_num_procs = 1, places_procids = 0
[RANK:1,TID:0,PLACE:0] num_places = 1, places_num_procs = 1, places_procids = 7
[RANK:1,TID:1,PLACE:0] num_places = 1, places_num_procs = 1, places_procids = 7
[RANK:1,TID:2,PLACE:0] num_places = 1, places_num_procs = 1, places_procids = 7
[RANK:1,TID:3,PLACE:0] num_places = 1, places_num_procs = 1, places_procids = 7
[RANK:1,TID:4,PLACE:0] num_places = 1, places_num_procs = 1, places_procids = 7
beaker:$ ./SHOW-RESULTS.sh 4rank
[kmp_runtime.cpp:__kmp_partition_places:4763: 77451] DBG: __kmp_partition_places() BINDING = PROC_BIND_SPREAD
[kmp_runtime.cpp:__kmp_partition_places:4763: 77450] DBG: __kmp_partition_places() BINDING = PROC_BIND_SPREAD
[kmp_runtime.cpp:__kmp_partition_places:4763: 77448] DBG: __kmp_partition_places() BINDING = PROC_BIND_SPREAD
[kmp_runtime.cpp:__kmp_partition_places:4763: 77449] DBG: __kmp_partition_places() BINDING = PROC_BIND_SPREAD
==========================================
[RANK:0,TID:0] Number of threads = 2
[RANK:1,TID:0] Number of threads = 2
[RANK:2,TID:0] Number of threads = 2
[RANK:3,TID:0] Number of threads = 2
==========================================
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77448) Rank: 0 places: {0}, {1}, {2}
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77449) Rank: 1 places: {3}, {4}, {5}
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77450) Rank: 2 places: {6}, {7}, {8}
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77451) Rank: 3 places: {9}, {10}, {11}
==========================================
[RANK:0,TID:0,PLACE:0] num_places = 3, places_num_procs = 1, places_procids = 0
[RANK:0,TID:0,PLACE:1] num_places = 3, places_num_procs = 1, places_procids = 1
[RANK:0,TID:0,PLACE:2] num_places = 3, places_num_procs = 1, places_procids = 2
[RANK:0,TID:1,PLACE:0] num_places = 3, places_num_procs = 1, places_procids = 0
[RANK:0,TID:1,PLACE:1] num_places = 3, places_num_procs = 1, places_procids = 1
[RANK:0,TID:1,PLACE:2] num_places = 3, places_num_procs = 1, places_procids = 2
[RANK:1,TID:0,PLACE:0] num_places = 3, places_num_procs = 1, places_procids = 3
[RANK:1,TID:0,PLACE:1] num_places = 3, places_num_procs = 1, places_procids = 4
[RANK:1,TID:0,PLACE:2] num_places = 3, places_num_procs = 1, places_procids = 5
[RANK:1,TID:1,PLACE:0] num_places = 3, places_num_procs = 1, places_procids = 3
[RANK:1,TID:1,PLACE:1] num_places = 3, places_num_procs = 1, places_procids = 4
[RANK:1,TID:1,PLACE:2] num_places = 3, places_num_procs = 1, places_procids = 5
[RANK:2,TID:0,PLACE:0] num_places = 3, places_num_procs = 1, places_procids = 6
[RANK:2,TID:0,PLACE:1] num_places = 3, places_num_procs = 1, places_procids = 7
[RANK:2,TID:0,PLACE:2] num_places = 3, places_num_procs = 1, places_procids = 8
[RANK:2,TID:1,PLACE:0] num_places = 3, places_num_procs = 1, places_procids = 6
[RANK:2,TID:1,PLACE:1] num_places = 3, places_num_procs = 1, places_procids = 7
[RANK:2,TID:1,PLACE:2] num_places = 3, places_num_procs = 1, places_procids = 8
[RANK:3,TID:0,PLACE:0] num_places = 3, places_num_procs = 1, places_procids = 9
[RANK:3,TID:0,PLACE:1] num_places = 3, places_num_procs = 1, places_procids = 10
[RANK:3,TID:0,PLACE:2] num_places = 3, places_num_procs = 1, places_procids = 11
[RANK:3,TID:1,PLACE:0] num_places = 3, places_num_procs = 1, places_procids = 9
[RANK:3,TID:1,PLACE:1] num_places = 3, places_num_procs = 1, places_procids = 10
[RANK:3,TID:1,PLACE:2] num_places = 3, places_num_procs = 1, places_procids = 11
beaker:$ ./SHOW-RESULTS.sh 6rank
==========================================
[RANK:0,TID:0] Number of threads = 1
[RANK:1,TID:0] Number of threads = 1
[RANK:2,TID:0] Number of threads = 1
[RANK:3,TID:0] Number of threads = 1
[RANK:4,TID:0] Number of threads = 1
[RANK:5,TID:0] Number of threads = 1
==========================================
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77490) Rank: 0 places: {0}, {1}
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77491) Rank: 1 places: {2}, {3}
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77492) Rank: 2 places: {4}, {5}
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77493) Rank: 3 places: {6}, {7}
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77494) Rank: 4 places: {8}, {9}
[kmp_runtime.cpp:__kmp_register_library_startup:6638] OMP-RT (pid:77495) Rank: 5 places: {10}, {11}
==========================================
[RANK:0,TID:0,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 0
[RANK:0,TID:0,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 1
[RANK:1,TID:0,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 2
[RANK:1,TID:0,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 3
[RANK:2,TID:0,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 4
[RANK:2,TID:0,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 5
[RANK:3,TID:0,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 6
[RANK:3,TID:0,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 7
[RANK:4,TID:0,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 8
[RANK:4,TID:0,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 9
[RANK:5,TID:0,PLACE:0] num_places = 2, places_num_procs = 1, places_procids = 10
[RANK:5,TID:0,PLACE:1] num_places = 2, places_num_procs = 1, places_procids = 11
beaker:$
from libomp.
This needs more testing and not sure if the places/binding are as we want. But it is a start. :-)
from libomp.
So does it mean that __kmp_partition_places() is not a suitable place to set the number of places? Also, is it certain that by setting the OMP_PLACES at the proposed location, the team and the master_th structures will be correctly set so that the appropriate work will happen in __kmp_partition_places()? Maybe I am missing something but at the moment, we only know for sure that the OMP_PLACES env variable is set based on what we are trying to do, not that the actual partitioning and place creation is correct.
In other words, what is the reason for not looking at __kmp_partition_places()?
from libomp.
This was just a first step, setting OMP_PLACES
. I think the info I have in startup will be used in __kmp_partition_places()
for a better next step. I didn't fully understand the relationship between binding for the placement pieces in __kmp_partition_places()
so I broke into two steps.
from libomp.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libomp.