Comments (6)
There is a maintained version of Metis and ParMetis. Maybe you can try them out.
from normalmodes.
Looks like they have the same code problem.
I'd be happy to report the issue there, but there does not appear to be a way to open an issue on that BitBucket repository...?
from normalmodes.
Sorry, I don't know. I thought they might have solved this problem. It's a pity they didn't.
from normalmodes.
I think that this is going to be resolved as a bad user build of the overall stack (i.e., I think there might have been some remnants of a mixed Intel MPI + Open MPI build in there somewhere). Doing a completely clean, reproducible build from scratch seems to have fixed the MPI handle munging issue.
from normalmodes.
With a little more testing to convince myself, I'm re-opening this issue.
Although we were initially incorrect about the reason, the end effect is the same: ParMETIS is getting a Fortran MPI communicator handle, and Open MPI (rightfully) invokes an MPI exception.
The mechanism for how this is happening is a little different than we thought, however.
Specifically, ParMETIS does have Fortran API entry points that invoke the correct MPI "f2c" conversion of handles. However, NormalModes is somehow bypassing those ParMETIS Fortran API entry points and directly invoking the ParMETIS C API entry point. This leads to MPI handles not being converted from Fortran to C properly, and therefore Open MPI (rightfully) invokes an MPI exception (which, by default, aborts the job).
You can see the call stack in gdb, for example (I inserted a call to sleep(3)
in the ParMETIS function ParMETIS_V3_PartKway()
so that I could attach a debugger and see what was going on):
(gdb) bt
#0 0x00000034916acced in nanosleep () from /lib64/libc.so.6
#1 0x00000034916acb60 in sleep () from /lib64/libc.so.6
#2 0x00000000006e3d58 in ParMETIS_V3_PartKway (vtxdist=0x2aaac6adb600, xadj=0x2aaac6ab7ec0, adjncy=0x2aaac6ae4b80, vwgt=0x2aaac6adb740, adjwgt=0x0, wgtflag=0x7fffffffa900, numflag=0x7fffffffa8f0, ncon=0x7fffffffa920, nparts=0x7fffffffa930, tpwgts=0x2aaac6aa3f00, ubvec=0x2aaac6acbe20, options=0x2d32b20 <geometry_mod_mp_pnm_apply_parmetis_$OPTIONS>, edgecut=0x7fffffffa910, part=0x2aaac6ab7d80, comm=0x7fffffffa940) at /home/jsquyres/apps/parmetis/4.0.3/src/parmetis-4.0.3/libparmetis/kmetis.c:41
#3 0x00000000004c911a in geometry_mod::pnm_apply_parmetis () at mod_geometry.f90:757
#4 0x000000000048ccd9 in geometry_mod::build_geometry () at mod_geometry.f90:67
#5 0x00000000006e3b6b in cg_evsl () at mainnm.f90:31
#6 0x0000000000432482 in main ()
#7 0x000000349161ed1d in __libc_start_main () from /lib64/libc.so.6
#8 0x0000000000432329 in _start ()
You can see that geometry_mod::pnm_apply_parmetis()
is directly invoking the ParMETIS C API entry point ParMETIS_V3_PartKway()
, instead of going through any of ParMETIS's Fortran API entry points:
$ nm libparmetis.a | grep ParMETIS_V3_PartKway
0000000000000964 T PARMETIS_V3_PARTKWAY
U ParMETIS_V3_PartKway
0000000000000a44 T parmetis_v3_partkway
0000000000000b24 T parmetis_v3_partkway_
0000000000000c04 T parmetis_v3_partkway__
0000000000000000 T ParMETIS_V3_PartKway
U ParMETIS_V3_PartKway
You can see the usual convention of parmetis_v3_partkway[_[_]]
and PARMETIS_V3_PARTKWAY
Fortran entry points. These C functions all call the MPI "f2c" conversions before calling the C ParMETIS_V3_PartKway()
function.
I'm not enough of a Fortran expert to know why this is happening, but I suspect that NormalModes' use of use ISO_C_BINDING
has a role to play here (i.e., it might be bypassing the "usual convention" Fortran symbol munging and directly invoking the back-end symbol instead, which has the effect of invoking ParMETIS' C API entry point rather than its Fortran API entry points).
Even though this is quite definitely wrong, it just happens to work with MPICH for the reasons previously cited: MPICH's MPI handles are integers in both C and Fortran. Open MPI's handles are pointers in C; if a Fortran MPI handle is passed to an Open MPI C function, it will (rightfully) go "kaboom".
You can argue who is at fault here (ParMETIS or NormalModes), but ParMETIS looks like it is adondonware, so the chances of something being fixed in it are pretty low.
Regardless, until something is changed, NormalModes will not work with any version of Open MPI because of this issue.
from normalmodes.
PR #4 opened to address this issue.
from normalmodes.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from normalmodes.