Giter Site home page Giter Site logo

Comments (6)

Icarusradio avatar Icarusradio commented on September 28, 2024

There is a maintained version of Metis and ParMetis. Maybe you can try them out.

from normalmodes.

jsquyres avatar jsquyres commented on September 28, 2024

Looks like they have the same code problem.

I'd be happy to report the issue there, but there does not appear to be a way to open an issue on that BitBucket repository...?

Screen Shot 2019-10-29 at 9 18 45 PM

from normalmodes.

Icarusradio avatar Icarusradio commented on September 28, 2024

Sorry, I don't know. I thought they might have solved this problem. It's a pity they didn't.

from normalmodes.

jsquyres avatar jsquyres commented on September 28, 2024

I think that this is going to be resolved as a bad user build of the overall stack (i.e., I think there might have been some remnants of a mixed Intel MPI + Open MPI build in there somewhere). Doing a completely clean, reproducible build from scratch seems to have fixed the MPI handle munging issue.

from normalmodes.

jsquyres avatar jsquyres commented on September 28, 2024

With a little more testing to convince myself, I'm re-opening this issue.

Although we were initially incorrect about the reason, the end effect is the same: ParMETIS is getting a Fortran MPI communicator handle, and Open MPI (rightfully) invokes an MPI exception.

The mechanism for how this is happening is a little different than we thought, however.


Specifically, ParMETIS does have Fortran API entry points that invoke the correct MPI "f2c" conversion of handles. However, NormalModes is somehow bypassing those ParMETIS Fortran API entry points and directly invoking the ParMETIS C API entry point. This leads to MPI handles not being converted from Fortran to C properly, and therefore Open MPI (rightfully) invokes an MPI exception (which, by default, aborts the job).

You can see the call stack in gdb, for example (I inserted a call to sleep(3) in the ParMETIS function ParMETIS_V3_PartKway() so that I could attach a debugger and see what was going on):

(gdb) bt
#0  0x00000034916acced in nanosleep () from /lib64/libc.so.6
#1  0x00000034916acb60 in sleep () from /lib64/libc.so.6
#2  0x00000000006e3d58 in ParMETIS_V3_PartKway (vtxdist=0x2aaac6adb600, xadj=0x2aaac6ab7ec0, adjncy=0x2aaac6ae4b80, vwgt=0x2aaac6adb740, adjwgt=0x0, wgtflag=0x7fffffffa900, numflag=0x7fffffffa8f0, ncon=0x7fffffffa920, nparts=0x7fffffffa930, tpwgts=0x2aaac6aa3f00, ubvec=0x2aaac6acbe20, options=0x2d32b20 <geometry_mod_mp_pnm_apply_parmetis_$OPTIONS>, edgecut=0x7fffffffa910, part=0x2aaac6ab7d80, comm=0x7fffffffa940) at /home/jsquyres/apps/parmetis/4.0.3/src/parmetis-4.0.3/libparmetis/kmetis.c:41
#3  0x00000000004c911a in geometry_mod::pnm_apply_parmetis () at mod_geometry.f90:757
#4  0x000000000048ccd9 in geometry_mod::build_geometry () at mod_geometry.f90:67
#5  0x00000000006e3b6b in cg_evsl () at mainnm.f90:31
#6  0x0000000000432482 in main ()
#7  0x000000349161ed1d in __libc_start_main () from /lib64/libc.so.6
#8  0x0000000000432329 in _start ()

You can see that geometry_mod::pnm_apply_parmetis() is directly invoking the ParMETIS C API entry point ParMETIS_V3_PartKway(), instead of going through any of ParMETIS's Fortran API entry points:

$ nm libparmetis.a | grep ParMETIS_V3_PartKway
0000000000000964 T PARMETIS_V3_PARTKWAY
                 U ParMETIS_V3_PartKway
0000000000000a44 T parmetis_v3_partkway
0000000000000b24 T parmetis_v3_partkway_
0000000000000c04 T parmetis_v3_partkway__
0000000000000000 T ParMETIS_V3_PartKway
                 U ParMETIS_V3_PartKway

You can see the usual convention of parmetis_v3_partkway[_[_]] and PARMETIS_V3_PARTKWAY Fortran entry points. These C functions all call the MPI "f2c" conversions before calling the C ParMETIS_V3_PartKway() function.

I'm not enough of a Fortran expert to know why this is happening, but I suspect that NormalModes' use of use ISO_C_BINDING has a role to play here (i.e., it might be bypassing the "usual convention" Fortran symbol munging and directly invoking the back-end symbol instead, which has the effect of invoking ParMETIS' C API entry point rather than its Fortran API entry points).

Even though this is quite definitely wrong, it just happens to work with MPICH for the reasons previously cited: MPICH's MPI handles are integers in both C and Fortran. Open MPI's handles are pointers in C; if a Fortran MPI handle is passed to an Open MPI C function, it will (rightfully) go "kaboom".

You can argue who is at fault here (ParMETIS or NormalModes), but ParMETIS looks like it is adondonware, so the chances of something being fixed in it are pretty low.

Regardless, until something is changed, NormalModes will not work with any version of Open MPI because of this issue.

from normalmodes.

jsquyres avatar jsquyres commented on September 28, 2024

PR #4 opened to address this issue.

from normalmodes.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.