Giter Site home page Giter Site logo

Comments (8)

rhc54 avatar rhc54 commented on June 6, 2024 1

My apologies - I wasn't entirely complete in my answers. I went back and looked at the configure code as I was a little bothered by the --with-pbs option being required. It isn't quite that simple. Because we don't link to a library to obtain the allocation, the resource discovery component automatically builds unless the user requests that it not be built. Specifically:

  • if configured --without-pbs or --with-pbs=no, then the PBS resource discovery component will not be built
  • if configured --with-pbs, or --with-pbs=yes, or if no option was specified, then the component will be built if we are on a Linux, AIX, or OSX system. AFAIK, those are the only environments supported by PBS

The TM launch component requires a library, and so we have to find it. Thus, it's configure logic is more complicated:

  • if configured --without-tm or --with-tm=no, then the TM launch component will not be built
  • if configured --with-tm, or --with-tm=yes, then we will search the default locations for the TM library - the launch component will be built if the library is found. The configure procedure will fail if the library is not found since support was specifically requested
  • if no option is provided, then we will search the default locations for the TM library - the launch component will be built if the library is found. Configure will continue and the component will not be built if the library is not found
  • if configured --with-tm=<foo>, then we will search only the specified location - if the library is found there, then the component will be built, otherwise configure will fail

Note that we also deal with the following change that occurred:

    # Note that Torque 2.1.0 changed the name of their back-end
    # library to "libtorque".  So we have to check for both libpbs and
    # libtorque.

Sorry this sounds complex - it is an attempt to build PBS scheduler and Torque launch support wherever possible.

Please note that I explicitly turned over maintenance of the RM-specific components in PRRTE to the respective RM companies a couple of years ago as I attempt to retire (your rep was at the meetings that discussed this, and I'm pretty sure he at least tried to pass it along - but I know how such things can fall thru cracks). So you folks technically "own" these components- which means you can change this logic as you deem fit. Last changes I see were done by me in 2022.

from ompi.

rhc54 avatar rhc54 commented on June 6, 2024

Altair changed the way they package PBSPro with respect to Torque and therefore changed the configure logic in PRRTE (which is used in OMPI v5 while OMPI v4 has the old logic in ORTE). You need to ensure you have installed the Torque library - it is missing or installed in a different location on your system. The correct options would now be:

--with-pbs=<pbs-location> --with-tm=<torque-location>

from ompi.

scc138 avatar scc138 commented on June 6, 2024

Thanks for the quick information, I confirm that replacing
options="--with-tm=/opt/pbs/"
with
options="--with-pbs=/opt/pbs"
allows configure to succeed and results in a completed build, though I haven't gotten things working yet. I'll ignore that for the purposes of this issue ticket, though.

In your comment are you saying I need to have BOTH --with-pbs= and --with-tm=? I don't have torque installed on this system.

I am still puzzled about what you think changed in PBS Professional, as well. We (I am with Altair) didn't make any significant packaging change that I am aware of, and in the above example I kept the PBS Pro version the same (2024.1.0, the most current release) and changed only the OpenMPI version. If you can provide any more detail I would really appreciate it.

After seeing your message I went looking for --with-pbs in the online OpenMPI docs and don't see any mention. https://docs.open-mpi.org/en/main/installing-open-mpi/configure-cli-options/runtime.html still says:

--with-tm=DIR: Specify the directory where the TM libraries and header files are located. This option is generally only necessary if the TM headers and libraries are not in default compiler/linker search paths.

TM is the support library for the Torque and PBS Pro resource manager systems, both of which are frequently used as a batch scheduler in HPC systems.

And the information that I (only just now, I fully admit) see in configure --help also seems a bit off:

root@PBSServer:~/openmpi-5.0.3# ./configure --help | grep -i pbs
  --with-pbs              Build PBS scheduler component (default: yes)
  --with-tm(=DIR)         Build TM (Torque, PBSPro, and compatible) support,

--with-pbs looks like a Boolean here, rather than something that accepts a path to a directory, and --with-tm still mentions PBS Pro.

Thanks!

from ompi.

rhc54 avatar rhc54 commented on June 6, 2024

Hey Scott!! I should have recognized the name - my apologies.

Things changed a few years back - not terribly long ago, but still not last week kind of thing. It used to be that the Torque launch support and the PBS scheduler were commingled, so if we built one we built both. We then started hitting places where someone had PBS installed, but not Torque - apparently the packaging changed where that was now possible.

So we altered our configure logic to mirror that new reality by adding a --with-pbs flag alongside the --with-tm one. We don't link against any PBS libraries, so --with-pbs is indeed just a binary for now - if we someday do need to link against a lib, we can extend the configure support for that option. The --with-tm option needs to tell us where to find the Torque library.

This was all initially encountered in PRRTE and so it only made its way into OMPI v5. I don't think the OMPI folks made the changes in the OMPI v4 series, but I haven't really been tracking them.

from ompi.

rhc54 avatar rhc54 commented on June 6, 2024

So just to be clear:

  • --with-pbs means to build the PBS scheduler support. Builds the PRRTE component that picks up the allocation and the PBS support component in PMIx, assuming it was populated (currently is not). The latter is to enable support for the PMIx scheduler integration.
  • --with-tm=<foo> means to build the Torque launcher support and points to where the Torque library can be found. Minus that, some people use the ALPS or PALS launchers in place of Torque, depending on their system. If nothing else is available, we default to ssh.

Hope that helps.

from ompi.

subhasisb avatar subhasisb commented on June 6, 2024

hi, so you are really saying that we need both --with-pbs and --with-tm to build with PBS + task manager launcher integration. --with-pbs will build PBS PMIx support, but OpenMPI will launch using ssh, so --with-tm will be needed to ask OpenMPI to launch via the PBS task manager interface.

Sorry for being repetative, but would please confirm the above statement?

from ompi.

rhc54 avatar rhc54 commented on June 6, 2024

Yes, that is correct - with one change. The --with-pbs option also builds the PBS PRRTE support for detecting and parsing the PBS allocation into PRRTE's resource tracker.

from ompi.

subhasisb avatar subhasisb commented on June 6, 2024

Thanks for the detailed explanation Ralph. We will try this out and report back further. I am not sure we (anybody from the PM/engineering teams) were aware about owning the RM component, i guess we will follow up about that separately.

from ompi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.