Comments (8)
My apologies - I wasn't entirely complete in my answers. I went back and looked at the configure code as I was a little bothered by the --with-pbs
option being required. It isn't quite that simple. Because we don't link to a library to obtain the allocation, the resource discovery component automatically builds unless the user requests that it not be built. Specifically:
- if configured
--without-pbs
or--with-pbs=no
, then the PBS resource discovery component will not be built - if configured
--with-pbs
, or--with-pbs=yes
, or if no option was specified, then the component will be built if we are on a Linux, AIX, or OSX system. AFAIK, those are the only environments supported by PBS
The TM launch component requires a library, and so we have to find it. Thus, it's configure logic is more complicated:
- if configured
--without-tm
or--with-tm=no
, then the TM launch component will not be built - if configured
--with-tm
, or--with-tm=yes
, then we will search the default locations for the TM library - the launch component will be built if the library is found. The configure procedure will fail if the library is not found since support was specifically requested - if no option is provided, then we will search the default locations for the TM library - the launch component will be built if the library is found. Configure will continue and the component will not be built if the library is not found
- if configured
--with-tm=<foo>
, then we will search only the specified location - if the library is found there, then the component will be built, otherwise configure will fail
Note that we also deal with the following change that occurred:
# Note that Torque 2.1.0 changed the name of their back-end
# library to "libtorque". So we have to check for both libpbs and
# libtorque.
Sorry this sounds complex - it is an attempt to build PBS scheduler and Torque launch support wherever possible.
Please note that I explicitly turned over maintenance of the RM-specific components in PRRTE to the respective RM companies a couple of years ago as I attempt to retire (your rep was at the meetings that discussed this, and I'm pretty sure he at least tried to pass it along - but I know how such things can fall thru cracks). So you folks technically "own" these components- which means you can change this logic as you deem fit. Last changes I see were done by me in 2022.
from ompi.
Altair changed the way they package PBSPro with respect to Torque and therefore changed the configure logic in PRRTE (which is used in OMPI v5 while OMPI v4 has the old logic in ORTE). You need to ensure you have installed the Torque library - it is missing or installed in a different location on your system. The correct options would now be:
--with-pbs=<pbs-location> --with-tm=<torque-location>
from ompi.
Thanks for the quick information, I confirm that replacing
options="--with-tm=/opt/pbs/"
with
options="--with-pbs=/opt/pbs"
allows configure to succeed and results in a completed build, though I haven't gotten things working yet. I'll ignore that for the purposes of this issue ticket, though.
In your comment are you saying I need to have BOTH --with-pbs= and --with-tm=? I don't have torque installed on this system.
I am still puzzled about what you think changed in PBS Professional, as well. We (I am with Altair) didn't make any significant packaging change that I am aware of, and in the above example I kept the PBS Pro version the same (2024.1.0, the most current release) and changed only the OpenMPI version. If you can provide any more detail I would really appreciate it.
After seeing your message I went looking for --with-pbs in the online OpenMPI docs and don't see any mention. https://docs.open-mpi.org/en/main/installing-open-mpi/configure-cli-options/runtime.html still says:
--with-tm=DIR: Specify the directory where the TM libraries and header files are located. This option is generally only necessary if the TM headers and libraries are not in default compiler/linker search paths.
TM is the support library for the Torque and PBS Pro resource manager systems, both of which are frequently used as a batch scheduler in HPC systems.
And the information that I (only just now, I fully admit) see in configure --help also seems a bit off:
root@PBSServer:~/openmpi-5.0.3# ./configure --help | grep -i pbs
--with-pbs Build PBS scheduler component (default: yes)
--with-tm(=DIR) Build TM (Torque, PBSPro, and compatible) support,
--with-pbs looks like a Boolean here, rather than something that accepts a path to a directory, and --with-tm still mentions PBS Pro.
Thanks!
from ompi.
Hey Scott!! I should have recognized the name - my apologies.
Things changed a few years back - not terribly long ago, but still not last week kind of thing. It used to be that the Torque launch support and the PBS scheduler were commingled, so if we built one we built both. We then started hitting places where someone had PBS installed, but not Torque - apparently the packaging changed where that was now possible.
So we altered our configure logic to mirror that new reality by adding a --with-pbs
flag alongside the --with-tm
one. We don't link against any PBS libraries, so --with-pbs
is indeed just a binary for now - if we someday do need to link against a lib, we can extend the configure support for that option. The --with-tm
option needs to tell us where to find the Torque library.
This was all initially encountered in PRRTE and so it only made its way into OMPI v5. I don't think the OMPI folks made the changes in the OMPI v4 series, but I haven't really been tracking them.
from ompi.
So just to be clear:
--with-pbs
means to build the PBS scheduler support. Builds the PRRTE component that picks up the allocation and the PBS support component in PMIx, assuming it was populated (currently is not). The latter is to enable support for the PMIx scheduler integration.--with-tm=<foo>
means to build the Torque launcher support and points to where the Torque library can be found. Minus that, some people use the ALPS or PALS launchers in place of Torque, depending on their system. If nothing else is available, we default tossh
.
Hope that helps.
from ompi.
hi, so you are really saying that we need both --with-pbs and --with-tm to build with PBS + task manager launcher integration. --with-pbs will build PBS PMIx support, but OpenMPI will launch using ssh, so --with-tm will be needed to ask OpenMPI to launch via the PBS task manager interface.
Sorry for being repetative, but would please confirm the above statement?
from ompi.
Yes, that is correct - with one change. The --with-pbs
option also builds the PBS PRRTE support for detecting and parsing the PBS allocation into PRRTE's resource tracker.
from ompi.
Thanks for the detailed explanation Ralph. We will try this out and report back further. I am not sure we (anybody from the PM/engineering teams) were aware about owning the RM component, i guess we will follow up about that separately.
from ompi.
Related Issues (20)
- mpirun 5.0.2 hangs - ssh works HOT 11
- --with-cuda failes to find libcuda.so HOT 4
- Scaling issue run openmp on a cluster HOT 4
- openmpi osc_ucx_component error HOT 4
- Error using openmpi mpirun in Fedora 40 HOT 5
- Errors when running mpi programs HOT 5
- Trying to run MPI 3.0.6 on docker HOT 6
- problem with MPI_Comm_Create_Group HOT 8
- Error `Could not find viable pmix build` while building in Docker HOT 2
- COLL/UCC doesn't compile against head of UCC at master HOT 2
- Support zero-copy non-contiguous send HOT 4
- OpenMPI/5.0.3 with PMIx/4.2.7 compilation error HOT 2
- Failed to build RPM from SRPM because of large UID and old tar command HOT 3
- dead coll tuned alltoall mca parameters HOT 6
- Discrepancy between 'oshcc' compiler wrapper and corresponding pkg-config files "oshmem-c.pc" and "oshmem.pc"
- Base Allreduce Algorithm Selection/Performance Issue HOT 3
- Issues running OpenMPI 5.0.3 HOT 1
- configure: error: Could not run a simple Fortran program. Aborting. HOT 1
- Use OMPI without LSF integration on LSF HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ompi.