Giter Site home page Giter Site logo

ibmspectrumcomputing / lsf-python-api Goto Github PK

View Code? Open in Web Editor NEW
103.0 23.0 65.0 172 KB

Location for the LSF Python wrapper for controlling all things LSF

Home Page: http://ibmspectrumcomputing.github.io/lsf-python-api/

License: Eclipse Public License 1.0

Python 25.50% Makefile 2.31% C 9.15% Batchfile 0.87% SWIG 62.18%
python hpc-systems parallel-computing hpc-applications lsf batch-scheduler

lsf-python-api's People

Contributors

adamsla avatar andretility avatar anna-singleton avatar elstak avatar liuboxa avatar liyancn avatar lleiiell avatar mattaezell avatar paolostivanin avatar vassilisvassiliadis avatar xawangyd avatar xlqiang-learn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lsf-python-api's Issues

ls_load returning junk values along with hostName

On retrieving hostName from ls_load it gives output something like this
hostname.com\x00e\x02\x00d\x08\x00<e\x08\x00i\x1e\x00e\x04\x00d\x08\x00<e\x0b\x00o\x19\x00\x01e\x0c\x00e\x08

which is treated as garbage value.

I want the output to be as hostname.com like the output from lsb_hostinfo on retrieving host

How do i ignore the junk values that it is returning?

segmentation fault when invoking lsf.lsb_queueinfo

Hi,
I'm trying to test the example code in my python interpreter. When I invoke lsf.lsb_queueinfo api, sometimes it keep printing 'lsf is down, please wait' (I check my lsf cluster is working fine). Sometimes no error return but the next command 'queueInfo.queue' return the wrong queue name( invoked short queue but return analog queue name). Sometimes it prints 'segmentation fault(core dumped)' and then exit the interpreter.

>>> ....
>>>lsf.stringArray_setitem(strArr, 0, 'short')
>>>queueInfo = lsf.lsb_queueinfo(strArr,intp_num_queues,None,None,0)
>>>queueInfo.queue

This's my first time to use lsf api. Hope someone can help me with this issue. Thanks.
[lsf 10.1.0.4, python 2.6.6 ]

Links to documentation require updating

developerWorks is being closed, apparently. The link currently 'works' in that you are sent to the correct page, but you are immediately bounced to the new community site, which doesn't have the information.

The wiki will need to be migrated and the links updated, in order to remain relevant.

passing pointer-to-pointer struct issue with lsb_limitfInfo

Hello
In python API, how to pass pointer-to-pointer struct to lsb_limitfInfo

Below is C API
int lsb_limitInfo( limitInfoReq *req, limitInfoEnt **limitItemRef, int *size, struct lsInfo* lsInfo)

Please let me know passing limitInfoEnt ** in Python Script

Library Documentation

Great to have a python API for this, but I can't find alot of documentation on the C code and haven't been able to find any documentation on the Python.

Along the same thread I'd like to request that the example code is cleaned up a bit and thorough comments are added to explain what's going on and show other options. In addition it'd be great to see an example on polling the queue to see if a job that you've submitted is finished since that's a really common use case for some customers.

question regarding the esub

Hi ,

The option '-a ' could be used with bsub to run some esub applicatioin. How can I do the same thing in python by using the api ?

Thanks and best regards,
Xinwei

SegmentationFault after upgrade to LSF10

Hi Developers,

We are facing an issue with the APIs after upgrading to LSF 10.
On running the code, it's resulting in 'Segmentation Fault'.
Have rebuild and reinstalled the LSF Python API package but still, the issue persists.

I can send across a copy of the script if required, please suggest at the earliest.

cannot find module after installation

I installed the package as per instructions but I'm getting following error (trying to run one of the examples provided in the repo):

LSF: 10.1.0.8
Python: 3.9.5

`hostname[~/repo/lsf-python-api/examples][master !?]$ python3 cluster_info.py
Traceback (most recent call last):
File "/home/myuser/repo/lsf-python-api/examples/pythonlsf/lsf.py", line 18, in swig_import_helper
fp, pathname, description = imp.find_module('_lsf', [dirname(file)])
File "/app/vbuild/RHEL7-x86_64/python/3.9.5/lib/python3.9/imp.py", line 296, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named '_lsf'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/myuser/repo/lsf-python-api/examples/cluster_info.py", line 1, in
from pythonlsf import lsf
File "/home/myuser/repo/lsf-python-api/examples/pythonlsf/lsf.py", line 28, in
_lsf = swig_import_helper()
File "/home/myuser/repo/lsf-python-api/examples/pythonlsf/lsf.py", line 20, in swig_import_helper
import _lsf
`

jobFinish exitStatus has exitcodes not in lsbatch.h / lsf.h

Hello
Is there are way to add exit detail in a consistent fashion to the output of jobFinishLog.exitStatus. We are seeing a number of interesting numers, including:
6400, 8704, 33280, 34304, 35840, 256, 512, 65280 etc etc. Some appear to be a bit extension of the translated bhist values BUT this seems very inconsistent and there doesn't appear to be a hook for the translated exit cause as viewed within bacct -l .
eg
31232 -> 122 What is 122 ?!
33280 -> 130 For what reason ? Mem or CPU limit reached ?

Is it possible to either:
Make the exit codes consistent with bhist / bacct and / or provide a variable hook for the exit hint (i.e. job exceeded memory limit) as well as a consistent exit code ?

Many thanks and apologies if I'm missing something obvious.
Pete

Failed to build on Power redhat

[root@lsf1p11 lsf-python-api-master]# python3 -V
Python 3.5.6
[root@lsf1p11 lsf-python-api-master]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
[root@lsf1p11 lsf-python-api-master]# uname -a
Linux lsf1p11 3.10.0-957.el7.ppc64le #1 SMP Thu Oct 4 20:51:36 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
[root@lsf1p11 lsf-python-api-master]#

*** ERROR: No build ID note found in /scratch/dev12/xawangyd/download/lsf-python-api-source/lsf-python-api-master/build/bdist.linux-ppc64le/rpm/BUILDROOT/lsf-pythonapi-1.0.6-10.1.0.7.ppc64le/usr/local/lib/python3.5/site-packages/pythonlsf/_lsf.cpython-35m-powerpc64le-linux-gnu.so
error: Bad exit status from /var/tmp/rpm-tmp.FRhuiW (%install)

RPM build errors:
Bad exit status from /var/tmp/rpm-tmp.FRhuiW (%install)
error: command 'rpmbuild' failed with exit status 1

Doesn Load: - _lsf.so: undefined symbol: Py_InitModule4

Doesn't work with Python 2.7.11 and LSF 9.1.3

whoami@vlpt-someone:~ >--> python
Python 2.7.11 (default, Jan 20 2016, 13:52:18)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from pythonlsf import lsf
Traceback (most recent call last):
File "", line 1, in
File "/hls/share/python/latest/lib/python2.7/site-packages/pythonlsf/lsf.py", line 5, in
import _lsf
ImportError: /hls/share/python/latest/lib/python2.7/site-packages/pythonlsf/_lsf.so: undefined symbol: Py_InitModule4

Array jobs in lsb_openjobinfo

Hi,

Querying an array job with openjobinfo returns an error:

print(lsf.lsb_openjobinfo(887163[45],'', 'nxf30690','','', 0x02000))
Traceback (most recent call last):
File "", line 1, in
TypeError: 'int' object is not subscriptable

Is there a better way to query an specific arrayid in a job?

setup.py build gives ValueError: zero length field name in format

[lsf-python-api]# python setup.py build
Traceback (most recent call last):
File "setup.py", line 81, in
set_gccflag_lsf_version()
File "setup.py", line 54, in set_gccflag_lsf_version
with open('{}/lsf.conf'.format(_lsf_envdir), 'r') as f:
ValueError: zero length field name in format

failed to rebuild library

Hi, I'm seeing the following error when building the library according to the README.md

>> python3 setup.py build
running build
running build_py
copying pythonlsf/lsf.py -> build/lib.linux-x86_64-3.8/pythonlsf
running build_ext
building '_lsf' extension
swigging pythonlsf/lsf.i to pythonlsf/lsf_wrap.c
swig -python -I/arm/tools/platform/lsf/10.1_sp8/linux3.10-glibc2.17-x86_64/lib/../../include/lsf/ -DOS_HAS_THREAD -D_REENTRANT -DFLAG_PYTHONAPI_KEYVALUE_T -DNOLSFVERSION -o pythonlsf/lsf_wrap.c pythonlsf/lsf.i
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/include/python2.4 -I/arm/projectscratch/ssg/pj10000073_platforms/users/benal01/mps4_common_scripts/vivado_build/lsf-python-api/pythonlsf -I/arm/projectscratch/ssg/pj10000073_platforms/users/benal01/mps4_common_scripts/vivado_build/venv/include -I/arm/tools/python/python/3.8.2/rhe7-x86_64/include/python3.8 -c pythonlsf/lsf_wrap.c -o build/temp.linux-x86_64-3.8/pythonlsf/lsf_wrap.o -m64 -I/arm/tools/platform/lsf/10.1_sp8/linux3.10-glibc2.17-x86_64/lib/../../include/lsf/ -Wno-strict-prototypes -DFLAG_PYTHONAPI_KEYVALUE_T -DNOLSFVERSION -DOS_HAS_THREAD -D_REENTRANT -Wp,-U_FORTIFY_SOURCE -O0
cc1: error: unrecognized command line option "-Wno-unused-result"
error: command 'gcc' failed with exit status 1

LSF version: 10.1
swig verison: 2.0.10
python version: 3.8.2
gcc (GCC): 3.4.2
os: RHEL 7.4

Why is it that this package is not available on PIP? Looks like the last update to https://pypi.org/project/platform-python-lsf-api/ was 2014, i've tried installing that via pip, but i see different errors.

ImportError: No module named '_lsf'

I try to install the python-lsf-api following the readme you propose, but i get this error:

$ python3 -c "from pythonlsf import lsf"
Traceback (most recent call last):
  File "lsf-python-api/pythonlsf/lsf.py", line 16, in swig_import_helper
    fp, pathname, description = imp.find_module('_lsf', [dirname(__file__)])
  File "/opt/at10.0/lib64/python3.5/imp.py", line 297, in find_module
    raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named '_lsf'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "lsf-python-api/pythonlsf/lsf.py", line 26, in <module>
    _lsf = swig_import_helper()
  File "lsf-python-api/pythonlsf/lsf.py", line 18, in swig_import_helper
    import _lsf
ImportError: No module named '_lsf'

Python version is 3.5
My system is: Linux zhcc026 3.10.0-957.10.1.el7.ppc64le #1 SMP Thu Feb 7 07:19:03 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

I did the python3.5 setup.py build and install --user.

Is there an obvious thing i forgot to set?

ImportError on mosquito_loop

Hi,

I'm trying the LSF python api from https://github.com/IBMSpectrumComputing/lsf-python-api .I built it on Power and am trying to run the example. I'm getting this error. Any ideas of what might be wrong? Not sure where the mosquito comes from.Looked it up and looks like something from MQTT.

[root@master examples]# python queue_info.py
Traceback (most recent call last):
File "queue_info.py", line 1, in
from pythonlsf import lsf
File "/usr/lib64/python2.7/site-packages/pythonlsf/lsf.py", line 26, in
_lsf = swig_import_helper()
File "/usr/lib64/python2.7/site-packages/pythonlsf/lsf.py", line 22, in swig_import_helper
_mod = imp.load_module('_lsf', fp, pathname, description)
ImportError: /usr/lib64/python2.7/site-packages/pythonlsf/_lsf.so: undefined symbol: mosquitto_loop
[root@master examples]# ls -l /usr/lib64/python2.7/site-packages/pythonlsf/_lsf.so

[root@master examples]# uname -a
Linux master 4.14.0-115.8.1.el7a.ppc64le #1 SMP Thu May 9 14:45:13 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
[root@master examples]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)

How to kill array jobs using lsb_forcekilljob?

How do i pass an array jobs to the function lsb_forcekilljob to be killed?

Passing an array job, lsf.lsb_forcekilljob(337400[4]) throws the following error:
Traceback (most recent call last):
File "/data/cmetrics/lsf_test.py", line 64, in
lsf.lsb_forcekilljob(337400[4])
TypeError: 'int' object has no attribute 'getitem'

Also how do i pass a group of jobs to be killed at once?

Add get_queue_info_all() to api

I found this at one point on the IBM Platform LSF portals.

Can these be appended to the bottom of lsf.i?

 PyObject * get_queue_info_all() {
     struct queueInfoEnt *queueinfo;
     char **resreq = NULL;
     int numqueues = 0;
     int options = 0;

     queueinfo = lsb_queueinfo(resreq,             // Return queries as C queueInfoEnt*
                   &numqueues, NULL, 0, options);

     PyObject *result = PyList_New(numqueues);     // Create PyObject * to get C returns
     int i;
     for (i = 0; i < numqueues; i++) {             // Save queries in a loop to result
         PyObject *o = SWIG_NewPointerObj(SWIG_as_voidptr(&queueinfo[i]),
                                          SWIGTYPE_p_queueInfoEnt, 0 | 0 );
         PyList_SetItem(result,i,o);
     }

     return result;
 }

bacct information using LSF API

Hi,

I want to query done jobs in LSF and get the output as 'bacct -u user_id -d'.
If the above is not availabe, can I query all completed jobs in the last 6-12 hours?

Can you please help me out with this? Thanks for your help in advance.

LSF Version = 10.1
OS = RHEL 7.4

expose bhist and bacct into the Python API

I realized that bhist/bacct is not exposed in the Python API.
lsb.events doesn't seem to be the appropriate mechanism to get this data.

Any suggestion to get standard stats once each job is completed?

Examples don't work properly

There are C syntax and comments in the file which are not valid python; i.e. you don't need semicolons, and // is not used for comments, # is.

How to ask for exlusive use of a GPU ?

Issue

I would like to use the lsf-python-api to submit a LSF job which requests exclusive use of hardware resources.

For example, here's how I request 4 cores, 1 GPU, and 1000 MB of RAM with a 1h walltime:

import pythonlsf.lsf as lsf
import os

if lsf.lsb_init("test") > 0:
    raise ValueError("Unable to initialise LSF environment")

submitreq = lsf.submit()

submitreq.options = 0
submitreq.options2 = 0
submitreq.options3 = 0
submitreq.options4 = 0

submitreq.beginTime = 0
submitreq.termTime = 0
submitreq.resReq = "rusage[mem=1000:ngpus_physical=1.00] span[ptile=1] affinity[core(4,exclusive=(core,alljobs))*1]"
submitreq.outFile = os.path.join(os.getcwd(), "my-stdout.txt")
submitreq.queue = "x86_1h"
submitreq.options = lsf.SUB_OUT_FILE | lsf.SUB_QUEUE | lsf.SUB_RES_REQ
submitreq.options2 = lsf.SUB2_OVERWRITE_OUT_FILE
submitreq.cwd = os.getcwd()
submitreq.options3 = lsf.SUB3_CWD

submitreq.command = "nvidia-smi"

submitreply = lsf.submitReply()
job_id = lsf.lsb_submit(submitreq, submitreply)

print("Job id is", job_id)

Observed behaviour

Running the above prints a job id and when I use bjobs -l <id> I see this at the bottom of the printout:

 GPU REQUIREMENT DETAILS:
 Combined: mode=shared:mps=no:j_exclusive=no:gvendor=nvidia
 Effective: mode=shared:mps=no:j_exclusive=no:gvendor=nvidia

Expected behaviour

If I run an equivalent bsub command I have the option of setting the gpu mode like so:

bsub <other fields....> -gpu num=1:mode=exclusive_process

Then when I look at the bjobs -l <id> information I see this printout:

Combined: num=1:mode=exclusive_process:mps=no:j_exclusive=yes:gvendor=nvidia
Effective: num=1:mode=exclusive_process:mps=no:j_exclusive=yes:gvendor=nvidia

Question

How can I get my python code to ask for exclusive use of the requested GPUs ?

How can I get the execution host?

So far,

I am able to get machine from which the job has been sumitted but not on which machine the job is running

lx114
_60dda77f71550000_p_p_char
84

I guess the hostnames are in the **char, but how to read it from python ?

lsb_queueinfo seg faulting in LSF 10.1 Update 4

#include "lsf.h"
 #include "lsbatch.h"
 #include <stdio.h>

 int main(){

 struct queueInfoEnt *queueinfo;


     char **resreq = NULL;
     int numqueues = 0;
     int options = ALL_QUEUE;

     lsb_init("test");

     queueinfo = lsb_queueinfo(resreq,             // Return queries as C queueInfoEnt*
                   &numqueues, NULL, 0, options);

     //PyObject *result = PyList_New(numqueues);     // Create PyObject * to get C returns
     int i;
     for (i = 0; i < numqueues; i++) {             // Save queries in a loop to result
         //cout << queueinfo[numqueues].queue << endl;
         printf("Queue ID: %i\n", i);
         printf("Queue Name: %s\n", queueinfo[i].queue);
         //PyObject *o = SWIG_NewPointerObj(SWIG_as_voidptr(&queueinfo[i]),
         //                                 SWIGTYPE_p_queueInfoEnt, 0 | 0 );
         //PyList_SetItem(result,i,o);
     }

 return 0;
 }

It is seg faulting when it gets to the second queue and attempts to print the queue name. I want to get all of the queues in a single query. It seg faults even if you retrieve only 2 queues.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.