Giter Site home page Giter Site logo

Comments (6)

vchlum avatar vchlum commented on June 14, 2024 1

Hi @snowyday , this error probably means you are using cgroups version 2. Check it out and switch your system to cgroups v1. Unfortunately, the hook works with v1 only.

from openpbs.

vchlum avatar vchlum commented on June 14, 2024 1

I have a patch for this, I will submit it soon. If you do not want to wait, here is the patch:

diff --git a/src/hooks/cgroups/pbs_cgroups.PY b/src/hooks/cgroups/pbs_cgroups.PY
index cee1242f..6ef4ebc5 100644
--- a/src/hooks/cgroups/pbs_cgroups.PY
+++ b/src/hooks/cgroups/pbs_cgroups.PY
@@ -3976,7 +3976,11 @@ class CgroupUtils(object):
         for filename in glob.glob(pattern):
             try:
                 with open(filename, 'r') as desc:
-                    entries = desc.readline().split(' ')
+                    line = desc.readline()
+                    m = re.search('\(.*\)', line)
+                    line = re.sub('\(.*\) ', '', line)
+                    entries = line.split(' ')
+                    entries.insert(1, m.group(0))
                     if int(entries[5]) != sid:
                         continue
                     if check_tasks:

and here is the patched PY file:
pbs_cgroups.zip
you can unzip it and import with qmgr command:
import hook pbs_cgroups application/x-python default pbs_cgroups.PY

The problem is that stat file is read incorrectly.

from openpbs.

snowyday avatar snowyday commented on June 14, 2024

Following up on my previous post, I have now built PBS from the master commit 49874c7, since I identified a possibly related issue documented as #2553. Unfortunately, I am still unable to utilize cgroups as intended.
Further inspection of the logs at /var/spool/pbs/mom_logs/20240307 revealed an error specific to the cpuacct subsystem:

03/07/2024 04:49:12;0080;pbs_python;Hook;pbs_python;['Traceback (most recent call last):', '  File "<embedded code object>", line 6413, in main', '  File "<embedded code object>", line 2928, in __init__', '  File "<embedded code object>", line 3033, in _target_subsystems', '  File "<embedded code object>", line 3925, in enabled', 'CgroupConfigError: enabled: cgroups enabled but not mounted for subsystem cpuacct']
03/07/2024 04:49:12;0001;pbs_python;Hook;pbs_python;Unexpected error in pbs_cgroups handling exechost_periodic event: CgroupConfigError ('enabled: cgroups enabled but not mounted for subsystem cpuacct',)
03/07/2024 04:49:19;0002;pbs_mom;Svr;Log;Log opened
03/07/2024 04:49:19;0002;pbs_mom;Svr;pbs_mom;pbs_version=23.06.06

The log highlights a CgroupConfigError with the message "enabled: cgroups enabled but not mounted for subsystem cpuacct", indicating an issue with the cgroups configuration. Any insight or workaround would be greatly appreciated.

from openpbs.

snowyday avatar snowyday commented on June 14, 2024

@vchlum
Thank you for your assistance!! I have now switched cgroups from v2 to v1:

stat -fc %T /sys/fs/cgroup/
tmpfs

However, I am now facing a new error:

03/07/2024 06:21:30;0100;pbs_python;Hook;pbs_python;main: Event type is execjob_launch, job ID is 1011.Machine
03/07/2024 06:21:31;0080;pbs_python;Hook;pbs_python;['Traceback (most recent call last):', '  File "<embedded code object>", line 6425, in main', '  File "<embedded code object>", line 1041, in invoke_handler', '  File "<embedded code object>", line 1240, in _execjob_launch_handler', '  File "<embedded code object>", line 4019, in add_pids', '  File "<embedded code object>", line 3980, in _get_pids_in_sid', "ValueError: invalid literal for int() with base 10: 'S'"]
03/07/2024 06:21:31;0001;pbs_python;Hook;pbs_python;Unexpected error in pbs_cgroups handling execjob_launch event for job 1011.Machine (system hold set): ValueError ("invalid literal for int() with base 10: 'S'",)
03/07/2024 06:21:31;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 1011.Machine, event_type 2048 (elapsed time: 0.6283)
03/07/2024 06:21:31;0100;pbs_mom;Hook;pbs_cgroups;execjob_launch request rejected by 'pbs_cgroups'
03/07/2024 06:21:31;0008;pbs_mom;Job;1011.Machine;Unexpected error in pbs_cgroups handling execjob_launch event for job 1011.Machine (system hold set): ValueError ("invalid literal for int() with base 10: 'S'",)

The traceback details a ValueError related to an "invalid literal for int() with base 10: 'S'", which suggests an unexpected character 'S' where an integer was expected. This error occurred during a execjob_launch event for job 1011.Machine.

with open(filename, 'r') as desc:
entries = desc.readline().split(' ')
if int(entries[5]) != sid:
continue

Any insights into why this ValueError might be occurring, and potential fixes, would be greatly appreciated.

from openpbs.

snowyday avatar snowyday commented on June 14, 2024

@vchlum

All issues have been resolved!
After completely removing openpbs and reinstalling and reconfiguring it, I executed sudo /opt/pbs/bin/qmgr -c "import hook pbs_cgroups application/x-python default pbs_cgroups.PY". I would like to express my sincere gratitude for all the help I have received. Many thanks to the openpbs community for their support!!

from openpbs.

snowyday avatar snowyday commented on June 14, 2024

For Ubuntu 22.04 Users,

  1. To ensure proper functionality with cgroups on Ubuntu 22.04, please follow these steps:First, make sure to switch to cgroups v1.

  2. Wait for the new commit that resolves the issue or manually import the patch provided by @vchlum.

  3. Do not forget to install libcjson-dev before building with the following command: apt install libcjson-dev.

Good luck, and I hope this helps others experiencing similar issues!

from openpbs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.