ganga-devs / ganga Goto Github PK

View Code? Open in Web Editor NEW

97.0 13.0 159.0 36.54 MB

Ganga is an easy-to-use frontend for job definition and management

License: GNU General Public License v3.0

Python 98.91% C 0.01% Shell 0.20% C++ 0.01% Makefile 0.02% Jupyter Notebook 0.02% Roff 0.76% Dockerfile 0.06%

hep distributed-computing cern lhcb user-interface

ganga's Introduction

Ganga

Ganga is a tool to compose, run and track computing jobs across a variety of backends and application types.

Installation

Ganga can be installed using the standard Python tool pip with

pip install ganga

Usage

Ganga primarily runs as a command-line tool so can be started by just running

ganga

Which will load an interactive IPython prompt at which you can type

j = Job()
j.submit()
j.peek('stdout')

to create a simple local job which runs an executable which simply prints some text to the stdout.

Documentation

User guide and developer documentation

ganga's People

Contributors

Stargazers

Watchers

Forkers

kreczko chrisburr mjmottram schmitts olupton slangrock rmatev alexpearce vladimirromanovsky mesmith75 saschastahl mannymoo datty henryiii pseyfert nickhastings will-cern natfarleydev mohammed2 abishek10 durwasa-chakraborty paguenther arnaubrossa jplews17 apsknight man-hep-tier2 sam-tygier rquaglia90 janisozaur fstagni praveenojha33 riturajsingh878 sanchit-ahuja drmarkwslater wireshark10 dhruveshk aidarbek ishanrai05 pranay144 kpoornima nimitbhardwaj hedonhermdev bhavyakh mayankdhiman dilawari2008 corxrew admorris victorphoenix3 ankitskvmdam rajatsingh95 007vedant leoank ibrahimanis rindhane venkateshdb dumbmachine rohitrathore1 ductng rama270677 jonas-eschle 99bcsagar gpubrr042 sannki dhruvpuri ryuwd tarunxsh encrypted-soul sakshamk20 diptanshumittal adarsh-np gurudattapatil acampove dcervenkov monalisha31 vlisovsk chaen kh296 bhu1tyagi its-sushant devanshu24 joj0s durbar2003 abhijeetsharma200 fasnafarook shalearkane sayang14 vedanshbhartia tanuj-rai matheus-llc imperialteststudent souravpy dg1223 meherrushi singhankur28 rishabgit kabiirk heistera laf070810 aryabhatta-dey cycitizen

ganga's Issues

Ganga is abusing /tmp too much

Transferred from JIRA:

https://its.cern.ch/jira/browse/GANGA-1243

Basically make sure Batch, LCG and Athena classes clean up unnecessary tmp files

Hidden schema property shouldn't cause problems behind a proxy

Hi all,

I'm seeing the situation where I would like to hide a schema attribute behind a proxy, specifically 'is_prepared'. (See #29)

If I hide a schema attribute I naively expect it to be hidden at the proxy level but available again when I perform myStrippedObject = stripProxy(myObject).

The problem I'm seeing is that the attribute is not assigned (possibly just in the default case I'm not sure) when the attribute is hidden and I think this is due to a problem with what I expect vs what is implemented in GangaObjects.py

I would expect that all attributes even hidden ones be set to the default values behind the scenes in the Objects and that the only 'hiding' that is done is when the Proxy advertises the new dictionary to the GPI.

This requires a few lines of change in the GangaObject and a bit of work in the Proxy but does anyone see any possible problems with working this way rather than what is currently done?

At the moment I see is_prepared in myStrippedObject.__class__.__dict__ as well as in the schema but I don't see it as an attribute and hasattr(myStrippedObject, 'is_prepared') will fail which is not what I'm naively expecting. (Naturally the hidden attribute is not available through myObject either by design).
Or am I missing something with how hidden attributes are hidden?

Thanks,

Rob

Race condition in monitoring when copying jobs

In a clean Ganga session with a completely wiped ~/ganga_dir, running the following:

j = Job(); j.submit(); j.copy(); j.copy()

causes an exception in the monitoring loop about 80% of the time. The exception given is:

Traceback (most recent call last):
  File ".../ganga/python/Ganga/GPIDev/Lib/Job/Job.py", line 611, in updateStatus
    self.time.timenow(str(newstatus))
  File ".../ganga/python/Ganga/GPIDev/Lib/Job/JobTime.py", line 127, in timenow
    be_statetime = j.backend.getStateTime(childstatus)
  File ".../ganga/python/Ganga/Lib/Localhost/Localhost.py", line 112, in getStateTime
    j = self.getJobObject()
  File ".../ganga/python/Ganga/GPIDev/Base/Objects.py", line 1025, in getJobObject
    raise AssertionError('no job associated with object ' + repr(self))
AssertionError: no job associated with object <Ganga.Lib.Localhost.Localhost.Localhost object at 0x314c650>

Pausing for a second between after the submit() operation makes the race condition disappear:

from time import sleep
j = Job(); j.submit(); sleep(1); j.copy(); j.copy()

This came up when converting the JIRA1961 test to the new system.

First use of logging does strange things

Hi all,

I'm adding an issue to remind me to look into the logging system, or more specifically at least for LHCb the first use of Ganga currently only shows the word 'NORMAL' for each logging value when it's run.

This then works correctly if the user leaves and restarts ganga so is probably some part of the first use tools which aren't run/tested too often.

This should be fixed before the 6.1.14 release is finished.

Rob

subjob.resubmit() stalling for at least some jobs on the Dirac backend

subjob.resubmit() is stalling for (at least) some subjobs. This was first reported in private communications from a user.

The tested workflow is to run a job with subjobs to completion.

Take a subjob which can run on multiple sites.
Run job until it completes
Blacklist the site the subjob ran on in subjob.backend
try to do a subjob.resubmit()

This hangs on an inifinite loop which after exiting with Ctrl+C seems to be due to a subprocess call hanging either not terminating correctly or waiting on missing some extra input.

This should be fixed before 6.1.12

The release scripts don't cope well with git conflicts

This is a placeholder for me to remember to break out the release scripts to be more resilient to git conflicts throughout the release process.

Currently the tools fall over due to a variety of different errors in the final release step and it's not as simple as just re-running the step due to different kinds of human intervention being required.

Corrupt jobs throwing an exception which causes Ganga to not load

Hi all,

Apparently I wasn't able to catch all possible situations where a corrupt job causes Ganga to fail to load due to a repository error.

I'll work on this for 6.1.14. I think the problem is the 'catch all' except here was written with GangaExceptions in mind I think.

I'll look into wrapping the errors from SubJobXML and the VStreamer as part of this release into better defined errors.

Rob

Original exception thrown from Chris Jones below:

Welcome to Ganga ***
Version: 6.1.13
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.
This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.
Ganga.Utility.Config : INFO reading config file /cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r13/install/ganga/python/GangaLHCb/LHCb.ini
Ganga.Utility.Config : INFO reading config file /lhcb/GangaCambridge/lhcb/config/6-0-8/GangaLHCb.ini
Ganga.Utility.Config : INFO reading config file /usera/jonesc/.gangarc
Traceback (most recent call last):
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r13/install/ganga/bin/ganga", line 49, in <module>
Ganga.Runtime._prog.bootstrap()
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r13/install/ganga/python/Ganga/Runtime/bootstrap.py", line 1113, in bootstrap
for n, k, d in Repository_runtime.bootstrap():
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r13/install/ganga/python/Ganga/Runtime/Repository_runtime.py", line 123, in bootstrap
registry.startup()
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r13/install/ganga/python/Ganga/GPIDev/Lib/Registry/JobRegistry.py", line 86, in startup
super(JobRegistry, self).startup()
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r13/install/ganga/python/Ganga/Core/GangaRepository/Registry.py", line 536, in startup
self.repository.startup()
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r13/install/ganga/python/Ganga/Core/GangaRepository/GangaRepositoryXML.py", line 173, in startup
self.update_index(verbose=True, firstRun=True)
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r13/install/ganga/python/Ganga/Core/GangaRepository/GangaRepositoryXML.py", line 436, in update_index
logger.debug("Failed to load id %i: %s %s" % (id, getName(x.orig), x.orig))
AttributeError: 'exceptions.KeyError' object has no attribute 'orig'

Dirac proxy multi VO problems

Here is an email from simon quoting an LSST user:

We've been given the long report below from a user testing the LSST VO
using ganga + our DIRAC server. The gist of it seems to be that ganga is
getting a vanilla proxy, which the DIRAC server will then attach a VOMS
proxy to at job submission time. Unfortunately this user is a member of
multiple VOs and DIRAC sometimes picks a different VO to the one they're
trying to test... I guess the questions we need to answer are:

 - Is this behaviour reproducible by us?
 - Is there some way to get ganga to get a VOMS proxy so that there is no
   room for the DIRAC server to make any decisions on the VO?

Would you be able to have a look at this?

The original email is below. Hopefully this is a small fix but obviously the new credentials system will be the proper solution.

Most of the jobs following those 4 failed with a mixture of

Stalling for more than 11700 sec and Job stalled: pilot not running

at all sites but Birmingham where they weren't supposed to run.

Since I put the right dirac-proxy-init in .gangarc I looked a bit better at what happens and it seems >not to care, it just generates a plain proxy.

if I run the dirac command standalone I get this proxy
{quote}
aforti@vm7>dirac-proxy-init -g lsst_user -M
Generating proxy...
Enter Certificate password:
Added VOMS attribute /lsst
Uploading proxy for lsst_user...
Proxy generated:
subject : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti/CN=proxy/CN=proxy
issuer : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti/CN=proxy
identity : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti
timeleft : 23:53:59
DIRAC group : lsst_user
path : /tmp/x509up_u500
username : alessandra.forti
properties : NormalUser
VOMS : True
VOMS fqan : ['/lsst']

Proxies uploaded:
DN | Group | Until (GMT)
/C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti | vo.northgrid.ac.uk_user | 2016/11/03 11:48
/C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti | gridpp_user | 2016/11/03 11:48
/C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti | lsst_user | 2016/11/03 11:48
aforti@vm7>voms-proxy-info -all
subject : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti/CN=proxy/CN=proxy
issuer : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti/CN=proxy
identity : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti/CN=proxy
type : proxy
strength : 1024 bits
path : /tmp/x509up_u500
timeleft : 23:53:42
key usage : Digital Signature, Key Encipherment, Data Encipherment
=== VO lsst extension information ===
VO : lsst
subject : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti
issuer : /DC=com/DC=DigiCert-Grid/O=Open Science Grid/OU=Services/CN=voms1.fnal.gov
attribute : /lsst/Role=NULL/Capability=NULL
timeleft : 23:53:42
uri : voms1.fnal.gov:15003
{quote}

when I put that command in ganga this is what happen instead

{quote}
aforti@vm7>grep dirac-proxy-init .gangarc
[defaults_GridCommand]init = dirac-proxy-init -g lsst_user -M

aforti@vm7>ganga
Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti
Enter GRID pass phrase for this identity:
Creating proxy ........................................................................................................................... Done
Your proxy is valid until: Fri Nov 20 23:16:25 2015

*** Welcome to Ganga ***
Version: Ganga-6-1-6-hotfix1
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.

Ganga.Utility.Config : INFO reading config file /home/aforti/.gangarc

In [1]:
Do you really want to exit ([y]/n)? y
Ganga.Core.MonitoringComponent : INFO Stopping the monitoring component...
aforti@vm7>voms-proxy-info -all
subject : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti/CN=400330830
issuer : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti
identity : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=alessandra forti
type : RFC compliant proxy
strength : 1024 bits
path : /tmp/x509up_u500
timeleft : 23:59:43
key usage : Digital Signature, Key Encipherment, Data Encipherment
{quote}

it generates a plain proxy without VOMS information. With LHCb this still works because they have >only LHCb on their servers but with the multi-VO gridpp Dirac it picks the first VO I belong to to run >the jobs if the jobs are submitted without VOMS credentials.

Interactive submit failure when successful

When I run a test job with the interactive backend, the job runs fine. However, the job throws a series of errors of the following kind

Ganga.GPIDev.Adapters : WARNING ---------- error in user/extension code ----------
Ganga.GPIDev.Adapters : WARNING Traceback (most recent call last):
Ganga.GPIDev.Adapters : WARNING File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r11/install/ganga/python/Ganga/GPIDev/Adapters/IBackend.py", line 310, in master_resubmit
Ganga.GPIDev.Adapters : WARNING result = b.resubmit()
Ganga.GPIDev.Adapters : WARNING File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r11/install/ganga/python/Ganga/Lib/Interactive/Interactive.py", line 105, in resubmit
Ganga.GPIDev.Adapters : WARNING return self._submit(self.getJobObject().getInputWorkspace().getPath("jobscript"))
Ganga.GPIDev.Adapters : WARNING TypeError: _submit() takes exactly 3 arguments (2 given)
Ganga.GPIDev.Adapters : WARNING --------------------------------------------------

As a result, the job then automatically tries to resubmit, while actually also simultaneously running fine. The resubmit stops when it finishes its maximum number of retries. The output is okay and the ntuples are stored correctly.

VO : LHCb
Ganga Version : v601r11

GangaThreadPool shutdown having troubles.

The GangaThreadPool 'shutdown', most likely the shutdown task for GangaThread or GangaThreadPool is throwing a named exception of 'global object logging not defined' on shutdown.

I'm currently catching this and making a debug log statement about the code.

I cannot fix or mitigate this particular exception and have tried removing all logging entries from within the GangaThread and GangaThreadPool classes to rule out that it is these classes throwing the exception. (Unless I missed something this didn't fix the problem)

The shutdown method seems to do what is expected, it has a breakable wait statement attempting to give the worker threads a chance to finish with the ability to exit on demand at shutdown.

As the shutdown method works and only throws an exception at the end of it's running I'm strongly leaning to this as a low priority bug but simply cannot work out what is causing the 'logging' object to not be defined. (This could be from within the (Worker?)Thread or from within the global ThreadPool I'm not sure.

I'm opening this task to look into it over the next few releases to see if this shines a light on some other bug which is only showing up here.

edit:
Actual line I get is:
' Cannot run one of the exit handlers: shutdown ... Cause: global name 'logger' is not defined '
The part of the string of interest is this:
"global name 'logger' is not defined"

(Re)enable the 'mergeScript' option for Jedi jobs

The mergeScript option is present but broken. It needs fixing.

Glitch in LSF submission when the job name isn't empty

Hi,

If I try and submit a job with the LSF backend, I get an error and the submission fails:

Ganga.GPIDev.Adapters : INFO submitting job 14.0 to LSF backend
Ganga.GPIDev.Lib.Job : INFO job 14.0 status changed to "submitting"
Ganga.GPIDev.Adapters : WARNING ---------- error in user/extension code ----------
Ganga.GPIDev.Adapters : WARNING Traceback (most recent call last):
Ganga.GPIDev.Adapters : WARNING File "/afs/cern.ch/lhcb/software/releases/GANGA/GANGA_v601r13/install/ganga/python/Ganga/GPIDev/Adapters/IBackend.py", line 179, in master_submit
Ganga.GPIDev.Adapters : WARNING if b.submit(sc, master_input_sandbox):
Ganga.GPIDev.Adapters : WARNING File "/afs/cern.ch/lhcb/software/releases/GANGA/GANGA_v601r13/install/ganga/python/Ganga/Lib/Batch/Batch.py", line 165, in submit
Ganga.GPIDev.Adapters : WARNING if isType(self, PBS):
Ganga.GPIDev.Adapters : WARNING NameError: global name 'isType' is not defined
Ganga.GPIDev.Adapters : WARNING --------------------------------------------------
Ganga.GPIDev.Lib.Job : ERROR JobManagerError: error during submit
Ganga.GPIDev.Lib.Job : ERROR JobManagerError: error during submit ... reverting job 14 to the new status
Ganga.GPIDev.Lib.Job : INFO job 14 status changed to "new"
Ganga.Runtime.bootstrap : ERROR JobError: Error: JobManagerError: error during submit

This seems to just be a trivial missing import from somewhere? The bug can be avoided by making sure the job name is an empty string.

Removal of the old Messaging system for 6.1.14

In summary:

The MSGMS system seems to be used to push a stomp based spyware client to the worker node to report home on a job running/completing (which is as far as I know simply never used, can someone please shout now if this isn't the case).
I think this needs some careful discussion on a Thursday meeting as to whether we plan to remove this entire feature.

I'm in favour of keeping the spyware component of this system to track what versions the client is being used on and to report back on job submission statistics. I like these and would like to in time expand add a few little extra bits of information to the gangamon server. But at the moment this is tied into a lot of old (appearing to be deprecated) code.

I'm in favour of dropping a lot of this code for several reasons:

Code appears to be fragile/broken.
This feature is not widely used. (If by anyone anymore)
This reduces our dependency on stomp significantly.

IMO we can consider dropping this and re-introducing this feature in a more sane way in the future if it's required/requested.

Does anyone strongly want to keep/maintain the MSGMS system?

Improve the handling of RegEx/Glob options

We are wildly inconsistent as to whether Ganga takes regex or glob style matching. It would nice if each field could take either where possible.

Problem Resubmitting PBS tasks

Hi,
I've got a strange problem with resubmitting PBS tasks.
This is Ganga-6-0-44 here (for the moment I need to stick to the 6.0.x branch).
A simple "task" which demonstrates the problem is ("test_exit.py"):

t = CoreTask()
trf = CoreTransform()
trf.application = Executable()
trf.application.exe = '/bin/sh'
trf.application.args = ['-c', '\"exit 1\"']
trf.backend = PBS()
trf.unit_splitter = GenericSplitter()
trf.unit_splitter.attribute = "application.args"
trf.unit_splitter.values = [ 'unused' ]
trf.abort_loop_on_submit = False
t.appendTransform( trf )
t.float = 100
t.run()

which then produces:

...
Version: Ganga-6-0-44
...
In [1]:execfile('test_exit.py')

In [2]:Ganga.GPIDev.Lib.Job               : INFO     submitting job 0
Ganga.GPIDev.Lib.Job               : INFO     job 0 status changed to "submitting"
Ganga.GPIDev.Lib.File              : INFO     Preparing Executable application.
Ganga.GPIDev.Lib.File              : INFO     Created shared directory: conf-41d5bb1c-8f15-40cb-8cc4-0f664e3eb3aa
Ganga.GPIDev.Adapters              : INFO     Sending file /bin/sh to shared directory.
Ganga.GPIDev.Lib.File              : INFO     Submitting a prepared application; taking any input files from conf-41d5bb1c-8f15-40cb-8cc4-0f664e3eb3aa
Ganga.GPIDev.Adapters              : INFO     submitting job 0 to PBS backend
Ganga.Lib.Batch                    : INFO     could not match the output and extract the Batch queue name
Ganga.GPIDev.Lib.Job               : INFO     job 0 status changed to "submitted"
Ganga.GPIDev.Lib.Job               : INFO     job 0 status changed to "running"
Ganga.GPIDev.Lib.Job               : INFO     job 0 status changed to "failed"
Ganga.GPIDev.Lib.Job               : INFO     resubmitting job 0
Ganga.GPIDev.Lib.Job               : INFO     job 0 status changed to "submitting"
Ganga.GPIDev.Adapters              : INFO     resubmitting job 0 to PBS backend
Ganga.Lib.Batch                    : WARNING  qsub: illegal -N value
usage: qsub [-a date_time] [-A account_string] [-b secs]
      [-c [ none | { enabled | periodic | shutdown |
      depth=<int> | dir=<path> | interval=<minutes>}... ]
      [-C directive_prefix] [-d path] [-D path]
      [-e path] [-h] [-I] [-j oe] [-k {oe}] [-l resource_list] [-m n|{abe}]
      [-M user_list] [-N jobname] [-o path] [-p priority] [-P proxy_user] [-q queue]
      [-r y|n] [-S path] [-t number_to_submit] [-T type]  [-u user_list] [-w] path
      [-W additional_attributes] [-v variable_list] [-V ] [-x] [-X] [-z] [script]

Ganga.GPIDev.Lib.Job               : ERROR    failed to resubmit job, JobManagerError: error during submit
Ganga.GPIDev.Lib.Job               : WARNING  reverting job 0 to the failed status
Ganga.GPIDev.Lib.Tasks             : ERROR    Couldn't resubmit the job. Deactivating unit.


In [2]:tasks.table()
...
Out[2]:

    # |         Type |                  Name |    State |                       Comment |Jobs: done/  run/  subd/  attd/ fail/ hold/  bad |Float
------------------------------------------------------------------------------------------------------------------------------------------------
    0 |     CoreTask |               NewTask |  running |                               |   1:    0/    0/    0/     --/    1/    0/    0 |  100
  0.0 |CoreTransform |      Simple Transform |  running |                               |   1:    0/    0/    0/     --/    1/    0/    0 |     
------------------------------------------------------------------------------------------------------------------------------------------------

In [3]:exit()
Ganga.Core.MonitoringComponent     : INFO     Stopping the monitoring component...

Could you, please, help me.
Thanks in advance,
Best regards,
Jacek.

Get outputFilteType working for ProdTrans + Jedi

For a new set of HammerCloud tests, I need to get specify outputFileType for the Jedi job. This will need require adding a ProdTrans RT Handler as well as the additional option in JediRequirements.

inputdata not always recorded for subjobs on disk

Hi all,

There are some users complaining due to the fact that they cannot resubmit failed subjobs because their subjobs don't have the inputdata recorded on disk.
These are typically LHCb/Gaudi/Dirac jobs which are split using SplitByFiles using the Offline splitter.
All within Ganga 6.1.11 I've been told.

I cannot reproduce this error from testing and am waiting an example script from a user to demonstrate how they're setting up their job. As I can't reproduce this after several attempts I'm really waiting on user input to actually determine what is causing this.

I'll update this if this develops.

Rob

Fix problems with DiracFile

DiracFile has some missing functionality compared to PhysicalFile upload in Ganga v600 I'll look into this and other fixes which are mostly papercuts.

New monitoring system

The current monitoring is a mire of threads, queues and locks which has grown to the point where it is extremely fragile. Long-term we have expressed a wish for a system which looks like the following:

An external monitoring dæmon, written in Python 3 and using asyncio which, given a repository on disk will make all the calls to external services to check job status, download output (and do the submission?)

Of course, this doesn't solve the problem of having more than one agent needing to be able to write to a repository so we would still need lock files or make the dæmon act as a server too which all Ganga clients talk to (the dæmon would then maintain a queue of incoming new jobs and change requests).

Since we cannot rely on Python 3 yet, we cannot go straight to this system but an intermediate system which has the same interface characteristics could aid the transition.

The aim of this issue is to log our discussions on this topic to get an understanding of how a final system could look as well as the best way to progress towards it.

Import missing in Grid.py for CREAM backend

Good afternoon,

I have tried using Ganga-6-1-6-hotfix1 from cvmfs:

/cvmfs/ganga.cern.ch/Ganga/install/LATEST/bin/ganga

to submit jobs to Manchester Analysis Facility but I get the error
below. Could you please have a look?

Thank you,
Rustem

*** Welcome to Ganga ***
Version: Ganga-6-1-6-hotfix1
Documentation and support: http://cern.ch/ganga
Type help() or help('index') for online help.

This is free software (GPL), and you are welcome to redistribute it
under certain conditions; type license() for details.

Ganga.Utility.Config : INFO reading config file
/afs/hep.man.ac.uk/u/rustem/.gangarc
Ganga.GPIDev.Lib.Job : INFO submitting job 3251
Ganga.GPIDev.Lib.Job : INFO job 3251 status changed
to "submitting"
Ganga.Lib.Executable : INFO Preparing Executable application.
Ganga.Lib.Executable : INFO Created shared
directory: conf-30cf5d3b-450d-4f1b-b234-d4ea6cf07192
Ganga.GPIDev.Adapters : INFO Sending file object
/bin/bash to shared directory
Ganga.GPIDev.Lib.Job : INFO Preparing subjobs
Ganga.Lib.Executable : INFO Submitting a prepared
application; taking any input files from
conf-30cf5d3b-450d-4f1b-b234-d4ea6cf07192
Ganga.Lib.LCG : WARNING CREAM CE assigment from
AtlasCREAMRequirements failed.
Ganga.GPIDev.Lib.Job : WARNING ---------- error in
user/extension code ----------
Ganga.GPIDev.Lib.Job : WARNING Traceback (most recent call last):
Ganga.GPIDev.Lib.Job : WARNING File
"/cvmfs/ganga.cern.ch/Ganga/install/LATEST/python/Ganga/GPIDev/Lib/Job/Job.py",
line 1592, in submit
Ganga.GPIDev.Lib.Job : WARNING r =
self.backend.master_submit( rjobs, jobsubconfig, jobmasterconfig)
Ganga.GPIDev.Lib.Job : WARNING File
"/cvmfs/ganga.cern.ch/Ganga/install/LATEST/python/Ganga/Lib/LCG/CREAM.py",
line 1086, in master_submit
Ganga.GPIDev.Lib.Job : WARNING if not
grids['GLITE'].cream_proxy_delegation(self.CE):
Ganga.GPIDev.Lib.Job : WARNING File
"/cvmfs/ganga.cern.ch/Ganga/install/LATEST/python/Ganga/Lib/LCG/Grid.py",
line 914, in cream_proxy_delegation
Ganga.GPIDev.Lib.Job : WARNING t_expire = time.time() +
Ganga.GPIDev.Lib.Job : WARNING NameError: global name
'time' is not defined

Ganga.GPIDev.Lib.Job : WARNING

Ganga.GPIDev.Lib.Job : ERROR global name 'time' is
not defined ... reverting job 3251 to the new status
Ganga.GPIDev.Lib.Job : INFO job 3251 status changed to "new"

JobError: global name 'time' is not defined

Require Python 2.7

All our code already runs on Python 2.7 without issue but we would like to move to requiring it so that we can make the most of its features and make future-proofing for Python 3 easier.

I have a plan about how to enact the change and will put up a feature branch soon with the code.

IPython crash on exit

On exiting the 6.1.3 release, I yesterday had a crash from IPython. No obvious side effect from this apart from nasty warnings. If I see it again, I will post the traceback.

As change of IPytuon version is upcoming, it may be that this problem can just be ignored.

DiracFile quirks when uploading

Hi,

There are a couple of minor issues with DiracFile in 6.1.13, particularly in relation to uploading files to grid storage:

*) If you specify namePattern and localDir, but not explicitly an LFN, before calling DiracFile.put() older ganga versions would generate a path like /lhcb/user/x/xxx/GangaFiles_23.46_Wednesday_18_November_2015/MyFile.dst and upload it there. 6.1.13 fails to add the '/MyFile.dst' part, so the LFNs are confusing and not unique if you upload >1 file/minute. (https://github.com/ganga-devs/ganga/blob/master/python/GangaDirac/Lib/Files/DiracFile.py#L720 + lines 751-752 look to blame).

*) Specifying uploadSE as an option to DiracFile.put() doesn't seem to work properly; clicking through github it looks like all_SE_list() uses a global cache, so uploadSE gets ignored on the 2nd and subsequent uses.

Clearly these are both minor issues that can be worked around on the user side, but I think should still be considered low-priority bugs.

Cheers, Olli

Debugging user jobs and making more use of JobTime object

Hi all,

I'm considering adding a (maybe hidden) attribute to the job-time object to record what version of Ganga was used for each change of state of a job.

I have a hunch that moving between Ganga 6.0 and Ganga 6.1 is causing data loss which could otherwise not be happening and ruling this out is taking much longer than just reading the logging data auto-recorded for a job.

Does anyone have any suggestions on whether they think this is a good idea or not?

Thanks,

Rob

Nice avatar for the ganga-ci robot

@milliams As one who is still using the github default I am hardly one to talk but I though it would be a bit more professional if we can give the ganga-ci account a nice avatar. I'm thinking something like the ganga logo... although this would make it indistinguishable from the ganga-devs organisation. Maybe that's a good thing? thoughts?

ROOT 'pro' version off site isn't working as expected

The ROOT merger script doesn't appear to be correctly picking up the 'pro' version of ROOT when the ROOT location is set.

This could be due to the re-write of the code to descend into the location path and find the requested version of ROOT that the user wanted.
There is the possibility that descending into the full 'location/version' isn't done correctly by the code but the old code contained horrible '../../' and '../../../' lines which I wanted to remove.

I'll look into this as part of 6.1.12 and see if this is related to our handling of the ROOT config or not.

Original email below:

From: Christopher Rob Jones
Posted: 12/10/2015 12:36
Subject: Problem with ROOT settings in ganga 601r11

Hi, 

ROOT file merging appears to be failing in gaga 601r11, with 

Ganga.Utility                      : ERROR    sh: 
/cvmfs/lhcb.cern.ch/lib/RootConfig/bin/root-config: No such file or 
directory 
Ganga.GPIDev.Adapters              : ERROR    PostProcessException: 
('Can not run ROOT correctly. Check your .gangarc file.',) 
Ganga.Utility                      : ERROR    No proper ROOT setup 

the path 

/cvmfs/lhcb.cern.ch/lib/RootConfig/bin/root-config 

is wrong, it should be 

/cvmfs/lhcb.cern.ch/lib/RootConfig/pro/bin/root-config 

This has always worked, with the following lines in your site config file 

[defaults_Root] 
# ROOT version for merging 
version = pro 

[ROOT] 
# ROOT architecture/compiler type 
arch = $$CMTCONFIG$$ 
# Directory containing local ROOT installation(s) 
# location = /lhcb/sw/sw/ROOT 
location = /cvmfs/lhcb.cern.ch/lib/RootConfig 
# ROOT version for Ganga jobs 
version = pro 
# Directory containing Python installation used for execution of PyROOT 
pythonhome = 
/cvmfs/lhcb.cern.ch/lib/GangaConfig/python/${pythonversion}/${arch} 
# Version number of Python used for execution of PyROOT 
pythonversion = pro 

and as far as I can tell they are correctly set at runtime in ganga 
In [23]:print config.ROOT 
ROOT : Options for Root backend 
*    arch = 'x86_64-slc6-gcc48-opt' 
           Architecture of ROOT 
           Type: <type 'str'> 
*    location = '/cvmfs/lhcb.cern.ch/lib/RootConfig' 
           Location of ROOT 
           Type: <type 'str'> 
      path = '' 
           Set to a specific ROOT version. Will override other options. 
           Type: <type 'str'> 
*    pythonhome = 
'/cvmfs/lhcb.cern.ch/lib/GangaConfig/python/pro/x86_64-slc6-gcc48-opt' 
           Location of the python used for execution of PyROOT script 
           Type: <type 'str'> 
*    pythonversion = 'pro' 
           Version number of python used for execution python ROOT script 
           Type: <type 'str'> 
*    version = 'pro' 
           Version of ROOT 
           Type: <type 'str'> 


In [24]:print config.defaults_Root 
defaults_Root : default attribute values for Root objects 
      args = [] 
           List of arguments for the script. Accepted types are numerics and 
           strings 
           Allowed types: ['str', 'int', 'list'] 
      is_prepared = None 
           Location of shared resources. Presence of this attribute 
implies the 
           application has been prepared. 
           Allowed types: ['type(None)', 'bool'] 
      script = 
File(name='/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r11/install/ganga/python/Ganga 
/Lib/Root/defaultRootScript.C',subdir='.')
           A File object specifying the script to execute when Root starts 
           Type: <class 'Ganga.GPIDev.Lib.File.File.File'> 
      usepython = False 
           Execute 'script' using Python. The PyRoot libraries are added 
to the 
           PYTHONPATH. 
           Type: <type 'bool'> 
*    version = 'pro' 
           The version of Root to run 
           Type: <type 'str'> 


It looks like the 'version=pro' setting is not working... Any ideas what 
could be wrong ? 

Chris

Schema version checking

Whenever we define a new GangaObject subclass we have to also give it a _schema data member with an appropriate Version argument. As far as I can tell, the only place this version is used is in the VStreamer parsing at

ganga/python/Ganga/Core/GangaRepository/VStreamer.py

Line 311 in 0b79688

if not cls._schema.version.isCompatible(version):

where it will create a dummy object if the versions don't match. I'm wondering how much we actually use this feature and whether it's worth the effort of having to define a Version in every GangaObject? I assume the idea is supposed to be that the minor version is increased for every addition to the schema and the major version is increased for every removal.

Looking at Job however shows that in 5 years, between here and here there were lots of additions made but the version did not increase once. If we had have increased the version number then loading a new job in an old Ganga would fail.

At present, if we load a new job with an old Ganga and then again with a new Ganga then information is lost (e.g. setting parallel_submit=True on a Job in 6.1.13, loading it with 6.0.44 and then again with 6.1.13 resets it to False).

I'm not sure how to fix this or even if we need to? If we remove a schema attribute between version then it seems that it is just silently ignored when loading and if we add a new one then it seems to just use the default value. If this is the case and we're ok with it then I suggest removing the version checking entirely. Perhaps we print a warning if loading a job with new/removed attributes?

Root() application doesn't handle arguments correctly

Since 6.1.13 I can't use the Root() application, as the generated command does not seem to escape the arguments properly.

In [455]:n.submit()
Ganga.GPIDev.Lib.Job               : INFO     submitting job 531
Ganga.GPIDev.Lib.Job               : INFO     job 531 status changed to "submitting"
Ganga.Lib.Root                     : INFO     Created shared directory: conf-135d3b33-2af2-4db8-b832-bcd2357f6c4b
Ganga.GPIDev.Adapters              : INFO     Sending file object /home/lupton/Kshh_svn/proctuples/makechain.cpp to shared directory
Ganga.GPIDev.Lib.Job               : INFO     Preparing subjobs
Ganga.Lib.Root                     : INFO     rootsys: /cvmfs/lhcb.cern.ch/lib/lcg/releases/LCG_79/ROOT/6.04.02/x86_64-slc6-gcc48-opt
Ganga.GPIDev.Adapters              : INFO     submitting job 531 to Interactive backend
Ganga.Lib.Interactive              : INFO     Starting job 531
Ganga.GPIDev.Lib.Job               : INFO     job 531 status changed to "submitted"
   ------------------------------------------------------------
  | Welcome to ROOT 6.04/02                http://root.cern.ch |
  |                               (c) 1995-2014, The ROOT Team |
  | Built for linuxx8664gcc                                    |
  | From tag v6-04-02, 14 July 2015                            |
  | Try '.help', '.demo', '.license', '.credits', '.quit'/'.q' |
   ------------------------------------------------------------
root [0]
Processing makechain.cpp(task-ddfix-MC.txt,D2KSHH_merged_PPLL-task-ddfix-MC,PPLL,/data/lhcb/users/lupton/akepahome/olupton/gangadir61/workspace/olupton/LocalXML/)...
input_line_15:2:12: error: use of undeclared identifier 'task'
 makechain(task-ddfix-MC.txt,D2KSHH_merged_PPLL-task-ddfix-MC,PPLL,/data/lhcb/users/lupton/akepahome/olupton/gangadir61/workspace/olupton/LocalXML/) /* invoking function corresponding to '.x' */
           ^
input_line_15:2:17: error: use of undeclared identifier 'ddfix'
 makechain(task-ddfix-MC.txt,D2KSHH_merged_PPLL-task-ddfix-MC,PPLL,/data/lhcb/users/lupton/akepahome/olupton/gangadir61/workspace/olupton/LocalXML/) /* invoking function corresponding to '.x' */
                ^

....this carries on for many lines. Clearly the (string) argument is being interpreted as C++.

This test used the Interactive backend, and:

In [456]:n.application
Out[456]: Root (
 is_prepared = ShareDir(name='conf-135d3b33-2af2-4db8-b832-bcd2357f6c4b',subdir='.') ,
 args = [task-ddfix-MC.txt, D2KSHH_merged_PPLL-task-ddfix-MC, PPLL, /data/lhcb/users/lupton/akepahome/olupton/gangadir61/workspace/olupton/LocalXML/]  ,
 version = '6.04.02' ,
 usepython = False ,
 script = File (
    name = '/home/lupton/Kshh_svn/proctuples/makechain.cpp' ,
    subdir = '.'
    )
 )

As a related question, it is not clear how to pick between compiled (i.e. root script.C+) and interpreted (i.e. root script.C) running. Is this possible?

Thanks, Olli

Revert isType Migration

@drmarkwslater requests at least a partial revert of the isType migration introduced in commit 8df5de9.

For additional details, see the discussion in the commit, starting:
8df5de9#commitcomment-14154666

I suggest all discussion now migrates to this issue to keep it centralised.

Make jobs.select behave in a more sane way

Hi all,

This is an issue to follow on from a discussion which was on the ganag-developers list to change the interface to jobs.select to be a bit more sane and look something similar to job.select(ids=[5]).
as suggested by @alexanderrichards.

I've been caught out by the 'strangeness' of the behaviour of this and as it's exposed to the users it would be nice to make this behave in a more standard way.

I think this is just a case of fixing the actual method as I don't think we make too much use of this method internally but it would probably require an entry in the changelog when it's changed.

Thanks,

Rob

load() is failing for some jobs

load() function fails for multiple reasons in latest release.

This should be looked at as it's bit of a pain from a user perspective.

There is also the problem that protected attributes can't be assigned during a load. This should be addressed by not exporting protected attributes imo.

Rob

Original backtrace from Chris Jones below:

In [51]:test=load('/usera/jonesc/s24-BcDD.udsts.txt')
---------------------------------------------------------------------------
<type 'exceptions.AttributeError'>        Traceback (most recent call last)

/var/nwork/pclr/jonesc/output/stripping/S24-B2OC-Tests/<console> in
<module>()

/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r11/install/ganga/python/Ganga/GPIDev/Persistency/__init__.py
in load(filename, returnList)
     204         if item:
     205             try:
--> 206                 this_object = eval(str(item), Ganga.GPI.__dict__)
     207                 objectList.append(this_object)
     208             except NameError as x:

/var/nwork/pclr/jonesc/output/stripping/S24-B2OC-Tests/<string> in
<module>()

/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r11/install/ganga/python/Ganga/GPIDev/Base/Proxy.py
in _init(self, *args, **kwds)
     377         for k in kwds:
     378             if getattr(self, proxyRef)._schema.hasAttribute(k):
--> 379                 setattr(self, k, kwds[k])
     380             else:
     381                 logger.warning('keyword argument in the %s
constructur ignored: %s=%s (not defined in the schema)', name, k, kwds[k])

/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r11/install/ganga/python/Ganga/GPIDev/Base/Proxy.py
in _setattr(self, x, v)
     518             raise GangaAttributeError("'%s' has no attribute
'%s'" % (getattr(self, proxyRef)._name, x))
     519
--> 520         object.__setattr__(self, x, v)
     521     helptext(_setattr, """Set a property of %(classname)s with
consistency and safety checks.
     522 Setting a [protected] or a unexisting property raises
AttributeError.""")

/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r11/install/ganga/python/Ganga/GPIDev/Base/Proxy.py
in __set__(self, obj, val)
     292                 val = makeGangaList(stripper(val))
     293         else:
--> 294             val = stripper(val)
     295
     296         # apply attribute filter to component items

/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r11/install/ganga/python/Ganga/GPIDev/Base/Proxy.py
in stripAttribute(v)
     268                 v = getattr(v, proxyRef)
     269                 logger.debug('%s property: assigned a component
object (%s used)' % (self._name, proxyRef))
--> 270             return getattr(obj,
proxyRef)._attribute_filter__set__(self._name, v)
     271
     272         # unwrap proxy

/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r11/install/ganga/python/GangaDirac/Lib/Files/DiracFile.py
in _attribute_filter__set__(self, name, value)
     177
     178             elif name == 'localDir':
--> 179                 return expandfilename(value)
     180
     181         return value

/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r11/install/ganga/python/Ganga/Utility/files.py
in expandfilename(filename)
      15 def expandfilename(filename):
      16     "expand a path or filename in a standard way so that it may
contain ~ and ${VAR} strings"
---> 17     return os.path.expandvars(os.path.expanduser(filename))
      18
      19

/cvmfs/lhcb.cern.ch/lib/lcg/releases/LCG_76root6/Python/2.7.9.p1/x86_64-slc6-gcc48-opt/lib/python2.7/posixpath.py
in expanduser(path)
     259     """Expand ~ and ~user constructions.  If user or $HOME is
unknown,
     260     do nothing."""
--> 261     if not path.startswith('~'):
     262         return path
     263     i = path.find('/', 1)

<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute
'startswith'

LHCbTasks background loop

[Copied from JIRA https://its.cern.ch/jira/browse/GANGA-2038]

Hi,
Sometimes the tasks background thread spits out the following error:

Ganga.GPIDev.Lib.Tasks : ERROR Full traceback:
Traceback (most recent call last):
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r8/install/ganga/python/Ganga/GPIDev/Lib/Tasks/TaskRegistry.py", line 144, in _thread_main
p.update()
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r8/install/ganga/python/Ganga/GPIDev/Lib/Tasks/ITask.py", line 101, in update
if trf.update() and not self.check_all_trfs:
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r8/install/ganga/python/Ganga/GPIDev/Lib/Tasks/ITransform.py", line 241, in update
unit.getID(), self.getID(), task.id))
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r8/install/ganga/python/Ganga/GPIDev/Lib/Tasks/ITransform.py", line 180, in getID
return task.transforms.index(self)
File "/cvmfs/lhcb.cern.ch/lib/lhcb/GANGA/GANGA_v601r8/install/ganga/python/Ganga/GPIDev/Lib/GangaList/GangaList.py", line 386, in index
return self._list.index(self.strip_proxy(obj))
ValueError: <GangaLHCb.Lib.Tasks.LHCbTransform.LHCbTransform object at 0x7fe54a4b7310> is not in list
Ganga.GPIDev.Lib.Tasks : INFO Some Transforms of Task 121 'D2KShh realdata' have been paused. Check tasks.table() for details!

Somehow this also resets the 'float' of the Task to zero when it pauses it.
I haven't noticed anything really bad happening to the task afterwards, if I start it again things seem to keep going.
As it happens in a background thread it's hard to tell exactly what triggers the error. It came immediately after a job from that Task had finished submitting:

Ganga.GPIDev.Adapters : INFO submitting job 292.53 to Dirac backend
Ganga.GPIDev.Lib.Job : INFO job 292.53 status changed to "submitting"
Ganga.GPIDev.Lib.Job : INFO job 292 status changed to "submitting"
Ganga.GPIDev.Lib.Job : INFO job 292.53 status changed to "submitted"
Ganga.GPIDev.Lib.Job : INFO job 292 status changed to "submitted"
Ganga.GPIDev.Lib.Tasks : ERROR Full traceback: <see above>

where 292.53 is the last subjob of 292.
I realise this isn't a huge amount to go on, but thought I'd open a ticket as a place to collate information about future occurrences.
Thanks, Olli

GangaFile put and remove statements should return True/False

It has been requested that GangaFile objects return True/False for put and remove commands so that scripts and monitoring tools can determine if an action has completed successfully.

We already do this for setLocation and I think this should only be a few lines of code to change but I'[d prefer to leave it until 6.1.13 to implement this.

Extremely slow performance for large job submission (XML and Dataset related)

After running some profiling and benchmarking I've discovered that a large source of performance problems is in parsing the XML for a job.

In order to reproduce this I was attempting to run over a 'standard' LHCb dataset of 8082 LFN with a standard LHCb job that effectively just counted the integrated luminosity for each lfn.

This job would stall for a few minutes at a time which didn't make much sense as I don't see this sort of performance bottleneck in simple profiling.

This turned up too late to be considered for 6.1.14, although I wanted to track down and identify what was causing this performance problem before 6.1.14 was released to know if this was a deal breaker.

The problem that I see is that constructing 8k DiracFile objects takes approximately 40s to perform and storing this on disk as a flat array of DiracFiles causes ganga to pause for the 2-3minutes required to process the xml file.

I think for 6.1.15 I would like to rewrite the internals of LHCbDataset to store the LFN in a flat array of strings and act effectively as a generator producing the DiracFile on demand.

Given that there is now a generic GangaDataset I would like to add this generator type functionality there and have a GangaDataset class which is able to store 'flat' information and generate IGangaFile objects on demand which would be more sensible than simply generating 8k (or more) DiracFile objects and storing the full object on disk in a huge XML.

(This could prove tricky with iterators and such but should also reduce the memory footprint which is a bit higher than I would like it to be atm ~200Mb per job when submitting which can expand when the dataset is copied/modified.)

Hopefully this would allow for the LHCbDataset to become a lose wrapper around the Core GangaDataset providing some minor additional LHCb functionality on top of this.

Reading ganga version from .gangarc in 6.1.12

The version stored in 6.1.11 is stored in the x,y.z format rather than the x-y-z format.

Shall we fix the 6.1.12 to be able to read x.y.z as well as x-y-z even though we're storing the format of x-y-z for forward/backward compatibility.
or,
Are we content in asking users to fix their .gangarc files by hand?

Users moving forwards/backwards are already familiar with changing/fixing this but users who just naively update to 6.1.11 and don't pay attention to what version they're on will see 6.1.12 crash on launch.

This is a minor fix from the users perspective but are we wanting to fix this ourselves or have this as a user requirement?

(We could even use this to intentionally break backwards compatibility if we chose in the future i.e. with Ganga-v7 if we ever go down this route.)

Making a deep copy of a backend resets the `id`

On develop:

import time
j = Job()
j.submit()
time.sleep(2) #Wait for job to start
print j.backend.id
f = j.copy()
print j.backend.id

will print

84286
-1

and so for some reason, copying the job has reset the id of the job's backend to the default value. I found this behaviour since it happens during a small step of the copy process, when the backend is deep-copied:

import time, copy
j = Job()
j.submit()
time.sleep(2) #Wait for job to start
print j.backend.id
b = copy.deepcopy(j.backend)
print j.backend.id

It should be noted that b.id and f.backend.id are all set to -1 as well so the information just seems to be being lost entirely.

Looking at the tests now, it seems it's failing on Savannah8009 which was the test I was converting when I came across this.

Cache data vs fully initialized Job

This is an issue to remind me to look into the use of Cached data on an object from the .idx file vs loading the full object description from the .xml.

Currently this feels like it's extremely difficult to follow where the actual data is being stored and where it's being accessed.
I've made some improvements with the optimisation work for 6.1.14 but this highlights that this needs addressing properly imo.

I've some ideas on how the code could be cleaned up with regards to this but it should probably be done for 6.1.15 or onwards. Basically I want to remove the overhead in trying to access data that isn't there and make it much clearer where and when a job should be fully loaded from disk in future.
This goes hand in hand with improving the SubJobXMLList which unfortunately is still loading ALL subjobs for many tasks due to some parts of the codebase requesting the whole subjob or all subjobs before it knows what it wants to do with the information.

Slowdown in job submissions from 6.1.1-hotfix1 - 6.1.2-hotfix1 and beyond

Submitting 100 subjobs to the ARC backend takes ~1 minute in 6.1.1-hotfix1 (well done!), but slows to ~15minutes in 6.1.2-hotfix1 and beyond. I just tested all the way to 6.1.13.

Add a file type which writes to a shared filesystem

It's common when running jobs on a local batch system to write the output files of the jobs manually to some shared file-system which is also accessible from the desktop machine of the user. We don't currently have a file type which supports this automatically. LocalFile will copy it over the sandbox so that is not what we want.

The closest we have is using a MassStorageFile with setting the defaultProtocol to '', the uploadOptions.path to the shared file-system and uploadOptions.cp_cmd etc. to cp etc.

Perhaps it would be worth making a new file type (e.g. SharedFile) which derives from MassStorageFile which sets these defaults to something sensible?

Remove the reexec machinery

We've discussed several times about getting rid of the ability for Ganga to re-exec itself. Historically this was needed in order to tweak LD_LIBRARY_PATH for linking against specific version of numpy but this is not relevant any more.

At the very least I would like to change the default to not re-exec but if possible I would like to completely remove the code from bootstrap.py. Given that LHCb has been running for years with GANGA_NEVER_REEXEC set without problems, I don't think there should be any trouble.

Any complaints?

Performance problems in v601r13

Hi all,

I'm just adding this issue for a place to discuss problems with performance which have been highlighted in submitting large (a job with order 200+ subjobs) to LHCbDirac.

I've done some work on develop to track down and reverse some regressions and I have some other patches which I'm testing before I upload them (unfortunately this is mainly bugfixing work so I didn't start a dedicated branch for this task.

I've identified a few severe bottlenecks including a bug in GangaList, the SplitBytFiles splitter using Offline splitter as well as problems with the XML repo being a bit too paranoid and hammering the filesystem.

I'll update this task with some more information later today/this-evening,

Thanks,

Rob

Monitoring and worker threads

Hi,

It looks like the monitoring does not necessarily pick up the number of worker threads in the Ganga configuration (.gangarc). In my instance, I increased the number of worker threads to 10, but I have not seen more than 3 threads run at any given time - and to note, 3 is the default value of this parameter (NumWorkerThreads).

VO : LHCb
Ganga version : 6.1.13
OS : sl6

Of course the great use of increasing the number of threads is when I have a local machine with many cores, but which is not being used, I can hopefully increase the speed of all the operations possible with this.

Thanks and Cheers,
Raja.

Move IGangaFile from Ganga/GPIDev/Lib/File/IGangaFile to Ganga/GPIDev/Adapters/IGangaFile

Hi all,

This is a minor task which has obvious repercussions of some imports needing fixing but is there any reason why the IGangaFile interface shouldn't be placed with other interfaces in Ganga.GPIDev.Adapters?

This is a relatively simple 'Friday afternoon' task to do one week so I don't think we should have it blocking 6.1.14 but it would be nice to change this imo,

Unless anyone has any good reasons not to fix this?

Thanks,

Rob

Move to IPython 3.x by default

I would like to see us move to IPython 3.x by default by 6.1.13.

I think this is as working as well as the presently default IPython 0.6 implementation we're using.

It would be nice to drop the old IPython 0.6 related code once we've settled on IPython 3 as it's unlikely we will have anyone dropping back to an older version any time soon.

I would push for it for 6.1.12 but there have been a lot of changes in Core which I'd like to see tested and released first and whilst IPython shouldn't interfere with this I'd rather not risk too many things changing under the hood in 6.1.12.

Does anyone have a strong objection to this move?

Thanks,

Rob

Problem with TaskChainInput

Hi,
I've got a strange problem with the TaskChainInput.
This is Ganga-6-0-44 here (for the moment I need to stick to the 6.0.x branch).
I cannot convince a "transform" to use a specific subset of files, coming from the previous "transform", as "input files".
Here's the skeleton of my script:

tk = SomeTask() # ... the "task" ...
# ...
trf_first = SomeTransform() # ... the first "transform" ...
trf_first.nbinputfiles = 1 # ... several tens of jobs wil be run ...
trf_first.backend = PBS()
trf_first.application = SomeApplication
# ...
# "SomeApplication" produces two kinds of files, "*_A_*.root" and
# "*_B_*.root", and I want to "save" all of them in "/.../First/"
trf_first.outputfiles = [MassStorageFile(namePattern='*.root',outputfilenameformat='/'.join(['First','{fname}']))]
# ...
tk.appendTransform(trf_first)
# ...
trf_second = SomeTransform() # ... the second "transform" ...
trf_second.nbinputfiles = 0 # ... just one job for all files ...
trf_second.backend = PBS()
# ...
indata = TaskChainInput()
indata.input_trf_id = trf_first.getID() # ... here's the link ...
indata.single_unit = True;
indata.use_copy_output = False;
# only "*_B_*.root" files (from the previous "trf_first" step) should be
# used as input files for the "trf_second" step, ... but ...
# ... the "include_file_mask" seems to be ignored ...
indata.include_file_mask=['*_B_*.root']
# ... and ... the "exclude_file_mask" seems to be ignored, as well ...
indata.exclude_file_mask=['*_A_*.root']
# ... and, as a result, all available "*.root" files are passed as "input" ...
trf_second.addInputData(indata)
# ...
# "save" the output in "/.../Second/"
trf_second.outputfiles = [MassStorageFile(namePattern='*.root',outputfilenameformat='/'.join(['Second','{fname}']))]
# ...
tk.appendTransform(trf_second)
# ...
tk.float = 100
tk.run()

Could you, please, help me.
Thanks in advance,
Best regards,
Jacek.

Move workernode templates

Hi all,

The naming of the workernode templates when they were extracted from the code was a bit bad and causes problems with code profiling tools etc.
It would be nice to at least change their file extensions so that it's clear this code isn't run in place.
Maybe even some work to turn these script templates into 'pure' python code in some cases.
I'm thinking that passing a dictionary of the inline replacements as arguments to these classes should be enough to do this.

I think this should be a relatively minimal task to do but touches a fair few files, it would be nice to get done in 6.1.15.

Thanks,

Rob

Problems Resubmitting to the ARC backend

Had a bug report from Jeppe:

Hi Mark,
I am having a little trouble resubmitting failed jobs through ganga to arc backends. When I try, I get the following output (using ganga 6.1.6-hotfix1, but identical in many of the previous versions):

In [22]:jobs(124).subjobs(26).resubmit()
Ganga.GPIDev.Lib.Job               : INFO     resubmitting job 124.26
Ganga.GPIDev.Lib.Job               : INFO     job 124.26 status changed to "submitting"
Ganga.GPIDev.Lib.Job               : INFO     job 124 status changed to "submitting"
---------------------------------------------------------------------------
<type 'exceptions.AttributeError'>        Traceback (most recent call last)

/scratch/andersen/LHHiggsSubmit/<console> in <module>()

/scratch/andersen/install/LAST/python/Ganga/GPIDev/Lib/Job/Job.py in resubmit(self, backend)
   1920
   1921         """
-> 1922         return self._resubmit(backend=backend)
   1923
   1924     def auto_resubmit(self):

/scratch/andersen/install/LAST/python/Ganga/GPIDev/Lib/Job/Job.py in _resubmit(self, backend, auto_resubmit)
   2036                 else:
   2037                     if backend is None:
-> 2038                         result = self.backend.master_resubmit(rjobs)
   2039                     else:
   2040                         result = self.backend.master_resubmit(

/scratch/andersen/install/LAST/python/Ganga/Lib/LCG/ARC.py in master_resubmit(self, rjobs)
   1111
   1112         # delegate proxy to ARC CE
-> 1113         if not grids['GLITE'].arc_proxy_delegation(self.CE):
   1114             logger.warning('proxy delegation to %s failed' % self.CE)
   1115

<type 'exceptions.AttributeError'>: 'Grid' object has no attribute 'arc_proxy_delegation'

Obviously, I created a valid proxy... Any ideas?
Thanks,
Jeppe

Check the setLocation script for master jobs and for DiracFile

There appears to be a bug where some DiracFile objects have the __postprocessinglocation__ file set to be the localDir of the job and the master job for some jobs attempts to find this in the master output for the job even though it shouldn't.

This is just a placeholder to remind me to look into this as I suspect these are separate problems in the Job class and in the DiracBase.

Investigate the use of ordered dictionary for schema attributes

As mentioned in the developer meeting on 03/12/2015 I've spotted a problem in Core due to initializing objects can potentially require that other schema attributes be correctly set for a parent object.

I'd like to investigate changing the attributes in the Schema to be stored in an ordered dictionary as of 6.1.15.
The default behaviour would be to have an ordered dictionary but to allow for unmodified classes to still use a dictionary style definition of the schema attributes.

This shouldn't be too invasive but I'd rather not introduce this into 6.1.14.

Missing 'PBS' entries in 'backendPostprocess' plus some better documentation needed

When a new ganga creates a new ".gangarc", I can find four instances of 'backendPostprocess' ("DiracFile", "GoogleFile", "LCGSEFile", "MassStorageFile") with appropriate entries for different backends, but 'PBS' entries are missing (I assume they will always be equal to the existing 'LSF' entries).
Could you, please, modify the procedure that creates ".gangarc" so that it adds these 'PBS' related entries, too.

ganga-devs / ganga Goto Github PK

ganga's Introduction

Ganga

Installation

Usage

Documentation

ganga's People

Contributors

Stargazers

Watchers

Forkers

ganga's Issues

Thank you, Rustem

Ganga.GPIDev.Lib.Job : WARNING

JobError: global name 'time' is not defined

Recommend Projects

Recommend Topics

Recommend Org

Thank you,
Rustem