uwefladrich / scriptengine Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 4.0 404 KB

A lightweight and extensible framework for execution scripts written in YAML.

License: GNU General Public License v3.0

Python 100.00%

climate-models earth-system-model python yaml

scriptengine's People

Contributors

Stargazers

Watchers

Forkers

zklaus mandresm jmrgonza kinow

scriptengine's Issues

Clean up imports throughout the code

ScriptEngine-internal imports (i.e. one SE module needing symbols from another SE module) are note very consistent and clear. Sometimes, whole modules are imported and at other times, only some symbols from modules. This even leads to confusing situations in which symbols are provided by hidden chains of imports, such as in cli/se.py, where

scriptengine.tasks.core.loader.load()
scriptengine.yaml.parse_file()

are used, but neither scriptengine.tasks.core.loader nor scriptengine.yaml are explicitly imported. The parse_file() function, for example, is provided via se.scripts-->se.scripts.simple_script_engine-->se.tasks.core-->se.yaml.

Jobs crash when loop modifier is empty

when a job has an empty loop modifier, it crashes with:

Traceback (most recent call last):
  File "[...]/scriptengine/se/bin/se", line 105, in <module>
    SimpleScriptEngine().run(script=[job], dryrun=DRY_RUN)
  File "[...]/scriptengine/se/scripts.py", line 8, in run
    cfg = job.run(dryrun=dryrun, **config)
  File "[...]/scriptengine/se/jobs.py", line 53, in run
    cfg = task.run(**{**config, **config_updates})
  File "[...]/scriptengine/se/jobs.py", line 53, in run
    cfg = task.run(**{**config, **config_updates})
  File "[...]/scriptengine/se/jobs.py", line 47, in run
    render_string_recursive(self.loop, **config))
  File "[...]/lib/python3.6/ast.py", line 48, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "[...]/lib/python3.6/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 0
    
    ^
SyntaxError: unexpected EOF while parsing

For example, this would trigger the problem:

- template:
    src: semorise.metadata.json.j2
    dst: "{{run_env.local_dir}}/semorise.metadata.{{item}}.json"
  loop: "{{components}}"
- template:

when {{components}} happens to be an empty string/not defined.

Include task raises wrong exception if src not found

Consider

include:
    src: foo.yml

and assume that foo.yml does not exist.

In that case, the Include task raises

scriptengine.exceptions.ScriptEngineTaskArgumentMissingError: Missing task argument: ignore_not_found

when it should raise an ScriptEngineTaskRunError.

SE does not find main module in 0.8.0

In the latest release v0.8.0, the toplevel scriptengine module is not found:

se --version
Traceback (most recent call last):
  File "/home/a001603/tmp/se-testing/.se./bin/se", line 5, in <module>
    from scriptengine.cli.se import main
ModuleNotFoundError: No module named 'scriptengine'

It is probably similar to the problem that lead to the bugfix release 0.6.1, only this time we are using setuptools.find_packages() to automatically find everything.

Review task timing implementation

Task timing in SE needs two parts:

Task classes need to have their run() methods decorated with timed_runner from tasks.base.timing,
Task timing needs to be configured at run time with the task_timer task.

Things could be simplified:

Task timing could be mandatory (at the instance level) and with default options, so no need for configuration with task_timer.
The base class Task could have a Task.timed_run() method, which calls Task.run() with timing. Hence, task developers would not need to decorate their run() methods manually.

Integrate the repository with Travis CI

Since some unit tests have been implemented for SE, the repo should be connected with Travis CI and automatic testing with pytest activated.

CLI file not found handling

If the se command doesn't find (one of) the YAML script file(s) it should log an error and stop. No traceback is needed.

Looping over list of dicts doesn't work in particular case

There is a particular case where looping over a list of dicts does not work:

- base.context:
    list:
      - foo:
          bar: 1
          baz: 2
      - foo:
          bar: 3
          baz: 4
- base.echo:
    msg: '{{item.foo.bar}} - {{item.foo.baz}}'
  loop: "{{list}}"

This raises a jinja2.exceptions.UndefinedError: 'str object' has no attribute 'config'

Command task needs to consider return code

The command task currently ignores the return code of the command and proceeds regardless of success or fail. This should be changed so that the task fails when the command returns a non-zero code. Additionally, there should be an optional parameter that controls if the task should not fail on non-zero return codes. In all cases, a config parameter should be set that indicates the return code of the command.

base.include messes up the context

A new bug has been introduced that causes the include task to change the context erroneously:

- base.context:
    foo: [1, 2]
- base.echo:
    msg: 'Before include, foo is {{foo}}'
- base.include:
    src: include.yml  # any included script, even without context access
- base.echo:
    msg: 'After include, foo is {{foo}}'

foo from the context is [1, 2] before the include task and [1, 2, 1, 2] after.

Improve logging

The current logging implementation store a logger object in each Task instance. This is not really necessary, because logging objects are handled by the logging module itself and they can be unambiguously addressed by their name. There is most likely no performance disadvantage by using a logger via logging.getLogger(name), compared to a stored logger object.

Furthermore, in the current implementation, task login can only begin when the creation of a task object is completed (i.e. the task has a valid _logger member). This prevents logging in the creation phase. Also, task initialisation is complicated by the fact that the (sub)classes determine the logger name. A consistent naming scheme for loggers should be defined, such that the Task base class can implement logging.

When clause with empty/undefined expression

Similar to #8, just for when instead for loop. Consider

base.echo:
    msg: "Foo. Bar!"
when: "{{foo}}"

which will result in an exception that is not handled.

Deprecation warning from attrdict

Pytest reports a warning that originates in the attrdict code:

DeprecationWarning: Using or importing the ABCs from 'collections' instead \
  of from 'collections.abc' is deprecated since Python 3.3, \
  and in 3.9 it will stop working

The problem is, of course, that attrdict has an unclear maintenance status. So maybe time to get rid of it. The problem is that is has only been introduced recently in affda81. However, it's use has been limited anyway.

Command task log not correct

The base.command info log is not completely correct. Example:

base.command:
    name: ls

> se ls.yml
2021-09-10 15:55:50 INFO [se.cli] Logging configured and started
2021-09-10 15:55:50 INFO [se.task:command <e9838ba3f9>] ls args=None cwd=None
2021-09-10 15:55:50 INFO [se.task:command <e9838ba3f9>] Command "{command}" output follows below:
[... ls output folows ...]

The "{command}" part in the output above should of course be replaced by the command's name.

Logging for tasks that are not executed

Consider:

- when: <some condition>
  echo: {msg: Hi!}

The echo task does not log anything, not even in debug mode, when the condition evaluates false. SE should log (in debug mode) that the task is not executed due to the when clause.

Loop on dictionaries

I think that it would be an interesting feature to add an option to use dictionaries for loops. I think that the current implementation only allows to use lists (mapping their elements by default to an item argument), and I propose to implement loops on dictionaries (mapping the keys and values contained in the dictionary).

Find task should have a `depth` argument

The find task should be able to limit the recursive search to a maximum depth, as in

- find:
    path: .
    depth: 3
- echo:
    msg: "Found file: {{item}}"
  loop: "{{result}}"

which should print all files down to three subdirectory levels from the current directory.

Review task load mechanism

Check if setup.py entry_points can provide a better mechanism to provide the task sets from packages.

Let SE stop on handled errors

ScriptEngine tasks should raise some sort of ScriptEngineTaskError if needed. When that's the case, the ScriptEngine instance should catch the exception stop execution (without traceback).

Use base.remove task in YAML tutorial to clean up

It's not necessary to use base.command to run rm, use base.remove instead, which would be more convincing.

Make sure keys from YAML are strings

This:

- config:
    no: foo
- echo:
    msg: "{{no}}"

yields:

TypeError: keyword arguments must be strings

because no is considered False in the YAML context.

Add `basename` and `dirname` filters for Jinja2

These filters are often useful.

Error when Python 3.6 deepcopy copies context

The solution to #47 uses copy.deepcopy under the hood in a few places. However, this is a problem when Python version 3.6 is used (3.7, 3.8, 3.9 are tested and not affected).

In particular, context["se"]["instance"], which holds a reference to the actual ScriptEngine instance, poses some kind of recursive reference when it is deepcopied in the run() method of the same SE instance. This leads in Python 3.6 to an error like

Traceback (most recent call last):

  [... part in ScriptEngine that calls copy.deepcopy(context)...]

  File "/usr/lib64/python3.6/copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "/usr/lib64/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)

  [...long traceback in the copy library...]

  File "/usr/lib64/python3.6/copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib64/python3.6/copy.py", line 169, in deepcopy
    rv = reductor(4)
TypeError: can't pickle _thread.RLock objects

Use setup.py entry_points for cli

Move bin/se to scriptengine/cli/ and use entry_points['console_scripts'] in setup.py to provide the executable script.

Rename `config` task to `context`

The name context would be somewhat more consistent, config refers more the the usage of the task for model configuration.

For backwards compatibility, the config name should be kept for a transitional phase.

Add an option to the template task to produce an executable script

Just a suggestion. I think that it could be useful to add an option to the template rendering task to request that the destination file has execution permissions enabled.
I think that the same result could be achieved with a combination of a template + command task, to call "chmod +x" after the template has been rendered, but I think that the readability of the scriptengine code will be enhanced if both operations are done in the same task.

Further logging improvements

review and better define hierarchy(?) of loggers for ScriptEngine; check how many different log formatters are really needed
implement logger configuration more consistent; configure from dict or file?
improve logging for exceptions (logging.exception()?)
review and clean up scriptengine.logging; try to have functions without side effects
check if the --debug and --quiet command line options are really needed; maybe --log=<level> is better?
make sure every task is logged appropriately; even if user-coded tasks do not log properly; log at SE instance level?
use entry_points as names for task logging (instead of class names)

Handling environment variables

Two extensions are proposed to handle environment variables, beyond what is possible at the moment with base.getenv:

set environment variables
get/set more than one variable with one task

The get and set tasks should be "symmetric".

stdout/stderr handling in base.command

While addressing #41 and #42 some logging issues have been introduced, namely

stderr is completely swallowed if the task completes without error, and
stdout is delayed until the task is completed.

A better implementation of base.command logging is needed. It should

log stdout and stderr with only reasonable delay
use the existing logging system to log stdout and stderr
allow to suppress stdout and/or stderr
allow to save stdout and/or stderr in the context

base.copy handles missing files incorrectly

Assuming

base.copy:
    src: foo
    dst: bar

and assuming further that foo does not exist, the base.copy task falsely assumes that foo is a directory. It will also raise an unhandled FileNotFoundError (appropriate error, but for the wrong reason).

Improved stdout handling for Command task

The base.command task has the stdout argument that is either None (stdout of the command is ignored) or a name for the context (stdout is stored in the context under this name).
This could be improved by having:

stdout is None or False: standard output of the command is ignored
stdout is True: standard output is printed in the SE logs
stdout is a valid context name: standard output is stored in the context

Note that the improvement is twofold:

Adding the option to set stdout to True
Handling the standard output via the logging system (info level)

Command line option `--version`

Implement a command line option that prints the SE version, i.e.

> se --version
0.3.2

Update README.md

Add link to documentation and really short installation description for PyPI install, i.e. pip install scriptengine

Handle error messages in Command task

When base.command encounters an error in the command that is executed, return code and stderr should be logged at the error level and the proper ScriptEngineTaskRunError should be raised. The error message (from stderr) should only be logged once, not propagated upward via the exception.

Better argument checking for Tasks

The current scheme of adding required parameters (which should be called arguments) to the constructor of the base Task class, like:

def __init__(self, logger_name, parameters=None, required_parameters=None):
    # ...

has some drawbacks:

Required arguments should not be defined on an instance level, they are a class property.
In order to define required arguments, task developers have to redefine __init__(), just to call the parent's __init__ with the added required arguments. No real code is added.
When real code is needed in a tasks __init__ function, it is difficult to decide in the base Task class, which subclass actually defined the required arguments.

So,

required arguments should be defined in (sub)classes of Task, as class attributes
argument checking should be done in a (sub)classes __init__ function, if needed
beside "required arguments", there should be "invalid arguments" and checking should be done accordingly

Nested tasked arguments not parsed

When task arguments are dictionaries, they are not parsed by Task.getarg() as they should. This is different from nested lists. Example:

- context:
    world: earth

- command:
    name: echo
    args:
        - Hello:
            planet: "{{world}}"

yields:

{'Hello': {'planet': '{{world}}'}}

i.e. {{world}} is not parsed.

Missing required argument: logging error

The following example task:

base.echo:
    foo: bar

will correctly raise a ScriptEngineTaskArgumentMissingError, but another exception (KeyError)) is raised during logging:

Loops with empty or undefined specs

Consider

- echo:
    msg: "{{item}}"
  loop: "{{foo}}"

where foo is either undefined or an empty list. The loop body should not be executed.

Consider also the corner case

- echo:
    msg: "{{item}}"
  loop:

i.e. the loop spec is YAML-null.

Command line option `--no-color`

Implement a --no-color option for the SE command line that switches off output colouring (i.e. escape sequences).

shutil does not always accept path-like objects

When using pathlib to handle files etc in the base file tasks, the objects cannot always be passed to shutil functions. This is, for example, a problem for shutil.move, which does not accept path-like objects. This means that base.move is broken.

Handle ScriptEngineStopException logging

The SE instance should log (to INFO) that a ScriptEngineStopException has been received and therefore SE is stopping. Nothing more.

The reason why the stop exception was raised should be logged (to INFO) by the task that raises the exception. E.g. base.exit and hpc.slurm.sbatch.

base.include task does not update context

The base.include task does not return a context update, thus any context changes from include tasks are lost.

Command task in 0.10 doesn't work with Python <3.7

Since f9208c9 the command task doesn't run in Python older than 3.7:

base.command:
    name: ls

> se ls.yml 
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.8.2
2021-09-11 14:36:21 INFO [se.cli] Logging configured and started
2021-09-11 14:36:21 INFO [se.task:command <86b553858d>] ls args=None cwd=None
Traceback (most recent call last):
  File "[...]/bin/se", line 8, in <module>
    sys.exit(main())
  File "[...]/lib64/python3.6/site-packages/scriptengine/cli/se.py", line 170, in main
    context['se']['instance'].run(script, context)
  File "[...]/lib64/python3.6/site-packages/scriptengine/scripts/simple_script_engine.py", line 40, in run
    self._guarded_run(script_item, context)
  File "[...]/lib64/python3.6/site-packages/scriptengine/scripts/simple_script_engine.py", line 22, in _guarded_run
    item.run(context)
  File "[...]/lib64/python3.6/site-packages/scriptengine/tasks/core/timing.py", line 56, in wrap_timed
    return func(self, context)
  File "[...]/lib64/python3.6/site-packages/scriptengine/tasks/base/command.py", line 52, in run
    check=True,
  File "/usr/lib64/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
TypeError: __init__() got an unexpected keyword argument 'capture_output'

Implement `time` base task

Like:

- time:
    set: <context_var>

- time:
    since: <datetime>
    set: <context_var>

Let `se --help` list loaded tasks

Since task sets are no longer requested at the command line (previous -t or --task-set switch), there is no feedback as to which tasks are loaded and from where.

The suggestion is to have se --help list all loaded tasks and the modules they are loaded from.

Add sphinx theme package as dev requirement

The package sphinx-rtd-theme is needed to build the documentation. This should be added as development requirement in setup.py.

Include task crashes when ignore_not_found is not present

The base.include task crashes (at least under certain circumstances?) when the ignore_not_found argument is not present. Here is the relevant part of the traceback:

2020-12-21 16:59:58 ERROR [se.task.include <ee3ae07768>] Missing task argument: ignore_not_found
Traceback (most recent call last):
  File "[...]/scriptengine/scriptengine/tasks/base/task.py", line 129, in getarg
    arg = getattr(self, name)
AttributeError: 'Include' object has no attribute 'ignore_not_found'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  [...]
  File "[...]/scriptengine/scriptengine/tasks/base/include.py", line 43, in run
    if self.getarg('ignore_not_found', False):
  File "[...]/scriptengine/scriptengine/tasks/base/task.py", line 134, in getarg
    raise ScriptEngineTaskArgumentMissingError(
scriptengine.exceptions.ScriptEngineTaskArgumentMissingError: Missing task argument: ignore_not_found

Improve exceptions

ScriptEngine and ScriptEngine tasks should throw an own type of exceptions. This would help with debugging, in order to easily distinguish between errors in the ScriptEngine code and otherwise. It would also allow the ScriptEngine instances to handle task exceptions in a more consistent way, e.g. allow continuation of script execution despite task failures (globally or controllable from the script). Furthermore, it would help task developers by giving clearer guidance about which exceptions to throw.

Externalise ScriptEngine tasks

SE tasks are presently part of the SE code under scriptengine/tasks. However, to make SE truly extensible, it should be possible to make tasks available as separate packages. Thus, the existing tasks must be disentangled from SE core code and a mechanism to install SE task packages implemented.

Consistent argument names for base tasks

Now that SE has been in use for a while, it could be time to review the naming of task arguments as used in the YAML specs. Since the SE tasks have been externalised (#11), this involves (just) the base tasks (scriptengine.tasks.base).
Beside being consistent, the names should also be concise.