uwefladrich / scriptengine Goto Github PK
View Code? Open in Web Editor NEWA lightweight and extensible framework for execution scripts written in YAML.
License: GNU General Public License v3.0
A lightweight and extensible framework for execution scripts written in YAML.
License: GNU General Public License v3.0
ScriptEngine-internal imports (i.e. one SE module needing symbols from another SE module) are note very consistent and clear. Sometimes, whole modules are imported and at other times, only some symbols from modules. This even leads to confusing situations in which symbols are provided by hidden chains of imports, such as in cli/se.py
, where
scriptengine.tasks.core.loader.load()
scriptengine.yaml.parse_file()
are used, but neither scriptengine.tasks.core.loader
nor scriptengine.yaml
are explicitly imported. The parse_file()
function, for example, is provided via se.scripts
-->se.scripts.simple_script_engine
-->se.tasks.core
-->se.yaml
.
when a job has an empty loop
modifier, it crashes with:
Traceback (most recent call last):
File "[...]/scriptengine/se/bin/se", line 105, in <module>
SimpleScriptEngine().run(script=[job], dryrun=DRY_RUN)
File "[...]/scriptengine/se/scripts.py", line 8, in run
cfg = job.run(dryrun=dryrun, **config)
File "[...]/scriptengine/se/jobs.py", line 53, in run
cfg = task.run(**{**config, **config_updates})
File "[...]/scriptengine/se/jobs.py", line 53, in run
cfg = task.run(**{**config, **config_updates})
File "[...]/scriptengine/se/jobs.py", line 47, in run
render_string_recursive(self.loop, **config))
File "[...]/lib/python3.6/ast.py", line 48, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "[...]/lib/python3.6/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 0
^
SyntaxError: unexpected EOF while parsing
For example, this would trigger the problem:
- template:
src: semorise.metadata.json.j2
dst: "{{run_env.local_dir}}/semorise.metadata.{{item}}.json"
loop: "{{components}}"
- template:
when {{components}}
happens to be an empty string/not defined.
Consider
include:
src: foo.yml
and assume that foo.yml
does not exist.
In that case, the Include
task raises
scriptengine.exceptions.ScriptEngineTaskArgumentMissingError: Missing task argument: ignore_not_found
when it should raise an ScriptEngineTaskRunError
.
In the latest release v0.8.0, the toplevel scriptengine
module is not found:
se --version
Traceback (most recent call last):
File "/home/a001603/tmp/se-testing/.se./bin/se", line 5, in <module>
from scriptengine.cli.se import main
ModuleNotFoundError: No module named 'scriptengine'
It is probably similar to the problem that lead to the bugfix release 0.6.1, only this time we are using setuptools.find_packages()
to automatically find everything.
Task timing in SE needs two parts:
run()
methods decorated with timed_runner
from tasks.base.timing
,task_timer
task.Things could be simplified:
task_timer
.Task
could have a Task.timed_run()
method, which calls Task.run()
with timing. Hence, task developers would not need to decorate their run()
methods manually.Since some unit tests have been implemented for SE, the repo should be connected with Travis CI and automatic testing with pytest activated.
If the se
command doesn't find (one of) the YAML script file(s) it should log an error and stop. No traceback is needed.
There is a particular case where looping over a list of dicts does not work:
- base.context:
list:
- foo:
bar: 1
baz: 2
- foo:
bar: 3
baz: 4
- base.echo:
msg: '{{item.foo.bar}} - {{item.foo.baz}}'
loop: "{{list}}"
This raises a jinja2.exceptions.UndefinedError: 'str object' has no attribute 'config'
The command task currently ignores the return code of the command and proceeds regardless of success or fail. This should be changed so that the task fails when the command returns a non-zero code. Additionally, there should be an optional parameter that controls if the task should not fail on non-zero return codes. In all cases, a config parameter should be set that indicates the return code of the command.
A new bug has been introduced that causes the include
task to change the context erroneously:
- base.context:
foo: [1, 2]
- base.echo:
msg: 'Before include, foo is {{foo}}'
- base.include:
src: include.yml # any included script, even without context access
- base.echo:
msg: 'After include, foo is {{foo}}'
foo
from the context is [1, 2]
before the include task and [1, 2, 1, 2]
after.
The current logging implementation store a logger object in each Task
instance. This is not really necessary, because logging objects are handled by the logging
module itself and they can be unambiguously addressed by their name. There is most likely no performance disadvantage by using a logger via logging.getLogger(name)
, compared to a stored logger object.
Furthermore, in the current implementation, task login can only begin when the creation of a task object is completed (i.e. the task has a valid _logger
member). This prevents logging in the creation phase. Also, task initialisation is complicated by the fact that the (sub)classes determine the logger name. A consistent naming scheme for loggers should be defined, such that the Task
base class can implement logging.
Similar to #8, just for when
instead for loop
. Consider
base.echo:
msg: "Foo. Bar!"
when: "{{foo}}"
which will result in an exception that is not handled.
Pytest reports a warning that originates in the attrdict
code:
DeprecationWarning: Using or importing the ABCs from 'collections' instead \
of from 'collections.abc' is deprecated since Python 3.3, \
and in 3.9 it will stop working
The problem is, of course, that attrdict
has an unclear maintenance status. So maybe time to get rid of it. The problem is that is has only been introduced recently in affda81. However, it's use has been limited anyway.
The base.command
info log is not completely correct. Example:
base.command:
name: ls
> se ls.yml
2021-09-10 15:55:50 INFO [se.cli] Logging configured and started
2021-09-10 15:55:50 INFO [se.task:command <e9838ba3f9>] ls args=None cwd=None
2021-09-10 15:55:50 INFO [se.task:command <e9838ba3f9>] Command "{command}" output follows below:
[... ls output folows ...]
The "{command}"
part in the output above should of course be replaced by the command's name.
Consider:
- when: <some condition>
echo: {msg: Hi!}
The echo
task does not log anything, not even in debug mode, when the condition evaluates false. SE should log (in debug mode) that the task is not executed due to the when
clause.
I think that it would be an interesting feature to add an option to use dictionaries for loops. I think that the current implementation only allows to use lists (mapping their elements by default to an item argument), and I propose to implement loops on dictionaries (mapping the keys and values contained in the dictionary).
The find
task should be able to limit the recursive search to a maximum depth, as in
- find:
path: .
depth: 3
- echo:
msg: "Found file: {{item}}"
loop: "{{result}}"
which should print all files down to three subdirectory levels from the current directory.
Check if setup.py
entry_points
can provide a better mechanism to provide the task sets from packages.
ScriptEngine tasks should raise some sort of ScriptEngineTaskError
if needed. When that's the case, the ScriptEngine instance should catch the exception stop execution (without traceback).
It's not necessary to use base.command
to run rm
, use base.remove
instead, which would be more convincing.
This:
- config:
no: foo
- echo:
msg: "{{no}}"
yields:
TypeError: keyword arguments must be strings
because no
is considered False
in the YAML context.
These filters are often useful.
The solution to #47 uses copy.deepcopy
under the hood in a few places. However, this is a problem when Python version 3.6 is used (3.7, 3.8, 3.9 are tested and not affected).
In particular, context["se"]["instance"]
, which holds a reference to the actual ScriptEngine instance, poses some kind of recursive reference when it is deepcopied in the run()
method of the same SE instance. This leads in Python 3.6 to an error like
Traceback (most recent call last):
[... part in ScriptEngine that calls copy.deepcopy(context)...]
File "/usr/lib64/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib64/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
[...long traceback in the copy library...]
File "/usr/lib64/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib64/python3.6/copy.py", line 169, in deepcopy
rv = reductor(4)
TypeError: can't pickle _thread.RLock objects
Move bin/se
to scriptengine/cli/
and use entry_points['console_scripts']
in setup.py
to provide the executable script.
The name context
would be somewhat more consistent, config
refers more the the usage of the task for model configuration.
For backwards compatibility, the config
name should be kept for a transitional phase.
Just a suggestion. I think that it could be useful to add an option to the template rendering task to request that the destination file has execution permissions enabled.
I think that the same result could be achieved with a combination of a template + command task, to call "chmod +x" after the template has been rendered, but I think that the readability of the scriptengine code will be enhanced if both operations are done in the same task.
logging.exception()
?)scriptengine.logging
; try to have functions without side effects--debug
and --quiet
command line options are really needed; maybe --log=<level>
is better?entry_points
as names for task logging (instead of class names)Two extensions are proposed to handle environment variables, beyond what is possible at the moment with base.getenv
:
The get and set tasks should be "symmetric".
While addressing #41 and #42 some logging issues have been introduced, namely
stderr
is completely swallowed if the task completes without error, andstdout
is delayed until the task is completed.A better implementation of base.command
logging is needed. It should
stdout
and stderr
with only reasonable delaystdout
and stderr
stdout
and/or stderr
stdout
and/or stderr
in the contextAssuming
base.copy:
src: foo
dst: bar
and assuming further that foo
does not exist, the base.copy
task falsely assumes that foo
is a directory. It will also raise an unhandled FileNotFoundError
(appropriate error, but for the wrong reason).
The base.command
task has the stdout
argument that is either None
(stdout of the command is ignored) or a name for the context (stdout is stored in the context under this name).
This could be improved by having:
stdout
is None
or False
: standard output of the command is ignoredstdout
is True
: standard output is printed in the SE logsstdout
is a valid context name: standard output is stored in the contextNote that the improvement is twofold:
stdout
to True
Implement a command line option that prints the SE version, i.e.
> se --version
0.3.2
Add link to documentation and really short installation description for PyPI install, i.e. pip install scriptengine
When base.command
encounters an error in the command that is executed, return code and stderr should be logged at the error level and the proper ScriptEngineTaskRunError
should be raised. The error message (from stderr) should only be logged once, not propagated upward via the exception.
The current scheme of adding required parameters (which should be called arguments) to the constructor of the base Task
class, like:
def __init__(self, logger_name, parameters=None, required_parameters=None):
# ...
has some drawbacks:
__init__()
, just to call the parent's __init__
with the added required arguments. No real code is added.__init__
function, it is difficult to decide in the base Task
class, which subclass actually defined the required arguments.So,
Task
, as class attributes__init__
function, if neededWhen task arguments are dictionaries, they are not parsed by Task.getarg()
as they should. This is different from nested lists. Example:
- context:
world: earth
- command:
name: echo
args:
- Hello:
planet: "{{world}}"
yields:
{'Hello': {'planet': '{{world}}'}}
i.e. {{world}}
is not parsed.
The following example task:
base.echo:
foo: bar
will correctly raise a ScriptEngineTaskArgumentMissingError
, but another exception (KeyError
)) is raised during logging:
Consider
- echo:
msg: "{{item}}"
loop: "{{foo}}"
where foo
is either undefined or an empty list. The loop body should not be executed.
Consider also the corner case
- echo:
msg: "{{item}}"
loop:
i.e. the loop spec is YAML-null.
Implement a --no-color
option for the SE command line that switches off output colouring (i.e. escape sequences).
The SE instance should log (to INFO) that a ScriptEngineStopException
has been received and therefore SE is stopping. Nothing more.
The reason why the stop exception was raised should be logged (to INFO) by the task that raises the exception. E.g. base.exit
and hpc.slurm.sbatch
.
The base.include
task does not return a context update, thus any context changes from include tasks are lost.
Since f9208c9 the command task doesn't run in Python older than 3.7:
base.command:
name: ls
> se ls.yml
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.8.2
2021-09-11 14:36:21 INFO [se.cli] Logging configured and started
2021-09-11 14:36:21 INFO [se.task:command <86b553858d>] ls args=None cwd=None
Traceback (most recent call last):
File "[...]/bin/se", line 8, in <module>
sys.exit(main())
File "[...]/lib64/python3.6/site-packages/scriptengine/cli/se.py", line 170, in main
context['se']['instance'].run(script, context)
File "[...]/lib64/python3.6/site-packages/scriptengine/scripts/simple_script_engine.py", line 40, in run
self._guarded_run(script_item, context)
File "[...]/lib64/python3.6/site-packages/scriptengine/scripts/simple_script_engine.py", line 22, in _guarded_run
item.run(context)
File "[...]/lib64/python3.6/site-packages/scriptengine/tasks/core/timing.py", line 56, in wrap_timed
return func(self, context)
File "[...]/lib64/python3.6/site-packages/scriptengine/tasks/base/command.py", line 52, in run
check=True,
File "/usr/lib64/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
TypeError: __init__() got an unexpected keyword argument 'capture_output'
Like:
- time:
set: <context_var>
or
- time:
since: <datetime>
set: <context_var>
Since task sets are no longer requested at the command line (previous -t
or --task-set
switch), there is no feedback as to which tasks are loaded and from where.
The suggestion is to have se --help
list all loaded tasks and the modules they are loaded from.
The package sphinx-rtd-theme
is needed to build the documentation. This should be added as development requirement in setup.py
.
The base.include
task crashes (at least under certain circumstances?) when the ignore_not_found
argument is not present. Here is the relevant part of the traceback:
2020-12-21 16:59:58 ERROR [se.task.include <ee3ae07768>] Missing task argument: ignore_not_found
Traceback (most recent call last):
File "[...]/scriptengine/scriptengine/tasks/base/task.py", line 129, in getarg
arg = getattr(self, name)
AttributeError: 'Include' object has no attribute 'ignore_not_found'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
[...]
File "[...]/scriptengine/scriptengine/tasks/base/include.py", line 43, in run
if self.getarg('ignore_not_found', False):
File "[...]/scriptengine/scriptengine/tasks/base/task.py", line 134, in getarg
raise ScriptEngineTaskArgumentMissingError(
scriptengine.exceptions.ScriptEngineTaskArgumentMissingError: Missing task argument: ignore_not_found
ScriptEngine and ScriptEngine tasks should throw an own type of exceptions. This would help with debugging, in order to easily distinguish between errors in the ScriptEngine code and otherwise. It would also allow the ScriptEngine instances to handle task exceptions in a more consistent way, e.g. allow continuation of script execution despite task failures (globally or controllable from the script). Furthermore, it would help task developers by giving clearer guidance about which exceptions to throw.
SE tasks are presently part of the SE code under scriptengine/tasks
. However, to make SE truly extensible, it should be possible to make tasks available as separate packages. Thus, the existing tasks must be disentangled from SE core code and a mechanism to install SE task packages implemented.
Now that SE has been in use for a while, it could be time to review the naming of task arguments as used in the YAML specs. Since the SE tasks have been externalised (#11), this involves (just) the base tasks (scriptengine.tasks.base
).
Beside being consistent, the names should also be concise.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.