flaxandteal / dewret Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 969 KB

DEclarative Workflow REndering Tool

License: Apache License 2.0

Python 100.00%

dewret's People

Contributors

Stargazers

Watchers

dewret's Issues

Documentation improvements [1-39,44,46,50]

What is achievable for capturing imperative logic within a nested_task? [1-47]

Error expected if argument list is incomplete [1-25]

For non-defined but required inputs the rendering did not throw an error.
Consider the example

import sys
import yaml
from dewret.tasks import task, run
from dewret.renderers.cwl import render

@task()
def increment(num: int) -> int:
    return num + 1

result = increment()
workflow = run(result)
cwl = render(workflow)
yaml.dump(cwl, sys.stdout, indent=2)

but instead create (to my understanding incorrect) an output:

class: Workflow
cwlVersion: 1.2
inputs: {}
outputs:
  out:
    label: out
    outputSource: increment-ecfc9b657ab3a0f2bc96ee9dce5e98e3/out
    type: int
steps:
  increment-ecfc9b657ab3a0f2bc96ee9dce5e98e3:
    in: {}
    out:
    - out
    run: increment

Sympy objects in nested_tasks

Currently arbitrary python objects cannot be referenced within nested_tasks, since the logic of the nested task gets evaluated and embedded in the workflow. However, it would be desirable to loosen this restriction somewhat. One idea is to make use of sympy, such that the object to be referenced in the nested task can be expressed in sympy, then dewret can represent represent it in the workflow.

Yes - good point. I'm tempted to write that once I have the sympy functionality working, so we can say "if you can write it with sympy it's gtg" or something like that. (although agree if that's not right now, then it should be noted regardless)

Originally posted by @philtweir in #15 (comment)

Align on treatment of global variables (feedback 1-20)

[id: 1-20], [CONCEPT], [FUND] The current approach to define workflow inputs is to use global variables
as shown in the Parameters help section, right? Personally, I was a bit surprised by this approach. Before
that I had played around with dewret and the way I intuitively expected to define workflow input was to
define global variables and pass them as arguments to tasks, e.g.

import sys
import yaml
from dewret.tasks import task, run
from dewret.renderers.cwl import render
some_number = 3
@task()
def increment(num: int) -> int:
  return num + 1
result = increment(num=some_number)
workflow = run(result)
cwl = render(workflow)
yaml.dump(cwl, sys.stdout, indent=2)

Which obviously does not work, i.e. the produced output reads

class: Workflow
cwlVersion: 1.2
inputs: {}
outputs:
out:
label: outoutputSource: increment-012ef3b3ffb9d15c3f2837aa4bb20a8d/out
type: int
steps:
increment-012ef3b3ffb9d15c3f2837aa4bb20a8d:
in:
num:
default: 3
out:
- out
run: increment

Hence, it seems that currently the only intended way of defining workflow input is using global
variables which sneak into a function (as context). I know this is very subjective (so, please no offence!)
but I strongly reject the concept of context aware functions because they are violating the
encapsulation and the very idea behind a function, i.e. that a function should be a stateless and entirely
encapsulated entity whose result should only depend on the explicit input.

Clarify feedback issues having to do with CWL format re. input 1-16, 1-17

[id: 1-16], [CONCEPT], [MINOR]: Following the basic examples (like in the Quickstart/Usage guide), all
function input parameters (here num) which are not provided by other tasks are automatically defined
as step inputs in the CWL result rather than workflow inputs. In other words they are automatically
considered as step configuration. Am I right? If so, an explanation of this concept in the docs would be
nice.
[id: 1-17], [CONCEPT], [MAJOR], This more a cwl-question: Why are given input values (for steps) are
always defined as default (i.e. default: )? Is the cwl idea that a configuration always needs
to have a default, i.e. is there no possibility to define required values? And if so, does this match the
Ansatz philosophy (not 100% about this right now).

Add glossary of dewret concepts, for example that clarify concepts of `Task` and `Step`. (feedback 1-13)

Publish conda package to feed (prefix.dev is likely okay). (feedback 1-2)

Revisit CWL output of global variables (feedback 1-21)

Look into PyCharm-specific static type issues related to 1-19, limit scope to mypy.

Address 1-22 re. globals->input in documentation

id: 1-22], [FUND]: In addition, (sorry for repeating myself) referring again to my previous concern about
context-aware-functions, in the cwl-yaml, the dictionary of in of rotate-1 suggest (to me) that the
task rotate-1 has two input parameters, which it obviously doesn't.

Better error messages [1-40,41]

id: 1-39], [MAJOR]: Executing the example fails with the error message

File "C:\Users\...dewret\src\dewret\workflow.py", line 322,
in add_step
raise TypeError(f"All tasks should have a type annotation.")
TypeError: All tasks should have a type annotation.
That's because increment misses a return type.

[id: 1-40], [MAJOR]: The initial error stack does not contain the line of the task causing the error
(at least I didn't see it). Instead, it guides to a (from the user's perspective) non-existing line ("line
322"). For the given example, this is not a big issue/easy to spot, of course. But in general this is
an issue.

[id: 1-41], [MODERATE], [ENH]: The initial error message is fine, however, maybe a more precise
hint and/or more information would be nice. For example/some proposals:

The error message could contain the task name (and module), i.e. for example TypeError: Task 'increment' misses complete type annotation.
The error message could be more precise with respect what type annotation misses, i.e. for
example: TypeError: Task 'increment' misses type annotations for: 'return type', parameter 'foo', 'parameter bar'.
The error message could (in addition) give a proper example:

`TypeError: Task 'increment' has no or incomplete type annotations.
Example for a valid task:
@task()
def add_one(num: int) -> int:
    return num + 1

Follow up / review necessity of `attrs`, as highlighted in [1-23]

Feedback:

what I really don't like about the
current approach is that the interface is tightly coupled to a third party library and I wonder if this can
be relaxed using builtin Python functions/modules (typeddicts, namedtuples, dataclasses, etc.) and then
provide support for attrs, dataclasses etc. on top of that? I.e. could one define a more generic
interface?

flaxandteal / dewret Goto Github PK

dewret's People

Contributors

Stargazers

Watchers

dewret's Issues

Recommend Projects

Recommend Topics

Recommend Org