flaxandteal / dewret Goto Github PK
View Code? Open in Web Editor NEWDEclarative Workflow REndering Tool
License: Apache License 2.0
DEclarative Workflow REndering Tool
License: Apache License 2.0
For non-defined but required inputs the rendering did not throw an error.
Consider the example
import sys
import yaml
from dewret.tasks import task, run
from dewret.renderers.cwl import render
@task()
def increment(num: int) -> int:
return num + 1
result = increment()
workflow = run(result)
cwl = render(workflow)
yaml.dump(cwl, sys.stdout, indent=2)
but instead create (to my understanding incorrect) an output:
class: Workflow
cwlVersion: 1.2
inputs: {}
outputs:
out:
label: out
outputSource: increment-ecfc9b657ab3a0f2bc96ee9dce5e98e3/out
type: int
steps:
increment-ecfc9b657ab3a0f2bc96ee9dce5e98e3:
in: {}
out:
- out
run: increment
Currently arbitrary python objects cannot be referenced within nested_tasks
, since the logic of the nested task gets evaluated and embedded in the workflow. However, it would be desirable to loosen this restriction somewhat. One idea is to make use of sympy
, such that the object to be referenced in the nested task can be expressed in sympy
, then dewret
can represent represent it in the workflow.
Yes - good point. I'm tempted to write that once I have the sympy functionality working, so we can say "if you can write it with sympy it's gtg" or something like that. (although agree if that's not right now, then it should be noted regardless)
Originally posted by @philtweir in #15 (comment)
[id: 1-20], [CONCEPT], [FUND] The current approach to define workflow inputs is to use global variables
as shown in the Parameters help section, right? Personally, I was a bit surprised by this approach. Before
that I had played around with dewret and the way I intuitively expected to define workflow input was to
define global variables and pass them as arguments to tasks, e.g.
import sys
import yaml
from dewret.tasks import task, run
from dewret.renderers.cwl import render
some_number = 3
@task()
def increment(num: int) -> int:
return num + 1
result = increment(num=some_number)
workflow = run(result)
cwl = render(workflow)
yaml.dump(cwl, sys.stdout, indent=2)
Which obviously does not work, i.e. the produced output reads
class: Workflow
cwlVersion: 1.2
inputs: {}
outputs:
out:
label: outoutputSource: increment-012ef3b3ffb9d15c3f2837aa4bb20a8d/out
type: int
steps:
increment-012ef3b3ffb9d15c3f2837aa4bb20a8d:
in:
num:
default: 3
out:
- out
run: increment
Hence, it seems that currently the only intended way of defining workflow input is using global
variables which sneak into a function (as context). I know this is very subjective (so, please no offence!)
but I strongly reject the concept of context aware functions because they are violating the
encapsulation and the very idea behind a function, i.e. that a function should be a stateless and entirely
encapsulated entity whose result should only depend on the explicit input.
[id: 1-16], [CONCEPT], [MINOR]: Following the basic examples (like in the Quickstart/Usage guide), all
function input parameters (here num) which are not provided by other tasks are automatically defined
as step inputs in the CWL result rather than workflow inputs. In other words they are automatically
considered as step configuration. Am I right? If so, an explanation of this concept in the docs would be
nice.
[id: 1-17], [CONCEPT], [MAJOR], This more a cwl-question: Why are given input values (for steps) are
always defined as default (i.e. default: )? Is the cwl idea that a configuration always needs
to have a default, i.e. is there no possibility to define required values? And if so, does this match the
Ansatz philosophy (not 100% about this right now).
id: 1-22], [FUND]: In addition, (sorry for repeating myself) referring again to my previous concern about
context-aware-functions, in the cwl-yaml, the dictionary of in of rotate-1 suggest (to me) that the
task rotate-1 has two input parameters, which it obviously doesn't.
id: 1-39], [MAJOR]: Executing the example fails with the error message
File "C:\Users\...dewret\src\dewret\workflow.py", line 322,
in add_step
raise TypeError(f"All tasks should have a type annotation.")
TypeError: All tasks should have a type annotation.
That's because increment misses a return type.
[id: 1-40], [MAJOR]: The initial error stack does not contain the line of the task causing the error
(at least I didn't see it). Instead, it guides to a (from the user's perspective) non-existing line ("line
322"). For the given example, this is not a big issue/easy to spot, of course. But in general this is
an issue.
[id: 1-41], [MODERATE], [ENH]: The initial error message is fine, however, maybe a more precise
hint and/or more information would be nice. For example/some proposals:
TypeError: Task 'increment' misses complete type annotation.
TypeError: Task 'increment' misses type annotations for: 'return type', parameter 'foo', 'parameter bar'.
`TypeError: Task 'increment' has no or incomplete type annotations.
Example for a valid task:
@task()
def add_one(num: int) -> int:
return num + 1
Feedback:
what I really don't like about the
current approach is that the interface is tightly coupled to a third party library and I wonder if this can
be relaxed using builtin Python functions/modules (typeddicts, namedtuples, dataclasses, etc.) and then
provide support for attrs, dataclasses etc. on top of that? I.e. could one define a more generic
interface?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.