Giter Site home page Giter Site logo

pyungo's Introduction

pyungo

pypi python-version build-status license

pyungo is a lightweight library to link a set of dependent functions together, and execute them in an ordered manner.

pyungo is built around Graphs and Nodes used in a DAG (Directed Acyclic Graph). A Node represent a function being run with a defined set of inputs and returning one or several outputs. A Graph is a collection of Nodes where data can flow in an logical manner, the output of one node serving as input of another.

installation

>> pip install pyungo

simple example

graph = Graph()

@graph.register()
def f_my_function_2(d, a):
    e = d - a
    return e

@graph.register()
def f_my_function_1(c):
    d = c / 10
    return d

@graph.register()
def f_my_function_3(a, b):
    c = a + b
    return c

res = graph.calculate(data={'a': 2, 'b': 3})
print(res)

pyungo is registering the functions at import time. It then resolve the DAG and figure out the sequence at which the functions have to be run per their inputs / outputs. In this case, it will be function 3 then 1 and finally 2.

The ordered Graph is run with calculate, with the given data. It returns the output of the last function being run (e), but all intermediate results are also available in the graph instance.

The result will be (a + b) / 10 - a = -1.5

pyungo's People

Contributors

cedricleroy avatar nelsontodd avatar tosa95 avatar veronicaguo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pyungo's Issues

RFE: allow for optional kwargs fro input data

  • Python's kwargs are optional (since defaults are given in the function declaration).
  • This library's kwargs feature is not default - it has to exist in the input-data or an error is raised.

These two facts cause a mismatch when converting traditional code into a graph-pipeline.

If it is feasible, it would really help to add another optional keword in the Graph.add_node().

Steps to reproduce

The following code:

graph = pyungo.Graph()

@graph.register(inputs=['a'], kwargs=['b'], outputs=['c'])
def f(a, b=2):
    return a + b

graph.calculate({'a': 1})

... raises PyungoError: The following inputs are needed: ['b']
while the function is fully capable of working without b.

Proposal

This should work:

@graph.register(inputs=['a'], optional=['b'], outputs=['c'])
def f(a, b=2):
    return a + b

graph.calculate({'a': 1})

and produce 3.

Why does __init__.py not import anything?

Hi. I understand this is a style choice, but why use an empty __init__.py instead of filling it with from .core import *?
If you do it the second way, you can import with from pyungo import Graph instead of from pyungo.core import Graph, which is nice because I had no idea core.py existed inside of this package until I looked. Thank you!

Issue with single output being an array

Reference: https://github.com/cedricleroy/pyungo/blob/master/pyungo/core.py#L146-L152

I ran into an issue for a specific type of node. The node returns 1 output that is a list of multiple datetime objects (like a timestamps vector). Because of the lines referenced above, the graph only saves the 1st item of that returned list, because it thinks that multiple outputs will be returned (since iter(res) doesn't fail), but there is only one output_name in the node (like "timestamps" for instance).
Essentially, the for loop goes through the timestamps list, and returns the first element of that list as data to be saved...

Create a node without decorator

Add a method in Graph to register a new node without using a decorator:

graph.add_node(inputs=['a', 'b'], outputs=['c'], function=f_my_function)

RFE: allow to have nodes producing the same output

There is a use-case for having multiple nodes producing the same output.
And only decide on calculation-time which path to use.

Example: convert units, and have multiple input-units convert to the same output.

It would be even useful to have a flag set on calculation-time whether to raise if dupe outputs detected, or just issue a warning and chose an arbitrary node, in cases where there is duplication in the inputs, and all paths produce the same result. Further doen the road, the flag could become a tri-state, to calculate all paths and compare results and raise if different only.

Memory usage

Hi,
I'm not sure if this is a bug or a feature request.

I have a workflow that is very memory intensive but also very well suited to decomposition to a DAG.
My problem is that it if I keep any intermediate outputs in memory I will quickly exceed the capacity of one computer to hold the data in RAM.
I had hoped pyungo would be clever enough to allow intermediate states to be garbage collected, but it doesn't seem so.

See a sample program:
`
from pyungo import Graph
import numpy as np
import gc

@Profile
def main():
graph = Graph()

@graph.register()
def calc_a():
    a = np.random.rand(8192,8192)
    return a

@graph.register()
def calc_b():
    b = np.random.rand(8192,8192)
    return b

@graph.register()
def calc_c(a,b):
    gc.collect()
    c = a * b 
    print("c")
    return c

@graph.register()
def calc_d():
    gc.collect()
    d = np.random.rand(8192,8192)
    print("d")
    return d

@graph.register()
def calc_pfd(c,d):
    gc.collect()
    e = c * d
    return e

gc.collect()
res = graph.calculate(data={})
gc.collect()
print(res)
del res
gc.collect()
del graph
gc.collect()

main()
`

Output:
`
(venv) zenbook% python -m memory_profiler memtest.py
INFO:root:Starting calculation...
INFO:root:Ran Node(08f958eb-84ff-49ad-a2fb-a2ada5788705, <calc_a>, [], ['a']) in 0:00:02.127759
d
INFO:root:Ran Node(9cd9ce4e-16d7-4a43-83b7-a8e01e8bd8ba, <calc_d>, [], ['d']) in 0:00:01.618884
INFO:root:Ran Node(ea8b8c8f-d8fc-4967-a7c5-c3ba0dbcd550, <calc_b>, [], ['b']) in 0:00:01.519026
c
INFO:root:Ran Node(ac6d7004-7cb1-41ff-8476-a2a1ce9e64d6, <calc_c>, ['a', 'b'], ['c']) in 0:00:01.029356
INFO:root:Ran Node(7786d30e-7796-4d19-a692-07f2904ea6c8, <calc_pfd>, ['c', 'd'], ['e']) in 0:00:00.853072
INFO:root:Calculation finished in 0:00:07.152394
[[0.32979496 0.00617538 0.01675385 ... 0.08284045 0.03303956 0.09351132]
[0.00268712 0.20226707 0.06033366 ... 0.07918911 0.01333745 0.15655172]
[0.0007408 0.01337496 0.17597583 ... 0.19520472 0.0274126 0.07911974]
...
[0.00958562 0.00919059 0.10846052 ... 0.01235475 0.02207799 0.26674223]
[0.06822633 0.03539608 0.08139489 ... 0.08097827 0.10901089 0.02113664]
[0.01915152 0.00518849 0.34347554 ... 0.04939359 0.48837681 0.11771939]]
Filename: memtest.py

Line # Mem usage Increment Line Contents

 5   29.688 MiB   29.688 MiB   @profile
 6                             def main():
 7   29.688 MiB    0.000 MiB       graph = Graph()
 8                             
 9   29.691 MiB    0.000 MiB       @graph.register()
10   29.691 MiB    0.004 MiB       def calc_a():
11  541.562 MiB  511.871 MiB           a = np.random.rand(8192,8192)
12  541.562 MiB    0.000 MiB           return a
13                             
14 1053.578 MiB    0.000 MiB       @graph.register()
15   29.691 MiB    0.000 MiB       def calc_b():
16 1565.590 MiB  512.012 MiB           b = np.random.rand(8192,8192)
17 1565.590 MiB    0.000 MiB           return b
18                             
19 1565.590 MiB    0.000 MiB       @graph.register()
20   29.691 MiB    0.000 MiB       def calc_c(a,b):
21 1565.590 MiB    0.000 MiB           gc.collect()
22 2077.605 MiB  512.016 MiB           c = a * b 
23 2077.605 MiB    0.000 MiB           print("c")
24 2077.605 MiB    0.000 MiB           return c
25                             
26  541.562 MiB    0.000 MiB       @graph.register()
27   29.691 MiB    0.000 MiB       def calc_d():
28  541.562 MiB    0.000 MiB           gc.collect()
29 1053.578 MiB  512.016 MiB           d = np.random.rand(8192,8192)
30 1053.578 MiB    0.000 MiB           print("d")
31 1053.578 MiB    0.000 MiB           return d
32                             
33 2077.605 MiB    0.000 MiB       @graph.register()
34   29.691 MiB    0.000 MiB       def calc_pfd(c,d):
35 2077.605 MiB    0.000 MiB           gc.collect()
36 2589.621 MiB  512.016 MiB           e = c * d
37 2589.621 MiB    0.000 MiB           return e
38                             
39   29.691 MiB    0.000 MiB       gc.collect()
40 2589.621 MiB    0.000 MiB       res = graph.calculate(data={})
41 2589.621 MiB    0.000 MiB       gc.collect()
42 2589.621 MiB    0.000 MiB       print(res)
43 2589.621 MiB    0.000 MiB       del res
44 2589.621 MiB    0.000 MiB       gc.collect()
45   29.730 MiB    0.000 MiB       del graph
46   29.730 MiB    0.000 MiB       gc.collect()

`

After calc_c has run, a and b should be able to be garbage collected, but it seems a reference is held by graph to every output.

RFE: auto-populate graph from function name & args

I propose to auto-populate Graph.add_node() calls based on function & argument-names (using python's inspect standard-library's module), and allow some form of string-filtering on the function/args names.
Knowingly this would work for (singular) outputs only.

The proposal is easier to explain with sample client code:

def funcname_chopper(funcname):
    for prefix in ['calc_', 'compute_', 'make_']:
        if funcname.startswith(prefix):
            return  prefix[len(prefix):]

graph = pyungo.Graph(
    outname_converter=funcname_chopper)

# equivalent to: register(inputs=['a', 'b'], outputs=['c']
@graph.register
def make_c(a, b):
    return a+b


# equivalent to: register(inputs=['a', 'b', 'c'], outputs=['make_d']
@graph.register(inpname_converter=lambda n: n[5:], outname_converter=None)
def calc_d(some_a, look_b, stop_c):
    return a+b

RFE: support sub-graphs

It would be nice to add e method like:

bigger_graph = Graph.add_subgraph(some_graph)

and port all nodes from some_graph into bigger_graph.

Is schema enforced on outputs and internal data-nodes?

Adapting the quickstart example:

schema = {
    "type": "object",
    "properties": {
        "a": {"type": "number"},
        "b": {"type": "number"}
    }
}

graph = Graph(schema=schema)

@graph.register(inputs=['a'], outputs=['b'])
def f1(a):
    return "Hey!"

@graph.register(inputs=['b'], outputs=['c'])
def f2(b):
    return 2 * b

graph.calculate(data={'a': 1})

I was expecting an error, but got the pipeline went through and got the result Hey@!Hey!.
Am i doing something wrong?
This feature is particularly important for data on the internal nodes, because it is not as easy to test them as inputs/outputs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.