Giter Site home page Giter Site logo

pytato's Introduction

Pytato: Get Descriptions of Array Computations via Lazy Evaluation

Gitlab Build Status

Github Build Status

Python Package Index Release Page

Imagine TensorFlow, but aimed at HPC. Produces a data flow graph, where the edges carry arrays and the nodes are (give or take) static-control programs that compute array outputs from inputs, possibly (but not necessarily) expressed in Loopy. A core assumption is that the graph represents a computation that's being repeated often enough that it is worthwhile to do expensive processing on it (code generation, fusion, OpenCL compilation, etc).

  • Documentation (read how things work, see an example)
  • Github (get latest source code, file bugs)

Pytato is licensed to you under the MIT/X Consortium license. See the documentation for further details.

Numpy compatibility

Pytato is written to pose no particular restrictions on the version of numpy used for execution. To use mypy-based type checking on Pytato itself or packages using Pytato, numpy 1.20 or newer is required, due to the typing-based changes to numpy in that release.

pytato's People

Contributors

a-alveyblanc avatar alexfikl avatar inducer avatar isuruf avatar kaushikcfd avatar majosm avatar matthiasdiener avatar mattwala avatar nchristensen avatar nkoskelo avatar xywei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytato's Issues

Rename `IndexLambda`?

I remember someone commenting that IndexLambda was a bad name. This is to discuss whether we've got better suggestions. Some opening bids: ImplicitArray? ExpressionArray? IndexExprArray?

cc @matthiasdiener @kaushikcfd

Pooled allocation for temporaries generated in codegen

Consider the following kernel:

knl = lp.make_kernel(
    "{[i]: 0<=i<N}",
    """
    y[i] = 2*x[i]
    """, name="twice")

x = pt.make_placeholder(name="in", dtype=np.float64, shape=1000000)
out = {f"out{i}": call_loopy(knl, {"x": i*x})["y"]
       for i in range(10)}
pt_prg = pt.generate_loopy(out).program

The generated code in pt_prg will contain 20 temporaries each containing 1 million doubles. Ideally we should have allocated a single temporary of 1 million doubles (assuming we don't inline at the spot.).

We could have such an implementation in pytato via some heuristic pooled allocator. Allocation here means reserving a chunk from the base_storage of a reserved temporary.

The 160-million byte question being: should we do it at the pytato level or is the downstream user expected to do this on their own via some loopy transformations? However, we must take into consideration that the def-use chain is naturally available in pytato's code-generator which would be tougher to extract by the downstream transformers.

I am in favor of having this logic in pt.CodeGenMapper.

Equality comparison of DAGs can have exponential complexity

Every time expressions are reused, they are re-traversed, since __eq__ has no mechanism/means to memoize that expressions previously compared equal. (And we probably wouldn't want such a cache on the instance. Or maybe we do? Not sure.) I'm thinking we could introduce an EqualityComparisonMapper that would facilitate such memoization. To avoid having the cat biting its tail, it obviously couldn't use the arrays themselves as hash keys (since that would require equality comparison). But id() is available, and we could back that up with a is b as a check. An (IMO manageably) unfortunate consequence of this would be that we'd have to reimplement equality comparison of all types that can contain Array types, notably dict.

@MTCam is hitting this with cache retrievals upon freeze when exercising lazy evaluation for chemistry in https://github.com/illinois-ceesd/mirgecom.

cc @kaushikcfd

pytato.array is too long

It should be broken up into pieces, maybe

  • namespace, "base" array and DictOf
  • tags
  • specific node types
  • numpy-workalike

@kaushikcfd What do you think?

Should math functions handle scalar values?

Minimal reproduer:

In [1]: import numpy as np

In [2]: np.sin(3.0)
Out[2]: 0.1411200080598672

In [3]: import pytato as pt

In [4]: pt.sin(3.0)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-85cfcccd5e8f> in <module>
----> 1 pt.sin(3.0)

/shared/home/kgk2/pack/virtual_envs/emirge/pytato/pytato/array.py in sin(x)
   1756 
   1757 def sin(x: Array) -> IndexLambda:
-> 1758     return _apply_elem_wise_func(x, "sin")
   1759 
   1760 

AttributeError: 'float' object has no attribute 'dtype'

Numpy support scalar values for it's functions. Should we handle it too? I would vote yes, as handling this easy case would lead to better numpy compat.

/cc @inducer

DependencyMapper should not return every node in the graph

The one use case we have:

pytato/pytato/array.py

Lines 239 to 253 in 6e65675

def normalize_shape_component(
s: ShapeComponent) -> ShapeComponent:
if isinstance(s, Array):
from pytato.transform import DependencyMapper
if s.shape != ():
raise ValueError("array valued shapes must be scalars")
for d in (k for k in DependencyMapper()(s)
if isinstance(k, InputArgumentBase)):
if not isinstance(d, SizeParam):
raise NotImplementedError("shape expressions can (for now) only "
"be in terms of SizeParams. Depends on"
f" '{d.name}', a non-SizeParam array.")
# TODO: Check affine-ness of the array expression.

only wants a specific node type. Building a set containing all nodes is expensive overkill. It'd be much better to have a base CombineMapper and then return only the nodes that are actually desired.

The same is true of one upcoming use case:
https://github.com/kaushikcfd/pytato/blob/274c51ec0e1c7f9f68f548b42797011b9befa866/pytato/loopy.py#L251-L268

Statistics module in pytato

Although there is already a stats module in loopy, I'm working towards a vanilla stats module in pytato.

Q. Why not just use loopy.get_op_map?
A. For the kernels I'm looking at generating loopy takes around 1.5 minutes and the stats module in loopy needs to preprocess the translation unit which is yet another 4 minutes.

Please let me know if there are serious drawbacks to this approach.

[loopy-codegen, bug] Code generation for tagged `NamedArray`s is buggy

Tagging a named array and then accessing other results of the container loses the tag information as it might freshly create a "dirty" no-tagged variant, see

pytato/pytato/array.py

Lines 718 to 722 in 1c9ee78

@memoize_method
def __getitem__(self, name: str) -> NamedArray:
if name not in self._data:
raise KeyError(name)
return NamedArray(self, name)


Loopy codegen is broken for tagged instances of NamedArray for the same reason. Here's an MWE:

class FooTag(Tag):
    pass

knl = lp.make_kernel(
    "{[i]: 0<=i<10}",
    """
    y[i] = i
    """)

y = call_loopy(knl, bindings={})["y"]
y = y.tagged(FooTag())
pt.generate_loopy(2 * y)

raises:

  File "/home/kgk2/projects/ceesd/pytato/pytato/target/loopy/codegen.py", line 414, in map_named_array
    assert expr in state.results
AssertionError

Random DAG generation breaks reduction codegen

See #188 (comment).

Enabling

        v = rng.integers(0, 2)

in make_random_dag and then running

pycl test_codegen.py 'test_random_dag_against_numpy(cl._csc)'  

gives

[...]
18
Traceback (most recent call last):
  File "/home/andreas/src/pytato/test/test_codegen.py", line 1290, in <module>
    exec(sys.argv[1])
  File "<string>", line 1, in <module>
  File "/home/andreas/src/pytato/test/test_codegen.py", line 1283, in test_random_dag_against_numpy
    _, pt_result = pt.generate_loopy(dict_named_arys)(cq)
  File "/home/andreas/src/pytato/pytato/target/loopy/__init__.py", line 146, in __call__
    return self.program(queue,
  File "/home/andreas/src/loopy/loopy/translation_unit.py", line 342, in __call__
    return pex(*args, **kwargs)
  File "/home/andreas/src/loopy/loopy/target/pyopencl_execution.py", line 364, in __call__
    translation_unit_info = self.translation_unit_info(entrypoint,
  File "/home/andreas/src/pytools/pytools/__init__.py", line 704, in wrapper
    result = function(obj, *args, **kwargs)
  File "/home/andreas/src/loopy/loopy/target/pyopencl_execution.py", line 282, in translation_unit_info
    program = self.get_typed_and_scheduled_translation_unit(
  File "/home/andreas/src/loopy/loopy/target/execution.py", line 813, in get_typed_and_scheduled_translation_unit
    kernel = self.get_typed_and_scheduled_translation_unit_uncached(entrypoint,
  File "/home/andreas/src/loopy/loopy/target/execution.py", line 790, in get_typed_and_scheduled_translation_unit_uncached
    program = preprocess_program(program)
  File "/home/andreas/src/loopy/loopy/preprocess.py", line 2483, in preprocess_program
    new_subkernel = _preprocess_single_kernel(
  File "/home/andreas/src/loopy/loopy/preprocess.py", line 2386, in _preprocess_single_kernel
    check_reduction_iname_uniqueness(kernel)
  File "/home/andreas/src/loopy/loopy/preprocess.py", line 110, in check_reduction_iname_uniqueness
    raise LoopyError("iname '%s' used in more than one reduction. "
loopy.diagnostic.LoopyError: iname '_pt_sum_r0_3' used in more than one reduction. (2 of them, to be precise.) Since this usage can easily cause loop scheduling problems, this is prohibited by default. Use loopy.make_reduction_inames_unique() to fix this. If you are sure that this is OK, write the reduction as 'simul_reduce(...)' instead of 'reduce(...)'

I.e. it found a bug! :)

cc @kaushikcfd

Avoid recompiling for equivalent kernels

Problem

import pytato as pt

ns = pt.Namespace()
x = pt..make_placeholder(ns, shape=(10, 4), dtype=float, name="x")
y = pt..make_placeholder(ns, shape=(10, 4), dtype=float, name="y")
knl1 = pt.generate_loopy(2*x).program
knl2 = pt.generate_loopy(2*y).program

assert knl1 == knl2  # fails

I.e we recompile kernels which happen to be equivalent functions.

Proposal

Pass some options to generate_loopy so that the generated BoundProgram holds a naming maps so that in both the above cases we generate identical kernels. This option would be by default turned on if we run using python -O.

How to implement Array.__bool__?

Right now, pytato.Array doesn't implement __bool__. That's dangerous, as comparisons (such as those used in e.g. convergence checks) always evaluate to something "truthy":

x: Array
if x < 3:
    # always True

We should definitely implement __bool__ in some fashion.

PyOpenCL will transparently transfer Boolean scalars back to the host:
https://github.com/inducer/pyopencl/blob/1527c099bc6baae6773902b3d190fac6ed691328/pyopencl/array.py#L1451-L1458
I don't know if that's a good idea because it may lead to inadvertent transfers. (So it's safe functionally, but not that safe in terns of performance.) We could automatically evaluate scalars in boolean contexts, and we face a similar trade-off as pyopencl. What should we do?

Reshape on empty arrays

The following case doesn't work in pytato but numpy happily reshapes it to anything.

x = pt.make_placeholder("x", shape=(0,), dtype=np.float64)
assert pt.reshape(x, (128, 0, 17)).shape == (128, 0, 17)

and

x = pt.make_placeholder("x", shape=(), dtype=np.float64)
pt.reshape(x, (1,))

seems to fail at code generation in here.

xref: inducer/arraycontext#91

Shape inference

  • Give Pytato its own shape inference mechanism, perhaps by making the one from Loopy reusable.
  • Bring back test_size_param, see also this discussion
  • See also #14.

DataWrapper and make_data_wrapper take shape arguments: does that make sense?

IIRC the intention was to use these to pass the same shape in symbolic form (if applicable) with the idea that the actual size would constrain (fix) the symbolic parameters. But I'm no longer sure this makes sense. After all, the constraining would happen as soon as the array "meets" a symbolically-shaped counterpart in (e.g.) arithmetic. Thoughts?

@xywei Do you remember if that 's accurate?

cc @kaushikcfd

Representing functionally dependent data that comes from outside lazy eval

When capturing (say) a DG operator in a way that allows use of the same IR against a different mesh, arrays are encountered that have a functional dependency on the mesh, but whose computation happens outside of the lazy-evaluation world. The sizes of these arrays will often also depend on the mesh. Inter-element connectivity is a good example of this. Semantically, they're somewhere in between DataWrapper (which bakes fixed data into the expression graph) and Placeholder (which leaves inputs up for replacement).

We need (ideally unobtrusive) ways to:

  • communicate to the lazy-eval infrastructure that this is data that is expected to change, and how (e.g. by introducing new shape parameters that show which bits of the shape can change)
  • let the infrastructure ask the user to provide the correct values for those arrays (possibly via a callback) and infer associated shape parameters
  • make a way to uniquely identify these inputs so that the infrastructure can recognize their repeated introduction. Otherwise, subgraphs depending on them may look spuriously different and require repeated compilation. Forcing some sort of global naming scheme might solve this last point, but it's a bit obnoxious, and I would prefer to avoid it.

Parametric shapes are on hold until we can come up with a design that addresses these challenges.

cc @kaushikcfd

Translation to loopy is slow

Consider this small application where we take 500 numpy arrays and 500 pytato arrays and double both of them. The reported timing on my machine is:

Numpy took: 0.000705718994140625 secs.
Pytato took: 146.95382499694824 secs.

Communication nodes in pytato

From the discussion with @inducer (see also #54):

  • Should we have separate send and receive nodes?
  • Joined communication buffers?
  • Is there (functional) overlap with other types of operations (e.g., visualizing or checkpointing)?

Mypy fail on Gitlab

On e82b132 (aka current main)

pytato/array.py:281: error: Call to untyped function "result_type" in typed context
pytato/array.py:1123: error: Call to untyped function "result_type" in typed context
pytato/array.py:1157: error: Call to untyped function "result_type" in typed context
pytato/array.py:1197: error: Call to untyped function "result_type" in typed context
pytato/array.py:2085: error: Argument "dtype" to "IndexLambda" has incompatible type "Type[bool_]"; expected "dtype[Any]"
Found 5 errors in 1 file (checked 13 source files)

Any clues?

https://gitlab.tiker.net/inducer/pytato/-/jobs/285578

cc @kaushikcfd @matthiasdiener

`Array` nodes should have a `__repr__`

(Possibly also a __str__?)

Not having a meaningful print syntax makes debugging code kind of a pain. We'll have to figure out how much of the DAG to actually print. We probably want a StringifyMapper (or some such) that's somewhat customizable, to use as the basis of this functionality.

cc @kaushikcfd

Downstream test don't pick up the correct branch during a PR action

In

downstream_tests:
strategy:
matrix:
downstream_project: [meshmode, mirgecom, arraycontext]
name: Tests for downstream project ${{ matrix.downstream_project }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: "Main Script"
env:
DOWNSTREAM_PROJECT: ${{ matrix.downstream_project }}
run: |
if test "$DOWNSTREAM_PROJECT" = "mirgecom"; then
git clone "https://github.com/illinois-ceesd/$DOWNSTREAM_PROJECT.git"
else
git clone "https://github.com/inducer/$DOWNSTREAM_PROJECT.git"
fi
cd "$DOWNSTREAM_PROJECT"
echo "*** $DOWNSTREAM_PROJECT version: $(git rev-parse --short HEAD)"
# Avoid slow or complicated tests in downstream projects
export PYTEST_ADDOPTS="-k 'not (slowtest or octave or mpi)'"
if test "$DOWNSTREAM_PROJECT" = "mirgecom"; then
# can't turn off MPI in mirgecom
sudo apt-get update
sudo apt-get install openmpi-bin libopenmpi-dev
export CONDA_ENVIRONMENT=conda-env.yml
export CISUPPORT_PARALLEL_PYTEST=no
else
sed -i "/mpi4py/ d" requirements.txt
fi
curl -L -O https://tiker.net/ci-support-v0
. ./ci-support-v0
build_py_project_in_conda_env
test_py_project
if [[ "$DOWNSTREAM_PROJECT" = "meshmode" ]]; then
python ../examples/simple-dg.py --lazy
fi
I think it misses the logic to tell the downstream package's requirements.txt to pickup the correct branch.

/cc @matthiasdiener

Should "overriding" bound arguments be allowed?

On master the the data of "x" is overridden. Was this intentional?

ns = pt.Namespace()
x_in = np.ones(10)
x = pt.make_data_wrapper(ns, x_in, name="x")
prg = pt.generate_loopy(42*x, pt.PyOpenCLTarget(queue))
evt, (out1,) = prg()
evt, (out2,) = prg(x=np.random.rand(10))

Teach Einsums Broadcasting

 a = np.random.rand(3,3)
 b = np.random.rand(3,1)
 np.einsum("ij,ij->ij", a, b)  # works by broadcasting the 1-long axis in 'b'

 a_in = pt.make_data_wrapper(a)
 b_in = pt.make_data_wrapper(b)
 pt.einsum("ij,ij->ij", a_in, b_in)  # FAILS :sob: 

Need while loops

To implement chemistry. An additional problem with this is that there isn't a way to lower those to Loopy.

cc @kaushikcfd

Rejecting `__eq__` early if hashes disagree

I'm not sure it's profitable for us, but it might be worth studying. The main reason I have doubts is because freezes on equivalent computation lead to frequent graphs that do compare equal, meaning there's no benefit to an early exit that won't be taken. (And, if we were to compute the hash for this, there would be potentially substantial extra cost.)

Another option would be to

  • make the hash value cache an attribute (e.g. self._hash_value) instead of implicit (via @memoize_method) and
  • only use it for early exit if it's already computed (i.e. not None)

DependencyMapper semantics

@kaushikcfd This hadn't occurred to me when I reviewed it, but the semantics for the DependencyMapper are actually not clearly defined. The current one provides all subexpressions that go into the argument. Some other possibilities:

  • All named arrays (i.e. things known by the Namespace)
  • All scalar/integer parameters

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.