probsys / sppl Goto Github PK

Probabilistic programming system for fast and exact symbolic inference

License: Apache License 2.0

Shell 0.77% Python 99.23%

probabilistic-programming sum-product-networks pldi probabilistic-inference programming-languages

sppl's Introduction

Sum-Product Probabilistic Language

SPPL is a probabilistic programming language that delivers exact solutions to a broad range of probabilistic inference queries. The language handles continuous, discrete, and mixed-type probability distributions; many-to-one numerical transformations; and a query language that includes general predicates on random variables.

Users express generative models as probabilistic programs with standard imperative constructs, such as arrays, if/else branches, for loops, etc. The program is then translated to a sum-product expression (a generalization of sum-product networks) that statically represents the probability distribution of all random variables in the program. This expression is used to deliver answers to probabilistic inference queries.

A system description of SPPL is given in the following paper:

SPPL: Probabilistic Programming with Fast Exact Symbolic Inference. Saad, F. A.; Rinard, M. C.; and Mansinghka, V. K. In PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, June 20-25, Virtual, Canada. ACM, New York, NY, USA. 2021. https://doi.org/10.1145/3453483.3454078.

Installation

This software is tested on Ubuntu 20.04 and Python 3.8. SPPL is available on the PyPI repository

$ python -m pip install sppl

To install the Jupyter interface, first obtain the system-wide dependencies in requirements.sh and then run

$ python -m pip install 'sppl[magics]'

Examples

The easiest way to use SPPL is via the browser-based Jupyter interface, which allows for interactive modeling, querying, and plotting. Refer to the .ipynb notebooks under the examples directory.

Benchmarks

Please refer to the artifact at the ACM Digital Library: https://doi.org/10.1145/3453483.3454078

Guide to Source Code

Please refer to GUIDE.md for a description of the main source files in this repository.

Tests

To run the test suite as a user, first install the test dependencies:

$ python -m pip install 'sppl[tests]'

Then run the test suite:

$ python -m pytest --pyargs sppl

To run the test suite as a developer:

To run crash tests: $ ./check.sh
To run integration tests: $ ./check.sh ci
To run a specific test: $ ./check.sh [<pytest-opts>] /path/to/test.py
To run the examples: $ ./check.sh examples
To build a docker image: $ ./check.sh docker
To generate a coverage report: $ ./check.sh coverage

To view the coverage report, open htmlcov/index.html in the browser.

Language Reference

Coming Soon!

Citation

To cite this work, please use the following BibTeX.

@inproceedings{saad2021sppl,
title           = {{SPPL:} Probabilistic Programming with Fast Exact Symbolic Inference},
author          = {Saad, Feras A. and Rinard, Martin C. and Mansinghka, Vikash K.},
booktitle       = {PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Design and Implementation},
pages           = {804--819},
year            = 2021,
location        = {Virtual, Canada},
publisher       = {ACM},
address         = {New York, NY, USA},
doi             = {10.1145/3453483.3454078},
address         = {New York, NY, USA},
keywords        = {probabilistic programming, symbolic execution, static analysis},
}

License

Apache 2.0; see LICENSE.txt

Acknowledgments

The logo was designed by McCoy R. Becker.

sppl's People

Contributors

Stargazers

Watchers

Forkers

zababurinsv hermetique danhlephuoc olaronning femtomc seics joaoloula ships

sppl's Issues

Consider making Event of type Transform

An event is a Transform with a binary domain.

Piecewise should invoke invert_finite and invert_interval on the intersection of ranges of subexpressions

https://github.com/probcomp/sum-product-dsl/blob/de4d0478da0ca6e7037120ddf7a2658368719f64/src/transforms.py#L633-L642

Solve an event once at the root and call logprob_solved to the children

Blocked by #13

This way the child nodes do not have to keep solving the same event over and over

Remove logpdf from SPN interface

Reintroduce after a clean semantics is established for mixed-type distributions.

Make sure EventAnd.invert and EventOr.invert use reduce, not manual constructors

Using reduce with & and | will correctly handle the simplification

https://github.com/probcomp/sum-product-dsl/blob/de4d0478da0ca6e7037120ddf7a2658368719f64/src/transforms.py#L908-L910

https://github.com/probcomp/sum-product-dsl/blob/de4d0478da0ca6e7037120ddf7a2658368719f64/src/transforms.py#L961-L963

Settle on names for Numerical(Real) and Ordinal(Real)

Really these should be called ContinuousReal and DiscreteReal, not Numerical and Ordinal

Implement "solve" for multivariate event

Should return a dictionary of solutions, one for each symbol.

Handle EmptySet for EventBasic (EventInterval and EventFinite)

https://github.com/probcomp/sum-product-dsl/blob/de4d0478da0ca6e7037120ddf7a2658368719f64/src/transforms.py#L730

Make weight distribution of Sum node an SPN and expose the random choice

The SPN should have one symbolic variable
The SPN leaf nodes should be of Nominal type.

Allow NominalDist to take any (hashable) object

Unify testing for EmptySet, i.e., "x == EmptySet" vs "x is EmptySet"

Currently it converts them to integers.

Allow NominalValue n in x << {n}

To avoid situations like the following, where iterating over the support requires a confusing type conversion...

https://github.com/probcomp/sum-product-dsl/blob/d2745c965d19e14ff7ed6368da37e03c48b0e0e9/src/spn.py#L185-L186

Implement simplifier in interpret.IfElse

When expressing a Multi-Mixture program (synthesized e.g., using the CrossCat prior), we have:

model = (Start
    # View 0 has 3 variables and 2 clusters.
    & Z[0] >> NominalDist({'0': .8, '1': .2})
    & Cond (
        Z[0] << {'0'},
                X[0] >> Norm(loc=0, scale=1)
            &   X[1] >> Norm(loc=0, scale=1)
            &   X[2] >> Norm(loc=0, scale=1),
        Z[0] << {'1'},
                X[0] >> Norm(loc=10, scale=1)
            &   X[1] >> Norm(loc=10, scale=1)
            &   X[2] >> Norm(loc=10, scale=1))
    # View 1 has 2 variables and 3 clusters.
    & Z[1] >> NominalDist({'0': .2, '1': .8, '2': 0})
    & Cond (
        Z[1] << {'0'},
                X[3] >> Norm(loc=0, scale=1)
            &   X[4] >> Norm(loc=0, scale=1),
        Z[1] << {'1'},
                X[3] >> Norm(loc=10, scale=1)
            &   X[4] >> Norm(loc=10, scale=1),
    ))

The root should be a product not a sum.

sympify_number in sym_util should fail on string inputs

Currently it converts them to integers.

Expunge redundant combinators.IfElse (interpret.IfElse is sufficient)

Replace all instances of dict/set with non-random variants

Formalize the syntax and typing judgments of SPN language in transforms.py and distributions.py

The interesting typing judgments are around addition and multiplication for events and piecewise.

Implement Piecewise Transform

Partition of domain specified in terms of general (solvable) events.
Symbol of all subexprs must be identical.
Symbol of events must match symbol of subexprs.
Solutions of events may not overlap.
Syntax for a piecewise function (related: #6, so that events are transforms):
Y = (exp(X)) * (X << {0, 1)) + X**2 * (0 < X < 1)

Give a hint to EventFiniteNominal by implementing X.support (where possible)

Can significantly reduce system complexity I think.

FiniteSet('-v1') will by a sympy mul, not a sympy symbol.

Currently it converts them to integers.

Change Singletons.EmptySet in test_transforms_solve.py and spn.py to sym_util EmptySet

Fix implementation of Product.logprob_disjoint_union

https://github.com/probcomp/sum-product-dsl/blob/ba6e29362dec68422924cc23151cad2bce263eea/src/spn.py#L316

Faulty reasoning in disjoint-union algorithm for probabilities.

Key idea: Pr[A or B] // where A and B are in DNF
= Pr[A or (B and ~A)] // disjoint union property
= Pr[A] + Pr[B and ~A] // since events are disjoint

Since A and B are in DNF, write them as
A = ϕ(A; X) and ϕ(A; Y)
B = ϕ(B; X) and ϕ(B; Y)

To assess B and ~A, we have:
[ϕ(B; X) and ϕ(B; Y)] and ~[ϕ(A; X) and ϕ(B; Y)]

= [ϕ(B; X) and ϕ(B; Y)] and [~ϕ(A; X) or ~ϕ(B; Y)]

= [ϕ(B; X) and ϕ(B; Y) and ~ϕ(A; X)]
or [ϕ(B; X) and ϕ(B; Y) and ~ϕ(B; Y)]

= [ϕ(B; X) and ~ϕ(A; X) and ϕ(B; Y)]
or [ϕ(B; X) and ϕ(B; Y) and ~ϕ(B; Y)]

the issue is that clauses in this DNF expression not necessarily
disjoint.

Example:

ϕ(A; X) = X > 0
ϕ(A; Y) = Y < 1
ϕ(B; X) = X < 1
ϕ(B; Y) = Y < 3

Then the final expression is:

[(X < 1) and ~(X > 0) and (Y < 3)]
or [(X < 1) and (Y < 3) and ~(Y < 1)]
= [(X < 1) and (X ≤ 0) and (Y < 3)]
or [(X < 1) and (Y < 3) and (Y ≥ 1)]
= [ (X ≤ 0) and (Y < 3) ] or [ (X < 1) and (1 ≤ Y < 3) ]

which is not disjoint, and reduces to inclusion-exclusion.

Implement compiler from the IMP language

It should suffice to specify the SPN using Python if-else notation and leveraging the ast module (extended docs)

For example, we can define the Indian GPA problem as:

nationality = choose({'India': 0.5, 'USA': 0.5})
score = choose({'imperfect': 0.99, 'perfect': 0.01})

if (nationality == 'India'):
    if (score == 'perfect'):
        gpa = Atomic(10)
    if (score == 'imperfect'):
        gpa = Uniform(0, 10)

if (nationality == 'USA'):
    if (score == 'perfect'):
        gpa = Atomic(4)
    if (score == 'imperfect'):
        gpa = Uniform(0, 4)

In general, any SPN will have this specific structure.

Implement XOR of Events

A xor B = (A and ~B) or (~A and B)

Surprising that Python does not automatically implement this...

Avoid duplicate work in DNF query analysis for Product

Duplicate work factoring the query is done at each Product node in the network!

Implement simplifications in Event.to_dnf

i.e., (X > 0) & ((X < 0) | Y > 0))
-> ((X>0) & (X < 0)) | ((X > 0) & (Y > 0))
-> ((X > 0) & (Y > 0))

Add test case for Indian GPA

The model is as follows:

Using the updated SPN DSL parser, we express it as:

X = Identity('X')

model = 0.5 * ( # American student
            0.99 * Uniform(X, loc=0, scale=4) | \
            0.01 * Atomic(X, loc=4)) | \
        0.5 * ( # Indian student
            0.99 * Uniform(X, loc=0, scale=10) | \
            0.01 * Atomic(X, loc=10))

Allow general set in "X << S", including S is Union/Complement/Interval, etc.

i.e., add more cases to this method
https://github.com/probcomp/sum-product-dsl/blob/d2745c965d19e14ff7ed6368da37e03c48b0e0e9/src/transforms.py#L289-L311

Remove complement attribute from EventInterval and just create a Union upon inversion

Gracefully handle EventFinite.solve with non-numeric values

Example 1

>>> (X << {'a'}).solve()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/fsaad/sum-product-dsl/build/lib/spn/transforms.py", line 711, in solve
    return self.invert({1})
  File "/scratch/fsaad/sum-product-dsl/build/lib/spn/transforms.py", line 53, in invert
    return self.invert_finite(intersection)
  File "/scratch/fsaad/sum-product-dsl/build/lib/spn/transforms.py", line 764, in invert_finite
    return self.subexpr.invert(values_prime)
  File "/scratch/fsaad/sum-product-dsl/build/lib/spn/transforms.py", line 59, in invert
    assert False, 'Unknown intersection: %s' % (intersection,)
AssertionError: Unknown intersection: Intersection({a}, Union({-oo, oo}, Reals))

Example 2

>>> (~(X << {'a'})).solve()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/fsaad/sum-product-dsl/build/lib/spn/transforms.py", line 711, in solve
    return self.invert({1})
  File "/scratch/fsaad/sum-product-dsl/build/lib/spn/transforms.py", line 53, in invert
    return self.invert_finite(intersection)
  File "/scratch/fsaad/sum-product-dsl/build/lib/spn/transforms.py", line 764, in invert_finite
    return self.subexpr.invert(values_prime)
  File "/scratch/fsaad/sum-product-dsl/build/lib/spn/transforms.py", line 59, in invert
    assert False, 'Unknown intersection: %s' % (intersection,)
AssertionError: Unknown intersection: Reals \ {a}

The solution is to create two classes EventFiniteReal and EventFiniteNominal and handle the inversion appropriately under each case.

Implement optimization in event_to_disjoint_union

Only solve the clauses once, at the first call.
Return overlap_dict in find_dnf_non_disjoint_clauses

Expunge equivalent events in dnf_normalize

https://github.com/probcomp/sum-product-dsl/blob/a67336578c619c9df1323488cf31ffe04e8141f1/src/dnf.py#L71-L82

Specifically, apply conjunctions = set([...])

Consider implementing simplifications of And/Or in EventBasic

https://github.com/probcomp/sum-product-dsl/blob/de4d0478da0ca6e7037120ddf7a2658368719f64/src/transforms.py#L769-L781

Example of present behavior

print((1.5 <= X) & ((1 <= X) < 2))                                                                    
(1.5 < X) & (1 <= X < 2)

If the subexpressions match, we can simplify it to 1.5 < X < 2

There is not much of a simplification that we can do for Or.

Consider implementing a MultvariateNormal Leaf (not clear how)

On the conditional distribution of a multivariate Normal given a transformation – the linear case
https://www.sciencedirect.com/science/article/pii/S240584401831449X

Random sampling from a truncated multivariate normal distribution
https://www.sciencedirect.com/science/article/pii/0893965994900426

Consider implementing a Constant transform

Currently a constant c must be numeric, by creating a polynomial (0*X + c). In general, we can have constants in symbolic (nominal) domains, for Piecewise expressions such as

Y = 'negative' * (X < 0) + 'positive' * (X > 0) + 'zero' * (X << {0})

Related #19

Decide and document whether X << {} should be EventFiniteNominal or EventFiniteReal

https://github.com/probcomp/sum-product-dsl/blob/a67336578c619c9df1323488cf31ffe04e8141f1/src/transforms.py#L309

Unify variables names of "values", "x", "intervals", and other names for "y"

Everything having to do with "invert" should use a variable name "y", not "x", for consistency. Intervals, intersections, values, and finite sets should also all be "y".

Handle symbolic domains in Piecewise

One possibility is to have a generic Constant transform, in which case we can do something like:

Y = ('high')*( X << {'a', 'b'}) + ('low')*(X << {'c'})

probsys / sppl Goto Github PK

sppl's Introduction

Sum-Product Probabilistic Language

Installation

Examples

Benchmarks

Guide to Source Code

Tests

Language Reference

Citation

License

Acknowledgments

sppl's People

Contributors

Stargazers

Watchers

Forkers

sppl's Issues

Example 1

Example 2

Recommend Projects

Recommend Topics

Recommend Org