egraphs-good / egglog Goto Github PK

View Code? Open in Web Editor NEW

383.0 383.0 45.0 4 MB

egraphs + datalog!

Home Page: https://egraphs-good.github.io/egglog/

License: MIT License

Rust 78.78% HTML 2.92% JavaScript 17.22% Makefile 0.25% Python 0.06% Racket 0.77%

egglog's People

Contributors

Stargazers

Watchers

Forkers

philzook58 remysucre yihozhang ezrosent dcao oflatt saulshanabrook alstrup rtjoa clyben graydon cartazio hackedy amdahl-rs timotree3 aziemchawdhary-gs ryan-berger wbthomason xhenryzhang trevorhansen wilcoxjay hatoo gussmith23 bvanjoi alex-fischman gmh5225 kirstenmg eytans mgree pandacontron majidaldo puyral connorbaker boyland-pf pbonh riswords opacous amarshah1 mehrresearch therealbnut cgyurgyik

egglog's Issues

Check passes for wrong assertion

The following check should fail, but it doesn't:

(function f (i64) i64 :merge (min old new))

(set (f 1) 4)

(check (= (f 1) 2))

Parser might mishandle differently-named values constrained to be equal

See the below program.

(datatype list
    (Empty)
    (Cons i64 list))

(relation eq (list list))

(eq Empty Empty)

; Oddly, this version works:
; (rule ((= x (Cons x1 rest1)) (= y (Cons x2 rest2)) (= 0 (- x1 x2)) (eq rest1 rest2))
(rule ((= x (Cons x1 rest1)) (= y (Cons x2 rest2)) (= x1 x2) (eq rest1 rest2))
      ((eq (Cons x1 rest1) (Cons x2 rest2))))

(define mylist (Cons 1 Empty))

(run 100)

Output:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `"i64"`,
 right: `"__bogus__"`', src/typecheck.rs:654:29
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Put back nbe and typecheck tests

15cccfa removed these tests because they failed to typecheck. @philzook58 @yihozhang if you get the chance to fix them feel free to commit them back.

Desugar merge functions

Merge functions are currently broken when they affect the database because they are not desugared properly.
We plan to make merge functions desugar into rules I believe

The print command for functions has a result of the form (foo x) -> y.
I agree this is way more readable than a sexp, but it is slightly annoying / non uniform to parse with off the shelf sexp parsers.
I'd propose keeping the more readable form for the stderr info but turn the stdout output to either (foo x y) or (-> (foo x) y) keeping with the philosophy that stderr is for humans, stdout is for machines.
In addition, I think it would be useful to have a method to dump raw eid form in addition to extracted form. Maybe a parameter to print, or maybe a csv dump.

lattices should be bounded (rationals -> extended rationals)

The semantics of rewrite rules is that everything on the LHS is query, and everything on the RHS is action (populating terms only, no query). For example, in

(rewrite ((add a b)) ((= (lo x) (+ (lo a) (lo b)))))

(lo x), (lo a), (lo b) will be created on the fly with the default bottom value (so the above is usually not the rule we want). For this to be true, lattices should have a least element.

what are the current semantics for under/overflow in the built in number types?

Just read the paper. I like it! :)

Created map type is always first defined map type

I have noticed a slight oddity in the current type system. The order in which you defined map sorts influences the type of maps you create.

Example

For example, this works:

(sort MyMap (Map i64 String))
(sort MyMap1 (Map i64 i64))

(define my_map1 (insert (empty) 1 "one"))

whereas this fails:

(sort MyMap1 (Map i64 i64))
(sort MyMap (Map i64 String))

(define my_map1 (insert (empty) 1 "one"))

This means you cannot create two different types of maps in one level.

Why does it happen

When you register a map sort, like (sort MyMap (Map i64 String)), this will register the primitives, including create. If you do this twice, then two primitives will be registered under the name create. However, when you call create, the first one registered will be chosen.

Workaround

You can get around this by pushing a level, creating a map, then going up a level:

(push)
(sort MyMap1 (Map i64 i64))
(define my_map1 (insert (empty) 1 1))
(pop)

(sort MyMap (Map i64 String))

(define my_map2 (insert (empty) 1 "one"))

Solution

I am not sure if there an elegant way to address this oddity at the moment. I am not sure if it would be classified as a bug.

My understanding of the underlying issue is that type system of egg-smol is mainly the simply typed lambda calculus. Users can create custom type (sorts) and define function which map between different types. Users cannot create functions that are generic or create types that depend on other types.

Builtin maps are supported as the only parametric types, but they are added in such a way as to make them appear as simple types.

This leads to this issue, since there is no way to pass a type into a function. To solve it, I believe we would need to be able to pass a type into the create function to parameterize it with something like (create (Map i64 i64)) or (create i64 i64).

This would mean changing the type system of egg-smol from a simply typed lambda calculus to the System Fω, since it requires having types that depend on types (Map) and terms that depend on types (create). This would also allow users to define other custom types, such as lists or simply typed functions (like a lambda from int to str).

I am not suggesting this one oddity would be enough motivation for such a large change, since it is possible to work around it currently. But I wanted to open this issue to write out my thoughts here, since I had been thinking about the possibility of adding user-defined parametrics for a while :)

Thanks for all the work on this library! It's very exciting.

Persistent indices across updates

Rebuilding Indices is one of the performance bottleneck of relational e-matching. Right now, the indices are built from scratch in each iteration. A more efficient approach may be to keep the indices across iterations and keep them updated as the database changes.

`Table::len()` is not computing the right size because of stale entries

This causes issues with print-size and the printed summary. I made an ad hoc fix locally for print-size which the artifact relies on but we need a more careful fix.

Saturation not detected when `set` is a noop

The below program doesn't break early, but could. This particularly matters if we have a schedule that wants to run a ruleset to saturation.

(function foo () i64 :merge (min old new))

(set (foo) 0)

; This should break at iteration 0 because the merge doesn't cause any updates
(rule ((= f (foo))) ((set (foo) 1)))
(run 100)

Extraction Cost Example Incorrect Now

I believe the extraction cost example is no longer accurate:

(datatype Expr
  (Num i64 :cost 5))

(define x (Num 1) :cost 10)
(define y (Num 2) :cost 1)

(extract x) ;; (Num 1)
(extract y) ;; (y)

Running it yields:

./target/debug/egg-smol  tests/extraction-cost.egg 
[INFO ] Declared sort Expr.
[INFO ] Declared function Num.
[INFO ] Run (let v1___ 1).
[INFO ] Run (let v0___ (Num v1___)).
[INFO ] Run (let x v0___).
[INFO ] Run (let v3___ 2).
[INFO ] Run (let v2___ (Num v3___)).
[INFO ] Run (let y v2___).
[INFO ] Run (let v4___ x).
[INFO ] Extracted with cost 6: (Num 1)
[INFO ] Run (let v5___ y).
[INFO ] Extracted with cost 6: (Num 2)

I think some of the changes to how define works may have broken this example. I am not sure if this was intentional or not.

Bad query compilation for parent table

This test case is deleted in PR #158 because it runs very slow:

https://egraphs-good.github.io/egglog/?program=XQAAgAAYCAAAAAAAAAAdiAUH8jxQlgyj-cRXmi2Zd1J45P5Icm3waT90RZXzafssUGhO1uV_9mDZkqP3L26Lbwg1MixeqK3M5B1GbZGDJsGqTw73J2lPAzlcGJoP99puJTwt9Mfe6OyWPmVAVKbJFJiA-HS0hXHxPLXRYfm1PxhcCsCrOiy_N1URApkHiM-Q931K1d9OnuJhnUenWG_FF9goSUPpsigw_j6z3EIIrST2EhQxHLWp5CwoG4POM0V4JHUe_dg_EaJ0RKHnTGrmUGfjS13TnVR7OQ5KjaXlrsAfQuiukYarPvAHABNFH9Xfwg8Qb4fHHUIEqZ4bQb9Tu392HY2yfBtv5XK4Vr8TYM-xurlvgmqKsoiP1m1zhUh_iUM8bTSb2GkgAAQXe7ZyDmQxId7KyDw8aFW21S9nG-e5LWAXagjUUlonsEe_a6MyeSQijKDqpYi5t3y3LD1cFSYsegFwMHWYQw7r1_kGTxnXzJ2FRDeOT61gge4blXcNSn9_HaN34LwaRrXzJ81DiyL1jBsj0uC2kLA5YCgc0hKzQv4TMWcrs9Ica5BNbJ1lYQh8-r3hLTqen5XFtvDYxM_fNWBe2O1zW2VZ0BLOWRIEvpp3opQx13aZHUj9eWgxKfc8-0MXx6dXqGP_yLZV-H7s9l1h-i-kqmtsvBk713wXA9ODvLEQaKVC82usSezmDomy8qTWIAgJETmoQJrZJeur6PaZ8x9NVu03QqQ9bhUe_BCDP-ocl1jz5CIvZ2t0oofoyVlKlEMEcPaCFEutVCJqccU9zkjWDO3KQHBp9Jc__e4LJA%253D%253D

One way to fix it would be to leverage the functional dependency to look up the output from the parent table instead of joining as we currently do.

Vec include getting items and length?

I was looking at the Vec sort and as far as I can tell it doesn't have a way to retrieve an item at a certain index or update an item at a certain index? It seems to only contain a way to check if an item exists.

It seems similar to a set then, but maybe allowing duplicates and ordered.

I was curious if you would be open to a PR which adds the ability to get an item from a particular index as well as one which returns the length.

Add log2 primitive for integers

While trying to write a rewrite to handle:

a / b -> a >> log2(b), where b is a power of 2

I ran into an issue with representing log2(b). I can of course add the operation to a datatype, but I thought it would be a nice addition to egglog since rational has a log function.

I've added the primitive and opened #162

Thanks for this super awesome tool!

Add License

If possible I'd prefer to contribute with an open source license in place. My preference would be MIT or similar (like egg) but happy to discuss other options.

Lambda Calculus Example in Paper

I am trying to create a lambda calculus example in the Python bindings and was curious about the code that is listed in the "Better Together: Unifying Datalog and Equality Saturation" paper:

It looks a bit different than the current lambda calculus example in this repo. I was curious if the code for the paper is living somewhere else and if it works on the most recent version of egg-smol.

Example not working

This variant of a program in tests is not doing what I expect.

(datatype Math
  (Num i64)
  (Var String)
  (Add Math Math))

(define start (Add (Num 3) (Add (Num 4) (Var 42)))) ;; TODO typecheck this!
(define goal  (Add (Num 7) (Var 42)))

(function depth (Math) (Min i64))

(rewrite (Add x y) (Add y x))
(rewrite (Add (Add x y) z) (Add x (Add y z)))
(rewrite (Add (Num x) (Num y)) (Num (+ x y)))

(rule ((= e (Num x)))  ((= 0 (depth e))))

(run 3)
(extract (Num 4))
(extract (depth (Num 4)))
; [ERROR] Not found: (depth (Num 4))
(check (= 0 (depth (Num 4))))

I'm not sure what I am missing here. Eventually, I'm trying to use lattice values for extraction.

Unify extraction and serialize to both use the serialize format

Right now extraction extracts to an Expr, but this has the big downside of not sharing subexpressions.
The new serialization format supports terms, so we should extract to that.

Integer overflow in `semi_naive_set_function_with_proofs`

Enabling proofs for the semi_naive_set_function test results in an integer overflow, apparently in the generated proofs code; with RUST_LOG=debug enabled:

$ cargo test semi_naive_set_function_with_proofs
...
[2023-06-01T03:26:52Z DEBUG egg_smol::gj] Query: (Cons__ p1 pl p) (ProofCost__ p1 cost) (ProofListCost__ pl cost2) where (+ cost cost2 v15_____) 
    New atom: (ProofListCost__ pl cost2) 
    Vars: p1 pl p v15_____ cost cost2
    Program
     Intersect @ 1   0: Cons__.1  2: ProofListCost__.0
     Intersect @ 0   0: Cons__.0  1: ProofCost__.0
     Intersect @ 2   0: Cons__.2
     Intersect @ 5   2: ProofListCost__.1
     Intersect @ 4   1: ProofCost__.1
     Call Prim(+) [Var("cost"), Var("cost2"), Var("v15_____")] false
thread 'semi_naive_set_function_with_proofs' panicked at 'attempt to add with overflow', src/sort/i64.rs:32:61

I think other desugaring is generating costs that are really large. It seems like we shouldn't be using i64 and + for this if these costs really need to be that high, instead we could look at:

f64
Option<i64> where None is "positive infinity"
Use saturating addition.

Not sure which of these would be easiest for the proofs code.

Duplicate literals in query cause egglog to panic

The following program panics.

(relation f (i64))

(rule ((= x 1)
       (= x 2))
      ((f x)))
      
(run 1)
(print f)

Either unsound rules or a bug in egglog

Here's a set of rules that results in constants being merged.

Several people have looked at the rules, but we can't find the bug. It's potentially a bug in egglog

Define command vs define action

There is currently both a top-level define command as well as a define action, with the same names and similar signatures, but different behavior. This was initially a bit confusing for me until I dug into the source and tried out some examples. I am opening this issue to document my understanding of the current behavior (please correct me where I am wrong) and open a discussion to see if there might be a way to make it more straightforward.

Current Behavior

Top level define command

The top level command is effectively an alias for defining a function that takes no args (a thunk) and setting the value passed in as the result. So for example, I can do:

(define x 1)
(check (= (x) 1))

Which is, AFAIK, equivalent to:

(function x () i64)
(set (x) 1)
(check (= (x) 1))

The define top-level command is used in many examples to make variables which are then referred to later.

Define action

The define action however cannot currently be run at the top level of the script but can be used in a rule, to define a temporary local variable that is then accessed later in the rule actions.

For example, this is the initial fibonacci example:

(function fib (i64) i64)
(set (fib 0) 0)
(set (fib 1) 1)

(rule ((= f0 (fib x))
       (= f1 (fib (+ x 1))))
      ((set (fib (+ x 2)) (+ f0 f1))))

(run 7)

(check (= (fib 7) 13))

We can refactor it to define a temporary variable in the rule and use that to set the value:

(function fib (i64) i64)
(set (fib 0) 0)
(set (fib 1) 1)

(rule ((= f0 (fib x))
       (= f1 (fib (+ x 1))))
      ((define sum (+ f0 f1))
       (set (fib (+ x 2)) sum)))

(run 7)

(check (= (fib 7) 13))

This is different from the above define, in that it sets it as a variable, not as a function. So you cannot call the sum in the set action below it.

As far as I can tell, this isn't used much at all in the examples. The only place I can see it used is in the matrix example. In this case, the local variables defined are not ever actually accessed later. I assume something about simply defining them has an affect on the script, but I am not sure what this is:

; demand
(rule ((= e (MMul A B)))
((define demand1 (ncols A))
(define demand2 (nrows A))
(define demand3 (ncols B))
(define demand4 (nrows B)))
)

Possible changes

As I said above, I am not opening this because there are any bugs in the current implementation necessary, but simply because the ergonomic are a bit confusing. Depending on where the define is located it can either be a global function or a local variable. Also the top level one takes a cost parameter, which the local variable does not.

I wonder if it might be possible to either:

Remove the define action since it is used in only one example. In that case, it isn’t used for defining a variable but for causing a value to be evaluated. Maybe if this use case is still needed, there could be a different action defined to only evaluate the expression?
Change the name of one of them, to differentiate them better.
Keep them the same, but maybe make their implementation closer? Maybe the top-level one could also set a global variable, instead of defining a function?
Remove the define command, since you can get the same behavior with a function definition and a set, or just repeat the expression where you need. This would be fine for using egg-smol from something like Python, where you can rely on the host language to define a value and use it in multiple places, but would make the text interface much more verbose.

These are just a few initial ideas I had, but I mostly wanted to raise this issue to make sure I am understanding things properly and open up the discussion for more input. Thank you all for your work on this package! It's been very interesting to dive into and learn about.

Support predicates / operators in query

For many applications, we may want to write something like

(rule ((> (lo x) (hi x))) ((unsound x))

and

(rule ((= (typeof (Cons y ty ctx) x) t) (!= x y))((= (typeof ctx x) (typeof ctx x))))

However, we need to support predicates (and operators in general) for queries (the left-hand side) for this rule to work. And to support them, we may need to build a dependency graph for the query to the evaluation order (e.g., we need to ground x and y before evaluating (!= x y).

Matching bug with `!=`

This program should be the same if you use the commented out rule. However, it fails.

Print Line Number on Parse Error

Conversion of character number to line col number would be a much more pleasant programming experience. I get around it by installing an extension in my vscode editor that prints character number. Bonus mode if one can print the line with little squiggles under the problem. I attempted to do this a few months back based on these comments lalrpop/lalrpop#323 (comment) but got lost somewhere in custom error implementation egg-smol has.

Unable to build egglog

Hello!
I am trying to build the repository however I'm running into some issues. First when I build it using make all it gives an error when trying to compile the target test:

When I comment out those lines, the build progresses and then fails at the following point:

error[E0658]: use of unstable library feature 'int_log'
  --> src/sort/i64.rs:45:67
   |
45 |         add_primitives!(eg, "log2" = |a: i64| -> i64 { (a as i64).ilog2() as i64 });
   |                                                                   ^^^^^
   |
   = note: see issue #70887 <https://github.com/rust-lang/rust/issues/70887> for more information

error[E0658]: use of unstable library feature 'is_some_with'
    --> src/lib.rs:1029:42
     |
1029 |                 if self.get_sort(&value).is_some_and(|sort| sort.is_eq_sort()) {
     |                                          ^^^^^^^^^^^
     |
     = note: see issue #93050 <https://github.com/rust-lang/rust/issues/93050> for more information

For more information about this error, try `rustc --explain E0658`.
error: could not compile `egglog` due to 2 previous errors

I'm guessing the latter issue may be something to do with my Rust configuration, I'm not too familiar with Rust.

Weird "invalid default" error for a constant

Repro here

Panic when extracting from a partial function

The following program panics:

(function f (i64) i64 :merge (min old new))

(set (f 1) 4)

(extract (f 2))

Another parser bug to test

Here is another good test case showing a bug in egglog.

I found it implementing a flattener for egglog rules

High level Python API

Hi!

I wanted to open an issue to let folks know I have a version of a high level API wrapper working in Python. I published a new version of the Python package and put together a tutorial showing how to re-create the matrix examples in Python:

from egg_smol import *

egraph = EGraph()

@egraph.class_
class Dim(BaseExpr):
    """
    A dimension of a matix.

    >>> Dim(3) * Dim.named("n")
    Dim(3) * Dim.named("n")
    """
    def __init__(self, value: i64Like) -> None:
        ...

    @classmethod
    def named(cls, name: StringLike) -> Dim:
        ...

    def __mul__(self, other: Dim) -> Dim:
        ...

a, b, c = vars_("a b c", Dim)
i, j = vars_("i j", i64)
egraph.register(
    rewrite(a * (b * c)).to((a * b) * c),
    rewrite((a * b) * c).to(a * (b * c)),
    rewrite(Dim(i) * Dim(j)).to(Dim(i * j)),
    rewrite(a * b).to(b * a),
)

x = Dim.named("x")
ten = Dim(10)
res = x * ten * ten

graph.register(let("", res))
egraph.run(10)
egraph.extract(res)

It also exposes the rust API more directly, which the high level API is built on top of.

A couple of fun things that you get with the high level API, is static type analysis provided by any of Python's type checkers (mypy, pylance, etc), to verify that the types (aka sorts) are correct before even running the program. Also, instead of just having everything be a function, we can use Python's operator overloading so that things like a * b are translated into Add(a, b).

This requires knowing all the types at compile time, so that we can differentiate the operator overloading based on the argument types, which I have implemented on the Python side.

It also exposes a few oddnesses with at least the version of egg-smol that I am using, like the use of let as a way of adding expressions to the e-graph, which I use in the Python tutorial and is also used in the matrix example in egg-smol.

I am currently pinned to an older version of egg-smol, so I can focus on getting this API worked out. I do plan on upgrading the underlying dependency next and adding support for the new features.

I would also like to try to add on some graph visualization next, to help understand the state of the e-graph visually as things progress...

I would love any feedback on the Python API! I had to make some design decisions on how to expose things and would love to hear how they land with people. Although it's definitely more verbose than the lisp API, there are some advantages of being a DSL in a host language with tooling as I mentioned above.

Thanks for all the work on this library! It's very exciting to be getting closer to a language-independent way of expressing DSLs and translations between them :)

`math_with_proofs` fails in debug builds

The math_with_proofs test triggers an assertion failure:

thread 'math_with_proofs' panicked at 'assertion failed: `(left == right)`
  left: `Value { tag: "Proof__", bits: 59896 }`,
 right: `Value { tag: "Proof__", bits: 10121 }`: [19586] EqGraph__([Value { tag: "Ast__", bits: 99019 }, Value { tag: "Ast__", bits: 59914 }]) = TupleOutput { value: Value { tag: "Proof__", bits: 59896 }, timestamp: 14 }
ResolvedSchema { input: [EqSort { name: "Ast__" }, EqSort { name: "Ast__" }], output: EqSort { name: "Proof__" } }', src/lib.rs:277:14

The relevant assertion that's failing is:

                assert_eq!(
                    output.value,
                    self.bad_find_value(output.value),
                    "[{i}] {name}({inputs:?}) = {output:?}\n{:?}",
                    function.schema,
                )

So rebuilding missed this node! This is probably an edge-case in rebuilding somewhere

Visualization of EGraph State

I am starting to look into adding some interactive visualizations to egglog-python and wanted to open an issue here, thinking I might start with adding some visualizations upstream.

I think could be useful for explaining e-graphs to new audiences and developing a mental model in particular for how egglog works.

To start with, I was hoping to generate graphs similar to those on the egg website:

I have started to look through the internal structure to pull out the pieces I would need to traverse:

EGraph struct
- functions: HashMap<Symbol, Function>
  - In the Function struct, can get the
  - In the Function struct, the nodes.vals (Vec<(Input, TupleOutput)>) returns a number of input (list of Values) output (Value) pairs which define the nodes of the function
    - Value has a tag which defines which type it is and a bits to store the value. For the primitive types, this is the value. For an "Eq Sort" i.e. user defined type, this is a lookup in the unionfind
unionfind: UnionFind
- parents: Vector where each index is an ID and the values point to the parent item.

For egglog, I was thinking I could generate something in a .dot syntax, which could be visualized using the graphviz CLI as well as in the browser. I could either generate the strings directly or rely on a rust library like graphviz-rust to create them.

To create the graph I was going to try:

Traversing the parents in the unionfind structure to create a cluster per union, with a node in each cluster item.
Traversing all the values of each all functions, setting the text of each node to the function name, and adding an edge between the output and each input. For non-eq/primitive nodes, I would create a new node in the graph for each. For output primitive nodes, I would have the text be the value as well as the function.

I am not sure how legible this will be, but wanted to see how it works out for some of the examples.

If this does work out, then I would try extending to some interactive formats. I have had luck in the past using cytoscape.js for large graph visualization of programs (see vega/editor#1023).

Type Inference Failure

;ThCategory
(datatype TYPE)
(datatype TERM)
(function type (TERM) TYPE)
(function Ob () TYPE)
(function Hom (TERM TERM) TYPE)

(function id (TERM) TERM)
(rule ((= goofball (type (id A))))  ((let A_type (type A))))
(rewrite (type (id A)) (Hom A A) :when ((= (type A) Ob)))

(function compose (TERM TERM) TERM)
(rule ((= goofball (type (compose f g))))  ((let f_type (type f)) (let g_type (type g))))
(rewrite (type (compose f g)) (Hom A C) :when ((= (type f) (Hom A B)) (= (type g) (Hom B C))))


(birewrite (compose (compose f g) h) (compose f (compose g h)) :when ((= (type A) Ob) (= (type B) Ob) (= (type C) Ob) (= (type D) Ob) (= (type f) (Hom A B)) (= (type g) (Hom B C)) (= (type h) (Hom C D))))
(birewrite (compose f (id B)) f :when ((= (type A) Ob) (= (type B) Ob) (= (type f) (Hom A B))))
(birewrite (compose (id A) f) f :when ((= (type A) Ob) (= (type B) Ob) (= (type f) (Hom A B))))

https://www.mwillsey.com/egg-smol/?program=XQAAgAB9AwAAAAAAAAAdlQkEMyquloSsDef8Ne_CNtIKYi_rMgTflb9nB-DF-gk3Yjhk1l0lAFzyvSLtDQMTUWeekzTMgv3D9Ss2FhdWDmn3YGsm5y8dRgAwvMCVB0KoS2MxJfv7fkWXV5ZVvtbgJmePIO7P_q-JLbIg3vP3iv_lpP9EmmcBDPCHHJBFg_NH5CnfTYQKPnW33_xU20s6iKclWEzR6pcGY153-dZy7A9FRBPWSglI_kfJ5juBrwYIrb0G1_zySgYm8trnH2wL_ed1VGmlGPQi3Sz9dccLj95aiagKa6Lqu8eiBHDOH4l7wmZuMeZLVaqwSvs3QCpDPwmrDie94AM6kqOEilsJzHgqxv5Z_aY%253D

This fails with

[ERROR] Errors:
    Failed to infer a type for: __rewrite_var
    Failed to infer a type for: f

on the last two rewrite rules. I think it ought to be able to infer the type of f from the :when clause.

:merge can have side effects by calling a function with a :default

Possibly a non-issue (there's no bug), but it can lead to some spooky behavior.

In the below example, repeatedly setting bar, even though its table never changes, causes a feedback loop.

(function foo (i64) i64 :default 0)

(function bar () i64 :merge (min (foo new) 0))

(set (bar) 1)
(set (bar) 2)

(rule ((= f (foo x)))
      ((set (bar) (+ x 1))))


(print bar)      ; (bar) -> 0
(run 100)
(print bar)      ; (bar) -> 0
(print-size foo) ; Function foo has size 101

Nondeterminism

This is a tracking issue for sources of nondeterminism. @ezrosent ran into one in #33. I think that can be attributed to using HashMap instead of IndexMap in a couple places, and that is somehow leaking the fact that Symbols might be different in the face of concurrency or multiple runs in the same process.

I would like to not have to use IndexMapeverywhere, so I think we should track down why this matters.

Container rebuild bug

Here's a failing test:

https://egraphs-good.github.io/egglog/?program=XQAAgAA5AQAAAAAAAAAFCt5yvsWfR8X0ni3K7Yumf[…]6BBt6aN1pbCBmCx9E-YvCaNuOBCluRmTNujEjkyD_pWdKAA%253D%253D

Here's a more complex test that should pass:

(datatype Math
  (Num i64))

(sort MathVec (Vec Math))


(let v1 (vec-of (Num 1) (Num 2)))
(let v2 (vec-of (Num 2) (Num 2)))

(union (Num 1) (Num 2))

(check (= v1 v2))

(function MyVec (MathVec) Math)

(MyVec v1)

(check (MyVec v2))

(check (= (MyVec v1) (MyVec v2)))

(let v3 (vec-of (Num 4) (Num 5)))

(union (Num 4) (Num 6))
(union (Num 5) (Num 7))

;; We don't have any (MyVec v3) yet
(fail (check (= (MyVec v3) (MyVec (vec-of (Num 6) (Num 7))))))

(MyVec v3)
(check (= (MyVec v3) (MyVec (vec-of (Num 6) (Num 7)))))```

extract:variants panic in base type

(function f (i64) i64 :merge (max old new))

(set (f 1) 1)

(extract:variants 1 (f 1))
(extract:variants 2 (f 1))

The first extract is ok but the second extract panicked. When the function map to user-defined types, it won't panic.
Should the second extract behave the same as the first one?

panic: ConstrainEq on unbound variables

(rule ((= x 1) (= y x) (= z y)) ())
(run 1)

panics, while

(rule ((= x 1) (= y x)) ())
(run 1)

does not.

Seems related to desugaring @oflatt?

This issue is blocking @clyben from fixing a follow-up issue of #93.

More details:

Currently, the program produces body

[AssignLit(x, 1), ConstrainEq(y, x), ConstrainEq(z, y)]

while the expected body is

[AssignLit(x, 1), AssignVar(y, x), AssignVar(z, y)]

(Note that we don't have AssignVar currently.)

ConstraintEqs are produced by flatten_facts and flatten_equalities on cases like (= var1 var2).

We can also remove the checks since we are converting NormFact::ConstrainEq back to Fact::Eq anyway so it shouldn't matter, maybe?

Follow-up issues on "Semi-naive should also inspect the deltas of the right-hand side"

#93 is fixed by #134. However, there are several issues left with #134:

let-bound variables are conservatively handled. We will translate every let-bound right-hand side to the left-hand side even when they are not used in set. To handle this correctly, we may need a liveness analysis.
Namespaces of let-bound variables on the right-hand side are not correctly handled.

(rule ((= a (expr1)))
      ((let a (expr2))
       (set (...) a))

Unioned values found to be `!=`

The following program's check does not fail as expected:

(datatype r (R i64))
(union (R 1) (R 2))
(fail (check (!= (R 1) (R 2))))

Also, when a (run 0) is added after the union, or the union is changed to set, then the error changes to:

thread 'main' panicked at 'prim was partial... do we allow this?', src/typecheck.rs:723:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Another query compiler bug

Here

Variants Gone?

It seems like there isn't any code currently that tests the variants feature. Is this a dead code path? Or are there other downstream applications that are using it?

I was trying to figure out what the variants are, but it was hard without an examples.

Panic on admittedly silly axiom

https://www.mwillsey.com/egg-smol/?program=XQAAgACeAAAAAAAAAAAUGQgnfMUD9dO10BC2jbJotgLClFTWhuxd3X1pNTX5W8B1pnvqFbP-Z9e9Nklrmkcj2koM9emIop4mcnfahUjjHhdqJLXm1KNuDIX1T2pDuh93M7OwPZsIlGSZ0J0jVlzUUL5wnBvhQ8MXkradYNkf_9rAEAA%253D

(datatype KAT
    (A)
    (par KAT KAT)
)

(rewrite (par p p) p)
(rule ((= r (par q r)) (= q (par q r))) ((union r q)))

; tests
(define q (par A A))
(run 10)

[INFO ] database size: 10
[INFO ] Made 1 updates
thread 'main' panicked at 'assertion failed: i < self.len()', src/unionfind.rs:65:9
stack backtrace:

Namespace in egglog

Right now egglog is lacking a proper module system, and this causes pains in issues like #113. So it would be nice to design and implement a namespace or module system for egglog. This issue tracks the progress of module system development in egglog.

Some resources:

Our goals for egglog module design:

TODO

cc @clyben

Let-bound extraction

When extracting from egglog, sometimes the results can blow up because the sharing in the database is no longer being leveraged.
If we had another format for terms which used let, this problem would be solved.

Merge during rebuild

Today merge functions are only applied when we notice a merge while evaluating Action::Set, but we can also cause merges during a rebuild. Merge functions should also run in response to merges caused by rebuilds. Here's a small failing test case:

(datatype N (Node i64))
(function distance (N N) i64 :merge (min old new))

(define a (Node 0))
(define b (Node 1))
(define x (Node 2))
(define y (Node 3))
(set (distance x y) 1)
(set (distance a b) 2)

(union a x)
(union b y)

(run 1)
(check (= (distance x y) 1)) ; fails, the distance has gone up!

Semi-naive should also inspect the deltas of the right-hand side

The semi-naive will only evaluate based on the deltas of the left-hand side, but the deltas of the right-hand side may also impact the output. See the code below

(function f (i64) i64 :merge (max old new))

(set (f 0) 0)
(set (f 3) 0)


(rule ((= f0 (f 0))) ((set (f 1) f0)))
(rule ((= f1 (f 1))) ((set (f 2) f1)))

;; update f3 some iters later to make sure f(0) is inactive
(rule ((= f2 (f 2))) ((set (f 3) 3)))

;; This rule should fire and set f(0) to be 3, but because f0 is inactive, it does not fire (despite that f3 is active now)
(rule ((= f0 (f 0))) ((set (f 0) (f 3))))

(run 100)
(print f) ;; f0 is expected to have value 3, but has 0 in reality.

Supporting tree automata completion (program refinement--based saturation) in egglog

This is a more ambitious issue, just dumping it here so we don't forget. We want to support tree automata completion in egglog by dusugaring it into egglog/Datalog rules.

Some thoughts I had earlier:

Essentially you need this rule:

(rule (
	(less-than o1 o2)
	(func args ... o1)
) (
	(func args ... o2)
)

This requires FDs not to be enforced. And for a rewrite like (rewrite lhs rhs), it now becomes

;; rhs can be used in place of lhs
(rule (
	(= e lhs)
) (
	(less-than rhs e)
)

Note that there might be multiple e for the same lhs because we no longer have FDs, but this is fine due to transitivity.
Technically you need transitivity and reflexivity rules for the less-than relation, but I doubt you need them in this particular case. These two properties might already be implicitly maintained.
Finally, we can speed up TA completion with the following rule:

(rule (
	(less-than e1 e2)
	(less-than e2 e1)
) (
	(union e1 e2)
))

Implement default for non-eq-able functions

This is failing

(function f () i64 :default 42)
(extract (f))

with the error message

thread 'main' panicked at 'not yet implemented: Handle default expr', src/typecheck.rs:676:33