Giter Site home page Giter Site logo

egg's People

Contributors

akesling avatar bastacyclop avatar bjchambers avatar bksaiki avatar chandrakananandi avatar cospectrum avatar dandandan avatar dewert99 avatar fabric-and-ink avatar framist avatar imbrem avatar jakevossen5 avatar krypt-n avatar lyxell avatar marcusrossel avatar meithecatte avatar minimario avatar mmp avatar mwillsey avatar namin avatar nlewycky avatar oflatt avatar philzook58 avatar remysucre avatar rosekunkel avatar sgpthomas avatar tomjaguarpaw avatar waywardmonkeys avatar yihozhang avatar yycdavid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

egg's Issues

Runner doesn't track initial rebuild

There is an initial rebuild at the beginning of run in run.rs that is there to create the initial classes_by_op hash table in the egraph. The time for this rebuild (usually very small) is not tracked by the iterations vector in the runner.

What happened to Language::children()?

This method was present in the latest release but no longer exists in the development version. I don't see anything in the changelog about it. I have the same question about Language::children_mut().

Clarify Analysis ordering

The new Analysis::merge could use a bit more doc, along the line of:

Analysis::Data must form a partial order. For example, the "constant" analysis for i32 forms the order None < Some(x: i32) for any x (but any pair of Some(x), Some(y) are not comparable). merge modifies to to be the least upper bound of to and from. For example, take merge(a, b):

  1. When b = None there is no need to modify a, as b contains no information (is the bottom of the partial order). We know for sure b <= a.
  2. When a = None and b = Some(2) modifies a to be Some(2), as b > a.
  3. When a = Some(1) and b = Some(1) there is no modification.
  4. When a = Some(1) and b = Some(2) egg should panic, as the points have no LUB (they give conflicting information for what the constant should be).

Is the above consistent with the implementation? I also can't figure out what the ordering returned by merge is supposed to mean from the tests.

Accessing analysis data after extraction

In some cases it would be useful to access the analysis data after extraction. For instance, the analysis data may include information about constraints or type information that are useful during rewriting (for instance, only apply certain rules to certain types of nodes). This information may also be useful for lower level code-generation from the extracted expression.

But, the extracted Ids are not related to the IDs in the graph, making the analysis information difficult to access.

Some possible options:

  1. A variant of extraction (extract_with_analysis) that collects the analysis information for each node added to the RecExpr.
  2. A map from IDs in the expression to IDs in the class. This seems error prone since the IDs could get mixed up.

I think the former would be pretty easy to add, if I'm reading things right:

    /// Find the cheapest (lowest cost) represented `RecExpr` in the
    /// given eclass.
    pub fn find_best_analysis(&mut self, eclass: Id) -> (CF::Cost, RecExpr<L>, Vec<N::Data>) {
        let mut expr = RecExpr::default();
        let mut data = Vec::new();
        let (_, cost) = self.find_best_analysis_rec(&mut expr, &mut data, eclass);
        (cost, expr, data)
    }

    fn find_best_analysis_rec(&mut self, expr: &mut RecExpr<L>, data: &mut Vec<N::Data>, eclass_id: Id) -> (Id, CF::Cost) {
        let eclass = self.egraph[eclass_id];
        let (best_cost, best_node) = match self.costs.get(&eclass.id) {
            Some(result) => result.clone(),
            None => panic!("Failed to extract from eclass {}", eclass.id),
        };

        let node = best_node.map_children(|child| self.find_best_rec(expr, child).0);
        data.push(eclass.data.clone());
        (expr.add(node), best_cost)
    }

It may be cleaner to create a wrapper around the data that allows indexing with Id.

Did I miss a way to get the analysis data after extraction? If not, would the above method be generally useful? Another option would be to allow the recursive call to take a function that is applied to each eclass. Then one version does nothing (and hopefully gets compiled away) while the other version pushes the data to the vector. This could also allow for versions that push just a part of the data, etc.

Change the signature of merge to take two eclasses instead of analysis data as arguments

It would help with debugging if merge took two eclasses as arguments instead of their analysis data. Consider a situation where two eclasses incorrectly get merged due to some bug. The debug log currently shows us which rule firing led to this situation. It however does not tell us which two eclasses were merged. I think changing the signature of the merge function will allow a user of egg to get this information and more easily debug their egg-based application.

BiImplication in prop example

Mostly curious, from my own projects/converting between normal forms, In the the prop.rs example, Introducing BiImplication tends to increase the complexity of the ast, because assuming you add an enum variant like: "<->" = BiImplication([Id; 2]),
and a case to make Prop::BiImplication([a, b]) => Some(x(a)? || !x(b)?) && Some(x(b)? || !x(a)?),,
you would need to either clone, or Rc, a and b.

So introducing BiImplication causes the ownership to get a bit hairier.
I thought it'd be interesting to see how that small change affects the otherwise clean example,
however I haven't tried it in egg yet.

Language::display_op should work like other formatting traits

The rust std library formatting traits (Display, Debug, Binary, etc.) are all structured the same way: they have a single method

fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result

Language::display_op should follow the same convention, either by changing its signature:

fn display_op(&self, f: &mut fmt::Formatter) -> fmt::Result

or by separating its functionality out into a new trait:

pub trait DisplayOp {
    pub fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result;
}

Beyond just consistency, this change would also have the added advantage that the operator's string representation could be generated on-the-fly, whereas the current signature (returning &dyn Display) requires the implementer to return a reference to some value that already exists. I'm happy to submit a pull request making this change.

Other types of searchers and appliers

The main one I can think of is an applier that adds a new expression based on a matched pattern, but does not substitute it (i.e. doesn't union). Instead it's just appended to the e graph as is. This could be used to create "deductions" that could then be used by conditional rewrite rules. Consider the use case of inferring properties of expressions based on the properties of their constituents. For example, a commutativity rewrite rule might be conditional on whether the operator in question is indeed commutative on the operands. The language might include set-theoretic terms like x is-a-member-of N which can be checked for with a condition so that the term (x + y) is-a-member-of N can be generated, and then the commutativity substitution (x + y) + x to x + (x + y) can be made.

I'm not sure how much this overlaps with the "analysis" feature, but it seems like it would be more powerful.

Limiting rewrites?

I found a degenerate case using rules similar to the math example. When confronted with an expression like (* x 0) it seems to first apply both zero-mul and comm-mul to get that (* x 0) = (* 0 x) = 0. After that, each iteration seems to run more and more instances of assoc-mul. I think this makes arises since the x is equivalent to a multiplication, leading to: (* x 0) = (* x (*x 0)) = (* x (* x 0)).

The result is that this simple expression times out after 10 seconds. Is there a way to prevent this expansion? I wonder if it would be enough to add guards on the associativity rules that the nested expression isn't a constant?

If there is a way, I'm happy to figure out and add a reasonable case to the existing tests for it.

Integrate delta debugging

Add functionality in test_fn to specify that set of rules do not rewrite term A to term B. Additionally, use the delta debugging crate to discover a minimal set of rules that do rewrite A to B to make debugging easier.

One thought: this should either not be turned on by default OR only run once per run of a test suite. This can take a while to run, especially on large rule sets. A single unsoundness may trigger many tests to fail, but the result of rerunning DD will usually not give much more information. I still think it would be good to complete running the entire test suite, but maybe having a DD_HAS_RUN flag somewhere and checking it before running.

Per-rule scheduling

Would be nice to have per-rule customization of scheduling. For example, apply BackOffScheduler only to certain rules.

Standalone CLI

I would like to use egg in a compiler for a programming language, but the compiler is written in a language without FFI to Rust or C. I wrote a poor version of egg in the meantime, but it would be ideal to be able to call an egg subprocess or similar.

Language must be an enum

Noticed this when trying to use a generic language like:

struct MyLang {
  op: OpKind,
  children: Vec<Id>,
}

When used as a pattern, search fails because it uses std::mem::discriminant(...) to get the key of the enum, which is unspecified behavior.

This restriction should at least be documented, since it affects the set of valid languages.

One suggestion would be to add something to the Language trait that allows the user to change the discriminant. In my case, the OpKind could be used to identify operators.

For instance:

trait Language {
  type OpDiscriminant: PartialEq + Eq + Hash + Debug;
  fn discriminant(&self) -> OpDiscriminant
}

For languages that based on enums, this could be implemented as:

impl Language for EnumLang {
  type OpDiscriminant = Discriminant<Self>;
  fn discriminant(&self) -> OpDiscriminant { std::mem::dicriminant(self) }
}

For languages like mine that use a struct, this could be:

impl Language for MyLang {
  type OpDiscriminant = OpKind;
  fn discriminant(&self) -> OpDiscriminant { self.op }

I'm not sure if this affects other parts of the code. If not, I'd be happy to try creating a pull request to address this if the approach outlined above seems reasonable.

nits in lambda test

I think there are some nits in the lambda test. For example, Analysis::make inserts the ID of variables to free, but lambda takes a symbol and CaptureAvoid::apply_one tests if that symbol is in free, which will always be false. To see this, changing lambda_capture_free to the following will fail:

 egg::test_fn! {
    #[should_panic(expected = "Could not prove goal 0")]
    lambda_capture_free, rules(),
    "(let y (+ (var x) (var x)) (lam (var x) (var y)))" => "(lam (var x) (+ (var x) (var x)))"
}

Similarly, the test cases also pass symbols to let and fix, which Analysis::make would add to free.

Furthermore, in my implementation I consistently use variables instead of symbols, but implementing the free variable test as in CaptureAvoid does not work. The ID of v2 never matches any ID stored in free (any clue why this happens? Do I have to call find?) Instead, I had to test if the free var of v2 is in the free vars of e.

union allows distinct constants to be in the same eclass

This maybe more of an RFC than an issue, but right now, union will not complain if two eclasses A and B are merged where both A and B contain distinct constants.

Some hypothetical clients may want support for such situations, but an eclass containing distinct constants will usually be a bug in a rewrite rule or other equivalence finder running over the egraph.

A straightforward fix would be to extend the range type of the eclass's hash map (currently Vec<ENode>) in the egraph type to also track what constant (if any) an eclass contains. Then union can quickly check for compatibility during merging.

More interestingly, this would be a first step towards supporting client-specific equivalence finders, constant folding in particular. I'm not sure what that interface looks like yet, but getting it right will be an important design challenge.

Non-enum Languages significantly slower

I created a copy of the math.rs test that uses a custom language based on a struct:

#[derive(Clone, Debug, Hash, PartialEq, Eq, PartialOrd, Ord)]
enum MathOp {
    // 2 args
    Diff,
    Integral,
    Add,
    Sub,
    Mul,
    Div,
    Pow,
    // 1 args
    Ln,
    Sqrt,
    Sin,
    Cos,
    // 0 args
    Constant(Constant),
    Symbol(Symbol),
}

#[derive(Clone, Debug, Hash, PartialEq, Eq, PartialOrd, Ord)]
pub struct Math {
    op: MathOp,
    children: Vec<Id>,
}

Running the benchmarks shows this to take 15-30% more cycles for each test:

Test math math_struct math / math_struct 1-(math / math_struct)
diff_power_harder 21,074,499 30,392,699 69.34% 30.66%
diff_power_simple 5,047,100 7,156,238 70.53% 29.47%
integ_one 1,329,639 1,533,784 86.69% 13.31%
integ_part1 8,189,299 12,439,484 65.83% 34.17%
integ_part2 30,383,957 48,027,206 63.26% 36.74%
integ_part3 3,705,544 5,267,235 70.35% 29.65%
integ_sin 1,340,420 1,547,473 86.62% 13.38%
integ_x 1,401,729 1,639,932 85.47% 14.53%
associate_adds 60,466,286 80,325,970 75.28% 24.72%
diff_different 1,337,645 1,543,119 86.68% 13.32%
diff_ln 1,350,054 1,556,474 86.74% 13.26%
diff_same 1,321,303 1,498,483 88.18% 11.82%
diff_simple1 3,352,771 4,654,982 72.03% 27.97%
diff_simple2 3,018,403 4,165,698 72.46% 27.54%
powers 1,440,284 1,731,722 83.17% 16.83%
simplify_add 2,648,652 3,418,225 77.49% 22.51%
simplify_const 1,746,411 2,237,483 78.05% 21.95%
simplify_factor 8,295,182 11,505,095 72.10% 27.90%
simplify_root 11,303,834 15,622,872 72.35% 27.65%
  1. It seems like the performance shouldn't be this much worse, since the operator can still be extracted and used as a key.
  2. In some cases, it seems like it should be even faster, since the constant or symbol can be part of the operator key, filtering out more cases (especially for 0 / 1 constants).

I suspect the problem is the use of discriminant, which causes all nodes to be considered the same, preventing short-circuiting of some searches and possibly inability to use lookup classes by operator, etc.

As a first step, I can send a PR with the test / benchmark. I think @mwillsey you had mentioned some planned work that would address this, so perhaps this issue can just serve as a tracking issue for that work?

This would be helpful (and avoid surprises) in cases where egg is being used as part of a larger project that already has a language defined.

"Found unexpected equivalence"

I'm writing a very custom Language implementation and I suspect that maybe there is a bug in my implementation that is causing this assertion in "egraph.rs" to fail. Question is, I'm not sure what sort of problem leads to "Found unexpected equivalence", so maybe the text of that assertion could be expanded to include details of what the likely cause is?

Bidirectional rewrite! macro

It's common to have rewrite rules to apply both ways, so it'd be convenient to have a bidirectional rewrite! macro. Such a macro (say equiv!) can generate a vector of 2 rules, and the user can use it like so:

vec![
    equiv!("assoc"; "(* a (* b c))" = "(* (* a b) c)"), 
    equiv!(...), 
    ...
].concat()

This will also force the user to separate bidirectional rules from one-way rules, which is good.

Add not-equal conditions

Sometimes you only want to rewrite if something is not equal.

This gets a little dicey because things may become equal later, but we should support it as it's pretty common.

`rewrite!()` `if`s are parsed in reverse order

https://github.com/mwillsey/egg/blob/db793b0117dd25349ba483248c04ab90bb241536/src/macros.rs#L304

It's because here, each if is used to create another ConditionalApplier, meaning the last if is the outermost ConditionalApplier and is evaluated last.

A potential solution that I haven't had the time to test yet: turn the ifs into a list of some kind, then fold over the list in the reverse order.

Another potential solution: can you just reverse the order of matched things in a pattern that matches repetitions?

constant folding

We should implement constant folding. The Language trait should provide the functionality to do that.

Put fuel on "expansive" rules

There should be some way to classify rewrites as expansive or not, and those that are should have some kind of fuel applied to them.

Can we do this classification automatically?

unresolved import `egg::Symbol`

The documentation suggests importing egg's interned String as egg::Symbol, but this fails to build in a clean new project (created with cargo new, egg v0.4.1):

   Compiling autocfg v1.0.0
   Compiling log v0.4.8
   Compiling cfg-if v0.1.10
   Compiling symbolic_expressions v5.0.3
   Compiling smallvec v1.4.0
   Compiling once_cell v1.4.0
   Compiling instant v0.1.4
   Compiling indexmap v1.4.0
   Compiling egg v0.4.1
   Compiling egg-test v0.1.0 (/Users/avh/research/egg-test)
error[E0432]: unresolved import `egg::Symbol`
 --> src/main.rs:1:5
  |
1 | use egg::Symbol;
  |     ^^^^^^^^^^^ no `Symbol` in the root

Expand documentation on how to properly implement the Analysis trait

I am struggling to implement the Analysis trait correctly, especially the merge function. I got very good at producing endless loops or failing asserts in the caller function. I have something working now for a very simple data type but I am not certain why it works. There are more elaborate examples in the paper and also some explanation, but I am still struggling. Also in the main branch the return value is now Option<Ordering> but I am not sure what is happening exactly depending on the respective four different results.

Please consider to add a bit more information there what invariants need to hold in the merge operation and what is happening when this function is called.

Reuse the machine between search iterations

Right now the pattern search machine isn't reused between pattern searches, so it reallocates a lot.

I can fix this with some unsafe, but doing it safely will require an API change.

`find_best` overflows the stack

I'm on Mac, where stack sizes are small, and the egraph I'm searching through is large...but still, I assume this is not expected behavior?

I'll post a link to a demo you can run yourself in a bit!

Possible soundness issue with Runner timeout

Hello! I'm running into interesting behavior when I create a Runner with a very short timeout (say 200 microseconds) and a large number of rules.

It tells me Saturated, and I extract the minimal expression, but if I increase the timeout (say to 400 microseconds), I again get Saturated with a smaller minimal expression. Unless I'm misunderstanding saturation, this behavior would seem to mean that the initial Saturated status was incorrect.

I think the source of this issue is the break statements inside run_one:
https://github.com/egraphs-good/egg/blob/master/src/run.rs#L423
https://github.com/egraphs-good/egg/blob/master/src/run.rs#L452
With a short timeout, these prevent the loop(s) from iterating through all the rules, meaning applied may be empty, even though there could be further work to do. The empty applied causes run_one to give a Saturated result.

As a temporary fix, does it make sense to just comment out those two breaks? This will allow the runner to exceed the limits, but only by one "iteration".

Maybe a better fix would be to change the two checks of the form

if self.check_limits().is_err()
    break;
}

to something like

result = result.and(self.check_limits());
if result.is_err() {
    break;
}

(where result is initially Ok(())), and then run_one could return result if its an Err? This approach is what I'm doing right now, and it seems to be working.

Let me know if I'm completely missing something here. Otherwise, I can create a pull request for the above, if it would help!

Docs feedback

I'm just dropping in my requested feedback in a place that's easy to find. I haven't moved on yet to using Egg, so this is just based on reading the docs:

  1. The installation was very smooth. I was impressed.
  2. It was also a very nice experience opening up the docs the first time.
  3. The page on egraphs is fantastic and understandable.
  4. There's a very small typo ("conceptually and") on the graphs page.
  5. The examples on both the main page and the egraph page are hard for me to understand without context. It would help to have an English explanation before each of them about what is happening.
  6. Better yet, I would love some kind of tutorial to getting starting with Egg that walks you through a real but simple use case. Maybe even in the browser!
  7. The main page could also use a "What is Egg?" section at the beginning, or just a paragraph at the top. Some of the wonderful content from the egraph page could go in that section.

I'll add other thoughts to this issue as I move on to playing around.

Support multi-patterns for rewrite rules

Description

Currently the left hand side and the right hand side of a rewrite rule can only contain one subtree each, with the equivalent nodes being the two root nodes. It would be helpful to support more general cases where each side can contain multiple subtrees (potentially with shared nodes), and a rewrite rule can have multiple matched equivalent nodes.

Example cases:

  1. (matmul ?input_1 ?input_2), (matmul ?input_1, ?input_3) => (split_0 (matmul ?input_1 (concat ?input_2 ?input_3))), (split_1 (matmul ?input_1 (concat ?input_2 ?input_3)))

Here split_0 (and split_1) refers to taking the first (and second) element from the split operation. matmul is matrix multiplication. There are two matched outputs: (matmul ?input_1 ?input_2) is equivalent to (split_0 (matmul ?input_1 (concat ?input_2 ?input_3))), and (matmul ?input_1, ?input_3) is equivalent to (split_1 (matmul ?input_1 (concat ?input_2 ?input_3))). They can't be specified as separate rules, since to apply (matmul ?input_1 ?input_2) => (split_0 (matmul ?input_1 (concat ?input_2 ?input_3))), we need to match ?input_3 in the egraph before as well (needs to also be matmul'ed with ?input_1).

  1. (relu ?input_1), (relu ?input_2) => (split_0 (relu (concat ?input_1 ?input_2))), (split_1 (relu (concat ?input_1 ?input_2)))

This case differs from above in that left hand side contains two separate subtrees, where case 1's LHS share common nodes between (matmul ?input_1 ?input_2) and (matmul ?input_1, ?input_3). Both would be useful.

Some initial idea on how to achieve this

Maybe we can still use the RecExpr to represent the patterns on both sides, but adding a mapping of the matched output pairs (instead of the two last elements of LHS and RHS). And when matching LHS (and applying RHS), we can keep a record on which indices of the RecExpr that hasn't been matched (and applied), so that we make sure all elements are matched (and applied) before we finish.

Track applied rules for each class

Herbie has an optimization that tracks the rules applied for each class, so that we can skip those rules if the class doesn't change. This could speed up searching quite a bit when each class is big. This could be a simple annotation once we have class metadata.

Document egraph[id]

It took me a while to realize that to get the EClass of a given Id you need to do egraph[id].

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.