anydsl / thorin2 Goto Github PK

The Higher ORder INtermediate representation - next gen

Home Page: https://anydsl.github.io/thorin2/

License: MIT License

CMake 1.87% C++ 95.76% Shell 0.21% Python 0.06% Rust 1.48% C 0.32% GDB 0.30%

compiler compilers compiler-construction partial-evaluation partial-evaluator dependent-types intermediate-representation optimizer ssa static-single-assignment

thorin2's People

Contributors

Stargazers

Watchers

Forkers

fodinabor neuralcoder3 f-fries gmh5225 betzinger juhigupta0 rebcabin leissa

thorin2's Issues

Closure Conversion - LLVM Execution Error

In some cases, the emitted LLVM Code writes and loads functions from a wrong slot resulting in an infinite loop.

Example:
https://github.com/NeuralCoder3/thorin2/blob/closure-test/lit/clos/bind.thorin

Commands:

%thorin -d clos %s --output-ll %t.ll -o -
clang++ lit/clos/lib.cpp %t.ll -o %t -lc -Wno-override-module

Observed bahvior:
An infinite stream of 99 is printed.

Expected behavior:
A countdown from 99 to 0.

parsing several imports over and over again

You can observe this behavior if you enable the assert in World::make_external:

    void make_external(Def* def) {
        assert(!def->name().empty());
        auto name = def->name();
        auto [i, ins] = move_.externals.emplace(def->name(), def);
        assert((ins || (def == i->second)) && "two different externals registered with the same name");
    }

Fixing this is a little bit more complicated than meets the eye. Problem is that some parts of the Parser - like the input stream - are not designed to be changed - while other parts like the scope, world, normalizers, etc. are mutable entities that should be filled during parsing.

Prec in output broken

The precedence in the output for setting pairs of parentheses is pretty broken. It's also not trivial to fix that because x may actually refer to a Var + Extract. Now, for a Extract we usually want to set parentheses but if it's just x we don't need them :/

Rewrite Phase Assertion Issue

There might be a bug in the rewrite phase or in my usage thereof.

In 277d500 in
https://github.com/NeuralCoder3/thorin2/blob/matrix_dialect/dialects/matrix/passes/lower_matrix_lowlevel.cpp#L74

we have a minimal rewrite phase that just rewrites with the identity.

But for instance ./build/bin/thorin -o - -VVVV -d matrix lit/matrix/init_const_no_ret.thorin fails at the assumption that
the world of the rewritten def or the gid is different.

Backend Preparation: Nested Extract Splitting

We need a phase that splits nested extracts into basic blocks to move this burden away from the backends.
Issue raised in regard to #181

Related code snippet:

// A call to an extract like constructed for conditionals (else,then)#cond (args)
// TODO: we can not rely on the structure of the extract (it might be a nested extract)
for (auto callee_def : ex->tuple()->projs()) {
    // dissect the tuple of lambdas
    auto callee = callee_def->isa_nom<Lam>();
    assert(callee);
    // each callees type should agree with the argument type (should be checked by type checking).
    // Especially, the number of vars should be the number of arguments.
    // TODO: does not hold for complex arguments that are not tuples.
    assert(callee->num_vars() == app->num_args());
    for (size_t i = 0, e = callee->num_vars(); i != e; ++i) {
        // emits the arguments one by one (TODO: handle together like before)
        if (auto arg = emit_unsafe(app->arg(i)); !arg.empty()) {
            auto phi = callee->var(i);
            assert(!match<mem::M>(phi->type()));
            lam2bb_[callee].phis[phi].emplace_back(arg, id(lam, true));
            locals_[phi] = id(phi);
        }
    }
}

instead of emit_unsafe(app->arg()); in the case

auto ex = app->callee()->isa<Extract>(); ex && app->callee_type()->is_basicblock()

The corresponding tests in affine are marked as *.disabled_extract.

Application Error with matching domain

I get an error that an application can not be constructed ("cannot pass argument") although the type of the argument matches the domain exactly:

  '(0:(.Idx 4294967296), _6524550, 1:(.Idx 4294967296), _6524522, forOut_0, _6524540)' of type 
  '[.Idx 4294967296, .Idx 4294967296, .Idx 4294967296, [%mem.M, %matrix.Mat (2, (__6522473#1:(.Idx 2), __6522473#0:(.Idx 2)), T_6522475)], .Cn [.Idx 4294967296, [%mem.M, %matrix.Mat (2, (__6522473#1:(.Idx 2), __6522473#0:(.Idx 2)), T_6522475)], .Cn [%mem.M, %matrix.Mat (2, (__6522473#1:(.Idx 2), __6522473#0:(.Idx 2)), T_6522475)]], .Cn [%mem.M, %matrix.Mat (2, (__6522473#1:(.Idx 2), __6522473#0:(.Idx 2)), T_6522475)]]' to 
  '%affine.For (4294967296, 2, (%mem.M, %matrix.Mat (2, (__6522473#1:(.Idx 2), __6522473#0:(.Idx 2)), T_6522475)))' of domain 
  '[.Idx 4294967296, .Idx 4294967296, .Idx 4294967296, [%mem.M, %matrix.Mat (2, (__6522473#1:(.Idx 2), __6522473#0:(.Idx 2)), T_6522475)], .Cn [.Idx 4294967296, [%mem.M, %matrix.Mat (2, (__6522473#1:(.Idx 2), __6522473#0:(.Idx 2)), T_6522475)], .Cn [%mem.M, %matrix.Mat (2, (__6522473#1:(.Idx 2), __6522473#0:(.Idx 2)), T_6522475)]], .Cn [%mem.M, %matrix.Mat (2, (__6522473#1:(.Idx 2), __6522473#0:(.Idx 2)), T_6522475)]]'

Fork: NeuralCoder3
Branch: ad_ptr_merge
Commit: ee33da0
Command: ./build/bin/thorin -d matrix -d affine -d direct -d clos -d math -d autodiff lit/autodiff/matrix/prod.thorin -o - -VVVV
Situation: In the matrix medium level lowering pass.

Closure Conversion - Should work for functions without %mem.M type

This supersedes #119.

valgrind for lit tests

We should definitely enable valgrind for lit tests. It's just too easy with C++ to make some stupid error. Currently, I'm investigating some bugs that wouldn't have slipped through with valgrind enabled.

Normalizers are not always applied for recursive types

I would expect normalizers to be applied depth-first as soon as the object is entirely constructed.
But this seems to be not the case if the result of the axiom is a recursive type.
In these cases, the normalizer is never fired leading to errors.
This is related to the issue that normalizers for higher-order axioms (e.g. direct) take all arguments in a greedy fashion leading to unexpected normalization calls.

Example:

Axioms:

.Pi PassList: *, %compile.Pass = PassList;
.ax %compile.pass_phase: PassList -> %compile.Phase, normalize_pass_phase;
.ax %compile.combine_pass_list: Π [n:.Nat] -> «n; PassList» -> PassList, normalize_combine_pass_list;

Normalizer:

combine_pass_list K (pass_list pass11 ... pass1N) ... (pass_list passK1 ... passKM) = pass_list pass11 ... p1N ... passK1 ... passKM

Code:

%compile.pass_phase (%compile.combine_pass_list (⊤:.Nat) (pass_list A, pass_list B))

Expected result:

pass_list A B

Result:
In its normalizer, pass_phase encounters %compile.combine_pass_list which should be impossible by invariants.

Allocation of nested array

The alloc2malloc phase seems to have a problem with the allocation of nested arrays:

%mem.alloc («a; «b; .Idx 4294967296»»,0) mem

Example: https://github.com/NeuralCoder3/thorin2/blob/matrix_dialect/lit/mem/nested_alloc.thorin

It might be the case, that my application of alloc is wrong in this case.

Error:

lit/mem/nested_alloc.thorin:9:9-11:23: error: cannot pass argument 
  '(mem_1264932_303941, %core.trait.size «__1264888_303893#0:(.Idx 2); «__1264888_303893#1:(.Idx 2); .Idx 4294967296»»)' of type 
  '[%mem.M, «__1264888_303893#0:(.Idx 2); «__1264888_303893#1:(.Idx 2); .Idx 4294967296»»]' to 
  '%mem.malloc («__1264888_303893#0:(.Idx 2); «__1264888_303893#1:(.Idx 2); .Idx 4294967296»», 0)' of domain 
  '[%mem.M, .Nat]'

Direct2CPS Scheduling/Calling Problem

The direct2cps dialect has a bug as demonstrated in
https://github.com/NeuralCoder3/thorin2/blob/test/direct_old_var/lit/direct/ad_mem.thorin.disabled

The bug manifests in the test case only in complex circumstances.
Two cps2ds calls are used with functions that contain cps2ds where a function is created that is called.

Removing some of the complexity resolves the bug.

The dialect rewrites the function but only introduces new lambdas for the continuations.
To do so, the body of the old functions is replaced.
Therefore, the origin of the bug is unclear.

Code generation for floating point casts

Currently, floating point casts are not correctly resolved.
For instance,
.let acc_3187824: %math.F (52, 11) = %math.conv.f2f (52, 11) (52, 11) 0:(%math.F (52, 11));
generates the invalid code %acc_3187824 = fptrunc double 0x0000000000000000 to double.
Example:
Fork: NeuralCoder3
Branch: matrix_dialect
Command: ./build/bin/thorin -d matrix -d affine -d direct -d clos lit/matrix/print_const_prod.thorin --output-ll Tmp.ll -o - && clang lit/matrix/lib.c Tmp.ll -o Tmp.out && ./Tmp.out 2 3

Dialect Extensions

Often, one wants to keep a dialect extensible such that other dialects can attach at a well-defined interface.
An example would be a backend dialect that allows for dialects to handle code emission of new axioms.

This would also augment the rule construct on the cpp site for extensions that go beyond rewrites.

In general, such extension points could take the form of a map/list of functions that the dialect invokes.
The main difficulty in this approach is that the map with its type signature as well as interfacing declarations
have to be visible from other dialects.

A first step towards inter-dialect extensions, visibility, and backend extensions is in
https://github.com/NeuralCoder3/thorin2/tree/dialect_backends

Closure Conversion - failing impala tests

There are a couple of failing impala test cases:

Remove --clos flag from impala again
Remove --clos flag for impala in test script
Fix non-closure related test cases that fail if clos plugin is loaded
Fix codegen/cconv/typed_cc/recursion.impala
Fix codegen/cconv/typed_cc/closure_rec.impala

CPS2DS conversion error

The dependent rewrite using cps2ds_dep seems to contain a bug in special cases.
A non-minimal example can be found in lit/direct/2out_2.thorin.

The reported problem is an old variable that is still accessed after rewrites of the program.

Shadowing for variables

It would be helpful to allow the shadowing of local declarations in the thorin syntax.
This only affects parsing and does not change existing programs. It just allows additional programs that were invalid previously.

One case where this syntax would be helpful is the memory monad:

    .let (mem1, m1_t) = %matrix.transpose ((m,k), %math.F (p,e)) (mem ,m1);
    .let (mem2, m2_t) = %matrix.transpose ((k,l), %math.F (p,e)) (mem1,m2);
    .let (mem3, m1_s) = %matrix.prod      (m,l,k, (p,e))         (mem2,ms,m2_t);
    .let (mem4, m2_s) = %matrix.prod      (k,m,l, (p,e))         (mem3,m1_t,ms);
    .let (mem5, result) = %matrix.prod    (m,k,l, (p,e))         (mem4,m1,m2);
    pb_ret (mem5,m1_s,m2_s)

is cumbersome to write and error-prone as one needs to enumerate all memory instances.
It is tedious to add a new operation in between old ones.

With shadowing, the code becomes

  .let (mem, m1_t) = %matrix.transpose ((m,k), %math.F (p,e)) (mem ,m1);
  .let (mem, m2_t) = %matrix.transpose ((k,l), %math.F (p,e)) (mem,m2);
  .let (mem, m1_s) = %matrix.prod      (m,l,k, (p,e))         (mem,ms,m2_t);
  .let (mem, m2_s) = %matrix.prod      (k,m,l, (p,e))         (mem,m1_t,ms);
  .let (mem, result) = %matrix.prod    (m,k,l, (p,e))         (mem,m1,m2);
  pb_ret (mem,m1_s,m2_s)

Which is easier to read and write.

Pro:

Easier to write
Clear from the context
Used from other languages

Contra:
This syntax steps a bit away from the sea of nodes approach.
We use the explicit order of the linearized text format.
However, we already use this order to reuse names in different scopes.
Also, the linearized order is inherently connected to the way, we write thorin programs (as text).

Grammar Improvements

https://anydsl.github.io/thorin2/langref.html

The grammar seems to contain mistakes:

The rule I for index literal is never used
pack and array expressions contain i but this does not mean import here
v is used but never declared
The precedence rules seem to be not complete
s+ in g is confusing as s is optional X?+ = X*
The precedence rules list Π Sym : e does this refer to .Pi Sym ( : etype )? , edom n? (can be confused with Π b → ecodom) => unicodes should be used consistently
I think more rules are unused that I did not catch

Add mem unexpected behaviour

The add_mem phase behaves differently, depending on surrounding externals.
Even if these externals are never called and to not interfere with the functions.
Example: https://github.com/NeuralCoder3/thorin2/blob/ho_codegen/lit/mem/add_top_mem.thorin
Example2: https://github.com/NeuralCoder3/thorin2/blob/ho_codegen/lit/mem/add_top_mem2.thorin

In add_top_mem, the call remains f (⊤:%mem.M, argc_1111242, return_1111250).
In add_top_mem, the call is changed to f (_1113823, argc_1113827, return_1113835).

The only difference between the two examples are the externals g and h (h seems to be the important one).

It seems like mems in application argument position are sometimes taken in consideration as current memory.
However, we should only consider vars and application results.

Unary Tuples

A unary tuple is normalized to its content:

<<1; T>>
T
[T]

are all the same type and

<1;42>
42
(42)

are the same value.

This usually is helpful and senseful.
But it leads to errors in edge cases:

    n:.Nat,
    i:.Idx n,
    t:<< n; [I32,I32]>>,

If n is specialized to be 1, the array degenerates to a scalar.
Therefore, the projection

t#i#(1:(.Idx 2))

is invalid.

Example:
https://github.com/NeuralCoder3/thorin2/blob/matrix_dialect/lit/core/unary_tuple.thorin

Call:

./build/bin/thorin -o - lit/core/unary_tuple.thorin

Error:

lit/core/unary_tuple.thorin:13:13-13:15: error: index '0:(.Idx 1)' does not fit within arity '2'

Solution idea (possible related to an earlier discussion we had):
Also normalize the projection away.
A projection with 0:(.Idx 1) is the identity.

Add mem problem with dependent types

The add_mem pass fails to apply arguments to dependently typed functions.
The dependencies are not correctly tracked.

Example: Code involving internal_diff_core_wrap_mul fails with

  '(2:(.Idx 4294967296), _2431641#1:(.Idx 2))' of type 
  '«2; .Idx 4294967296»' to 
  'internal_diff_core_wrap_mul_5577316 (_2431346, 4294967296) 0' of domain 
  '«2; .Idx s_5577329_2429775»'

What should we do when allocating nominal arrays?

What should we do with sth like this?

«i: n; «i as Nat; T»»

See #165.

Add mem adds too many mems

The add_mem pass adds unnecessary mems in nested argument tuples but not in the functions:

.con tup_pb_5577562 _5577572::[tup_s_5577574::[_5577575: .Idx 4294967296, _5577579: .Idx 4294967296], tup_ret_cont_5577584: .Cn [%mem.M, %mem.Ptr (.Idx 4294967296, 0)]]
(readable: tup_pb: (Int*Int)*Cn (mem*Ptr(Int)))
becomes
[%mem.M, «2; .Idx 4294967296», .Cn [%mem.M, %mem.Ptr (.Idx 4294967296, 0)]]
(mem*Int²*Cn(mem*Ptr(Int)))
and the following error occurs:

<unknown location>: error: cannot pass argument 
  '(_2442813, _2442785, _2442643)' of type 
  '[%mem.M, [%mem.M, .Idx 4294967296, .Idx 4294967296], .Cn [%mem.M, %mem.Ptr (.Idx 4294967296, 0)]]' to 
  'tup_pb_5577562_reshape_2442834' of domain 
  '[%mem.M, «2; .Idx 4294967296», .Cn [%mem.M, %mem.Ptr (.Idx 4294967296, 0)]]'

The pass correctly adds the memory _2442813 in the outer tuple but also adds an additional memory in the inner tuple.

File: lit/mem/closure/out_of_nothing2_simpl.thorin in ad_ptr_merge.

Printing of functions with one argument

Functions with only one argument are printed without brackets:

    .cn return_178710 _178716: [%mem.M, .Idx 4294967296]  = {
        return_178715 _178716
    };

In cases like the one above, this is correct.
But this hinders readability in some cases (name_num vs name _num) and prevents simply adding/printing a filter @filter at the end.

In some cases like continuation arguments, the invalid code

.cn fun_name .Cn ... = ...

is generated.

Suggestion: always print brackets.

Unexpected behavior for higher-order code generation

Unexpected results are produced when the generation of LLVM Code from Thorin code containing higher-order code
is attempted.

Small example:

extern "C" {
    fn g(fn(i32)->i32) -> i32;
    fn h(i32) -> i32;
}

fn main() -> i32 {
    g(h)
}

generates the llvm code

declare i8* @malloc(i64)

declare i32 @g({i32} (i32) // Error
declare i32 @h(i32)


define i32 @main() {
main_1038:
    %g_cont_1071.ret = call i32 @g({i32} (i32 @h) // Error
    br label %return_1066
    
return_1066:
    %_1077 = phi i32 [ %g_cont_1071.ret, %main_1038 ]
    ret i32 %_1077
    
}

Steps: impala -emit-llvm -Othorin [file.impala]

The behavior occurs also at similar points where function types are used.
For instance at the allocation of a pointer slot for a function.

Observed behavior: Invalid LLVM Code is produced.
Expected behavior: The code generation should probably throw an error if unhandled higher-order code is encountered at this point.

optimization exhibits non-deterministic behavior

Sometimes, the behavior of the optimization pipeline seems to be non-deterministic.

Example:
./build/bin/thorin -d mem -o - lit/mem/no_mem.thorin -VVVV
in
https://github.com/NeuralCoder3/thorin2/tree/ad_ptr_merge
702d848

The issue might be due to the add_mem optimization, the pipeline builder, or an underlying bug in thorin.

This behavior might also be a side effect of the previous (not merged yet) changes to mem and clos conv with long-reaching impact that did not manifest up to now.

Function body is not read correctly

For instance, pow_ad_eval_simpl2_mem.thorin from
https://github.com/NeuralCoder3/thorin2/blob/autodiff/lit/clos/pow_ad_eval_simpl2_mem.thorin

can not be read by thorin.

./build/bin/thorin ./lit/clos/pow_ad_eval_simpl2_mem.thorin -o - -VVVV

Directly produces the error

In def aug_f_745675_660796_216256:
Assertion "some operands are set, others aren't"' failed.
Filter: 0:(.Idx 2)
Body: <nullptr>

Erroneous bitcast for indefinite array allocation

Allocations with a statically unknown size T::nat lead to <TODO> in LLVM code.

For instance, when an array of the same size as an indefinite array «⊤∷nat; r32» is created in Thorin,
invalid LLVM code is produced by the code generation.

Using Alloc2Malloc:
Before optimizations:

_1176: [:mem, :ptr («3∷nat; r32», 0∷nat)] = :alloc («3∷nat; r32», 0∷nat) mem_1175;
_1190: :ptr («⊤∷nat; r32», 0∷nat) = :bitcast (:ptr («⊤∷nat; r32», 0∷nat), :ptr («3∷nat; r32», 0∷nat)) _1176#1∷i1;
...
_1159: [:mem, :ptr («⊤∷nat; r32», 0∷nat)] = :alloc («⊤∷nat; r32», 0∷nat) _1158;

After optimizations

  _2412: [:mem, :ptr («3∷nat; r32», 0∷nat)] = :malloc («3∷nat; r32», 0∷nat) (mem_2410, 12∷nat);
  _2445: i64 = :bitcast (i64, nat) ⊤∷nat;
  _2449: i64 = :Wrap_mul (3∷nat, 0∷nat) (4∷i64, _2445);
  _2450: nat = :bitcast (nat, i64) _2449;
  _2452: [:mem, :ptr («⊤∷nat; r32», 0∷nat)] = :malloc («⊤∷nat; r32», 0∷nat) (_2412#0∷i1, _2450);

and the corresponding LLVM code:

  %_2755.i8 = call i8* @malloc(i64 12)
  %_2755 = bitcast i8* %_2755.i8 to [3 x float]*
  %_2788 = bitcast i64 <TODO> to i64
  %_2792 = mul nuw nsw i64 4, %_2788
  %_2793 = bitcast i64 %_2792 to i64
  %_2795.i8 = call i8* @malloc(i64 %_2793)

Without Alloc2Malloc:
After optimizations:

_2375: [:mem, :ptr («3∷nat; r32», 0∷nat)] = :alloc («3∷nat; r32», 0∷nat) mem_2374;
_2377: [:mem, :ptr («⊤∷nat; r32», 0∷nat)] = :alloc («⊤∷nat; r32», 0∷nat) _2375#0∷i1;

LLVM: (no malloc or alloc is generated)

%_2449 = getelementptr inbounds [0 x float], [0 x float]* <TODO>, i64 0, i64 %_2447

The problem here is that T::nat is used as size but does not exist in LLVM.
One solution for some cases might be a more extensive tracking of the sizes used in allocation.
For instance, the original allocation uses a constant size of 3 and the bitcasts of the array
are eliminated.
But this propagation does not extend to the newly generated array which should have the same size.

Comment: It is not possible to produce this error using valid impala code.

Automatic Dialect Loading

We probably should load dialects automatically.
Currently, the user has to specify the loaded dialects using -d [dialect].

A fine control might be helpful during development (for instance ignoring closure conversion phases).
But we can handle these settings (disabling some phases) also using the opt primitives.
Additionally, it is quite easy to forget to load a dialect and wonder why some passes are not executed (see #169).

One option would be to always load all available phases.

Another would be to load all dialects imported in the file.
This however misses dialects that are recursively needed (e.g. maths in matrix).
If we recursively look up imported dialects, we end up again loading all of them as opt imports nearly every other dialect.

Windows linker going bonkers

After a recent merge the Windows linker doesn't terminate anymore in the Debug build.

Function used as a type but is in fact a term

Observed in lit/direct/ds2cps_ax_cps2ds_dependent_short.thorin
Command line: ./build/bin/thorin -d direct -d tool ./lit/direct/ds2cps_ax_cps2ds_dependent_short.thorin --output-thorin -

Issue:

:4294967295: error: 'Π n_167425: .Nat → .infer (<nullptr>)' used as a type but is in fact a term

High-level description:
We constructed a wrapper around .Idx

Suspected problem:
Idx has problems with the duality of fundamental integer and being applied as a function.

NOp application issue

The nat operations can not be applied currently:

.cn .extern f [[a:.Nat, b:.Nat], return : .Cn .Nat] = {
    return (%core.nop.add (b,a))
};

leads to

error: cannot pass argument '__269171#0:(.Idx 2)' of type '.Nat' to '%core.nop.add' of domain '«2; .Nat»'

Expected behavior:
Accept program.

Clos Conv FDA error

The FreeDefAna in ClosConv throws an error.
In Node::add_fvs, the def is tested to be no memory.
However, this is tested in a situation that shallowly seems correct:

.con f a::[m:%mem.M, ...] = {
  ... // defined g
  g (a#0, ...)
  //  ^ error occurs here
}

Reproduction

Branch: https://github.com/NeuralCoder3/thorin2/tree/ad_ptr_merge
Commit: 8a56cf0
File: lit/autodiff/imperativ/arr.thorin
Example call: ./build/bin/thorin -o - -VVVV -d autodiff -d direct -d affine lit/autodiff/imperativ/arr.thorin

Code before error:

.import affine;
.import autodiff;
.import clos;
.import compile;
.import core;
.import direct;
.import mem;
.import opt;
.import refly;
.con printInteger [%mem.M, .Idx 4294967296, .Cn %mem.M] @(<nullptr>) = {
     <unset>
};
.con .extern main __2788883::[%mem.M, .Idx 4294967296, %mem.Ptr (%mem.Ptr (.Idx 256, 0), 0), return_2788887: .Cn [%mem.M, .Idx 4294967296]] @(1:(.Idx 2)) = {
    .con print_integer_callback2_reshape_2788874 _2788913: %mem.M @(1:(.Idx 2)) = {
        .let _2788915: %mem.M = %mem.free («100; .Idx 4294967296», 0) (_2788913, ⊥:%mem.Ptr («100; .Idx 4294967296», 0));
        return_2788887 (_2788915, 0:(.Idx 4294967296))
    };
    .con print_integer_callback_reshape_2788818 _2788819: %mem.M @(1:(.Idx 2)) = {
        .let _2788798: %mem.Ptr (.Idx 4294967296, 0) = %mem.lea (100, ‹100; .Idx 4294967296›, 0) (⊥:%mem.Ptr («100; .Idx 4294967296», 0), 0:(.Idx 100));
        .let _2788806: [%mem.M, .Idx 4294967296] = %mem.load (.Idx 4294967296, 0) (⊤:%mem.M, _2788798);
        .let _2788855: %mem.Ptr (.Idx 4294967296, 0) = %mem.lea (100, ‹100; .Idx 4294967296›, 0) (⊥:%mem.Ptr («100; .Idx 4294967296», 0), 1:(.Idx 100));
        .let _2788863: [%mem.M, .Idx 4294967296] = %mem.load (.Idx 4294967296, 0) (_2788806#0:(.Idx 2), _2788855);
        printInteger (_2788819, _2788863#1:(.Idx 2), %clos.attr.ret .Cn %mem.M print_integer_callback2_reshape_2788874)
    };
    .con init_callback_reshape_2788341 _2788374: %mem.M @(1:(.Idx 2)) = {
        printInteger (_2788374, _2788806#1:(.Idx 2), %clos.attr.ret .Cn %mem.M print_integer_callback_reshape_2788818)
    };
    .con for_reshape_2788328 _2788998::[_2789065: %mem.M, _2789000: .Idx 4294967296] @(1:(.Idx 2)) = {
        .con for_body_reshape_2788938 _2788954: %mem.M @(1:(.Idx 2)) = {
            .let _2788985: [%mem.M, %mem.Ptr («100; .Idx 4294967296», 0)] = %mem.alloc («100; .Idx 4294967296», 0) _2788863#0:(.Idx 2);
            .let _2789001: .Idx 100 = %core.bitcast (.Idx 100, .Idx 4294967296) _2789000;
            .let _2789009: %mem.Ptr (.Idx 4294967296, 0) = %mem.lea (100, ‹100; .Idx 4294967296›, 0) (_2788985#1:(.Idx 2), _2789001);
            .let _2789016: %mem.M = %mem.store (.Idx 4294967296, 0) (_2788954, _2789009, _2789000);
            .let _2789044: .Idx 4294967296 = %core.wrap.add 4294967296 0 (1:(.Idx 4294967296), _2789000);
            for_reshape_2788328 (_2789016, _2789044)
        };
        .let _2789062: .Idx 2 = %core.icmp.XygLe 4294967296 (_2789000, 100:(.Idx 4294967296));
        (init_callback_reshape_2788341, for_body_reshape_2788938)#_2789062 _2789065
    };
    for_reshape_2788328 (_2788985#0:(.Idx 2), 0:(.Idx 4294967296))
};

Error mesage:

E:dialects/clos/phase/clos_conv.h:49: memory def in free def analysis: _2788985#0:(.Idx 2) : %mem.M
E:dialects/clos/phase/clos_conv.h:50:   in nom main
thorin: dialects/clos/phase/clos_conv.h:52: auto thorin::clos::FreeDefAna::Node::add_fvs(const thorin::Def*): Assertion `!match<mem::M>(def->type())' failed.

Edit: There seem to be other problems (⊥:%mem.Ptr, ⊤:%mem.M) that were obscured by the printing.

Closure Conversion dependent type rewrites

Fork: neuralcoder3/throin2
Branch: matrix_dialect
Command: ./build/bin/thorin -d matrix lit/matrix/print_const.thorin -d affine -d direct -d clos -o - -VVVV

The type of .con print_int_matrix [mem: %mem.M, k: .Nat, l: .Nat, m: %matrix.Mat (2, (k,l), I32), return : .Cn [%mem.M]];
is rewritten to cc_print_int_matrix_3068959 : [%mem.M, [], .Nat, .Nat, %mem.Ptr («k_3068787; «l_3068793; .Idx 4294967296»», 0), .Cn %mem.M].

The dependency is lost resulting in error: cannot pass argument.

Note: Intermediate solution: Ignore dependencies in the type scheme by writing a wrapper that replaces (k,l) with (⊤:.Nat,⊤:.Nat) using bitcasts.

Core::zip Documentation

The %core.zip axiom needs better documentation.
A more high-level type scheme is

zip: 
∀ s:ℕʳ, Tᵢₙ:☆ⁿⁱ, Tₒᵤₜ:☆ⁿᵒ,
f: Tᵢₙ -> Tₒᵤₜ, // maps input tuple to output tuple
«i:nᵢ; «s; Tᵢₙ#i»» -> // input 
«o:nₒ; «s; Tₒᵤₜ#i»» -> // output

Above, Tᵢₙ is used for «i:nᵢ; Tᵢₙ#i» when appropriate.

The input seems to be an array/tuple where each element has the array s:ℕʳ as dimension/size and all their elements have the type of Tᵢₙ.
This would be something like input#i : (Tᵢₙ#i) array (mixing a few notations for alternative presentations). Where input#i is an r-dimensional array and each entry j has size s#j.
However, I might misinterpret these dependent arrays.

A clean high-level explanation is needed:

what is the concrete type (maybe also in Idris/Gallina notation)
what does this axiom do

To me, this axiom seems more like a semi-polytypic map that lifts a function on the level of dependent tensors.
Whereas I would expect a zip to combine two data streams/sources/structures.

Note: Ideally, every axiom should come with a semantical specification as the axioms are only placeholders for a meaning that is rewritten in passes or instantiated at code generation. This might however be not possible to do compactly for every axiom or overload the declarations. (We also would need a unified way to specify semantics.)

Closure Conversion - index out of bounds

The error

array.h:99: const T& thorin::ArrayRef<T>::operator[](size_t) const [with T = const thorin::Def*; size_t = long unsigned int]: Assertion `i < size() && "index out of bounds"' failed

is triggered in some examples when trying to use closure conversion.

Example Thorin file:
https://github.com/NeuralCoder3/thorin2/blob/autodiff/enzyme/brussel.thorin
Commit 84af1a0

Stacktrace

libc.so.6![Unknown/Just-In-Time compiled code] (Unbekannte Quelle:0)
libc.so.6!raise (Unbekannte Quelle:0)
libc.so.6!abort (Unbekannte Quelle:0)
libc.so.6![Unknown/Just-In-Time compiled code] (Unbekannte Quelle:0)
libc.so.6!__assert_fail (Unbekannte Quelle:0)
libthorin_clos.so!thorin::ArrayRef<thorin::Def const*>::operator[](const thorin::ArrayRef<thorin::Def const*> * const this, size_t i) (thorin/thorin/util/array.h:99)
libthorin_clos.so!operator()<unsigned long>(const struct {...} * const __closure, unsigned long j) (thorin/dialects/clos/phase/clos_conv.cpp:35)
libthorin_clos.so!std::__invoke_impl<const thorin::Def*, thorin::clos::ctype(thorin::World&, thorin::Defs, const thorin::Def*)::<lambda(auto:57)>::<lambda(auto:58)>&, long unsigned int>(std::__invoke_other, struct {...} &)(struct {...} & __f) (/usr/include/c++/12.1.1/bits/invoke.h:61)
libthorin_clos.so!std::__invoke_r<const thorin::Def*, thorin::clos::ctype(thorin::World&, thorin::Defs, const thorin::Def*)::<lambda(auto:57)>::<lambda(auto:58)>&, long unsigned int>(struct {...} &)(struct {...} & __fn) (/usr/include/c++/12.1.1/bits/invoke.h:114)
libthorin_clos.so!std::_Function_handler<const thorin::Def*(long unsigned int), thorin::clos::ctype(thorin::World&, thorin::Defs, const thorin::Def*)::<lambda(auto:57)>::<lambda(auto:58)> >::_M_invoke(const std::_Any_data &, unsigned long &&)(const std::_Any_data & __functor,  __args#0) (/usr/include/c++/12.1.1/bits/std_function.h:290)
libthorin_clos.so!std::function<thorin::Def const* (unsigned long)>::operator()(unsigned long) const(const std::function<const thorin::Def*(long unsigned int)> * const this,  __args#0) (/usr/include/c++/12.1.1/bits/std_function.h:591)
libthorin_clos.so!thorin::clos::clos_insert_env(unsigned long, thorin::Def const*, std::function<thorin::Def const* (unsigned long)>)(size_t i, const thorin::Def * env, std::function<const thorin::Def*(long unsigned int)> f) (thorin/dialects/clos/phase/clos_conv.cpp:21)
libthorin_clos.so!operator()<unsigned long>(const struct {...} * const __closure, unsigned long i) (thorin/dialects/clos/phase/clos_conv.cpp:35)
libthorin_clos.so!std::__invoke_impl<const thorin::Def*, thorin::clos::ctype(thorin::World&, thorin::Defs, const thorin::Def*)::<lambda(auto:57)>&, long unsigned int>(std::__invoke_other, struct {...} &)(struct {...} & __f) (/usr/include/c++/12.1.1/bits/invoke.h:61)
libthorin_clos.so!std::__invoke_r<const thorin::Def*, thorin::clos::ctype(thorin::World&, thorin::Defs, const thorin::Def*)::<lambda(auto:57)>&, long unsigned int>(struct {...} &)(struct {...} & __fn) (/usr/include/c++/12.1.1/bits/invoke.h:114)
libthorin_clos.so!std::_Function_handler<const thorin::Def*(long unsigned int), thorin::clos::ctype(thorin::World&, thorin::Defs, const thorin::Def*)::<lambda(auto:57)> >::_M_invoke(const std::_Any_data &, unsigned long &&)(const std::_Any_data & __functor,  __args#0) (/usr/include/c++/12.1.1/bits/std_function.h:290)
libthorin_clos.so!std::function<thorin::Def const* (unsigned long)>::operator()(unsigned long) const(const std::function<const thorin::Def*(long unsigned int)> * const this,  __args#0) (/usr/include/c++/12.1.1/bits/std_function.h:591)
libthorin_clos.so!thorin::Array<thorin::Def const*>::Array(unsigned long, std::function<thorin::Def const* (unsigned long)>)(thorin::Array<thorin::Def const*> * const this, size_t size, std::function<const thorin::Def*(long unsigned int)> f) (thorin/thorin/util/array.h:288)
libthorin_clos.so!thorin::clos::ctype(thorin::World & w, thorin::Defs doms, const thorin::Def * env_type) (thorin/dialects/clos/phase/clos_conv.cpp:34)
libthorin_clos.so!thorin::clos::ctype(thorin::World & w, thorin::Defs doms, const thorin::Def * env_type) (thorin/dialects/clos/phase/clos_conv.cpp:30)
libthorin_clos.so!thorin::clos::ClosConv::closure_type(thorin::clos::ClosConv * const this, const thorin::Pi * pi, thorin::Def2Def & subst, const thorin::Def * env_type) (thorin/dialects/clos/phase/clos_conv.cpp:280)
libthorin_clos.so!thorin::clos::ClosConv::rewrite(thorin::clos::ClosConv * const this, const thorin::Def * def, thorin::Def2Def & subst) (thorin/dialects/clos/phase/clos_conv.cpp:188)
libthorin_clos.so!thorin::clos::ClosConv::rewrite(thorin::clos::ClosConv * const this, const thorin::Def * def, thorin::Def2Def & subst) (thorin/dialects/clos/phase/clos_conv.cpp:227)
libthorin_clos.so!operator()<unsigned long>(const struct {...} * const __closure, unsigned long i) (thorin/dialects/clos/phase/clos_conv.cpp:244)
libthorin_clos.so!std::__invoke_impl<const thorin::Def*, thorin::clos::ClosConv::rewrite(const thorin::Def*, thorin::Def2Def&)::<lambda(auto:62)>&, long unsigned int>(std::__invoke_other, struct {...} &)(struct {...} & __f) (/usr/include/c++/12.1.1/bits/invoke.h:61)
libthorin_clos.so!std::__invoke_r<const thorin::Def*, thorin::clos::ClosConv::rewrite(const thorin::Def*, thorin::Def2Def&)::<lambda(auto:62)>&, long unsigned int>(struct {...} &)(struct {...} & __fn) (/usr/include/c++/12.1.1/bits/invoke.h:114)
libthorin_clos.so!std::_Function_handler<const thorin::Def*(long unsigned int), thorin::clos::ClosConv::rewrite(const thorin::Def*, thorin::Def2Def&)::<lambda(auto:62)> >::_M_invoke(const std::_Any_data &, unsigned long &&)(const std::_Any_data & __functor,  __args#0) (/usr/include/c++/12.1.1/bits/std_function.h:290)
libthorin_clos.so!std::function<thorin::Def const* (unsigned long)>::operator()(unsigned long) const(const std::function<const thorin::Def*(long unsigned int)> * const this,  __args#0) (/usr/include/c++/12.1.1/bits/std_function.h:591)
libthorin_clos.so!thorin::Array<thorin::Def const*>::Array(unsigned long, std::function<thorin::Def const* (unsigned long)>)(thorin::Array<thorin::Def const*> * const this, size_t size, std::function<const thorin::Def*(long unsigned int)> f) (thorin/thorin/util/array.h:288)
libthorin_clos.so!thorin::clos::ClosConv::rewrite(thorin::clos::ClosConv * const this, const thorin::Def * def, thorin::Def2Def & subst) (thorin/dialects/clos/phase/clos_conv.cpp:244)
libthorin_clos.so!thorin::clos::ClosConv::rewrite_body(thorin::clos::ClosConv * const this, thorin::Lam * new_lam, thorin::Def2Def & subst) (thorin/dialects/clos/phase/clos_conv.cpp:164)
libthorin_clos.so!thorin::clos::ClosConv::start(thorin::clos::ClosConv * const this) (thorin/dialects/clos/phase/clos_conv.cpp:131)
libthorin.so!thorin::Phase::run(thorin::Phase * const this) (thorin/thorin/phase/phase.cpp:10)
libthorin_clos.so!ClosConvWrapper::prepare(ClosConvWrapper * const this) (thorin/dialects/clos/clos.cpp:29)
libthorin.so!thorin::PassMan::run(thorin::PassMan * const this) (thorin/thorin/pass/pass.cpp:56)
libthorin.so!thorin::PassManPhase::start(thorin::PassManPhase * const this) (thorin/thorin/phase/phase.h:141)
libthorin.so!thorin::Phase::run(thorin::Phase * const this) (thorin/thorin/phase/phase.cpp:10)
libthorin.so!thorin::Pipeline::start(thorin::Pipeline * const this) (thorin/thorin/phase/phase.cpp:31)
libthorin.so!thorin::Phase::run(thorin::Phase * const this) (thorin/thorin/phase/phase.cpp:10)
libthorin.so!thorin::optimize(thorin::World & world, thorin::PipelineBuilder & builder) (thorin/thorin/pass/optimize.cpp:60)
main(int argc, char ** argv) (thorin/cli/main.cpp:139)

Possibly related to #117, #118

Add demo dialect

Just that we don't forget about this one ...

Float Infer Unification Issue

%math.arith.mul 0 (a,b)
is parsed as
%math.arith.mul (.infer (<nullptr>), .infer (<nullptr>)) 0 (applied to (a,b))
of domain
«2; %math.F (.infer (<nullptr>), .infer (<nullptr>))».

However, this excludes floats where (p,e) is perceived as one def.

For instance,

  .let R = %math.F (p,e);
  .con prod_comb [[mem:%mem.M, acc:R, [a:R, b:R]], ret:.Cn[%mem.M,R]] = {
      .let v = %math.arith.mul 0 (a,b);

is not accepted (in the current version) as (a,b) is of type «2; %math.F __13580».

Note: The code is present in matrix.thorin in the matrix_dialect branch.

Association Dialect

Maybe an association dialect would be helpful.

Often, when we rewrite things, we want to replace something with a semantically equal term on another level.
The destination term can often be written (more or less) conveniently in thorin.
This is especially the case for staged lowering like present in the autodiff and matrix dialect.

These associations are currently implemented using naming conventions and internals (a hack around externals that are erased).

The association dialect would keep track of associations in the thorin code and expose map on the C++ side.
No special care to quoting/reification is needed as we usually only want to associate axioms anyway.

Example:

// in associate
%assoc.associate: [T1:*,T2:*] -> %assoc.association_class -> T1 -> T2 -> %assoc.association

// in matrix
%matrix.association :  %assoc.association_class

.extern associate_matrix() {
  (%assoc.associate %matrix.association %matrix.prod internal_matrix_mapRed_prod, 
  ...)
}

There could also be generalized passes on associations (e.g. replace all terms in a class with their associated counterpart).

This seems like a simple way to cleverly utilize the dialect infrastructure and get away from string matching.

This dialect could be combined or extend the sketched rewrite-mapping dialect from previous discussions.

k:.Idx n doesn't make sense

My latest commit fixed some UB which materialized in the CI that this test here

ds2cps_ax_cps2ds_dependent2.thorin

failed to work sometimes on Windows.

Problem is: A literal 42:.Idx n doesn't make sense and should have been a type error (which it is now). It caused UB in some normalizers (which I have fixed).

One edge case is: 0:.Idx n. This kind of makes sense, as 0:.Idx 0 actually means 0:.Idx 2^64. So this literal is guaranteed to exist - but this is still kind of odd Oo Should we really allow that?

For now I disabled the test case above, as it makes use of this erroneous index.

Recent commits broke matrix axioms

In the most recent thorin version, parsing seems odd:

With the mapReduce axiom, lit/matrix/mapReduce_mult_init.thorin is parsed (printed at the start of the pipeline) as an empty function.

However, returning an earlier memory instance (e.g. mem3 instead of mem4), the function behaves more correctly and leaves in the constMat call.

A connected issue might be that parsing itself crashes when a mapReduce and return that expects a matrix is combined.

LLVM Emitter: lam == entry_ || lam->is_basicblock()

The assertion lam == entry_ || lam->is_basicblock() is thrown at unexpected points.

An example can be found here (LamSpec needs to be disabled from the optimizations to be able to run the code generation):
https://github.com/NeuralCoder3/thorin2/blob/c4326c6643b9e26777d0d7bd9b2f5dfc5ce1e885/enzyme/brussel.thorin

The problem instance in the linked file is caused when the brusselator_2d_loop function is external (otherwise it is completely deleted) and calls brusselator_f.

The body of brusselator_f seems to be irrelevant. But the error only occurs when brusselator_f is .extern.

Note: The error is also caused in other circumstances where brusselator_f is not marked with .extern but has a more complex body.

Inconsistent constant folding of one-element tuples

One-element tuples are not constant folded away if the elements' type is a nom sigma.
I.e

// The type is simplified:
auto type = w.nom_sigma(w.kind(), 2);
type->set({w.type_int(), w.type_int()});
ASSERT_EQ(type, w.sigma({type}))

// The value is not simplified:
auto v = w.tuple(type, {w.lit_int(42), w.lit_int(1337)});
ASSERT_EQ(v, w.tuple({v}));   // < fails

(complete test)

There seems to be a similar issue when the one-tuple appears inside the struct:

struct Pos {
    pos: (int, int)
}

extern fn f(wrapped: Pos) -> int {
    let (x:int, y:int) = wrapped.pos; x
}

yields

Pos_8079: ★ = {
    [«2∷nat; i32»]
}
        
cc_f_8103: Cn [:mem, Pos_8079, Cn [:mem, i32]]: (_8132, _8143, _8127) = {
    _8151: ⊥∷★ = _8122 (_8132, _8143#0∷i0#0∷i1);
    λ@(0∷i1) _8151
}
            
    _8122: Cn [:mem, i32]: (_8259, _8264) = {
        _8129: ⊥∷★ = _8127 @_8122;
        λ@(0∷i1) _8129
    }

Here we get a a index of type i0 which the BE can't handle.

Rewrite phase issue with lambda vars.

The rewrite phase crashes when the type of a lambda is rewritten.
For instance, .con .extern f [A,.Cn B] with the rewrite B to C leads to a crash.

Example call: ./build/bin/thorin -d matrix -d affine -d direct lit/matrix/init.thorin -o - -VVVV
Branch: matrix_dialect

Externals in Dialects

Dialects currently can not declare externals.
This is probably due to multiple reads of the dialect thorin file.

This behavior can be observed in the automatic differentiation branch.

The intermediate fix is to disable the insertion/containment check in world.h.

Scheduler returns wrong position dynamically

The generated code is:

.con eta_f_1436134 _1436139::[a_1436177: .Idx 4294967296, return_1436141: .Cn .Idx 4294967296] @(0:(.Idx 2)) = {
    .let _1436184: .Idx 4294967296 = %core.wrap.add 4294967296 0 (2:(.Idx 4294967296), a_1436177);
    return_1436141 _1436184
};
.con eta_f_1436234 _1436235::[a_1436239: .Idx 4294967296, return_1436237: .Cn .Idx 4294967296] @(0:(.Idx 2)) = {
    .let _1436246: .Idx 4294967296 = %core.wrap.add 4294967296 0 (2:(.Idx 4294967296), a_1436239);
    return_1436237 _1436246
};
.con .extern main __1436033::[mem_1436050: %mem.M, .Idx 4294967296, %mem.Ptr (%mem.Ptr (.Idx 256, 0), 0), return_1436037: .Cn [%mem.M, .Idx 4294967296]] @(0:(.Idx 2)) = {
    .con eta_f_cps_cont_1436350 _1436381: .Idx 4294967296 @(0:(.Idx 2)) = {
        .con eta_f_cps_cont_1436287 _1436318: .Idx 4294967296 @(0:(.Idx 2)) = {
            return_1436037 (mem_1436050, _1436381) // Start code
        };
        eta_f_1436234 (38:(.Idx 4294967296), eta_f_cps_cont_1436287)
    };
    eta_f_1436134 (_1436318, eta_f_cps_cont_1436350)
};

First we start with

.con .extern main __1436033::[mem_1436050: %mem.M, .Idx 4294967296, %mem.Ptr (%mem.Ptr (.Idx 256, 0), 0), return_1436037: .Cn [%mem.M, .Idx 4294967296]] @(0:(.Idx 2)) = {
    return_1436037 (mem_1436050, _1436381) // Start code
};

We look at the simple case:
We place eta_f_1436234 (38:(.Idx 4294967296), eta_f_cps_cont_1436287) with the computed entry nominal main and construct the scope and scheduler.
The resulting place is main. We place the app in default fashion by moving the body of main.

.con .extern main __1436033::[mem_1436050: %mem.M, .Idx 4294967296, %mem.Ptr (%mem.Ptr (.Idx 256, 0), 0), return_1436037: .Cn [%mem.M, .Idx 4294967296]] @(0:(.Idx 2)) = {
    eta_f_1436234 (38:(.Idx 4294967296), eta_f_cps_cont_1436287)
};

.con eta_f_cps_cont_1436287 _1436318: .Idx 4294967296 @(0:(.Idx 2)) = {
    return_1436037 (mem_1436050, _1436381)
};


.con eta_f_1436234 _1436235::[a_1436239: .Idx 4294967296, return_1436237: .Cn .Idx 4294967296] @(0:(.Idx 2)) = {
    .let _1436246: .Idx 4294967296 = %core.wrap.add 4294967296 0 (2:(.Idx 4294967296), a_1436239);
    return_1436237 _1436246
};

To place eta_f_1436134 (_1436318, eta_f_cps_cont_1436350), we again compute the scope entry main, construct the scope, the scheduler.
But the scheduler suggests the place main resulting in the wrong order above.
The correct place would be eta_f_cps_cont_1436287 as the tuple requires the argument _1436318 of eta_f_cps_cont_1436287.

Code used to place:

// entry is computed
Scope scope(entry);
Scheduler sched_(scope);
auto place = sched_.smart(app);

Import of files

Currently, the import mechanism operates on dialects.

One could imagine a more modular style of files with a file-based import.

.import name // imports dialect name
.import name/abc // import abc.thorin from the dialect provided files
.import path/abc // imports path/abc.thorin from a local file

Local imports would allows for standard definitions like

.let _32 =  4294967296;
.let I32 = .Idx _32;
.let r32 = %core.Real 32;
.let double = %core.Real 64;

Dialect imports would allow for a more readable and modular approach to dialect definitions.
One example would be the differentiated constructs of autodiff.

Closure Conversion - higher order return

1E) '([], ((.Idx 4294967296, aug_pow_else_745690_660849_352178, _660885_351873):closure_type_352061, ([], aug_pow_then_745900_661112_352340, ()):closure_type_352416)#_352541#2:(.Idx 3))' of type 
2E) '      (.Idx 4294967296, aug_pow_else_745690_660849_352178, _660885_351873):closure_type_352061, ([], aug_pow_then_745900_661112_352340, ()):closure_type_352416)#_352541#1:(.Idx 3)' of domain 

1T) '[★,                   ((.Idx 4294967296, aug_pow_else_745690_660849_352178, _660885_351873):closure_type_352061, ([], aug_pow_then_745900_661112_352340, ()):closure_type_352416)#_352541#0:(.Idx 3)]' to 
2T) '[closure_type_352015, ((.Idx 4294967296, aug_pow_else_745690_660849_352178, _660885_351873):closure_type_352061,  ([], aug_pow_then_745900_661112_352340, ()):closure_type_352416)#_352541#0:(.Idx 3)]'
      ^ only here

This behavior is exhibited for instance in pow_autodiff_eval_filter2_eval.thorin.

The program is a differentiated version of pow and as such a minimal recursive function that folds continuations inside each other in the return continuation.

Performance depends on the size of Arr/pack

Here Thorin is doing things 100000000 times.

.import mem;

.let i32 = .Idx 4294967296;
.let size = 100000000:.Nat;

.cn .extern main [mem: %mem.M, argc: i32, argv : %mem.Ptr («⊤:.Nat; %mem.Ptr («⊤:.Nat; .Idx 256», 0:.Nat)», 0:.Nat), return : .Cn [%mem.M, i32]] = {
    .let (alloc_mem, arr) = %mem.alloc (<<size; i32>>, 0) (mem);
    .let lea = %mem.lea (size, <size; i32>, 0) (arr, 0:(.Idx 100));
    .let (load_mem, val) = %mem.load (i32, 0) (alloc_mem, lea);

    return ( load_mem, 1:i32 )
};

unexpected normalizer behavior

An older issue that surfaces again:

Normalizers act only on fully applied terms.
This is helpful in general as we want to normalize using the arguments.
But the determination of whether something is fully applied to not fine enough.
Ideally, the normalizer should fire at the moment the axiom is applied to all of its own arguments.
The number of arguments / the moment of full application could be determined using the type of the axiom.
This would not work perfectly for axioms that always return a function. But this is also currently a problem and it would be an expected edge case.

If the axiom is higher-order and returns a function, the normalizer is given the arguments that are applied to the final function as arg which might be confusing.
Therefore, one needs to collect all arguments from nested applications, work with the first few, and reapply the others in the end.

Example:

.ax %direct.cps2ds: Π [T: *, U: *] -> (.Cn [T, .Cn U]) -> (T -> U), normalize_cps2ds;

Hoist slots to top

Due to the fix in #118 we need a pass that tries to hoist stack slots to the top. E.g., codegen/benchmark/aobench.impala is failing right now: We are running out of stack space!

anydsl / thorin2 Goto Github PK

thorin2's People

Contributors

Stargazers

Watchers

Forkers

thorin2's Issues

Recommend Projects

Recommend Topics

Recommend Org