Giter Site home page Giter Site logo

laurenarnett / tensorflock Goto Github PK

View Code? Open in Web Editor NEW
8.0 8.0 1.0 1.32 MB

A small functional tensor language with Einstein summation notation convention and shape-checking at compile-time.

TeX 27.70% Makefile 1.95% OCaml 55.57% HCL 6.92% Shell 3.35% C 4.36% Haskell 0.15%

tensorflock's Introduction

Hi there, I'm Lauren ๐Ÿ‘‹

I graduated as a Computer Science major from Columbia University in 2019. I have been working as a Full-Stack Software Developer since then, using Clojure and ClojureScript. I'm currently in a batch at the Recurse Center, where I'm making lots of Haskell projects and trying out IHP.

  • ๐ŸŒฑ Currently learning: Haskell; SIMD & working with vector intrinsics; distributed systems
  • ๐Ÿ˜„ Pronouns: she/her
  • โšก Fun fact: CFG parsing reduces to Boolean matrix multiplication (Lee, 2002), meaning that the fastest parsing algorithms can be computed in less than O(n^3)!

You can find more details about me and ways to reach out at laurenar.net!

tensorflock's People

Contributors

dawnstar63 avatar jmorag avatar laurenarnett avatar nbuonin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jmorag

tensorflock's Issues

Code Sample

We need to write some code samples at the end of the proposal.

  • Flatten a tensor (Joseph)
  • Update a tensor element (maybe a primitive, Joseph)
  • Matrix multiplication (Nick)
  • GCD algorithm (Nick) (actually done by Joseph)
  • Perceptron algorithm (Lauren)
  • Nearest neighbor (Lauren)

Semantic Bug

We have a pretty interesting bug in the semantic checker that lets things like

wrong_type : Nat;
wrong_type = True;

pass through unscathed. This is because

  1. When the semantic checker validates expressions, for literals, it doesn't actually look at their declared type, but infers it straight from their value.
  2. We don't write enough tests that the semantic checker should reject. These are just as important as writing the ones that are valid in our language.

Tensor syntax

What do we want our tensor syntax to be? I liked declaring them

T : Tensor (r : Rank) <d_1..d_r>

but this is a pretty "functional" way of programming, and Edwards was pretty explicit about us having to give up on that if we're doing tensors. Thoughts?

Implement a block floating pass

In lambda-lift.ml once sfuncs are uniquely named, and have their parameters lifted (#117 & #118) implement a pass to float these blocks into the global scope. This can be done by consing them to sprogram's sfunc list.

Index binding in expressions

Sticking sindices in sfuncs isn't good enough because you can still use indices in main expressions; thus, index binding needs to be added to expressions.

Transforming everything to a Forall - goal is to figure out how to stick things into this.

Codegen for tensor literals

Now that we can semantically check tensor literals for their shapes, we need to generate code for them. This involves linking up the code in baby_runtime to codegen. We need to

  1. Finish writing print_tensor in C.
  2. Declare it under the other two printing functions we have in codegen.
  3. Somehow figure out how to call Talloc and Tfree from codegen when we encounter a tensor literal.

Long Compilation Times

I have a sneaking suspicion that whenever we run make, ocamlbuild -clean gets called, forcing the scanner, parser, ast, etc. to get recompiled, even if we only make changes to codegen or semant. From just looking at the Makefile, I'm not sure how easy this is to fix, but maybe @nbuonin could take a look at this.

Reorganize test files

Our current test structure leaves much to be desired. Ideally, we should have tests grouped into semantically meaningful folders. Maybe one big pass folder and one big fail folder. We can then move tests from pass into fail as we begin to implement a stricter compiler, instead of just rewriting all of the pass ones.

Tensor struct

Define a C struct representing a tensor. As we discussed in the proposal phase, tensors have

  • shape
  • contents

In an effort to cut down on our ambition, we're supporting only doubles (64-bit floats) in our tensors. We can capture the notion of a natural-valued tensor index with an unsigned int. We should probably make these of the 64-bit variety as well, since then we can avoid running into any memory allocation subtleties that come with having to put things with different sizes on the heap.

My first guess at what this struct would be is

struct Tensor {
    unsigned int size; //Internal, the actual number of doubles that the tensor is storing
    unsigned int rank;
    unsigned int *shape;
    double *array;
};

but perhaps someone has more ideas about this.

Record Syntax

Record syntax needs to be added to the proposal, in the syntax section.

data Tensor = Tensor { numType : (Num a) => a
, rank : Nat
, shape : Vect rank (\ x -> Fin x)
}

Broken parens

Parens only work on arithmetic expressions. Yikes! Trying to fix it, but it's rather annoying.

functions and variables

The time has come for TensorFlock to compile programs with functions! Probably start out with generating code for functions of the form f : SomeUnitType; f = some_unit_value and branch off of that for actual functions.

Implement fixed-size tensor allocation

This one's not too bad. Given a tensor of known size, rank, and shape, implement the c function that mallocs the struct on the heap. The real challenge is trying to do this for tensors whose structure is unknown at compile time.

Access a component of a tensor

This is the low level function in C that, given a Tensor struct pointer and indices, returns a pointer to the memory location of those indices.

typedef struct Tensor T;
typedef unsigned int uint;

double* access(T *tensor, ...) 
{
    va_list indices;
    va_start(indices, tensor);
    uint index = 0;
    uint i, j, prod;

    for(i = 0; i < tensor->rank; ++i) {
        prod = 1;
        for(j = 0; j < i - 1; ++j) {
            prod *= (tensor->shape)[j];
        }
        index += va_arg(indices, uint) + prod;
    }
    va_end(indices);
    return &((tensor->data)[index]);
}

Right now, it is agnostic to the rank of the tensor it is acting on. For efficiency, we may want to perform manual loop unrolling and specialize this for tensors of ranks 1-4. For now, though, this can serve as a black box that in our manual compilation for the next meeting.

Range function

Please compile the range function to C

range : Int -> T<n>;
range n = t; { t : T<n>; t[i] = cast i; }

Matrix-vector multiplication

Please compile matrix-vector multiplication

matTimesVec : T<n,m> -> T<m> -> T<n>;
matTimesVec mat vec = mat[i,j] * vec[j];

Pin llvm 3.8

Would it be possible, along with all the other stuff that our makefile drags in, to pin llvm to our dependencies so that it comes along?

Function to extract ids from expr

In order to create a dependency graph of a program, we need to be able to deduce which identifiers an expression depends on so that we can look for them in the rest of the program.

It should have type ids_in_expr : expr -> string list. Name can be changed to something with fewer underscores.

blah

test

  • test1
  • test2

Streamline TLit codegen

#100 works, but we can do better. I'm in an issue-happy mood now so I'm opening this up here. Probably will involve some changes to the Talloc function in the runtime. I'm thinking that we should just use it minimally to allocate space given a size and rank and let the LLVM deal with actually sticking contents inside of the returned struct.

Codegen var bug

Found a bug in codegen for global vars. We currently have a test that looks like:

main = res;

three : Nat;
three = 3;

res : Nat;
res = 2 ^ three;

It returns 8 like it should. However, if we reorder the functions like so:

main = res;
res : Nat;
res = 2 ^ three;

three : Nat;
three = 3;

then it returns 1. This is because when we do codegen for global variables, we initialize Nats to 0, and llvm is quite happy to accept that as a valid value for three and raise 2 to the 0'th power and only later reassign three, since reassignment is a thing that is allowed in llvm-land.

I think that this stems from a larger problem of us having copied large swaths of microc before we realized that our language is about as far away from microc as possible. Anyway, to solve this particular problem, we might need to change what we call globals into functions with 0 arguments and constant return values so that order of initialization stops mattering.

Makefile improvements

  • Modify "make state" to move generated files to a _state directory
  • Make a target to package project for first deliverable

Additions to Docker container

Add:

  • entr and/or watcher script
  • man pages
  • valgrind
  • LLVM suffix free
  • gcc extension that permits nested scope (for emitting LLVM that demonstrates nested scope)

Implement dynamic tensor allocation

As I mentioned in #13, this one is somewhat harder. If we declare something super general, like
T : Tensor r <d_1..d_r>, can we write the corresponding C (later LLVM) code that would actually perform such an operation under the hood? I'll take this one.

Rewrite LRM

Someday before turning in final project stuff we should rewrite some of the LRM, especially stuff on scope, lambda lifting, etc. keeping in mind the feedback that it's "not supposed to be a tutorial" like documentation.

Restricted class of expressions for bracket indexing

I implemented a restricted subset of expr, aexpr, for tensor shapes. It only has IDs, integer literals, arithmetic operations, and function application. Should we also restrict bracket indices to be of this class as well? The only argument against this would be that boolean statements can possibly be useful in conditionals, say "select this index if such-and-such and select this other one if otherwise." However, it's probably not super necessary to have this feature and this does a fair amount of type checking for us for free at parse time.

Average of vector and tensor

Please write equivalent C for the following functions:

vecAverage : T<n> -> T<>;
vecAverage v = (v[i] * ones[i]) / (cast n); {
    ones : T<n>; ones[i] = 1.0;
}

matAverage : T<n,m> -> T<>;
matAverage m = ((m[i,j] * ones'[j]) * ones[i]) / (cast (n * m)); {
    ones  : T<n>; ones[i] = 1.0;
    ones' : T<m>; ones[j] = 1.0;
}

C function to do einstein summation

@dawnstar63, could you try implementing something in C that contracts to tensor structs? I think it would look something like

T *contract(T *t1, T *t2, int idx1, int idx2, uint *new_shape) {
  // allocate the new tensor using the new shape (which is something that we'll have to pass from codegen)
  // a bunch of for loops to stick all of the correct components into the new tensor, summing over the indices given as arguments
  // return the pointer to the new tensor
}

Arbitrary Tensor Shape

How do we represent the arbitrary shape? Ideally, we would like to be able to operate on individual shape dimensions of tensors without knowing how many there are ahead of time in a type safe way, but creating syntax for this is somewhat hard.

Implement parameter lifting pass

In lambda-lift.ml implement a pass over the uniquely named sfuncs (see #117) to lift free variables in sfunc.sscope to sfunc.sfparams.

Do this by finding the intersection of ids in a sfunc's enclosing scope and sfunc.ssope, and then subtract the ids in sfunc.sfparams. The remaining ids are free variables; copy these into sfunc.sfparams.

Then update each call site in the sfuncs enclosing scope and sfunc.sscope with the additional parameters.

State table option in makefile

Would it be possible to have a makefile target that builds the table output of ocamlyacc? Generating the table output just involves passing a -v flag to ocamlyacc, but I can't figure out how to do this cleanly with ocamlbuild.

Topsort

Demote "functions" of global-vars to just global vars. Requires topsort to reorder dependencies.

Automate linking

Now that we have a real runtime, we need to better integrate it into our test process. Ideally we should have a executable that takes one .tf file as an argument and prints the output of the main expression.

Str module

Hi guys. I needed to use the ocaml Str module for some regex work with tensor literals. The files for the module are in my /usr/local/lib/ocaml/ folder, but ocamlbuild does not seem to want to find them. Any ideas to fix this?

Implement renaming pass

In lambda-lift.ml implement a pass of the list of sfuncs to:

  • give each sfunc.sfname a unique name, use incrementing ~s b/c we don't use those in our language
  • update each call site in a sfunc's enclosing scope and sfunc.sscope

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.