rahulc29 / kaleido Goto Github PK

2.0 2.0 0.0 61 KB

Abandoned due to code spaghettification😭

CMake 4.30% C++ 95.70%

kaleido's Introduction

I'm Rahul Chhabra (he/they), a student of Information Technology at Indian Institute of Information Technology, Allahabad, graduating in 2025.

My primary interests are best described by "computational trinitarianism" - logic, category theory, dependent type theory (especially cubical ones), and the development and engineering of proof assistants.

Much of my work right now is on my realizability project - a formalisation of categorical realizability from the ground up in Cubical Agda.

kaleido's People

Contributors

Stargazers

Watchers

kaleido's Issues

Add recursive descent parser

Add recursive descent + operator precedence implementation of the Parser API

Rewrite all Lexer API usages to not use deprecated API

The old Lexer API nextToken() is now deprecated. Rewrite all tests to use the newer tokenize() API instead.

Add function, variable and prototype structures for AST

The AST structure consists of nodes for simple unary and binary expressions such as addition, multiplication, division, etc. The basic structure therefore allows for parsing of expressions like 3 + 4 * ( 5 / 6 ) into an AST however not for more sophisticated ones like ( x + 5 ) * ( y - 3 )

Create Generator API to interact with LLVM API

The kaleido::gen::Generator API is to act as a layer of abstraction between the AST structure and the LLVM IR Generation API.

Add GitHub Actions

Add CMake based GitHub actions so that every push builds successfully and passes all the unit tests.

Add conditionals to the language

The language currently only allows for one-expression functions and conditionals such as if, else, elseif, are not supported. The language is not Turing complete yet.

Understand SSA form

The LLVM tutorial talks about the static single assignment form. Read about it and develop as good an understanding of the topic as possible.

Add parser driver

The method kaleido::parser::RecursiveDescentParser::parse is currently unimplemented. This method should return an std::vector<std::unique_ptr<kaleido::ast::Function>>. This can be done as of now since top-level expressions separated by newlines are each parsed into one anonymous functions.
Example

def add(a b) a + b
3 + 4 * add(5, 7)

Should return an std::vector with two elements.

The first element is a Function with a prototype consisting of 2 parameters (a and b), and the name add. The body should be an Addition with both the left and right children as Variables with names a and b respectively
The second element is the anonymous function generated for all top level functions, it should be an Addition with left child a Literal with value 3, and the right child should be a Multiplication, the left child of which would be a Literal with value 4 and the right child would be a Invocation with function-name add and argument list consisting of two Literals with values 5 and 7.

Add `if` statements

Grammatically, if may be defined as a keyword. So, something like [if](\W)* would do.
if keyword must be followed by a (
parenthesis must be followed by a boolean expression (a >= b && (c != d || e == f)) -> this can be implemented with a little bit of modification to the recursive descent parsing algorithm
the expression should legally end with a )
the condition of the if should be followed by the body declaration. The body may be modelled with an std::vector<kaleido::ast::TreeNode>. Once we add assignment operations we'll get somewhat functional if blocks.

Restructure tests into separate folder

Currently, all the tests are in the main.cpp file. This is obviously a bad idea.

Add comparison operators

implement comparison operators (<, >, <=, >=, ==, !=) in lexer and parser
implement comparison operators in AST
implement code generation for comparisons using fcmp instruction of LLVM

Add mutable variable

A new bool property for node type Variable akin to mIsMutable. Could also add a new subtype of Variable, MutableVariable to handle the case but I feel polymorphism here is unnecessary. Kaleido is very simple and mutables and immutables don't defer significantly enough for polymorphism to actually achieve anything.
A new AST node type Reassignment : BinaryExpression based around the = operator to allow for reassignments
Add methods to generator API to prevent generation of Reassignment instances iff the Variable to reassign has mIsMutable true.

Add assignment operator

The assignment operator = can be understood as yet another binary expression with an LHS and an RHS.
New keywords such as val and var may also be added to aid in parsing and declaring mutability
A legal assignment may then be val a = 3. Here the LHS is a Variable and the RHS is a Literal.
RHS can easily be parsed using recursive descent parsing while LHS must necessarily be a valid identifier
In grammatic pseudocode, we may write [val|var](\s)+{id}(\s)*=(\s)*{expr}

Add Parser API

Add a parser API with the following abilities:

store and const-access a unique reference to a Lexer and obtains the Lexer using a (hardcoded for now) factory method
iterate through the tokens provided by the lexer and create AST
return unique reference to AST and provide ownership to client code

Add Fortran I inspired parser

Add an implementation of the parser API wherein the operator precedence strategy is the simpler one used by the Fortran I compiler as described here

Allow parsing of non-pretty expressions

Non-pretty expressions are expressions wherein every tokenizable is not separated by whitespace. So while 60 + 3 * 3 is pretty, 60 + 3*3 is not.
Currently, the lexer is only able to tokenize the first kind.

Add generator functions for all AST node types

The kaleido::gen::Generator class acts as an interface between AST nodes and the LLVM IR generator. Each node generates bitcode for itself and recursively calls the generator of all it's children. This converts the entirety of the AST into LLVM IR.
Each node must therefore implement it's generator function.

Shift AST classes to separate folder

The AST classes are under the parser folder, they should be refractored into parser/ast

Dynamicise the matching of tokenizable classes

When tokenizable classes (such as keyword, identifier, operator, etc) are matched and the matching indices stored, the internal API insertMatches is called for each class. For a large number of classes, this is infeasible. The classes must therefore be put into a std::vector and dynamicise the insertions using for (auto &clazz : classes) of some sort.

Add Regex based Lexer

The current lexer is essentially an adhoc gluing of string processing. This string processing should be done via regex instead.

Add generated Lexer and implement Lexer API for it

The current Regex based Lexer RegexLexer uses an algorithm that runs in O(n^2 log(n)). Also it is incapable of handling even slightly complicated cases so that while for (int i=0;i<5;i++) can be tokenized for(int i=0;i<5;i++) does not tokenize perfectly (this particular case is flexible however I do not how many such corner cases exist). Also, it is not very extensible and adding if-else, loops and function definition using this lexer is going to be a nightmare.

Add basic structure of AST

AST must consist of

common tree node superclass
unary expression support (!, -, values themselves are unary expressions)
binary expression support (+, -, *, /, =, ==, !=, >, <, >=, <=, etc)
a different kind of node for everything else (function signatures, return statements, function calls, etc)

Add high-level client code

The high-level client code would perform the following tasks

factory methods to produce default implementations of the Lexer, Parser, and Generator APIs
determine the mode of input (standard input, file stream, HTTP, etc) and constructs the appropriate lexer
take ownership of the AST produced by Parser API
call the generator functions for the AST
provide the context for generation stage