Giter Site home page Giter Site logo

tlp's Introduction

Tulip

This is a compiled, statically typed, and stack based toy language inspired by Porth. The goal of this project is for me to explore the basics of compiler and language development.

Ultimately, I'm aiming to make the compiler self-hosted.

NOTE: THIS LANGUAGE IS NO LONGER UNDER ACTIVE DEVELOPMENT

  • See the about.md for a breakdown about the language and why I'm leaving it here.

Examples

Below are a few examples to demonstrate some of the basic concepts of the language.

Hello world:

use "std.tlp"

"Hello World\n" puts

Print numbers 0 to 99:

0 while dup 100 < do
    dup putu
    1 +
end drop

Quick Start

Before the compiler is self-hosted, the Python3 compiler can be used.

Requirements:

  • Python 3.7+
  • NASM v 2.13+

Compiling and running a Tulip program

Compile the fib.tlp example program. This will generate an executable file output.

pyton3 tulip.py examples/fib.tlp

Run the program with

./output

Running the tests.

I've got a number of tests (there's still a bunch missing) to make sure that the compiler's working properly. The ci.py program runs each of the tests and make sure the output matches the expected value.

python3 ci.py

Language Overview

This is a brief overview of the features currently in the language. I'll try to keep this up to date as new features are introduced.

Literals

Integers

A sequence of digits are treated as an unsigned integer, and pushed onto the stack.

10 20 +

Booleans

true and false are parsed as booleans and are represented with 1 and 0 respectively.

Booleans are treated separately from integers thus the following code would not compile:

1 true +

Strings

A string must be contained within two ". A string is a structure within Tulip that has both a size (int) and a pointer to the data (ptr)

// This is the internal representation of a Str
struct Str
    int // size
    ptr // data
end

When a string token is encountered, the Str structure is pushed onto the stack. As will be discussed later, structures can be treated as a single element.

When a string literal is compiled, a null terminator is placed at the end for convenience for working with the operating system. String operations do not rely on this null terminator, and rather use the size of the string. The size of the string does not include the null terminator.

Intrinsics

These are the built in operations for the language.

Stack Manipulation

Operation Signature Description
dup T -> T T Duplicates the top element on the stack.
swap A B -> B A Swaps the order of the top two elements
drop T -> Consumes the top element from the stack
putu int -> Consumes and prints the top integer on the stack
push T -> R[T] Consumes the top element of the stack, and pushes it onto the return stack.
pop R[T] -> T Consumes the top element of the return stack and pushes it onto the stack.

Comparison Operators

Not all comparison operators have been implemented yet.

Operation Signature Description
== a: int b: int -> bool Pushes a == b onto the stack
<= a: int b: int -> bool Pushes a <= b onto the stack
< a: int b: int -> bool Pushes a < b onto the stack
> a: int b:int -> bool Pushes a > b onto the stack

Syscalls

Operation Signature Description
syscall<n> T1, T2, ... Tn id: int -> int Performs the syscall with the corresponding id, with up to n arguments. 0 <= n <= 6

Group/Struct Operations

Tulip supports the creation of structures as well as anonymous structures or groups.

Operation Signature Description
<n> group T1, T2, ... TN -> Group<n> Groups the top n elements into one element
group.<n> Group<n> -> T Consumes the group and pushes the nth element onto the stack
cast(<name>) T1, T2, ... TN -> struct Groups the top elements of the stack into a struct
<name>.<n> struct -> T Consumes the struct and pushes the nth element onto the stack
split struct -> T1, T2, ... TN Breaks the struct/group into it's constituent parts

Structs and groups are treated as if they were a single element. For example the swap operation will swap the entire struct/group with the element below the struct/group, while preserving the order of elements within the struct/group.

For example the following program would print 1, then 3, then finally 2.

1 2 3   // Stack: 1 2 3 
2 group // Stack: 1 [2 3] 
swap    // Stack: [2 3] 1
putu    // Output: `1`
split   // Stack: 2 3
putu    // Output: `3`
putu    // Output: `2`

Function Pointers

Operation Signature Description
&<name> -> fn(T1, T2, ...) -> [R1, R2, ...] Pushes the pointer to function <name> onto the stack.

Control Flow

If Conditions

Type checking requires that each branch (at least two) of the if statement produces the same types onto the stack. For instance, if one branch pushes an int onto the stack, and another pushes two ints onto the stack, this will not compile.

if <condition> do
    <branch body>
else <condition> do
    <branch body>
else 
    <branch body>
end

While loops

Type checking requires that the types on the stack do not change from before the loop and after the loop. You cannot, for example, push an int onto the stack with each iteration of the loop.

while <condition> do
    <body>
end

Functions

fn <name> <Input Types> (-> <Output types>) do
    <function body>
end

// Eg. Function that takes an int and returns an int
fn foo int -> int do
    // ...
end

// Eg. Function that takes a bool and returns nothing
fn bar bool do

end

Generics Support

Tlp has some basic support for generics. By default, structs and functions have to use concrete types, and unkonwn types will be rejected. In order to make a type/function generic, prefix the definition with a with block.

Note: Functions aren't type checked until a concrete instance is created. Track the issue here.

with T
struct pair
  T T
end

with T
fn consumes_t T do
  // ...
end

Generic structs are created in the same way as a normal struct with cast(<name>).

with T
struct foo
  T
end

// creates foo<int>
1 cast(foo) 

Generic functions cannot be called directly and require a with-do block to turn the generic type into a concrete type.

with T
fn generic_put T do
    cast(int) putu
end 

1 with int do generic_put
true with bool do generic_put
3 cast(ptr) with ptr do generic_put
// This wouldn't compile
// "Hello World\n" with Str do generic_put

You can put generic types in signatures. To specify a generic struct to a type, then you need to use a with -> block.

with T
struct Foo
  T
end 

with T
fn takes_foo
  with T -> foo 
do
  // ...
end

Structs/Functions can take generic function pointers. The type is declased with another with block.

with T
struct foo
  with T &fn T -> int T end
end

with A B
fn foo 
  with A B &fn A -> B end
  with B &fn B end

Constant Expression

Tulip supports a very limited number of operations as constant expressions.

const <name> <expr> end

Reserving Memory

You can reserve fixed amounts of memory (such as for an array) with reserve blocks.

reserve <name> <int> end

Types

There are only four types in Tulip by default: int, bool, ptr, and Str.

Including From Multiple Files

You can include other files with use statements. Paths can be absolute or relative to the tulip.py compiler.

use "std.tlp"

tlp's People

Contributors

rtulip avatar

Stargazers

Daniel Quernheim avatar  avatar

Watchers

 avatar

tlp's Issues

Stack based language isn't as much fun to write as I hoped

When I first started this project, I was really interested by the notion of a stack based language because everything felt like a puzzle. But since then I've found that maintaining and revisiting code is really painful.

Every time I've had to re-visit old tlp code, I've had to go through all the logic again and again to follow exactly what is on the stack.
For instance, just look at this code:

with T
struct TypeInfo
    with T &fn ptr -> T ptr end // read
    with T &fn T ptr -> ptr end // write
    int   // SizeOf(T)
end

with T
struct Stack 
    with T -> TypeInfo
    ptr   // ptr
    ptr   // size 
    int   // cap
end

with T
fn Stack.Push
    with T -> Stack
    T
do
    push split
    push dup @64 pop 
    if < not do
        "Cannot push to stack. It's full.\n" eputs
        1 exit
    end 

    dup @64 1 + swap !64

    // How do you come back to this later???
    // What's on the stack???
    // This is gross to figure out what's actually going on!!!
    pop swap dup push @64
    cast(ptr) push swap pop swap 
    with T do TypeInfo.Write call
    pop !64

end

I haven't found any good way to make this more maintainable, unless you continue to write a ton of functions, but that's not really any better, and doesn't handle the stack manipulation part of the game.

Tsoding recently introduced let bindings into porth, which I think only reinforced how important bindings are for readability. Overall, I've really come to question if I want to continue to develop the semantics of this language, or take this as a learning opportunity, and reset the semantics.

Things I don't like:

  1. Working with many things on the stack in a function
  2. Big structures

Things I do like:

  1. Ease of "returning" multiple things
  2. The strongly typed-checked nature of the language.
  3. No AST

I think I've learnt a lot about the nature of programming languages, but I might take a reset and re-do most of the semantics.

Nested generic structs don't work.

This code doesn't compile right now, when it should be able to.

with T
struct foo T end

with T
struct bar
  with T -> foo
end

1 cast(foo) cast(bar) drop

Add constraints to generic types.

An extremely powerful feature would be to limit what types can be accepted as a generic. For example:

with T
fn generic_putu T do
  cast(int) putu
where T -> int bool ptr
end

This block would only allow generic_putu to be called if T is an int, bool, or ptr. Other types would be rejected with some nice error message. This is a prerequisite for #19.

Type Checking Skips generic functions

Any function that has a with label is skipped during type checking. This means that you don't know if you're generic function is actually sane until you try to call it.

This will be a tricky thing to resolve. Consider the following:

with T
fn generic_putu T do
    cast(int) putu
end

Until we can put bounds on T, we can't actually tell if this function is valid. As such, we may need to do something link:

with T
fn generic_putu T do
    cast(int) putu
where T -> int bool ptr
end

If there isn't a where clause, then it is assumed that T could be any type.

Multiple bounded generics could be handled with multiple where clauses, but it's kinda ugly. Though syntax really isn't the problem at this point.

with A B
fn foo A B do
    // ...
where A -> int bool ptr
where B -> int bool ptr
end

Generate Read/Write functions for Types

There's a pretty useful pattern that's arisen for writing types to buffers, which would be a lot nicer to use if they were generated automatically. This would also ensure that the implementation is consistent for all types.

For example:

fn !int int ptr -> ptr do
    dup SizeOf(int) ptr_offset push !64 pop
end

fn @int ptr -> int ptr do
    dup cast(int) SizeOf(int) - cast(ptr) push @64 pop
end

These functions will read/write to a pointer and return the updated pointer. In turn this makes creating similar functions for structs very easy:

fn !Str Str ptr -> ptr do
    push split pop
    !ptr !int
end

Generate SizeOf(T)

I've made functions such as SizeOf(int) and SizeOf(Str), but it'd be nice if these were auto-generated, or if SizeOf was just an intrinsic.

Maybe change the `if` semantics

if blocks behave very strangely if a bool is already on the stack.

// This behaves just fine
true
if do
  // ...
else <cond> do
  // ...
end

It might be worth removing the do after the if and have do just be used for else blocks.

0 1 == if
  // ...
else <cond> do
  // ...
else
  // ...
end

Support Argv and Argc

If I'm going to get this to be self hosted, I'm going to need to be able to provide a way to get to command line arguments

Add generic types to structs

If I want to be able to do lists I'm going to need to allow for generics in structs. The important thing is that a generic identifier (eg T) refers to the same type throughout the struct cast. On cast, a new concrete type can be generated.

Here's an idea of syntax I'm thinking of:

struct foo T with T end
5 cast(foo) // creates foo_int
false cast(foo) // creates foo_bool

Support Generics

Generics are getting to be very cumbersome with the current implementation, since it was done without much thought and on the fly.

This issues it to track the progress of implementing generics within both structs and functions.

Making a formal grammar

I'd be interested to try to make a proper grammar for this language. I'm wondering if that would be easier or harder to work with to make the compiler self-hosted.

Revisit `lexer.tlp`

I started trying to write tlp.tlp, but found working with lists extremely annoying. So I've since added generics supports to make that process easier. This in-turn broke both tlp.tlp and lexer.tlp. These need to be fixed up.

Add some support for lists

Lists are a real pain right now, especially to do correctly, as they require lots of duplicate code.

I'd love to create a struct which can be used to define the list and how to read from/write to it.
This is my initial brainstorming:

struct List
    fn_ptr // Read function pointer
    fn_ptr // Write function pointer
    ptr // Pointer to the start of the buffer
    ptr // Pointer to one past the last element in the buffer
    ptr // Pointer to an int to store the size.
    int // Capacity
end

This design needs to know how to pass function pointers, and know their type signatures, which might be a pain, but is probably useful to figure out.

Not sure if this is the best approach yet, will do some research

Add Names to structs

right now we just provide types, but I'm often labeling struct types with names anyways. It'd be nice to be able to do this:

struct Str
    int Size
    ptr Data
end

Str.Size and Str.Data would be generated as accessors instead of Str.0 and Str.1.

Groups with the same signature should be treated as the same.

Currently every group is considered unique, even if they have the same types and order.

For example, this currently won't compile --


1 2 
2 group

while true do
    split 2 group
end 
drop

All the loop would do is split and re-join the group, but we get the following error:

examples/test.tlp:5:1 [ERROR]: 
    While loops cannot change the stack outside of the loop
    [Note]: Stack at start of loop : [AnonStruct_0_2]
    [Note]: Stack at end of loop   : [AnonStruct_1_2]
    [Note]: Pushed at start of loop: []
    [Note]: Pushed at end of loop  : []

Add support for enums

For the moment, I currently have a bunch of consts, but this gets pretty tedious to write, it'd be much nicer to have something along the lines of:

enum <name>
    foo
    bar
    baz
end

which could be desugared to:

const <name>.foo 1 end
const <name>.bar <name>.foo end
const <name>.baz <name>.bar end

Improve test suite.

There's a number of missing compiler errors from the test cases, and a good number of functional tests that should exist.

Also, we should get the ci.py script to be run before any pull request.
The ci.py test script itself can be improved too so it's a bit easier to use & work with.

Add support for function pointers

In order to properly support polymorphism, functions pointers are going to help a lot. The type system should ensure that function pointers signatures are maintained.

Here's my thought on how I want to do the syntax. Similar to how generics are associated with the with keyword, function pointers (and maybe later addresses in general) can be associated with &.

&<name> will push a function pointer to <name> on the stack.
call will call the function pointer.
&fn [T1, ...] -> [R1, ...] end can define a function pointer type that takes [T1, ...] as input and returns [R1, ...] on the stack.

Defining generic types can be done with the same with statement as calling generic functions:
with T &fn T -> T T end

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.