elikaski / bf-it Goto Github PK

View Code? Open in Web Editor NEW

120.0 120.0 11.0 1.43 MB

A C-like language to Brainfuck compiler, written in Python

License: MIT License

Python 91.46% Makefile 0.13% C++ 8.41%

brainfuck brainfuck-compiler brainfuck-interpreter c compiler icecream python python3

bf-it's People

Contributors

Stargazers

Watchers

Forkers

lazercorn hugonikanor clflush neeeoo makolath esovm stevexyz abhra2020-smart anonymousellama21 saewitz tymonf

bf-it's Issues

Using a variable with printchar could be optimized

Printing using a variable without math operations is a bit unoptimized.
It first copies the variable and then prints the copy.

int x = 65;
printchar(x);

Results in (pseudo code)

Set x to 65
Move x to dest and temp
Move temp to x
Print dest
Clear dest

This could be optimized as

Set x to 65
Print x

Add debug prints

Add the ability to include comments in the compile code, for every source code line.
For example, if we have:

if (x) 
{y;}

Will compile to something like:

; start of if statement (line 5 column 10)
	; start of evaluate if expression (line 5 column 11)
		; evaluate variable x (line 5 column 14)
			<>>>--++<<<.......
		; end of evaluate variable x (line 5 column 15)
	; end of if expression evaluation (line 5 column 16)
	; enter if scope (line 6 column 1)
		; start of statement y (line 6 column 2)
			>>><<,++-- .......
		; end of statement y (line 6 column 3)
	; end of if scope (line 6 column 4)
; end of if statement (line 6 column 4)

Same for arithmetic calculations, loops, and so on

This will help users deeply understand BF algorithms, and will help the compiler developers to debug.

[Small Bug] Using an assignment operator which isn't =

int y *= 3;

this compiles, but it shouldn't. Other compilers restrict the token after the type to only =, , (this compiler doesn't have this yet), and ;

If the divisor is known and is not zero then omit the Divide By Zero Check

Write a README for the C++ interpreter

I see it has many great features
Would be nice if we knew how to use them
@makolath

Editor

It would be helpful if the build process was a bit simpler
This PR I just made may be the solution: #69

Add Labels and Goto

This is a thing that would probably not be implemented for a long time, due to it's completely.

Here is an example:

void main() {
    int x = 5;
    int y = 7;

    if(x>y) {
        goto invalid;
    }

    for(int a = 0; a < 10; a++) {
        if(a*y+x > 128) goto invalid;
        if(a*y+x < 3) goto invalid;
    }

invalid:
    print("Invalid Value");
    while(1){};
}

Function return types can only be void or integer

We can add more types if I know how it works because it looks like nonsense to me

[BUG] I don't know how to explain this.

This doesn't work with normal compilers. But it works on this compiler.

int b = 0;
int e = 0;
for({
    int c = 5;printchar(c+48);for({
    int d = 6;printchar(d+65);} e - 9; e++) {
        printchar(e+48);
    }} b - 5; b++) {
        printchar(b+48);
}

Make local variables in functions a bit more optimized.

int add(int a, int b) {
     return a + b; 
}
int add_optimized(int a, int b) {
     return a%% + b%%; 
}

int main() {
    add(4, 5);
    add_optimized(4, 5);
}

Code for add(4, 5)

>[-]>[-]++++>[-]+++++>[-]>[-]<<<[>>+>+<<<-]>>>[<<<+>>>-]>[-]<<<[>>+>+<<<-]>>>[<<<+>>>-]<[<+>-]<<<<[-]>>>[<<<+>>>-]<<<<

Code for add_optimized(4, 5)

>[-]>[-]++++>[-]+++++>[-]>[-]<<<[>>+>+<<<-]>>>[-]>[-]<<<[>>+>+<<<-]>>[<+>-]<<<<[-]>>>[<<<+>>>-]<<<<

The compiler could make the local (not global) variables used in the return expression, be more optimized by not making the variable not be copied back.

It could be made so it only optimizes the return expression if it doesn't assign values or use the same variable twice. Since if the return expression is a + y*f + a/2, if it applied the optimization the second use of the a variable would be zero.
If it should be able to optimize it, it should be able to find the last use of the variable. So the resulting expression would be a + y%% * f%% + a%%/2

It might be a good thing to make a flag named OPTIMIZE_LEVEL in the build flags file. And that flag could be changed with a cli flag such as -O1 and that sets OPTIMIZE_LEVEL to 1. The default value would be 0

Divide Compiler.py into several modules

Currently this file is 2000+ lines, which is not ideal
Split it to different modules

[BUG] Array assignment using +=|-=|*=|/=|%= results in error

The error is TypeError: __init__() missing 1 required positional argument: 'node_expression'

Code to replicate error:

int main() {
    int arr[10];
    arr[5] = 5;
    arr[5] *= 5;
}

Add compile time string concatenation.

int main() {
    print("Hello" /*COMMENT*/ " World");
}

The string would be compiled as "Hello World"

int main() {
    print("1");
    print("2");
    print("3");
    print("4");
    print("5");
    
    // Could then be rewritten as
    
    print(
        "1"
        "2"
        "3"
        "4"
        "5"
    );
}

Support octal and binary literals

Octal literals are defined as 0[0-7]+
Binary literals are defined as 0b[01]+

If the parameter for printnum is a literal, then convert it to compile_print(str(num)) on compile time

printint(55);
// becomes
print("55");

Add Global Variables

Add a global id map which is separate from the scope id map.

[BUG] `print("\\");` results in error.

If array index is known then hardcode the offset

Lets say that you have int matrix[4][4]; and want to get matrix[1][1] which is item 6 then we can hardcode the ptr offset as >>>>>> and copy the value into the desired cell.

I know that this works for 1 dimensional array but that was before multidimensional arrays.

This might also work with setting an item.

Add block comments

Example

int a = /* Comment */ 8;

int b = 10;
/* Comment
Still a comment
Also a comment */ int c = 8;

[BUG?] Assigning before declaring works, but it shouldn't work.

int main() {
    a = 1;
    int a;

    printint(a);
}

16-bit ints

This would probably be a long term thing.

A 16-bit is stored as 2 cells since this is an 8-bit compiler.
The 16-bit max value is 65535.

A 16-bit int is also called a short.

The number 512 would be stored as 2 0.
The number 12941 would be stored as 50 141.

The compiler would have to keep track of the type of the variable.
It would also need to keep track of the type of the variable that is going to be assigned.

Bools and chars should still be treated as ints

This would also mean we would have to implement 2 versions of every algorithm since it would have to support shorts.

Casting an 8-bit int to a 16-bit int could be hard but it could be done by making a copy of the 8-bit number and setting the lowest cell of the short to the copy.

A short is defined like this

short x = 1241;

Converting from an int to a short could be implemented as implicit casting but it would require more work from the compiler.

int x = 65;
short y = x;

But it could also be implemented this way.

int x = 65;
short y = (short) x;

But there is a problem. What if we do:

short x = 6135;
if(x > 31) {
}

The number literal 31 is an int, we would need a short.

The compiler would also need support for short temp variables for copying.

Since printing only supports 8-bit cells in an 8-bit environment, we should print the lowest cell, and reading a char would only fill the lowest cell if it is defined like this:

short ch = readchar();
// or
short ch = (short) readchar();

The library function readint would work with shorts but it would need some changes.

Since this would also need to support arrays it would need to do some extra math to get the index.

1 dimensional array:

short arr[5]; // Makes an array that is 10 (5 * 2) cells large
// 0 0, 0 0, 0 0, 0 0, 0 0
arr[2] = 2834; // Sets the 4th and 5th cells. (0-indexed)
// real index = 2 * 2 + 1 = 5
// The real index points to the lowest cell of the short.

multidimensional array:

short arr[5][5]; // Makes an array that is 50 (5 * 5 * 2) cells large
arr[3][2] = 2834; // Sets the 34th and 35th cells. (0-indexed)
// real index = ((3*5) + 2) * 2 + 1 = 35
// The real index points to the lowest cell of the short.

Implementation:

The pseudo code of shows some ways to implement some algorithms that would use shorts.

These algorithms only support shorts and not a mix of shorts and ints.

But there are some algorithms where I provided with a mix of shorts and ints

Basic Math:

# x is the value before the operator
# x = short

increment: # x++;
    x.low += 1
    if x.low == 0: # (overflow from 255 to 0)
        x.high += 1

decrement: # x--;
    x.low -= 1
    if x.low == -1: # (255)
        x.high -= 1

add: (a) # a = short
    while a.high != 0 or a.low != 0:
        a.low -= 1
        if a.low == -1:
            a.high -= 1

        x.low += 1
        if x.low == 0:
            x.high += 1

sub: (a) # a = short
    while a.high != 0 or a.low != 0:
        a.low -= 1
        if a.low == -1:
            a.high -= 1

        x.low -= 1
        if x.low == -1:
            x.high -= 1

add: (a) # a = int
    while a != 0:
        a -= 1
        x.low += 1
        if x.low == 0:
            x.high += 1

sub: (a) # a = int
    while a != 0:
        a -= 1
        x.low -= 1
        if x.low == -1:
            x.high -= 1

Multiplication could be implemented in 2 ways:
Using the add code, or making a new algorithm.

Relational Operators:

To make a mixed version you could probably just replace b.high with 0

# Tested and it works (1h taken)
>: (a, b) # This could probably be optimized
    if a.high > b.high:
        return True
    else:
        if b.high > a.high:
            return False
        else: # a.high and b.high are equal
            return a.low > b.low

# Tested and it works (1h taken)
<: (a, b) # This could probably be optimized
    if a.high < b.high:
        return True
    else:
        if b.high < a.high:
            return False
        else: # a.high and b.high are equal
            return a.low < b.low

# Tested and it works (2h 12m taken)
>=: (a, b) # This could probably be optimized
    if a.high == b.high:
        return a.low >= b.low
    else:
        if a.high > b.high:
            return True
        else:
            if b.high > a.high:
                return False
            else:
                return a.low > b.low

# Tested and it works (2h 12m taken)
<=: (a, b) # This could probably be optimized
    if a.high < b.high:
        return True
    else:
        if b.high < a.high:
            return False
        else:
            return a.low <= b.low

# Tested and it works (2h 12m taken)
!=: (a, b)
    if a.high != b.high:
        return True
    else:
        return a.low != b.low

# Tested and it works (2h 12m taken)
==: (a, b) # a = short b = short
    if a.high != b.high:
        return False
    else:
        return a.low == b.low

==: (a, b) # a = short b = int
    if a.high != 0:
        return False
    else:
        return a.low == b.low

==: (a, b) # a = int b = short
    if b.high != 0:
        return False
    else:
        return a.low == b.low

False warning in uninitialized variable in for loops

It is possible to declare an array in a for loop statement
E.g.

    i = 0;
    for (int arr[10]; i < 10; i++) {
        ...
    }

An example is available in loops.code
And even declare and initialize: for (int arr[10] = {1,2,3,4} ; i < 10; i++) {

These kind of for loops cause a "false" warning to printed:
[Warning] For loop variable 'ID arr (line 6 column 14)' isn't assigned to anything and may cause side effects

Exclude the return_value from the output if the function return type is void

Every function called adds

return_value [paramN...] [localN...]

to the cells, even for void functions. So the return_value cell can be omitted, to make the code smaller and faster execution of the code since the pointer doesn't have to move around so much.

This would make side effects if you return in a void function unless the compiler prevents it.

Add enums

Could be nice to have user-defined enums

enum RESULT {
   SUCCESS=0,
   FAILURE=1,
   UNKNOWN=2
}

That can be used inside switch-case as literals:

switch (result) {
   case SUCCESS: ...
   case FAILURE: ....
   case UNKNOWN: ...
}

And as parameters to functions

void foo(RESULT r) {
   if (r == SUCCESS) ...
}

Add support for Unary Minus and Unary Plus

Unary Plus is unneeded since it does nothing. Defined as +NUM
Unary Minus is defined as -NUM
They should not break BINOP.

Add better array initialization

Currently if you want to set an array to have specific values you need to do this

int array[5];
array[0] = 40;
array[1] = 63;
array[2] = 15;
array[3] = 33;
array[4] = 76;

it would be better to be able to do this

int array[5] = {40, 63, 15, 33, 76};

int array[5];
array = {40, 63, 15, 33, 76};

(unsure if this is valid c)

The resulting code would probably look like this

[-]>[-]+++++[-<++++++++>]
>[-]+++++++[-<+++++++++>]
+++++++++++++++>[-]
+>[-]++++[-<++++++++>]
+>[-]+++++[-<+++++++++++++++>]

This should also support multi-dimensional arrays (This might be very hard to implement)
Example:

void main() {
    int array[2][5];
    int idx = readchar() - 48;

    if(idx == 0) {
        array[0] = {40, 63, 15, 33, 76};
        array[1] = {27, 122, 75, 23, 64};
    }
    if(idx == 1) {
        array = {
            {87, 45, 23, 12, 11},
            {1, 0, 9, 2, 12}
        };
    }
    if(idx == 2) {
        array = {
            13, 4, 98, 4, 7,
            22, 85, 43, 11, 52
        };
    }
}

Add multiple file support and executing BrainF code from a function

(feature request)
It would be nice to have a way to seperate code into files.
For example, i would make a subfolder in the BF-it folder with my code's name (let's say the name is calculator), and put multiple *.code files inside, with one called main.code
If a *.code file required another file, it would have a line that'll be like this: #import calculation.code (using .code in the line could be optional)
When it will import another file, every variable and function that is inside the file will be copied, for example:
inside calculation.code:

int square(int a){
  return a*a;
}

inside main.code:

#import calculation.code

int sqresult=0;
int num=0;
void main(){
  print("Enter a number...");
  num = readint();
  sqresult = calculation.square(num);
  print("The number ");
  printint(num);
  print(" squared is equal to ");
  printint(sqresult);
  print(".\n");
}

What would be also nice is to be able to insert BrainF code at compile time, for example:

int res = 0;
void main(){
  res = insert("++++[>++++++++<-]>[<+>-]<");
  if(res==32){
    print("BrainF insert success")
  } else {
    print("Error, res = ");
    printint(res);
    print(" and not 32.")
  }
}

(if you ever implement this, is it possible that you don't make it remove comments? thanks!)

Compiler code uses a non-existing module

edited one of the example codes and ran it and the compiler immidiately outputted a "module not found" error. i tried installing it with pip and it didn't help.

D:\BF-it>python BF-it.py examples/games/snake.code
Traceback (most recent call last):
  File "D:\BF-it\BF-it.py", line 6, in <module>
    from Compiler import Compiler
  File "D:\BF-it\Compiler\Compiler.py", line 2, in <module>
    from Exceptions import BFSyntaxError, BFSemanticError
ModuleNotFoundError: No module named 'Exceptions'

D:\BF-it>pip install Exceptions
ERROR: Could not find a version that satisfies the requirement Exceptions (from versions: none)
ERROR: No matching distribution found for Exceptions

D:\BF-it>

Make the compiler smarter with variables

The code below takes 276 bytes.

int main() {
    int a = 54;
    int b = 4;

    while(b != 0) {
        a++;
        b--;
    }
}

bf code for a and b init:

>>>[-]>[-]++++++[-<+++++++++>]<><>[-]<<<[-]>>[>+<<<+>>-]>[<+>-]<><[-]++++><>[-]<<[-]>[>+<<+>-]>[<+>-]

The code below takes 548 bytes.

int main() {
    int a = 54;
    int b = 4;

    while(b != 0) {
        a++;
        b--;
    }

    int c;
    int d;
    int e;
    int f;
    int g;
    int h;
    int i;
    int j;
}

bf code for a and b init:

>>>>>>>>>>>[-]>[-]++++++[-<+++++++++>]<><>[-]<<<<<<<<<<<[-]>>>>>>>>>>[>+<<<<<<<<<<<+>>>>>>>>>>-]>[<+>-]<><[-]++++><>[-]<<<<<<<<<<[-]>>>>>>>>>[>+<<<<<<<<<<+>>>>>>>>>-]>[<+>-]

Currently, the compiler allocates every variable present in the scope.
Which means if you only use a small quantity of variables in the beginning and more later it will take up a lot more code size.

What I was thinking is that what if the compiler allocated enough and then when more variables appears it shifts the temp variables to the right
So then when a is declared only that is allocated. Every other variable technically exists in temp until it's declared.

This might also fix #35 since we could check if it is already declared.

implement else if without scope brackets

Currently the only way to implement an "else if" is to put the if in the else's brackets:

if (x) {
...
} else {
if (y) {
...
}
}

need to implement elseif or something similar

License ?

Hello, under what license is this project released ? I would like to study it to learn from it. Thank you.

Make expression evaluation use short-circuit

Support recursion

With the new design of functions, it might be possible to support recursion
By compiling the function recursively, up to a certain depth, that is given at compilation time
Add --recursion_depth flag to indicate the maximum depth of the recursion
When hitting that depth in run-time, print an error and enter an infinite loop, similar to division by zero

Allow assigning a string to a char array

This depends on #22

Example:

char test[5] = {'H', 'e', 'l', 'l', 'o'};

but this also works in c

char test[5] = "Hello";