Giter Site home page Giter Site logo

minijava's Introduction

MiniJava

MiniJava is a subset of Java. It's a simple language for educational purposes.
Here is the grammar that I used to build the compiler. The Compiler in this repo is MiniJava to X64_86.
THIS COMPILER IS FULLY HAND WRITTEN. NO TOOLS WERE USED.

Development Status

The compiler is still under development here is the stages:

Lexer: finished.
Parser: finished.
AST Building: finished.
Type Checker has to passes:
  Symbol Table building: finished.
  Checking phase by consulting symbol table: finished.
Code Generator: finished.

Build and Run

$ make to build MiniJava from the source you will get MiniJava executable
run ./MiniJava with -h flag to see the help list.
You can Run with the following flags:

  • -h to show the help list.
  • -S to see the program from the Lexer prospective (Tokens).
  • -P to see the program from the Parser prospective for example:
    if we have x + y * z it becomes (x + (y * z)).
  • -T to run the type checker.
  • -C to run the Code Generator, you will get a file called demo.s
    under runtime/ this has the assembly code for your program.
  • After Generatin the assembly code you can get the executable of your program
    by executing make run, the executable name is out
    run the executable by typing ./out

Motivation

This is a side-project I totally built myself. The project
is for educational purposes. I studied compiler theories, and
I wanted to implement the theories I learned.

Acknowledgement

I realy thank University of Washington for making CSE P 501 Open source. This course is a great help and helped me a lot.

What you should know

To understand the code, you don't have to know Compilers' theories.
But here are some highlights, that might help:

  • Regular expressions
  • Compiler components: Lexer, Parser, Type checking, IR, and Code generation.
  • Know the order of compiler phases, and why we use this order?
  • What the AST? Why We use it?
  • What is the recursive-descent parser?
  • Hash tables
  • Graph theory

Lexical Analysis

The lexer does the Lexical Analysis. Lexer implementaion is in Lexer.h and Lexer.cpp
Here are most important functions in the Lexer:

  • getToken(): Reads the Token stream, and put each group of
    charachters in the suitable group e.g, Key word, operation, ...etc.
  • getNextToken(): Gets the next token in the input stream.
  • lookahead(): Retuerns the next token in the input stream
    used in Parser as it's LL(1) type
  • eat(TOK, string): Takes two arguments the token, and the string form of it.
    Consumes the token if it's the expexted token, or emits an error.

Parsing

The Parser is hand-crafted LL(1). I didn't use any tools to build it,
as I wanted to learn how Parsers are built internally.
We needed to use lookahead() to check parse variable declarations as
they have the same start tokens as assignment statements see grammar
NOTE: The grammar is not LL but I converted it to be LL.
Files Parser.cpp, and Parser.h

Error Recovery:

The compiler Prints Colored error messages. it continues parsing always, but
It exits if the source has 10 errors, couldn't parse the main class,
If there's an error in a class, or method declaration, the compiler skips
the class, or the method, but it emits a message of the error and a note
that it's skipping the operation of the current class or method.
If there is an error in a variable declaration, it skips till the next
semi-colon.
You can see the source program from the parser view by passing the -P flag.

Error Locations

The Error message has a correct location for the parse error, but
if the error is the last charchter in the line it expects the correct token to
be in the next line for example:

1 int x
2 int z;
output:
expected `;` at line 2

it expectes ; instead of int.
The implementation is in Error.h and Error.cpp

AST

The Parser has two important jobs:

  • Check the syntax and make sure it's error free.
  • Build the AST. The AST is very important we use it in type-checking, and code generation.
    AST implementaion is in AST.h, and AST.cpp.
    NOTE: For Any AST traversing in the compiler I used visitor pattern:
  • Print the source from the parser view.
  • Building The symbol table.
  • Type-checking.
  • Code generation.

Type-Checking / Semantic Analysis

After done with parsing we should check the semantics of the source code.
We do this by two passes:

  • Build a symbol table of every symbol in the program, and store its scope.
  • Perform the type-checking consulting the symbol table.

Building Symbol Table

We use the builtin hashmap std::map for the symbol table enteries. This phase is seriously challenging assume the following problems:

  • Variables and methods' Scopes, and overriding. Assume the following case:
    class A {
      int a;
      int fii() {int a; return a;}
      int tii() {retuen 0;}
    }
    class B extends A {
      int a;
      int foo(int a) {return a;}
      int fee() {return a;}
      int tii() {return 6;}
      int a() {return a;}
    }
    
    
    The compiler must handle the above case perfectly. in SymbolTable/SymbolTable.h and SymbolTable/SymbolTable.cpp, you see
    the interface, and implementation of symbol table.
    but the functions that interests you are enterScope(string), and
    exitScope(), these functions store the information of every scope it
    hits. before entering a new scope, enterScope creates a Symbol table
    entry, store the parent scope for the current scope and start storing it's
    information.
    After done with the current scope exitScope return the control to the
    parent scope.
  • Cyclic inheritance. Assume the following case:
    class A extends B {}
    class B extends C {}
    class C extends A {}
    
    How can we know this? the solution is simple, just build a graph using the class relations. every node is a class, and every directed edge represents extends, and check for the cycles using simple dfs.
    • Class appears before it's used. This problem doesn't exist in C / C++, for example:
    class A extends B {}
    class B {}
    
    or
    Class A {B b;}
    Class B {A a;}
    
    How can we know this? answer:
    • For the first case, class B must be in the symbol table before class
      A, so we use topological sort to know the order.
    • For the second case, If class B doesn't use class A object,
      we could use topological sort, but there's a cycle in declaration.
      how could we solve this? answer, just use a map e.g, map<string, bool>
      and before building the symbol table just store all classes.
      If we find the class in map then we don't emit an error. For example:
      the map has two values: {{A, true}, {B, true}}, the we start building the
      symbol table.
      Merging the above two solutions, sort, and store, we can build the symbol table
      successfully.

Type-Checking

After building the Symbol table this phase becomes easy. Just use
getScope, and exiteScope for every scope you hit. The type-checker
consults the symbol table about variables and classes, e.g:

  • Return type mismatch.
  • Use of undefined variable.
  • Check if the object has the method e.g, a.foo().
  • Declare variable of undefined types e.g:
    class A {B b;}
    
  • non-integer array index.
  • non-boolean if expression.
  • and many more...

You can find The Symentic Phase code in: SymbolTable/, TypeCheckerVisitor.h, TypeCheckerVisitor.cpp,
and TypeVisitor.h

Code Generation

This phase was the most difficult although it seems easy, but the generated
code had many Segmentaion fault errors most of them was because of stack alignment.
The technique used here is so simple traverse the AST in execution order
and emit the code in visitor methods.
One strategy that made the implementation easy is to treat X86 as a 1-register
language with a stack for intermediate values
all operations writes %rax register
The code however uses other register for some operations but not much.
function calls was a little bit trickt to implement but the code uses O(1) dynamic dispatch.
you can find the code in CodeGeneratorVistor.h and CodeGeneratorVisitor.cpp

Language Runtime

The MiniJava program should have runtime environment e.g space for heap, and stack,
%rsp and the other registers should have correct initial values, and a way
to allocate memory for objects, and arrays.

Bootstrapping

We use the existing C runtime, under directory runtime/ you will find
a C program boot.c. This program calls the MiniJava main function
as if it were a C function. By using this technique, it's possible
to call standard C functions, and the program will now use the C ennvironment.

Copyrights

I wrote All files in this project except runtime, and SamplePrograms
they are from CSE P 501 starter code, however you can use this code freely.

minijava's People

Contributors

elhewaty avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.