Giter Site home page Giter Site logo

degory / ghul Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 12.62 MB

compiler for the ghūl programming language

Home Page: https://ghul.dev

License: GNU Affero General Public License v3.0

Shell 70.14% Dockerfile 13.87% C# 16.00%
compiler dotnet dotnet-core ghul programming-language

ghul's Introduction

ghūl language logo

repositories for the ghūl programming language, its compiler and supporting tools

CI/CD NuGet version (ghul) Release Release Date Issues License ghūl

compiler for the ghūl programming language

CI/CD Visual Studio Marketplace Release Release Date Issues License ghūl

a Visual Studio Code extension providing ghūl language support

CI/CD NuGet version (ghul.runtime) Release Release Date Issues License ghūl

low level dependencies required by all ghūl applications

CI/CD NuGet version (ghul.templates) Release Release Date Issues License ghūl

templates for the .NET new command which create new ghūl console application and class library projects

CI/CD Release Release Date Issues License ghūl

a template for creating a new ghūl project repository on GitHub

CI/CD NuGet version (ghul.test) Release Release Date Issues License ghūl

a snapshot based integration testing framework used by the ghūl compiler project

CI/CD Issues License ghūl

ghūl programming language examples

Issues License

ghūl ASP.NET web API example

Issues License

ghūl programming language website

ghul's People

Contributors

degory avatar dependabot[bot] avatar quanglewangle avatar

Stargazers

 avatar  avatar

Watchers

 avatar

ghul's Issues

parse expressions

parse expressions supporting same set of operators with same precedence as old L syntax

parse namespaces

support parsing namespaces:

namespace Something is
    ...
si

should also support qualified names, e.g.

namespace Syntax.Parser is
    ...
si

name for new language

new language needs a name:

  • unique in domain of programming languages
  • available DNS

AST rewriters

Rewrite AST in-place and/or rewrite to copy.
Delegate tree walk to nodes where possible.
How to deal with nodes that change type when rewritten? Should we/will we be forced to relax static typing of child nodes within parents?

interprocedural data and type flow analysis

want to be able to apply some higher level optimisations, such as devirtualization, that require the compiler prove certain values only ever hold specific types or a subset of possible types, or are not changed across a particular call or code block. In addition, some of this information can be passed to LLVM to aid in its alias analysis.
need a framework that to support this analysis.

parse enums

enum SUITS is
  CLUBS,
  DIAMONDS,
  HEARTS,
  SPADES
si

enum BITS is
  ONE = 1 << 0,
  TWO = 1 << 1,
  THREE = 1 << 2,
  FOUR = 1 << 3
si

parse indexers

support parsing readable and assignable indexers

class INDEXABLE is
    elements: int[];

    [index: int]: int
        => elements[index],
        = value is
            elements[index] = value;
        si
si

parse access modifiers and storage classes

trying to avoid need for this stuff in the first place, and instead to use conventions instead. however, in short term, me something we can translate from ghul to L with no symbol table or understanding of semantics.

place access modifiers and/or storage classes after colon and before type

property: int private static;
constant: int const = 50;

pretty printer for legacy syntax

When it comes to bootstrap the new compiler, I do not want to manually translate new compiler source from old L language syntax to new language - there could be tens of thousands of lines.

To avoid doing this, I can use the new compiler to translate new language syntax to old.

New syntax source -> [New compiler] -> Old syntax -> [Old compiler] -> Executable

Provided the new compiler is written in a subset of the new language where the new syntax maps directly onto the old syntax with the same semantics, this can be accomplished with a pretty printer.

Required:

  • Parser able to recognise a sufficiently large subset of the language to implement the new compiler in
  • Pretty printer able to accurately output the parse tree in equivalent old syntax
  • Pretty printer emits #file and #line directives so error reporting is accurate
  • Alter format of error output from old compiler in line with new, so same VS Code problem match regex will work across both sources of errors
  • Some kind of compiler driver to run the old compiler on the generated output, and merge error reports in with errors from new parser - shell scripts are probably sufficient

Nice to have:

  • Some mechanism to avoid changing timestamps on generated files if they're not changed - maybe via hashing of content.

Once this mechanism is in place, hand translate the new compiler to the new syntax, and all new development can be done in the new syntax.

deleted

Compiler should be capable of building itself

choose parser generator

Need to determine how to generate the parser.

Options are:

  • Bison
  • ANTLR
  • Hand-rolled recursive descent

parse list literals

parse list literals

[a, b, c, d]
[[x, y], [z, w]]
["a", b, c]: System.Object

need to support an optional explicit element type, at least until bootstrapped, because the legacy compiler's type inference for list literals is not 100% reliable

test framework

some means of automated regression testing

  • to start with, just capture compiler outputs for a set of test inputs and verifying that they don't change unexpectedly from build to build.
  • once compiler can produce executables (initially via legacy compiler and ultimately directly), then also capture the output of test executables and compare against previously captured

parse assignment statements

parse assignment statements.

Consider destructuring assign, e.g.

a, b, d, e = (a, (b, c), d);
x, y = y, x;

However, may not be straightforward to dumb-translate this into L old syntax (might require temporary variables)

parse case statements

parse case-when-default-esac statements

case a
when x: do_x();
when y: do_y();
default: do_something_else();
esac

static members or alternative

static member access on generic classes is ambiguous with array/indexer access:

    Util.Sort[String].sort(["a", "b", "c"]);
    some.variable[a_string].sort(["a", "b", "c"]

options include

  1. just don't support static members
    if I don't support static members at all, then what would replace them? something like 'sort' can be a mixin/trait, but that requires a class opt-in to that behaviour, whereas Util.Sort can be applied to anything that implements the right interface. other use cases can probably be covered by functions/properties/variables at namespace scope, although this feels like it might be a bodge.
  2. parse accepting the ambiguity and then disambiguate based on context/type in a later pass
    just accepting the ambiguity and attempting to resolve once type information is available might work. would need to alter unary expression parser to accept postfix type construction operations (e.g. [] ref ptr ?), and the primary parser would need to accept built in types as expression primaries.
  3. rely on type inference for generic arguments so [ always unambiguously introduces an indexer access
    legacy compiler does not have type inference for access to static members of generic classes, and adding it is probably impractical. going with this option would mean forgoing this functionality in the ghūl compiler until after it's bootstrapped, or leaving code that needs it in L.
  4. alternative/additional syntax for namespace + static member access, e.g. some_class::static_member Util::Sort[String]::sort(["a", "b", "c"]).
    this would work, however it imposes an unreasonable overhead on all namespace accesses, to support what is not actually a particularly common scenario. note that to avoid needing unbounded lookahead, namespaces members would need the special syntax as well as classes. otherwise the parser would still struggle with this Util.Sort[String]::sort(["a", "b", "c"])

parse indexed properties

support assignable and writable indexed properties

class DOUBLE_THINGS is
    elements: int[];

    single_things[index: int]: int
        => elements[index],
        = value is
            elements[index] = value;
        si

    double_things[index: int]: int
        => elements[index] * 2,
        = value is
            elements[index] = value / 2;
        si
si

parse nullable types

  x: int?;
  y: String?;

Ultimately want nullable-ness to be opt-in across all value and reference types and all accesses through nullables to require a preceding successful has-value operation (?), provable via data-flow analysis.
Can't implement this until we have a real middle-end - even foregoing the data-flow analysis, we cannot determine what legacy L to emit without symbol type information, as required behaviour would differ for value types vs reference types (value types need wrapping in a struct, with a has-value flag, reference types can stay as is, with null representing has-no-value)
Not sure in the interim that there's much point in bothering to parse this?

parse class definitions

parse class definitions

class name[A,B]: parent[X], parent[Y] is
    ...
si

When translating to old L syntax, just assume the first type in the ancestors list is the base class and any subsequent are interfaces.

Depends on #27 + #26

parse property definitions

support parsing` properties

  • auto
  • explicit
  • either public read, private assign auto properties or private fields, if names prefixed with _
  • if translating to old L syntax then pretty printer will need to generate backing field and accessors
prop: int; // auto property, public read, private assign
_prop: int; // private field
prop: int
    => _some_field,
    = value is _some_field = value; si

parse function definitions

parse function definitions

  • support expression bodied functions as well as block statement bodies.
  • support type inference (but not if translating to old L syntax)
some_function(a: int, b: int) -> int => a + b;
other_function(a: int, b: int) -> int is return a - b; si

parse use

use some.namespace;
use some.namespace.symbol;

parse for statements

statement will be a foreach, i.e. new syntax

for i in 0..9 do
    IO.Std.err.println("blah");
od

is equivalent to old L syntax

foreach var i; 0..9 do
    IO.Std.err.println("blah");
od

Note

  • No explicit var - loop variable is auto-declared let, scoped to body of loop
  • If concerned about efficiency, could recognise range constructor as a special case.
  • Either no support for C style for, or differentiate it by presence of semicolons

tame visitor implementation

Separate sub-visitors for sub-types of AST with useful pluggable default behaviours - e.g. skip unhandled subtrees entirely, or walk unknown subtrees picking up again where recognized. Delegate walking subtrees to node classes to avoid duplicating in every visitor (not suitable for pretty printer as order depends on nodes, but likely applicable to future visitors)

grammar

Need complete grammar for language that can be compiled in to a parser

  • Generated parser must be able to deterministically transform valid input to a parse tree

tokenizer

Need to be able to transform an input stream from bytes/characters into tokens

  • Must integrate with Bison generated parser (expose yylex compatible interface)
  • Must work with Bison grammar to disambiguate potentially ambiguous input (generic brackets)

parse loop labels

parse labels for loops, for targetted break and continue statements.

some_label: for i in list do
    break some_label;
od

pretty printer is not pretty

Fix indentation and spacing
Consider inserting line breaks and indentation in complex expressions.

  1. Add a new visitor that holds a structure for each visited node:
  • Estimated text width of this node, based on widths of children
  • Estimated text width of this node, if split across multiple lines
  • Split preference/weight/priority
  1. Walk the tree calculating text widths and spit priorities
  2. Pretty printer then applies an appropriate line splitting strategy to each node to achieve both a desired target width and a never-exceed width

Add support for producing HTML markup:

  • Escape HTML entities
  • Wrap root node in HTML pre
  • Wrap keywords etc. in span with appropriate CSS class

For website, call compiler on fly to render code examples for detected with - expose an AJAX service that can format any code snippet from its server-side file name (obviously needs to be done securely so can't be used to read arbitrary files from the server, probably by doing a lookup through a white-list table to the real file name)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.