kodek16 / colang Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 0.0 848 KB

CO language compiler, designed for programming contests and olympiads.

License: MIT License

Rust 98.16% C 1.84%

colang's People

Contributors

Stargazers

Watchers

colang's Issues

Add more checks to the last-resort validity checker

The validity checker that the program IR has to pass before it is returned from the frontend currently allows some kinds of invalid programs. Need to add more performed checks.

Add `Iterator` trait and `for` loop

Add a trait named Iterator that enables iteration over arbitrary sequences. Also consider adding Iterable trait to be implemented by collections with a single iter method that returns a new Iterator.

As the main consumer of these traits, add a for loop syntax. for loops require an Iterable value. They create a new variable visible inside the loop body that takes on consequent values returned by the created Iterator.

The syntax for for loops could look like this:

var v = [1, 2, 3];
for (x in v) {
  writeln x;
}

v must implement Iterable, and the type of x is inferred from this implementation.

Add basic trait support

Traits are the second piece of the puzzle that makes templates actually powerful. There are a lot of things traits could support, but for starters the following is required:

Traits can be defined with trait Foo { ... } syntax.
Traits can have method specifications that must be implemented in every implemented type.
Types can implement any number of traits by specifying them in their signature (alternatively a Rust-like approach could be explored) and providing implementation for all trait members.
Type parameters for all template entities can be bounded by traits.

Reimplement operator support through operator traits

Add a set of traits corresponding to various operators in the language (e.g. Add, Sub, Index), and rewrite operator support in the compiler to delegate to methods of these traits. This would also allow users to implement operator support for custom types by implementing the operator traits.

The exact design of these traits depends on how powerful the trait system is by that point, but here is an idea of how they could look like:

trait Add<T> {
  fun add(self, rhs: T): T;
}

struct MyNumber : Add<MyNumber> {
  // ...

  fun add(self, rhs: MyNumber): MyNumber {
    // ...
  }
}

fun main() {
  var x: MyNumber, y: MyNumber;

  // Expands to x.add(y).
  var z = x + y;

  // Causes an error that clearly states that `bool` does not implement `Add<bool>`.
  var b = true + false;
}

Add struct constructor expressions

Currently to create a struct with certain values used as its fields you need to first default-initialize a new struct, and then assign the field values manually:

struct Foo {
   x: int;
   y: string;
}

fun main() {
  var f: Foo;
  f.x = 42;
  f.y = "hello";
}

This is quite cumbersome (and even more so in expression contexts), and would be much better handled by a special syntax that fills the fields directly.

The added expression syntax could look like this:

var f = Foo { x: 42, y: "hello" };

Trait templates

Just like types, it should be also possible to parameterize traits by type parameters. A trait template should be defined using a syntax similar to

trait Into<T> {
    fun into(self): T
}

Types may implement different instantiations of the same trait.

Switch to a recursive descent parser

Some of the other issues would be very tricky to handle with the currently used parser generator (LALRPOP). The LR grammar is already quite inflated because of technical reasons (dangling else prevention), and adding some features like f-strings (#14) or optional semicolons (#15) would be either very difficult or downright impossible to fit into a pure LR grammar.

A long-term solution would be to move to a custom recursive descent parser that handles syntactic ambiguities manually one-by-one. A parser combinator might be useful to that end.

Fix and specify I/O flushing behavior

Flushing behavior needs to be designed and documented. One idea to explore is flushing output before every read instruction. This could work quite well for interactive contest tasks while still being very efficient.

Add "never" type for return statement type-checking

Consider how the following function would currently behave with the type checker:

fun is_positive(x: int): bool {
  if (x > 0) {
    return true;
  } else {
    return false;
  }
}

While overly verbose, this looks like a perfectly legit function that should not cause any compiler errors. Alas, the way type checker currently works, it sees the whole if as the final expression in the function body. This expression has type void, while the function return type is bool. A type mismatch is reported and the code does not compile.

There are various ways of expressing to the compiler why exactly this reasoning is flawed, but one of the cleanest ones is the "never" type. Let us call it ! for now.

With just a few rules concerning ! type, we can make the previous example work:

A block that contains a return statement has type !.
A block that contains an eval instruction with expression of type ! has type !.
Function body is allowed to have ! type whatever its return type is.

The last rule is actually just one specific case of a more general rule that says that ! can be substituted for any other type T, but this rule is a bit more tricky to implement in the current compiler. For now just this particular case should be enough to solve most of the syntactic troubles with return statements.

Make methods of type templates function templates

Once function templates are implemented, type template methods should be treated as function templates even if they do not introduce additional type parameters. This will make the whole template system more consistent.

Make diagnostic locations more specific

There are quite a few locations in the compiler where SourceOrigin that ultimately goes into the diagnostic is not as specific as it could be. Fixing this would improve the legibility of error messages.

Enforce the signature of `main`

Currently the analyzer seems to allow any signature of main, which causes problems down the line. Need to assert that only fun main() is allowed.

Make semicolons optional

Semicolons should be optional delimiters between statements in ambiguous syntactic contexts, not an item that has a central role in the syntax (as it currently is, since semicolons promote expressions to statements).

Automatically dereference pointers in operator expressions

Similarly to how pointers are currently automatically dereferenced in field and method expressions, it should be also possible to perform auto-deref in operator calls.

The way this would work is: if an expression of pointer type is used as an operand in some set of binary operator expressions (e.g., a + b, where a: int and b: &int), then it is automatically wrapped in a deref expression (so the example would be the same as a + *b).

For this to work consistently, pointers must themselves not support any of the operators that support auto-deref.

Add `!is` operator

Add a new operator that serves as syntactic sugar for !(a is b).

The two options that come to mind are either a is not b or a !is b. The former would introduce a new not keyword (which would confusingly not work as a negation operator) and is longer to type, so the latter seems like the better option.

Improve runtime error messages in the interpreter backend

Add more fluff to the runtime error messages so that they are easier to read to understand.

Improve consistency and usability of `void` type and stmt/expr distinction

(a.k.a. the void-sanity tracking issue)

The void type and statement vs. expression syntax right now is quite easy to trip on. Missing semicolon errors point to the next line which is most often unrelated to the issue. Both in the compiler internals and various documentation, void and void-expressions stick out as a sore thumb.

There are a number of things that need to be done to address it:

Make semicolons optional.
Automatically promote expressions to statements when and only when implied by context.
Consider removing void type from program IR. This would prevent a lot of confusion and potential void-related errors, but might require considerable code duplication (IfExpr/IfStmt, BlockExpr/BlockStmt, function calls, maybe something else?).

Bidirectional type inference in `is` expressions

Currently in the expression a is b only b may have a context-dependent type (which is inferred from the type of a). This covers the more common form p is null for example, but null is p causes an ambiguous type error.

While having null on the left side looks a bit weird, it is still a valid use case (and there can be other ambiguous-type expressions as well). Need to find a way to enable type inference in both directions.

Add interpolation syntax (f-syntax) for strings

Add a new string literal form that looks like f"...". Inside this literal, $ character has a special meaning: it must be followed by an expression that is evaluated, converted to string, and inserted in the string constructed from the literal at evaluation time.

Two interpolation patterns should be supported: $name evaluates the variable named name, while ${x} evaluates an arbitrary expression x.

Add hexadecimal escape sequences to `char` and `string` literals

Add \xdd escape sequences to literals where dd is a two character long hexadecimal number corresponding to a single byte.

Optimize `writeln` in C backend

Currently a writeln statement emits two IR write instructions, the second one with string literal "\n". The C backend converts this literal into a proper CO runtime string value before printing it which involves unnecessary operations including allocation. This needs to be optimized.

Add associated functions

Associated functions, or "static methods" in some languages, are functions belonging to a type namespace, but not requiring an instance of that type as a self parameter unlike methods.

Associated functions would be useful both for avoiding collisions in global namespace and potentially making traits more powerful by allowing a trait to specify required associated functions.

The proposed syntax is:

struct Foo {
  // A function defined within a struct definition without a self parameter should be
  // considered an associated function.
  fun bar() {
    // ...
  }
}

fun main() {
  // A new namespace navigation operator `::`.
  Foo::bar();
}

Add type conversion expressions

Add a new expression kind: a as T converts the value of the expression a of some type U to some other type T.

The conversion should be backed by a trait implementation: need to consider doing either T: From<U> or U: Into<T>.

Fully-qualified method names in diagnostic messages

Right now when a diagnostic refers to a method, it does so by only using its name. This gets confusing when traits and templates come into play, as it gets hard to distinguish which type the method belongs to. The solution would be to make methods aware of their "fully-qualified" names that include the whole path from the root namespace to the method name. An example of such name could be Foo::bar for a method bar of type Foo.

Need to consider #4 when designing the solution.

Add ranges support

Add a new Range type that would enable easy and consistent iteration over number ranges. The exact design is TBD, but the minimal requirements are:

Ranges of int type are supported, possibly other integer, or even numeric types.
Ranges can be constructed using .. and ..< operators:
- a..b constructs a range where both ends are included.
- a..<b constructs a range where right end is excluded.
Ranges implement the Iterable trait.

Improve type deduction explanation in diagnostic messages

Currently the compiler provides explanation notes about the type of an expression when this type might not be obvious at first glance. Need to enable this behavior for more kinds of complex expressions and make sure that the notes are not superfluous.

kodek16 / colang Goto Github PK

colang's People

Contributors

Stargazers

Watchers

colang's Issues

Recommend Projects

Recommend Topics

Recommend Org