Giter Site home page Giter Site logo

kymckay / sqwhiff Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 466 KB

A C++ implementation of a preprocessor, lexer, parser and semantic analyzer for the Real Virtuality engine's SQF scripting language.

License: GNU General Public License v3.0

C++ 96.68% Starlark 1.78% Python 1.54%
sqf parser lexer semantic-analyzer

sqwhiff's People

Contributors

kymckay avatar

Watchers

 avatar

sqwhiff's Issues

Handle string literals

Format: ".+?"(?!") or '.+?'(?!')

  • Lexer tokenisation
  • Parser needs to make a string node in the AST

Use C++17 at least

I've come to realise that using new standards is preferable for a project that's brand new. Though there may be some compiler support research to do, I think C++20 would be ideal and if not then at least C++17.

  • Better to learn and use modern habits.
  • Would allow use of modern syntax and standard library (e.g. make_unique).

Will need to update docs to reflect any change.

Improve syntax error handling

Currently I just output to console and throw to stop execution.

I should really log errors in some way for later output and use as an analysis tool.

Failing to resolve GVAR macro

#define ADDON TEST
#define DOUBLES(var1, var2) var1##_##var2
#define GVAR(var1) DOUBLES(ADDON, var1)

GVAR(var)

This should resolve to TEST_var and is currently resolving to TEST_ var which is then considered as 2 tokens and causes a syntax error. Presumably a bug in the way I handle preprocessing concatenation.

Incorrect nullary parsing

A nullary keyword followed by a binary keyword is parsed incorrectly into the AST structure

allunits select 0 should become ((allunits) select <Dec:0>), but it currently is parsed as a unary: (allunits (select <Dec:0>))

Support control flow structures

Some of these may implicitly work with the current rules of constructing the AST, but at least switch is definitely unsupported:

  • if ... else
  • switch
  • while
  • for
  • forEach

Set up a better build process

Instead of relying on VS Code's tasks.json file to tell g++ to compile I should really set up CMake or Bazel for a better IDE agnostic workflow that will allow more configurability.

Goes hand-in-hand with #1

Handle array displays

Format [expr (, expr)*]

  • Lexer needs to add tokenisation of [ and ] characters
  • Parser needs to consume correctly when [ encountered

Add unit tests

Would be good to add in tests for some self documentation and to avoid regression as things get more complex.

GoogleTest seems like a decent choice, just requires some better build setup using CMake or Bazel.

Assignment is indistinguishable from an expression

Similar to #7

Same reasoning, because both an assignment statement and an expression can start with an identifier token (or a keyword token for SQF's "private assignment" modifier).

Requires the same solution (token lookahead), but up to two tokens ahead since assignment can be: <keyword> <identifier> <assign>

Add scalable framework for analysis strategies

Currently the analyzer is only checking arity and that is hardcoded into the AST traversal functions.

For future scalability, it would make sense to use some sort of strategy pattern where functions are defined elsewhere and the analyzer just makes calls out to those and captures the errors they report. This would make it easier to provide a method of disabling/enabling individual linting errors/warnings too.

Should probably be done before (or as part of) the second half of #20

Add command line API to disable specified rules

Currently analysis rules are stored in a map of int -> rule in this file. This was done with the idea of allowing the user to specify rules to ignore/skip.

Thinking is:

  • User gives list of ints.
  • Those rules are removed from the map before use in the main program.

NoOp can only be followed by ; (not ,)

Having multiple semicolons in a row is valid syntax (a bunch of no operations), but multiple commas produces an error. However commas can still be at the end of statements.

The implementation currently doesn't reflect this behaviour.

Improve macro expansion

  • Introduce MacroExpansion class
    • Takes
      • A line and column (start of token, can be used to find relative argument positions)
      • A multimap of macro definitions (used for nested expansion and parameter replacement)
      • The word and arguments as a single string to expand (done this way for recursive use)
    • Sequentially processes the string, performing the same dance the preprocessor does when encountering the start of a word (with recursive use of self to resolve inner expansion first)
  • When preprocessor encounters the start of a word:
    1. Read it all
    2. If it is a macro, read any arguments (accounting for balanced parentheses)
      • If not just push to the peek buffer
    3. Pass the details to a new MacroExpansion instance which will recursively resolve the expansion and populate a vector of PosChar for insertion into the peek buffer

Part of #13 - a more thought through method of expansion which should allow for easier handling of nested expansion and expansion within arguments as well as parameter replacement. Sequential processing resolves some of the edge case issues that can occur when attempting to do this via regex.

This should also improve the code structure and keep the responsibilities of each class more focused (at the moment there's a lot going on in the preprocessor)

Handle #include directive

Sub-task of #13

Loose design spec:

  • Want to legitimately include the other file (if not found, raise exception)
    • Filepath can be relative (i.e. ../)
    • Can be in quotations of angle brackets (i.e. <file.txt> or "file.txt")
  • Arma supports internal filesystem if path starts with \, provide means of specifying a directory to act as this path
    • If not specified and an internal path encountered then raise exception
  • Macros defined in an included file should be usable after the include line
  • Errors in an included file should point to the correct location in that file

Handle #ifdef, #ifndef, #else and #endif

Sub-task of #13

Should be relatively straightforward:

  • If the macro is defined (or not), then process up to the next #endif or #else as appropriate
  • Skip to #else block in the negative case

In terms of implementation, could either run through and push al applicable into the lookahead cache, or could save state and future calls act accoridngly.

Lexer errors aren't being logged

Parser errors seem to be working fine and output to console when running the main program, but there's no output when I expect lexer errors (encountering ? char, or an unclosed string)

Improve output formatting

Currently errors are just spit out in the order they are encountered with no file information (relevant now that inclusion preprocessing is in). I think it would be nice to:

  • Collect all errors associated to each file processed and output them under their file path at the end, something like:
    path/to/example.sqf
        1:5 PreprocessingError - Recursive inclusion of file: ./example.sqf
    
  • Don't include the file and line information in the error description, but store them on the error for command line output. This is more of a refactor to clean up the error structure semantics for testing and reuse.

Move token peek logic to lexer

Currently the parser implements peek logic to lookahead to upcoming tokens (for assignment identification).

For consistency with other interfaces this logic should really be in the lexer (istream allows peeking at next get, preprocessor allows peeking at next get).

Handle preprocessor directives

I probably need to introduce a lexical preprocessor to truly handle these since they'd be resolved before the sqf lexer takes over

  • #include
  • #define MACRO value
  • #define MACRO_FUNC(ARG1, ARG2, ...)
  • # (stringification)
  • ## (token concatenation)
  • \ (exactly before a newline, multi-line definition)
  • #undef
  • #if
  • #ifdef
  • #ifndef
  • #else
  • #endif

Handle code displays

Format: {statement_list}

  • lexer { and } tokens
  • Parse tokens to a code node in the AST

Raise semantic errors if incorrect arity used

Now that binary and unary command data is captured, the most basic check to implement in the analyser is:

  • If used as unary check the command is part of the unary keywords map
  • If used as binary check the command is part of the binary keywords map

Use namespaces to prevent global pollution

Currently some objects are instantiated in the global namespace for use in various places (e.g. the SQF command data maps).

These can be put into appropriate namespaces to not pollute the global namespace and to improve code readability (clearer where they come from when used).

Enable analysis of command arguments

  • Add in information (unordered multimap?) for the possible datatype configurations allowed
  • Use the information to analyse nullary, unary and binary operation nodes

Nullary operators are indistinguishable from unary operators

As a result of my decision to tokenise SQF commands as a single type of token and spot misuse errors semantically instead of syntactically (due to parsing complication that would require since commands can have all/some of nullary, unary and binary forms) the parser is currently unable to distinguish nullary from unary commands (since they both start with a keyword token).

I need to add a way for the parser to see ahead to the next token in order to handle this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.