The sqwhiff from kymckay

Handle string literals

Format: ".+?"(?!") or '.+?'(?!')

Lexer tokenisation
Parser needs to make a string node in the AST

I've come to realise that using new standards is preferable for a project that's brand new. Though there may be some compiler support research to do, I think C++20 would be ideal and if not then at least C++17.

Better to learn and use modern habits.
Would allow use of modern syntax and standard library (e.g. make_unique).

Will need to update docs to reflect any change.

Improve syntax error handling

Currently I just output to console and throw to stop execution.

I should really log errors in some way for later output and use as an analysis tool.

Replace use of generic runtime errors

As a matter of good practice

Support directory targets in the binary usage

Currently only files are handled, would be good to allow targeting a whole directory structure.

May want a parameter to disabled recursive processing.

Failing to resolve GVAR macro

#define ADDON TEST
#define DOUBLES(var1, var2) var1##_##var2
#define GVAR(var1) DOUBLES(ADDON, var1)

GVAR(var)

This should resolve to TEST_var and is currently resolving to TEST_ var which is then considered as 2 tokens and causes a syntax error. Presumably a bug in the way I handle preprocessing concatenation.

Test for accurate position of lexical and syntax errors

Position appears in the error message

Incorrect nullary parsing

A nullary keyword followed by a binary keyword is parsed incorrectly into the AST structure

allunits select 0 should become ((allunits) select <Dec:0>), but it currently is parsed as a unary: (allunits (select <Dec:0>))

Add lexer operator tokens needed for precedence

Support control flow structures

Some of these may implicitly work with the current rules of constructing the AST, but at least switch is definitely unsupported:

Set up a better build process

Instead of relying on VS Code's tasks.json file to tell g++ to compile I should really set up CMake or Bazel for a better IDE agnostic workflow that will allow more configurability.

Goes hand-in-hand with #1

Handle array displays

Format [expr (, expr)*]

Lexer needs to add tokenisation of [ and ] characters
Parser needs to consume correctly when [ encountered

Add unit tests

Would be good to add in tests for some self documentation and to avoid regression as things get more complex.

GoogleTest seems like a decent choice, just requires some better build setup using CMake or Bazel.

Assignment is indistinguishable from an expression

Similar to #7

Same reasoning, because both an assignment statement and an expression can start with an identifier token (or a keyword token for SQF's "private assignment" modifier).

Requires the same solution (token lookahead), but up to two tokens ahead since assignment can be: <keyword> <identifier> <assign>

Add scalable framework for analysis strategies

Currently the analyzer is only checking arity and that is hardcoded into the AST traversal functions.

For future scalability, it would make sense to use some sort of strategy pattern where functions are defined elsewhere and the analyzer just makes calls out to those and captures the errors they report. This would make it easier to provide a method of disabling/enabling individual linting errors/warnings too.

Should probably be done before (or as part of) the second half of #20

Add command line API to disable specified rules

Currently analysis rules are stored in a map of int -> rule in this file. This was done with the idea of allowing the user to specify rules to ignore/skip.

Thinking is:

User gives list of ints.
Those rules are removed from the map before use in the main program.

NoOp can only be followed by ; (not ,)

Having multiple semicolons in a row is valid syntax (a bunch of no operations), but multiple commas produces an error. However commas can still be at the end of statements.

The implementation currently doesn't reflect this behaviour.

Accept string input to lexer constructor

Useful for unit testing to be able to pass in fixed strings instead of a file stream

Improve macro expansion

Introduce MacroExpansion class
- Takes
  - A line and column (start of token, can be used to find relative argument positions)
  - A multimap of macro definitions (used for nested expansion and parameter replacement)
  - The word and arguments as a single string to expand (done this way for recursive use)
- Sequentially processes the string, performing the same dance the preprocessor does when encountering the start of a word (with recursive use of self to resolve inner expansion first)
When preprocessor encounters the start of a word:
1. Read it all
2. If it is a macro, read any arguments (accounting for balanced parentheses)
  - If not just push to the peek buffer
3. Pass the details to a new MacroExpansion instance which will recursively resolve the expansion and populate a vector of PosChar for insertion into the peek buffer

Part of #13 - a more thought through method of expansion which should allow for easier handling of nested expansion and expansion within arguments as well as parameter replacement. Sequential processing resolves some of the edge case issues that can occur when attempting to do this via regex.

This should also improve the code structure and keep the responsibilities of each class more focused (at the moment there's a lot going on in the preprocessor)

Handle #include directive

Sub-task of #13

Loose design spec:

Want to legitimately include the other file (if not found, raise exception)
- Filepath can be relative (i.e. ../)
- Can be in quotations of angle brackets (i.e. <file.txt> or "file.txt")
Arma supports internal filesystem if path starts with \, provide means of specifying a directory to act as this path
- If not specified and an internal path encountered then raise exception
Macros defined in an included file should be usable after the include line
Errors in an included file should point to the correct location in that file

Handle #ifdef, #ifndef, #else and #endif

Sub-task of #13

Should be relatively straightforward:

If the macro is defined (or not), then process up to the next #endif or #else as appropriate
Skip to #else block in the negative case

In terms of implementation, could either run through and push al applicable into the lookahead cache, or could save state and future calls act accoridngly.

Lexer errors aren't being logged

Parser errors seem to be working fine and output to console when running the main program, but there's no output when I expect lexer errors (encountering ? char, or an unclosed string)

Add tests for the binary usage

Test that the file arguments are handled as expected
Test that the number of errors is returned

Parser fails on empty array and code displays

Improve output formatting

Currently errors are just spit out in the order they are encountered with no file information (relevant now that inclusion preprocessing is in). I think it would be nice to:

Collect all errors associated to each file processed and output them under their file path at the end, something like:
```
path/to/example.sqf
    1:5 PreprocessingError - Recursive inclusion of file: ./example.sqf
```
Don't include the file and line information in the error description, but store them on the error for command line output. This is more of a refactor to clean up the error structure semantics for testing and reuse.

Move token peek logic to lexer

Currently the parser implements peek logic to lookahead to upcoming tokens (for assignment identification).

For consistency with other interfaces this logic should really be in the lexer (istream allows peeking at next get, preprocessor allows peeking at next get).

Handle preprocessor directives

I probably need to introduce a lexical preprocessor to truly handle these since they'd be resolved before the sqf lexer takes over

Keyword recognition should be case insensitive

Currently it isn't, need to make the lexer result lowercase before checking for keywords

Final line can end in a ; or , character

Currently the parser and grammar don't capture this, they enforce the opposite behaviour

Handle code displays

Format: {statement_list}

lexer { and } tokens
Parse tokens to a code node in the AST

Raise semantic errors if incorrect arity used

Now that binary and unary command data is captured, the most basic check to implement in the analyser is:

If used as unary check the command is part of the unary keywords map
If used as binary check the command is part of the binary keywords map

Update main to take file input as a parameter

Has been hardcoded to read "test.txt" for a while now, this is a solid step to move towards being a command line tool

Use namespaces to prevent global pollution

Currently some objects are instantiated in the global namespace for use in various places (e.g. the SQF command data maps).

These can be put into appropriate namespaces to not pollute the global namespace and to improve code readability (clearer where they come from when used).

Enable analysis of command arguments

Add in information (unordered multimap?) for the possible datatype configurations allowed
Use the information to analyse nullary, unary and binary operation nodes

Nullary operators are indistinguishable from unary operators

As a result of my decision to tokenise SQF commands as a single type of token and spot misuse errors semantically instead of syntactically (due to parsing complication that would require since commands can have all/some of nullary, unary and binary forms) the parser is currently unable to distinguish nullary from unary commands (since they both start with a keyword token).

I need to add a way for the parser to see ahead to the next token in order to handle this.

kymckay / sqwhiff Goto Github PK

sqwhiff's People

Contributors

Watchers

sqwhiff's Issues

Recommend Projects

Recommend Topics

Recommend Org