Giter Site home page Giter Site logo

fstirlitz / luaparse Goto Github PK

View Code? Open in Web Editor NEW
456.0 19.0 93.0 3.27 MB

A Lua parser written in JavaScript

Home Page: https://fstirlitz.github.io/luaparse/

License: MIT License

Makefile 0.14% Lua 3.91% Shell 0.28% JavaScript 95.29% HTML 0.24% CSS 0.14%
lua lua-parser ast javascript-library luaparse javascript

luaparse's Introduction

luaparse

A Lua parser written in JavaScript, originally written by Oskar Schöldström for his bachelor's thesis at Arcada.

Installation

Install through npm install luaparse.

Usage

CommonJS

var parser = require('luaparse');
var ast = parser.parse('i = 0');
console.log(JSON.stringify(ast));

AMD

require(['luaparse'], function(parser) {
  var ast = parser.parse('i = 0');
  console.log(JSON.stringify(ast));
});

Browser

<script src="luaparse.js"></script>
<script>
var ast = luaparse.parse('i = 0');
console.log(JSON.stringify(ast));
</script>

Parser Interface

Basic usage:

luaparse.parse(code, options);

The output of the parser is an Abstract Syntax Tree (AST) formatted in JSON.

The available options are:

  • wait: false Explicitly tell the parser when the input ends.
  • comments: true Store comments as an array in the chunk object.
  • scope: false Track identifier scopes.
  • locations: false Store location information on each syntax node.
  • ranges: false Store the start and end character locations on each syntax node.
  • onCreateNode: null A callback which will be invoked when a syntax node has been completed. The node which has been created will be passed as the only parameter.
  • onCreateScope: null A callback which will be invoked when a new scope is created.
  • onDestroyScope: null A callback which will be invoked when the current scope is destroyed.
  • onLocalDeclaration: null A callback which will be invoked when a local variable is declared. The identifier will be passed as the only parameter.
  • luaVersion: '5.1' The version of Lua the parser will target; supported values are '5.1', '5.2', '5.3' and 'LuaJIT'.
  • extendedIdentifiers: false Whether to allow code points ≥ U+0080 in identifiers, like LuaJIT does. Note: setting luaVersion: 'LuaJIT' currently does not enable this option; this may change in the future.
  • encodingMode: 'none' Defines the relation between code points ≥ U+0080 appearing in parser input and raw bytes in source code, and how Lua escape sequences in JavaScript strings should be interpreted. See the Encoding modes section below for more information.

The default options are also exposed through luaparse.defaultOptions where they can be overriden globally.

There is a second interface which might be preferable when using the wait option.

var parser = luaparse.parse({ wait: true });
parser.write('foo = "');
parser.write('bar');
var ast = parser.end('"');

This would be identical to:

var ast = luaparse.parse('foo = "bar"');

AST format

If the following code is executed:

luaparse.parse('foo = "bar"');

then the returned value will be:

{
  "type": "Chunk",
  "body": [
    {
      "type": "AssignmentStatement",
      "variables": [
        {
          "type": "Identifier",
          "name": "foo"
        }
      ],
      "init": [
        {
          "type": "StringLiteral",
          "value": "bar",
          "raw": "\"bar\""
        }
      ]
    }
  ],
  "comments": []
}

Encoding modes

Unlike strings in JavaScript, Lua strings are not Unicode strings, but bytestrings (sequences of 8-bit values); likewise, implementations of Lua parse the source code as a sequence of octets. However, the input to this parser is a JavaScript string, i.e. a sequence of 16-bit code units (not necessarily well-formed UTF-16). This poses a problem of how those code units should be interpreted, particularly if they are outside the Basic Latin block ('ASCII').

The encodingMode option specifies how these issues should be handled. Possible values are as follows:

  • 'none': Source code characters all pass through as-is and string literals are not interpreted at all; the string literal nodes contain the value null. This is the default mode.
  • 'x-user-defined': Source code has been decoded with the WHATWG x-user-defined encoding; escapes of bytes in the range [0x80, 0xff] are mapped to the Unicode range [U+F780, U+F7FF].
  • 'pseudo-latin1': Source code has been decoded with the IANA iso-8859-1 encoding; escapes of bytes in the range [0x80, 0xff] are mapped to Unicode range [U+0080, U+00FF]. Note that this is not the same as how WHATWG standards define the iso-8859-1 encoding, which is to say, as a synonym of windows-1252.

Custom AST

The default AST structure is somewhat inspired by the Mozilla Parser API but can easily be overriden to customize the structure or to inject custom logic.

luaparse.ast is an object containing all functions used to create the AST, if you for example wanted to trigger an event on node creations you could use the following:

var luaparse = require('luaparse'),
    events = new (require('events').EventEmitter);

Object.keys(luaparse.ast).forEach(function(type) {
  var original = luaparse.ast[type];
  luaparse.ast[type] = function() {
    var node = original.apply(null, arguments);
    events.emit(node.type, node);
    return node;
  };
});
events.on('Identifier', function(node) { console.log(node); });
luaparse.parse('i = "foo"');

this is only an example to illustrate what is possible and this particular example might not suit your needs as the end location of the node has not been determined yet. If you desire events you should use the onCreateNode callback instead).

Lexer

The lexer used by luaparse can be used independently of the recursive descent parser. The lex function is exposed as luaparse.lex() and it will return the next token up until EOF is reached.

Each token consists of:

  • type expressed as an enum flag which can be matched with luaparse.tokenTypes.
  • value
  • line, lineStart
  • range can be used to slice out raw values, eg. foo = "bar" will return a StringLiteral token with the value bar. Slicing out the range on the other hand will return "bar".
var parser = luaparse.parse('foo = "bar"', { wait: true });
parser.lex(); // { type: 8, value: "foo", line: 1, lineStart: 0, range: [0, 3] }
parser.lex(); // { type: 32, value: "=", line: 1, lineStart: 0, range: [4, 5]}
parser.lex(); // { type: 2, value: "bar", line: 1, lineStart: 0, range: [6, 11] }
parser.lex(); // { type: 1, value: "<eof>", line: 1, lineStart: 0, range: [11 11] }
parser.lex(); // { type: 1, value: "<eof>", line: 1, lineStart: 0, range: [11 11] }

Examples

Have a look in the examples directory of the repository for some code examples or check them out live.

luaparse(1)

The luaparse executable can be used in your shell by installing luaparse globally using npm:

$ npm install -g luaparse
$ luaparse --help

Usage: luaparse [option]... [file|code]...

Options:
  -c|--code [code]   parse code snippet
  -f|--file [file]   parse from file
  -b|--beautify      output an indenteted AST
  --[no]-comments    store comments. defaults to true
  --[no]-scope       store variable scope. defaults to false
  --[no]-locations   store location data on syntax nodes. defaults to false
  --[no]-ranges      store start and end character locations. defaults to false
  -q|--quiet         suppress output
  -h|--help
  -v|--version
  --verbose

Examples:
  luaparse --no-comments -c "locale foo = \"bar\""
  luaparse foo.lua bar.lua

Example usage

$ luaparse "i = 0"

{"type":"Chunk","body":[{"type":"AssignmentStatement","variables":[{"type":"Identifier","name":"i"}],"init":[{"type":"NumericLiteral","value":0,"raw":"0"}]}],"comments":[]}

Support

Has been tested in at least IE6+, Firefox 3+, Safari 4+, Chrome 10+, Opera 10+, Node 0.4.0+, RingoJS 0.8-0.9, Rhino 1.7R4-1.7R5, Nashorn 1.8.0.

Quality Assurance

TL;DR simply run make qa. This will run all quality assurance scripts but assumes you have it set up correctly.

Begin by cloning the repository and installing the development dependencies with npm install.

The luaparse test suite uses testem as a test runner, and because of this it's very easy to run the tests using different javascript engines or even on locally installed browsers.

Test runners

  • make test uses node.
  • make testem-engines uses node, ringo and rhino 1.7R5. This requires that you have the engines installed.
  • make test-node uses a custom command line reporter to make the output easier on the eyes while practicing TDD.
  • By installing testem globally you can also run the tests in a locally installed browser.

Other quality assurance measures

  • You can check the function complexity using complexity-report using make complexity-analysis
  • Running make coverage will generate the coverage report. To simply check that all code has coverage you can run make coverage-analysis.
  • make lint, make benchmark, make profile.

Documentation

By running make docs all documentation will be generated.

Projects using/extending luaparse

  • luamin, a Lua minifier written by Mathias Bynens.
  • Ace, an online code editor.

Acknowledgements

  • Initial tests are scaffolded from yueliang and then manually checked for error.
  • Much of the code is based on LuaMinify, the Lua source and Esprima. All awesome projects.

License

MIT

luaparse's People

Contributors

dapetcu21 avatar dependabot[bot] avatar fstirlitz avatar mathiasbynens avatar oxyc avatar simsaens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

luaparse's Issues

Unfinished long string literals generates valid ast

Test:

luaparse.parse('a = [====[]')

It generates valid ast, but should have thrown something like unfinished long string.

The long string literal value is also incorrect

console.log(
    luaparse.parse('a = [====[]').body[0].init[0].value // nothing.
  , luaparse.parse('a = [====[]]').body[0].init[0].value // ]
);

Can not analyze symbols with Chinese

Can not analyze the Chinese function name, variable name, please add the analysis of Chinese function name and variable name support.
Luajit can support gbk or utf8 Chinese function name and variable name.

example
`
function 中文函数名(参数1,参数2)
local 中文变量 = "Chinese variable name"
end

Unicode Issue

It appears luaparse is having trouble parsing non-English characters such as í. I implemented a temporary fix but I thought I should let you know. If I have time to prepare a permanent solution ill initiate a pull request.

Error:

SyntaxError: [1:240] unexpected symbol 'í' near 'Palad'

Unterminated (single-line) string literal does not raise an error

Samples

luaparse.parse("s='");
luaparse.parse('s="');
luaparse.parse("s='\\\n");
luaparse.parse('s="\\\n');

(unterminated multi-line string seems to raise an exception unless it is run on command-line.)

From command-line

>luaparse -c "local s=\""
{"type":"Chunk","body":[{"type":"LocalStatement","variables":[{"type":"Identifier","name":"s"}],"init":[{"type":"StringLiteral","value":"","raw":"\""}]}],"comments":[]}

>luaparse -c "local s='"
{"type":"Chunk","body":[{"type":"LocalStatement","variables":[{"type":"Identifier","name":"s"}],"init":[{"type":"StringLiteral","value":"","raw":"'"}]}],"comments":[]}

>luaparse -c "local s=[["
{"type":"Chunk","body":[{"type":"LocalStatement","variables":[{"type":"Identifier","name":"s"}],"init":[{"type":"StringLiteral","value":"","raw":"[["}]}],"comments":[]}

Rundown

The path goes down to
parsePrimaryExpression() which calls next() here, which calls lex(), appending EOF token.

PrefixExpression incorrect range/location

parsePrefixExpression seems to be incorrectly handling the propagation of child ranges and locations, as demonstrated with the following Lua snippet:

base['outer']['inner']()

Expected behaviour

The outer IndexExpression's range should contain the inner expression without the CallExpression's parenthesis and the inner expression should only contain its own range.

Output

{
  "type": "Chunk",
  "body": [
    {
      "type": "CallStatement",
      "expression": {
        "type": "CallExpression",
        "base": {
          "type": "IndexExpression",
          "base": {
            "type": "IndexExpression",
            "base": {
              "type": "Identifier",
              "name": "base",
              "range": [
                0,
                4
              ]
            },
            "index": {
              "type": "StringLiteral",
              "value": "outer",
              "raw": "'outer'",
              "range": [
                5,
                12
              ]
            },
            "range": [
              13,
              22
            ]
          },
          "index": {
            "type": "StringLiteral",
            "value": "inner",
            "raw": "'inner'",
            "range": [
              14,
              21
            ]
          },
          "range": [
            0,
            22
          ]
        },
        "arguments": [],
        "range": [
          0,
          24
        ]
      },
      "range": [
        0,
        24
      ]
    }
  ],
  "range": [
    0,
    24
  ],
  "comments": []
}

Actual behaviour

Both IndexExpressions refer to the same range as the entire CallExpression (including its (), which is incorrect)

Output

{
  "type": "Chunk",
  "body": [
    {
      "type": "CallStatement",
      "expression": {
        "type": "CallExpression",
        "base": {
          "type": "IndexExpression",
          "base": {
            "type": "IndexExpression",
            "base": {
              "type": "Identifier",
              "name": "base",
              "range": [
                0,
                4
              ]
            },
            "index": {
              "type": "StringLiteral",
              "value": "outer",
              "raw": "'outer'",
              "range": [
                5,
                12
              ]
            },
            "range": [
              0,
              24
            ]
          },
          "index": {
            "type": "StringLiteral",
            "value": "inner",
            "raw": "'inner'",
            "range": [
              14,
              21
            ]
          },
          "range": [
            0,
            24
          ]
        },
        "arguments": [],
        "range": [
          0,
          24
        ]
      },
      "range": [
        0,
        24
      ]
    }
  ],
  "range": [
    0,
    24
  ],
  "comments": []
}

Lua 5.4 support

This issue tracks all changes required to support parsing Lua 5.4 code.

While PUC-Rio hasn't officially released Lua 5.4 yet, it already appears it is going to bring at least one syntactic innovation: attributes in local declarations, to implement immutable bindings and 'to-be-closed variables', i.e. a form of lexical cleanup/with statement/RAII. The EBNF is as follows:

	stat ::= local ‘<’ Name ‘>’ Name ‘=’ exp

I assume this particular syntax will change, as the current one does not permit declaring multiple attributes simultaneously, and there is no syntax for combining attributes with the local function construct.

Rethink string representation

(cribbed from README.md)

Unlike strings in JavaScript, Lua strings are not Unicode strings, but bytestrings (sequences of 8-bit values); likewise, implementations of Lua parse the source code as a sequence of octets. However, the input to this parser is a JavaScript string, i.e. a sequence of 16-bit code units (not necessarily well-formed UTF-16). This poses a problem of how those code units should be interpreted, particularly if they are outside the Basic Latin block ('ASCII').

Currently, this parser handles Unicode input by encoding it in WTF-8, and reinterpreting the resulting code units as Unicode code points. This applies to string literals and (if extendedIdentifiers is enabled) to identifiers as well. Lua byte escapes inside string literals are interpreted directly as code points, while Lua 5.3 \u{} escapes are similarly decoded as UTF-8 code units reinterpreted as code points. It is as if the parser input was being interpreted as ISO-8859-1, while actually being encoded in UTF-8.

This ensures that no otherwise-valid input will be rejected due to encoding errors. Assuming the input was originally encoded in UTF-8 (which includes the case of only containing ASCII characters), it also preserves the following properties:

  • String literals (and identifiers, if extendedIdentifiers is enabled) will have the same representation in the AST if and only if they represent the same string in the source code: e.g. the Lua literals '💩', '\u{1f4a9}' and '\240\159\146\169' will all have "\u00f0\u009f\u0092\u00a9" in their .value property, and likewise local 💩 will have the same string in its .name property;
  • The String.prototype.charCodeAt method in JS can be directly used to emulate Lua's string.byte (with one argument, after shifting offsets by 1), and likewise String.prototype.substr can be used similarly to Lua's string.sub;
  • The .length property of decoded string values in the AST is equal to the value that the # operator would return in Lua.

Maintaining those properties makes the logic of static analysers and code transformation tools simpler. However, it poses a problem when displaying strings to the user and serialising AST back into a string; to recover the original bytestrings, values transformed in this way will have to be encoded in ISO-8859-1.

Other solutions to this problem may be considered in the future. Some of them have been listed below, with their drawbacks:

  1. A mode that instead treats the input as if it were decoded according to ISO-8859-1 (or the x-user-defined encoding) and rejects code points that cannot appear in that encoding; may be useful for source code in encodings other than UTF-8
    • Still tricky to get semantics correctly
    • x-user-defined cannot take advantage of compact representation of ISO-8859-1 strings in certain JavaScript engines
  2. Using an ArrayBuffer or Uint8Array for source code and/or string literals
    • May fail to be portable to older JavaScript engines
    • Cannot be (directly) serialised as JSON
    • Values of those types are fixed-length, which makes manipulation cumbersome; they cannot be incrementally built by appending.
    • They cannot be used as keys in objects; one has to use Map and WeakMap instead
  3. Using a plain Array of numbers in the range [0, 256)
    • May be memory-inefficient in naïve JavaScript engines
    • May bloat the JSON serialisation considerably
    • Cannot be used as keys in objects either
  4. Storing string literal values as ordinary String values, and requiring that escape sequences in literals constitute well-formed UTF-8; an exception is thrown if they do not
    • UTF-8 chauvinism; imposes semantics that may be unwanted
    • Reduced compatibility with other Lua implementations
  5. Like above, but instead of throwing an exception, ill-formed escapes are transformed to unpaired surrogates, just like Python's surrogateescape encoding error handler
    • UTF-8 chauvinism, though to a lesser extent
    • Destroys the property that ("\xc4" .. "\x99") == "\xc4\x99"
    • If the AST is encoded in JSON, some JSON libraries may refuse to parse it

Cf. discussion under c05822d.

Reject newlines between an expression and opening parenthesis

PUC Lua 5.1 (and LuaJIT without the LUAJIT_ENABLE_LUA52COMPAT option) rejects code in which an expression is followed by a newline and an opening parenthesis. This is to avoid a parsing ambiguity discussed in the Lua 5.2 manual, §3.3.1, which Lua 5.2 and later instead resolve by introducing an optional explicit statement terminator, ;. (One can also use do...end, which also works in Lua 5.1.)

Currently, luaparse accepts such code with Lua 5.2 semantics (that is, interprets it as a function call); in Lua 5.1 mode, which is the default, it should probably be rejected instead.

`StringCallExpression` needs a `rawArgument` property (or something similar)

StringLiterals get a .raw property that contains an escaped version of the string.

E.g., the following Lua code:

x="\n"

…translates to the following AST:

[ { type: 'Chunk',
    body:
     [ { type: 'AssignmentStatement',
         variables: [ { type: 'Identifier', name: 'x' } ],
         init:
          [ { type: 'Literal',
              value: '\n',
              raw: '"\\n"' } ] } ], // ← this is awesome
    comments: [] } ]

However, StringCallExpressions lack such a property. E.g.:

f"\n"

…translates to:

[ { type: 'Chunk',
    body:
     [ { type: 'CallStatement',
         expression:
          { type: 'StringCallExpression',
            base: { type: 'Identifier', name: 'f' },
            argument: '\n' } } ],
    comments: [] } ]

Any chance this could be added, either as a separate rawArgument property, or (and this would be even better IMHO) by making the argument property more like a StringLiteral object?

Make it possible to tweak the `defaultOptions` in the `luaparse` binary

It would be nice if the luaparse binary accepted shell arguments to enable/disable the defaultOptions settings:

  var defaultOptions = exports.defaultOptions = {
    // Explicitly tell the parser when the input ends.
      wait: false
    // Store comments as an array in the chunk object.
    , comments: true
    // Track identifier scopes by adding an isLocal attribute to each
    // identifier-node.
    , scope: false
  };

Tag each release

This would enable installing luaparse using Bower. Just run these commands (I’ve looked up the commit hashes for you):

git tag -a v0.0.1 517fd5867b2a5be5fa0d549738812fec8fd98d48
git tag -a v0.0.2 ab62aae1a66e944585ee6b018ba78b31cca2ff31
git tag -a v0.0.3 308cc2de0f6fb9ac731c746c11591ec24f38d47d
git tag -a v0.0.4 299db4c477d72af94368f496e1fd5a6f4f478945
git tag -a v0.0.5 5af52528c4e44142353d5ea1c60eaf368d5977e3
git tag -a v0.0.6 df8fa49102f1b40ed9d074912c6a535266b45285
git tag -a v0.0.7 52f0be2fa26d63414d23472e9e912d63711941ca
git tag -a v0.0.8 609a4e76b2e6ab6d5669ad8043fa2da4e352dbb4
git tag -a v0.0.9 be14df51606ae49111bf8a0f95fa01acea3d4aa0
git tag -a v0.0.10 762d1669a0bae61fc41be4af4e42d7d8867e671d
git tag -a v0.0.11 68a110fb7fb656775c4f406b58ab8958ceea1136
git push --tags

Multiple independent parsers / reentrancy

Currently, this code works:

luaparse.parse({ wait: true }).write('foo = "');
console.info(luaparse.parse('bar"'));

It prints out:

{
	"type": "Chunk",
	"body": [
		{
			"type": "AssignmentStatement",
			"variables": [
				{
					"type": "Identifier",
					"name": "foo"
				}
			],
			"init": [
				{
					"type": "StringLiteral",
					"value": "bar",
					"raw": "\"bar\""
				}
			]
		}
	],
	"comments": []
}

This is because the library maintains a single lexer and parser state shared between invocations of the parse function; there is no way to concurrently parse multiple Lua scripts. Code that expects each .parse({ wait: true }) to create a new parser independent of any previously created one is in for a nasty surprise.

There should be a way to create multiple isolated parser states. This will probably necessitate a quite invasive re-write, and may break some backwards compatibility unless this is done through separate API calls. Then though, the sort of code that relies on non-reentrancy is not one I wish to personally support.

Incorrect uses of goto are not detected

I think it makes sense for luaparse to throw these errors. Lua throws them at compile time too.

Examples with respective errors

tests from goto test file in lua repo

goto l1; do ::l1:: end
do ::l1:: end goto l1;

Error: no visible label 'l1' for <goto>


::l1:: ::l1::

Error: label 'l1' already defined


goto l1; local aa ::l1:: ::l2:: print(3)
do local bb, cc; goto l1; end; local aa; ::l1:: print(3)

Error: <goto l1> at line 1 jumps into the scope of local 'aa'


repeat
    if x then goto cont end
    local xuxu = 10
    ::cont::
  until xuxu < x

Error: <goto cont> at line 2 jumps into the scope of local 'xuxu'


Quote from lua manual regarding this

A label is visible in the entire block where it is defined, except inside nested blocks where a label with the same name is defined and inside nested functions. A goto may jump to any visible label as long as it does not enter into the scope of a local variable.


I am currently looking into implementing this in my own project based on the AST generated by luaparse. I can start a PR for it here as soon as I have something working.

Upgrade the build environment

This project has hardly been very keen at keeping up with the Kardashians latest developments in the JavaScript world. As a result, the testsuite has experienced some failures as of late. In particular, the spec repository referred to on the depencency list has disappeared from GitHub; as a stop-gap measure, I switched to version 1.0.1 from npmjs.

An overhaul of the build process is probably long overdue. Suggestions are welcome on how to replace deprecated packages without breaking everything and rewriting the project from scratch.

Consider adding a `-c` option to the `luaparse` binary

Currently, the luaparse binary accepts both files or Lua scripts (strings) as arguments:

luaparse 'file' # file name
luaparse 'a = 42' # code

What do you think about adding an explicit option for passing a Lua string (e.g. -c | --code)?

Currently, it’s impossible to parse the following Lua code using the binary without storing it in a file first:

--foo

This is because the shell command to parse this code would be:

$ luaparse "--foo"
Unknown option: --foo

Of course, there are many other issues that could potentially occur due to the argument overloading. (For this reason, I’ve removed the argument overloading option entirely from the luamin binary.)

Look for performance improvements

This issue is to keep track of possible performance improvements, debunked or not.

  • ArrayBuffer: Converting it back to a string makes it slower (no improvement).
  • charCodeAt in isKeyword: Function won't get inlined because of its size (no improvement).
  • Prototype lookup instead of inArray for scope tracking: Not sure exactly why it's slower but it is significantly slower (no improvement).

asm.js techniques

  • Casting arguments (no improvement).
  • Actually using asm.js: Not really possible

Reflect parentheses that discard extra values in the AST

Currently, the latest released version does not distinguish e.g. f() from (f()) in the AST. These expressions are not equivalent; the former evaluates to all the return values of f(), and the latter evaluates to only the first of them (the rest are discarded). Current git master distinguishes them by the inParens field, but that is just a workaround for #24 and I expect to remove it when that issue is fixed properly.

A more principled approach to distinguishing these expressions is desired. I may keep the inParens field, or I may introduce a new kind of node that expresses the operation of discarding extra values.

Usage of '...' outside a vararg function is not throwing an error

For example:
x=function()print(...)end x()
function y()print(...)end y()
_G['z']=function()print(...)end z()

The above are parsed "correctly" by luaparse, however they are illegal at runtime.

cannot use '...' outside a vararg function near '...'

The following is valid:
print(...) -- in the root scope, probably to obtain arguments on command-line

I think this behaviour is documented.
Would be hard to fix ? (not sure)

Is `IfClause` missing, or is it `ElseifClause` by design?

E.g. if (true) then print(x) end results in an ElseifClause in the AST. Is this intentional, or should it be IfClause instead?

IMHO it would make sense to have a separate IfClause type, since ElseClause and ElseifClause are exposed separately too — but I may be missing something here.

`Literal` is too generic

Currently, all kinds of literals have type: 'Literal' (well, except for VarargLiteral). How about being more specific about what kind of Literal it is? E.g. StringLiteral, NumericLiteral, BooleanLiteral, NilLiteral, etc.

This would allow for more fine-grained beautification/minification.

Thoughts?

Keep track of local/global variables somehow?

We briefly discussed this on Twitter already but I’d like to move the discussion here.

What are your thoughts on exposing the names of local variables for each scope?

For example, Identifier nodes could get a boolean local or isLocal property that is true if they had previously been declared using a LocalStatement within that block, or the other way around, using a global / isGlobal property.

Or do you think this information just doesn’t belong in an AST?

Tolerant error handling

Implement a mode where errors are stored instead of thrown. Necessary recovery functions should be exposed so that users can hook into it.

I don't think the parser itself should include recovery functionality but should definitely have an example available.

Yet more misparsing: invalid parentheses in call statements and assignments

Each of the lines that follow is a syntax error, yet luaparse does not recognise these as such:

(a.b) = 0
(a) = 0
a:b() = 0
a() = 0
a.b:c() = 0
a[b]() = 0
(0) = 0
a, b() = 0, 0

Note, however, that the following are not (syntactical, at least) errors:

a().b = 1
({})[b] = 1
a""[b] = 1
a{}[b] = 1
({{}})[a][b] = 1
(a).b = 1
(1).a = 2 -- runtime error, unless you use debug.setmetatable

Relevant portions of the Lua 5.2 manual (essentially identical in Lua 5.0 and 5.1, and to 5.3-work2): §3.2 "Variables", §3.3.3 "Assignment", §9 "The Complete Syntax of Lua"

Revamp QA suite

A lot of the QA suite was experimentations on my part, but I think it could be loosened now.

  • Drop support (or at least testing) for ancient JavaScript engines.
  • Drop complexity analysis. Or just print it without failing.
  • I think we should still keep test coverage analysis and fail if there's a reduction.
  • Maybe drop jshint and adopt prettier so contributors don't need to care about code standards. To me which convention doesn't matter as long as there is one.
  • Should make it easier and better documented how to scaffold tests.
  • I have nothing against spec as a testing framework but if we drop support for ancient JavaScript engines this is also open for discussion. Eg. ava which is async. But as spec works and is fast enough imo, I don't see much worth changing it.
  • Definitely drop testem, unless it's easy to add browser testing I think node testing would be enough for luaparse.
  • Is it possible to drop the UMD wrapper and use some library that wraps everything during the build step? Luaparse shouldn't be keeping it up to date.

Thoughts?

Wrong locations with line endings LF on windows.

code:

--[[


THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

]]

local json = {}
 

when with line endings LF (the correct line 14, not 12):

    "type": "Chunk",
    "body": [
        {
            "type": "LocalStatement",
            "variables": [
                {
                    "type": "Identifier",
                    "name": "json",
                    "loc": {
                        "start": {
                            "line": 12,
                            "column": 6
                        },
                        "end": {
                            "line": 12,
                            "column": 10
                        }
                    },
                    "isLocal": true
                }
            ],
            "init": [
                {
                    "type": "TableConstructorExpression",
                    "fields": [],
                    "loc": {
                        "start": {
                            "line": 12,
                            "column": 13
                        },
                        "end": {
                            "line": 12,
                            "column": 15
                        }
                    }
                }
            ],
            "loc": {
                "start": {
                    "line": 12,
                    "column": 0
                },
                "end": {
                    "line": 12,
                    "column": 15
                }
            }
        }
    ],
    "loc": {
        "start": {
            "line": 12,
            "column": 0
        },
        "end": {
            "line": 12,
            "column": 15
        }
    },
    "globals": []
}

correct with line endings CRLF.

system:
windows10 1803

Fix typos

There are a couple of mispelling/typos in the comments.
These can be easily noticed and fixed.
I think I will make a PR for this.

Thank you for sharing!

This parser is really great. As someone familiar with The ESTree spec, I feel right at home. Thank you for building it and for sharing your contribution! ❤️

variable with negative Integer/Number gives UnaryExpression

Sample Line:

priority = -1

Currently fixing with follwing:

switch(content.type) {
    // Default numbers
    case 'NumericLiteral':
        value = parseInt(content.value, 10);
    break;

    case 'UnaryExpression':
        /* Fix parsing error for negative numbers */
        if(content.operator == '-') {
            value = -parseInt(content.argument.value, 10);
        }
    break;
}

Track locations

Figure out how to track locations without impacting performance when disabled.

  1. Wrap all parse functions with a tracking function -- minor refactoring required. This might be slow because of arguments usage.
  2. Inline it with if checks -- This would be a lot faster but poses a problem due to ast node-functions.
  3. Function compilation -- The initial compilation cost is too expensive really.
  4. Still looking for that 4th solution...

Parse error

I was testing some real-world™ Lua libraries (to see how well luaparse/luamin perform) and came across this issue.

luaparse fails to parse this piece of code, even though lua file.lua works fine (as in, it doesn’t throw an error and exits with status 0).

foo.lua:

function SetSetting(SETTING_PATH, SETTING_VALUE)
    assert(([[string]]):find(type(SETTING_PATH)), sprintf([[bad argument #1 to 'System.SetSetting' (string expected, got %s)]], type(SETTING_PATH)))
    assert(([[string]]):find(type(SETTING_VALUE)), sprintf([[bad argument #2 to 'System.SetSetting' (string expected, got %s)]], type(SETTING_VALUE)))

    local SETTING_PATH_PARTS = (((SETTING_PATH:gsub([[%\%\]], [[%\]])):gsub([[%/%/]], [[%/]])):gsub([[%/]], [[%\]])):explode([[\]])

    if (not (SETTING_PATH_PARTS[1] == [[Settings]])) then
        table.insert(SETTING_PATH_PARTS, 1, [[Settings]])
    end

    return setsettings(table.concat(SETTING_PATH_PARTS, [[\]]), SETTING_VALUE)
end
$ lua foo.lua
$ luaparse foo.lua

/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:386
    throw error;
          ^
SyntaxError: [7:50] ')' expected near 'then'
    at raise (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:375:15)
    at expect (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:953:10)
    at parsePrefixExpression (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1663:7)
    at parseSubExpression (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1621:22)
    at parseExpression (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1562:22)
    at parseLocalStatement (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1359:45)
    at parseStatement (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1145:41)
    at parseBlock (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1125:19)
    at parseFunctionDeclaration (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1477:16)
    at parseStatement (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1150:18)

(Note: this piece of code is part of a ~6300 LOC Lua library. If you want I can post it here.)

Rename arguments in TableCallExpression node to argument

TableCallExpression currently puts the function argument in the property arguments which leads to a minor inconsistency as it is expected to be an array. This should be renamed to argument as there will always only be one. This also makes it consistent with StringCallExpression

Locations of long string literals

Hi,

Thanks for the great library, I've depended on this a few times and it has always worked superbly well.

However, I've come across an issue in the locations with long string literals. It seems that when the input expression spans many lines, the resulting column value for that expression is negative. For example:

const parser = require('luaparse');
const lua = 'local a = [[hello\nworld]]';
const ast = parser.parse(lua, { locations: true });

console.log(JSON.stringify(ast));

Results in:

{"type":"Chunk","body":[{"type":"LocalStatement","variables":[{"type":"Identifier","name":"a","loc":{"start":{"line":1,"column":6},"end":{"line":1,"column":7}}}],"init":[{"type":"StringLiteral","value":"hello\nworld","raw":"[[hello\nworld]]","loc":{"start":{"line":2,"column":-8},"end":{"line":2,"column":7}}}],"loc":{"start":{"line":1,"column":0},"end":{"line":2,"column":7}}}],"loc":{"start":{"line":1,"column":0},"end":{"line":2,"column":7}},"comments":[]}

Note the "column":-8 in the middle.

I'll look into this sometime this week and open a PR if I can. However, if a possible cause pops into your head or you can give me any pointers in the meantime, that'd be great.

Cheers.

Parse error caused by inline block comments `--[[comment--]]`

function foo()
  return a >= b --[[and x--]] and b > 0
end
$ luaparse test.lua

/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:386
    throw error;
          ^
SyntaxError: [2:30] 'end' expected near 'nd'
    at raise (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:375:15)
    at expect (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:953:10)
    at parseFunctionDeclaration (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1478:5)
    at parseStatement (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1150:18)
    at parseBlock (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1125:19)
    at parseChunk (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1102:16)
    at end (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1826:17)
    at Object.parse (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1803:31)
    at /usr/local/share/npm/lib/node_modules/luaparse/bin/luaparse:53:22
    at Array.forEach (native)

Reject break statements outside loops

There is a disabled expected-failure testcase in the test/scaffolding/functions file:

function a(p) break end                 -- FAIL

Currently, it parses successfully; although it matches the basic recursive grammar, it semantically makes no sense and is rejected by all Lua implementations at compilation stage (although apparently only after the entire body is parsed). It would probably make sense to reject it here too.

Change parse error type from SyntaxError to something else

The SyntaxError type is reserved for errors raised by the host runtime (i.e. JavaScript), not errors raised by user code. Using SyntaxError to signal errors in inputs to the Lua parser causes problems in some engines, like the one worked around in 8536fdf.

Fixing this will be a breaking change, given that some downstream users already rely on the error to be an instance of SyntaxError, e.g. the ACE editor.

Related PR: #33.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.