Giter Site home page Giter Site logo

fstirlitz / luaparse Goto Github PK

View Code? Open in Web Editor NEW
452.0 19.0 90.0 3.27 MB

A Lua parser written in JavaScript

Home Page: https://fstirlitz.github.io/luaparse/

License: MIT License

Makefile 0.14% Lua 3.91% Shell 0.28% JavaScript 95.29% HTML 0.24% CSS 0.14%
lua lua-parser ast javascript-library luaparse javascript

luaparse's People

Contributors

dapetcu21 avatar dependabot[bot] avatar fstirlitz avatar mathiasbynens avatar oxyc avatar simsaens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

luaparse's Issues

`Literal` is too generic

Currently, all kinds of literals have type: 'Literal' (well, except for VarargLiteral). How about being more specific about what kind of Literal it is? E.g. StringLiteral, NumericLiteral, BooleanLiteral, NilLiteral, etc.

This would allow for more fine-grained beautification/minification.

Thoughts?

Unterminated (single-line) string literal does not raise an error

Samples

luaparse.parse("s='");
luaparse.parse('s="');
luaparse.parse("s='\\\n");
luaparse.parse('s="\\\n');

(unterminated multi-line string seems to raise an exception unless it is run on command-line.)

From command-line

>luaparse -c "local s=\""
{"type":"Chunk","body":[{"type":"LocalStatement","variables":[{"type":"Identifier","name":"s"}],"init":[{"type":"StringLiteral","value":"","raw":"\""}]}],"comments":[]}

>luaparse -c "local s='"
{"type":"Chunk","body":[{"type":"LocalStatement","variables":[{"type":"Identifier","name":"s"}],"init":[{"type":"StringLiteral","value":"","raw":"'"}]}],"comments":[]}

>luaparse -c "local s=[["
{"type":"Chunk","body":[{"type":"LocalStatement","variables":[{"type":"Identifier","name":"s"}],"init":[{"type":"StringLiteral","value":"","raw":"[["}]}],"comments":[]}

Rundown

The path goes down to
parsePrimaryExpression() which calls next() here, which calls lex(), appending EOF token.

Rename arguments in TableCallExpression node to argument

TableCallExpression currently puts the function argument in the property arguments which leads to a minor inconsistency as it is expected to be an array. This should be renamed to argument as there will always only be one. This also makes it consistent with StringCallExpression

Unicode Issue

It appears luaparse is having trouble parsing non-English characters such as í. I implemented a temporary fix but I thought I should let you know. If I have time to prepare a permanent solution ill initiate a pull request.

Error:

SyntaxError: [1:240] unexpected symbol 'í' near 'Palad'

Change parse error type from SyntaxError to something else

The SyntaxError type is reserved for errors raised by the host runtime (i.e. JavaScript), not errors raised by user code. Using SyntaxError to signal errors in inputs to the Lua parser causes problems in some engines, like the one worked around in 8536fdf.

Fixing this will be a breaking change, given that some downstream users already rely on the error to be an instance of SyntaxError, e.g. the ACE editor.

Related PR: #33.

Reflect parentheses that discard extra values in the AST

Currently, the latest released version does not distinguish e.g. f() from (f()) in the AST. These expressions are not equivalent; the former evaluates to all the return values of f(), and the latter evaluates to only the first of them (the rest are discarded). Current git master distinguishes them by the inParens field, but that is just a workaround for #24 and I expect to remove it when that issue is fixed properly.

A more principled approach to distinguishing these expressions is desired. I may keep the inParens field, or I may introduce a new kind of node that expresses the operation of discarding extra values.

Track locations

Figure out how to track locations without impacting performance when disabled.

  1. Wrap all parse functions with a tracking function -- minor refactoring required. This might be slow because of arguments usage.
  2. Inline it with if checks -- This would be a lot faster but poses a problem due to ast node-functions.
  3. Function compilation -- The initial compilation cost is too expensive really.
  4. Still looking for that 4th solution...

Incorrect uses of goto are not detected

I think it makes sense for luaparse to throw these errors. Lua throws them at compile time too.

Examples with respective errors

tests from goto test file in lua repo

goto l1; do ::l1:: end
do ::l1:: end goto l1;

Error: no visible label 'l1' for <goto>


::l1:: ::l1::

Error: label 'l1' already defined


goto l1; local aa ::l1:: ::l2:: print(3)
do local bb, cc; goto l1; end; local aa; ::l1:: print(3)

Error: <goto l1> at line 1 jumps into the scope of local 'aa'


repeat
    if x then goto cont end
    local xuxu = 10
    ::cont::
  until xuxu < x

Error: <goto cont> at line 2 jumps into the scope of local 'xuxu'


Quote from lua manual regarding this

A label is visible in the entire block where it is defined, except inside nested blocks where a label with the same name is defined and inside nested functions. A goto may jump to any visible label as long as it does not enter into the scope of a local variable.


I am currently looking into implementing this in my own project based on the AST generated by luaparse. I can start a PR for it here as soon as I have something working.

Can not analyze symbols with Chinese

Can not analyze the Chinese function name, variable name, please add the analysis of Chinese function name and variable name support.
Luajit can support gbk or utf8 Chinese function name and variable name.

example
`
function 中文函数名(参数1,参数2)
local 中文变量 = "Chinese variable name"
end

Upgrade the build environment

This project has hardly been very keen at keeping up with the Kardashians latest developments in the JavaScript world. As a result, the testsuite has experienced some failures as of late. In particular, the spec repository referred to on the depencency list has disappeared from GitHub; as a stop-gap measure, I switched to version 1.0.1 from npmjs.

An overhaul of the build process is probably long overdue. Suggestions are welcome on how to replace deprecated packages without breaking everything and rewriting the project from scratch.

Tag each release

This would enable installing luaparse using Bower. Just run these commands (I’ve looked up the commit hashes for you):

git tag -a v0.0.1 517fd5867b2a5be5fa0d549738812fec8fd98d48
git tag -a v0.0.2 ab62aae1a66e944585ee6b018ba78b31cca2ff31
git tag -a v0.0.3 308cc2de0f6fb9ac731c746c11591ec24f38d47d
git tag -a v0.0.4 299db4c477d72af94368f496e1fd5a6f4f478945
git tag -a v0.0.5 5af52528c4e44142353d5ea1c60eaf368d5977e3
git tag -a v0.0.6 df8fa49102f1b40ed9d074912c6a535266b45285
git tag -a v0.0.7 52f0be2fa26d63414d23472e9e912d63711941ca
git tag -a v0.0.8 609a4e76b2e6ab6d5669ad8043fa2da4e352dbb4
git tag -a v0.0.9 be14df51606ae49111bf8a0f95fa01acea3d4aa0
git tag -a v0.0.10 762d1669a0bae61fc41be4af4e42d7d8867e671d
git tag -a v0.0.11 68a110fb7fb656775c4f406b58ab8958ceea1136
git push --tags

Tolerant error handling

Implement a mode where errors are stored instead of thrown. Necessary recovery functions should be exposed so that users can hook into it.

I don't think the parser itself should include recovery functionality but should definitely have an example available.

Locations of long string literals

Hi,

Thanks for the great library, I've depended on this a few times and it has always worked superbly well.

However, I've come across an issue in the locations with long string literals. It seems that when the input expression spans many lines, the resulting column value for that expression is negative. For example:

const parser = require('luaparse');
const lua = 'local a = [[hello\nworld]]';
const ast = parser.parse(lua, { locations: true });

console.log(JSON.stringify(ast));

Results in:

{"type":"Chunk","body":[{"type":"LocalStatement","variables":[{"type":"Identifier","name":"a","loc":{"start":{"line":1,"column":6},"end":{"line":1,"column":7}}}],"init":[{"type":"StringLiteral","value":"hello\nworld","raw":"[[hello\nworld]]","loc":{"start":{"line":2,"column":-8},"end":{"line":2,"column":7}}}],"loc":{"start":{"line":1,"column":0},"end":{"line":2,"column":7}}}],"loc":{"start":{"line":1,"column":0},"end":{"line":2,"column":7}},"comments":[]}

Note the "column":-8 in the middle.

I'll look into this sometime this week and open a PR if I can. However, if a possible cause pops into your head or you can give me any pointers in the meantime, that'd be great.

Cheers.

Yet more misparsing: invalid parentheses in call statements and assignments

Each of the lines that follow is a syntax error, yet luaparse does not recognise these as such:

(a.b) = 0
(a) = 0
a:b() = 0
a() = 0
a.b:c() = 0
a[b]() = 0
(0) = 0
a, b() = 0, 0

Note, however, that the following are not (syntactical, at least) errors:

a().b = 1
({})[b] = 1
a""[b] = 1
a{}[b] = 1
({{}})[a][b] = 1
(a).b = 1
(1).a = 2 -- runtime error, unless you use debug.setmetatable

Relevant portions of the Lua 5.2 manual (essentially identical in Lua 5.0 and 5.1, and to 5.3-work2): §3.2 "Variables", §3.3.3 "Assignment", §9 "The Complete Syntax of Lua"

Look for performance improvements

This issue is to keep track of possible performance improvements, debunked or not.

  • ArrayBuffer: Converting it back to a string makes it slower (no improvement).
  • charCodeAt in isKeyword: Function won't get inlined because of its size (no improvement).
  • Prototype lookup instead of inArray for scope tracking: Not sure exactly why it's slower but it is significantly slower (no improvement).

asm.js techniques

  • Casting arguments (no improvement).
  • Actually using asm.js: Not really possible

Is `IfClause` missing, or is it `ElseifClause` by design?

E.g. if (true) then print(x) end results in an ElseifClause in the AST. Is this intentional, or should it be IfClause instead?

IMHO it would make sense to have a separate IfClause type, since ElseClause and ElseifClause are exposed separately too — but I may be missing something here.

Unfinished long string literals generates valid ast

Test:

luaparse.parse('a = [====[]')

It generates valid ast, but should have thrown something like unfinished long string.

The long string literal value is also incorrect

console.log(
    luaparse.parse('a = [====[]').body[0].init[0].value // nothing.
  , luaparse.parse('a = [====[]]').body[0].init[0].value // ]
);

Revamp QA suite

A lot of the QA suite was experimentations on my part, but I think it could be loosened now.

  • Drop support (or at least testing) for ancient JavaScript engines.
  • Drop complexity analysis. Or just print it without failing.
  • I think we should still keep test coverage analysis and fail if there's a reduction.
  • Maybe drop jshint and adopt prettier so contributors don't need to care about code standards. To me which convention doesn't matter as long as there is one.
  • Should make it easier and better documented how to scaffold tests.
  • I have nothing against spec as a testing framework but if we drop support for ancient JavaScript engines this is also open for discussion. Eg. ava which is async. But as spec works and is fast enough imo, I don't see much worth changing it.
  • Definitely drop testem, unless it's easy to add browser testing I think node testing would be enough for luaparse.
  • Is it possible to drop the UMD wrapper and use some library that wraps everything during the build step? Luaparse shouldn't be keeping it up to date.

Thoughts?

Fix typos

There are a couple of mispelling/typos in the comments.
These can be easily noticed and fixed.
I think I will make a PR for this.

Reject newlines between an expression and opening parenthesis

PUC Lua 5.1 (and LuaJIT without the LUAJIT_ENABLE_LUA52COMPAT option) rejects code in which an expression is followed by a newline and an opening parenthesis. This is to avoid a parsing ambiguity discussed in the Lua 5.2 manual, §3.3.1, which Lua 5.2 and later instead resolve by introducing an optional explicit statement terminator, ;. (One can also use do...end, which also works in Lua 5.1.)

Currently, luaparse accepts such code with Lua 5.2 semantics (that is, interprets it as a function call); in Lua 5.1 mode, which is the default, it should probably be rejected instead.

Reject break statements outside loops

There is a disabled expected-failure testcase in the test/scaffolding/functions file:

function a(p) break end                 -- FAIL

Currently, it parses successfully; although it matches the basic recursive grammar, it semantically makes no sense and is rejected by all Lua implementations at compilation stage (although apparently only after the entire body is parsed). It would probably make sense to reject it here too.

PrefixExpression incorrect range/location

parsePrefixExpression seems to be incorrectly handling the propagation of child ranges and locations, as demonstrated with the following Lua snippet:

base['outer']['inner']()

Expected behaviour

The outer IndexExpression's range should contain the inner expression without the CallExpression's parenthesis and the inner expression should only contain its own range.

Output

{
  "type": "Chunk",
  "body": [
    {
      "type": "CallStatement",
      "expression": {
        "type": "CallExpression",
        "base": {
          "type": "IndexExpression",
          "base": {
            "type": "IndexExpression",
            "base": {
              "type": "Identifier",
              "name": "base",
              "range": [
                0,
                4
              ]
            },
            "index": {
              "type": "StringLiteral",
              "value": "outer",
              "raw": "'outer'",
              "range": [
                5,
                12
              ]
            },
            "range": [
              13,
              22
            ]
          },
          "index": {
            "type": "StringLiteral",
            "value": "inner",
            "raw": "'inner'",
            "range": [
              14,
              21
            ]
          },
          "range": [
            0,
            22
          ]
        },
        "arguments": [],
        "range": [
          0,
          24
        ]
      },
      "range": [
        0,
        24
      ]
    }
  ],
  "range": [
    0,
    24
  ],
  "comments": []
}

Actual behaviour

Both IndexExpressions refer to the same range as the entire CallExpression (including its (), which is incorrect)

Output

{
  "type": "Chunk",
  "body": [
    {
      "type": "CallStatement",
      "expression": {
        "type": "CallExpression",
        "base": {
          "type": "IndexExpression",
          "base": {
            "type": "IndexExpression",
            "base": {
              "type": "Identifier",
              "name": "base",
              "range": [
                0,
                4
              ]
            },
            "index": {
              "type": "StringLiteral",
              "value": "outer",
              "raw": "'outer'",
              "range": [
                5,
                12
              ]
            },
            "range": [
              0,
              24
            ]
          },
          "index": {
            "type": "StringLiteral",
            "value": "inner",
            "raw": "'inner'",
            "range": [
              14,
              21
            ]
          },
          "range": [
            0,
            24
          ]
        },
        "arguments": [],
        "range": [
          0,
          24
        ]
      },
      "range": [
        0,
        24
      ]
    }
  ],
  "range": [
    0,
    24
  ],
  "comments": []
}

Make it possible to tweak the `defaultOptions` in the `luaparse` binary

It would be nice if the luaparse binary accepted shell arguments to enable/disable the defaultOptions settings:

  var defaultOptions = exports.defaultOptions = {
    // Explicitly tell the parser when the input ends.
      wait: false
    // Store comments as an array in the chunk object.
    , comments: true
    // Track identifier scopes by adding an isLocal attribute to each
    // identifier-node.
    , scope: false
  };

Consider adding a `-c` option to the `luaparse` binary

Currently, the luaparse binary accepts both files or Lua scripts (strings) as arguments:

luaparse 'file' # file name
luaparse 'a = 42' # code

What do you think about adding an explicit option for passing a Lua string (e.g. -c | --code)?

Currently, it’s impossible to parse the following Lua code using the binary without storing it in a file first:

--foo

This is because the shell command to parse this code would be:

$ luaparse "--foo"
Unknown option: --foo

Of course, there are many other issues that could potentially occur due to the argument overloading. (For this reason, I’ve removed the argument overloading option entirely from the luamin binary.)

Parse error

I was testing some real-world™ Lua libraries (to see how well luaparse/luamin perform) and came across this issue.

luaparse fails to parse this piece of code, even though lua file.lua works fine (as in, it doesn’t throw an error and exits with status 0).

foo.lua:

function SetSetting(SETTING_PATH, SETTING_VALUE)
    assert(([[string]]):find(type(SETTING_PATH)), sprintf([[bad argument #1 to 'System.SetSetting' (string expected, got %s)]], type(SETTING_PATH)))
    assert(([[string]]):find(type(SETTING_VALUE)), sprintf([[bad argument #2 to 'System.SetSetting' (string expected, got %s)]], type(SETTING_VALUE)))

    local SETTING_PATH_PARTS = (((SETTING_PATH:gsub([[%\%\]], [[%\]])):gsub([[%/%/]], [[%/]])):gsub([[%/]], [[%\]])):explode([[\]])

    if (not (SETTING_PATH_PARTS[1] == [[Settings]])) then
        table.insert(SETTING_PATH_PARTS, 1, [[Settings]])
    end

    return setsettings(table.concat(SETTING_PATH_PARTS, [[\]]), SETTING_VALUE)
end
$ lua foo.lua
$ luaparse foo.lua

/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:386
    throw error;
          ^
SyntaxError: [7:50] ')' expected near 'then'
    at raise (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:375:15)
    at expect (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:953:10)
    at parsePrefixExpression (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1663:7)
    at parseSubExpression (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1621:22)
    at parseExpression (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1562:22)
    at parseLocalStatement (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1359:45)
    at parseStatement (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1145:41)
    at parseBlock (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1125:19)
    at parseFunctionDeclaration (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1477:16)
    at parseStatement (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1150:18)

(Note: this piece of code is part of a ~6300 LOC Lua library. If you want I can post it here.)

Multiple independent parsers / reentrancy

Currently, this code works:

luaparse.parse({ wait: true }).write('foo = "');
console.info(luaparse.parse('bar"'));

It prints out:

{
	"type": "Chunk",
	"body": [
		{
			"type": "AssignmentStatement",
			"variables": [
				{
					"type": "Identifier",
					"name": "foo"
				}
			],
			"init": [
				{
					"type": "StringLiteral",
					"value": "bar",
					"raw": "\"bar\""
				}
			]
		}
	],
	"comments": []
}

This is because the library maintains a single lexer and parser state shared between invocations of the parse function; there is no way to concurrently parse multiple Lua scripts. Code that expects each .parse({ wait: true }) to create a new parser independent of any previously created one is in for a nasty surprise.

There should be a way to create multiple isolated parser states. This will probably necessitate a quite invasive re-write, and may break some backwards compatibility unless this is done through separate API calls. Then though, the sort of code that relies on non-reentrancy is not one I wish to personally support.

Thank you for sharing!

This parser is really great. As someone familiar with The ESTree spec, I feel right at home. Thank you for building it and for sharing your contribution! ❤️

variable with negative Integer/Number gives UnaryExpression

Sample Line:

priority = -1

Currently fixing with follwing:

switch(content.type) {
    // Default numbers
    case 'NumericLiteral':
        value = parseInt(content.value, 10);
    break;

    case 'UnaryExpression':
        /* Fix parsing error for negative numbers */
        if(content.operator == '-') {
            value = -parseInt(content.argument.value, 10);
        }
    break;
}

`StringCallExpression` needs a `rawArgument` property (or something similar)

StringLiterals get a .raw property that contains an escaped version of the string.

E.g., the following Lua code:

x="\n"

…translates to the following AST:

[ { type: 'Chunk',
    body:
     [ { type: 'AssignmentStatement',
         variables: [ { type: 'Identifier', name: 'x' } ],
         init:
          [ { type: 'Literal',
              value: '\n',
              raw: '"\\n"' } ] } ], // ← this is awesome
    comments: [] } ]

However, StringCallExpressions lack such a property. E.g.:

f"\n"

…translates to:

[ { type: 'Chunk',
    body:
     [ { type: 'CallStatement',
         expression:
          { type: 'StringCallExpression',
            base: { type: 'Identifier', name: 'f' },
            argument: '\n' } } ],
    comments: [] } ]

Any chance this could be added, either as a separate rawArgument property, or (and this would be even better IMHO) by making the argument property more like a StringLiteral object?

Parse error caused by inline block comments `--[[comment--]]`

function foo()
  return a >= b --[[and x--]] and b > 0
end
$ luaparse test.lua

/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:386
    throw error;
          ^
SyntaxError: [2:30] 'end' expected near 'nd'
    at raise (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:375:15)
    at expect (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:953:10)
    at parseFunctionDeclaration (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1478:5)
    at parseStatement (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1150:18)
    at parseBlock (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1125:19)
    at parseChunk (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1102:16)
    at end (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1826:17)
    at Object.parse (/usr/local/share/npm/lib/node_modules/luaparse/lib/luaparse.js:1803:31)
    at /usr/local/share/npm/lib/node_modules/luaparse/bin/luaparse:53:22
    at Array.forEach (native)

Lua 5.4 support

This issue tracks all changes required to support parsing Lua 5.4 code.

While PUC-Rio hasn't officially released Lua 5.4 yet, it already appears it is going to bring at least one syntactic innovation: attributes in local declarations, to implement immutable bindings and 'to-be-closed variables', i.e. a form of lexical cleanup/with statement/RAII. The EBNF is as follows:

	stat ::= local ‘<’ Name ‘>’ Name ‘=’ exp

I assume this particular syntax will change, as the current one does not permit declaring multiple attributes simultaneously, and there is no syntax for combining attributes with the local function construct.

Rethink string representation

(cribbed from README.md)

Unlike strings in JavaScript, Lua strings are not Unicode strings, but bytestrings (sequences of 8-bit values); likewise, implementations of Lua parse the source code as a sequence of octets. However, the input to this parser is a JavaScript string, i.e. a sequence of 16-bit code units (not necessarily well-formed UTF-16). This poses a problem of how those code units should be interpreted, particularly if they are outside the Basic Latin block ('ASCII').

Currently, this parser handles Unicode input by encoding it in WTF-8, and reinterpreting the resulting code units as Unicode code points. This applies to string literals and (if extendedIdentifiers is enabled) to identifiers as well. Lua byte escapes inside string literals are interpreted directly as code points, while Lua 5.3 \u{} escapes are similarly decoded as UTF-8 code units reinterpreted as code points. It is as if the parser input was being interpreted as ISO-8859-1, while actually being encoded in UTF-8.

This ensures that no otherwise-valid input will be rejected due to encoding errors. Assuming the input was originally encoded in UTF-8 (which includes the case of only containing ASCII characters), it also preserves the following properties:

  • String literals (and identifiers, if extendedIdentifiers is enabled) will have the same representation in the AST if and only if they represent the same string in the source code: e.g. the Lua literals '💩', '\u{1f4a9}' and '\240\159\146\169' will all have "\u00f0\u009f\u0092\u00a9" in their .value property, and likewise local 💩 will have the same string in its .name property;
  • The String.prototype.charCodeAt method in JS can be directly used to emulate Lua's string.byte (with one argument, after shifting offsets by 1), and likewise String.prototype.substr can be used similarly to Lua's string.sub;
  • The .length property of decoded string values in the AST is equal to the value that the # operator would return in Lua.

Maintaining those properties makes the logic of static analysers and code transformation tools simpler. However, it poses a problem when displaying strings to the user and serialising AST back into a string; to recover the original bytestrings, values transformed in this way will have to be encoded in ISO-8859-1.

Other solutions to this problem may be considered in the future. Some of them have been listed below, with their drawbacks:

  1. A mode that instead treats the input as if it were decoded according to ISO-8859-1 (or the x-user-defined encoding) and rejects code points that cannot appear in that encoding; may be useful for source code in encodings other than UTF-8
    • Still tricky to get semantics correctly
    • x-user-defined cannot take advantage of compact representation of ISO-8859-1 strings in certain JavaScript engines
  2. Using an ArrayBuffer or Uint8Array for source code and/or string literals
    • May fail to be portable to older JavaScript engines
    • Cannot be (directly) serialised as JSON
    • Values of those types are fixed-length, which makes manipulation cumbersome; they cannot be incrementally built by appending.
    • They cannot be used as keys in objects; one has to use Map and WeakMap instead
  3. Using a plain Array of numbers in the range [0, 256)
    • May be memory-inefficient in naïve JavaScript engines
    • May bloat the JSON serialisation considerably
    • Cannot be used as keys in objects either
  4. Storing string literal values as ordinary String values, and requiring that escape sequences in literals constitute well-formed UTF-8; an exception is thrown if they do not
    • UTF-8 chauvinism; imposes semantics that may be unwanted
    • Reduced compatibility with other Lua implementations
  5. Like above, but instead of throwing an exception, ill-formed escapes are transformed to unpaired surrogates, just like Python's surrogateescape encoding error handler
    • UTF-8 chauvinism, though to a lesser extent
    • Destroys the property that ("\xc4" .. "\x99") == "\xc4\x99"
    • If the AST is encoded in JSON, some JSON libraries may refuse to parse it

Cf. discussion under c05822d.

Keep track of local/global variables somehow?

We briefly discussed this on Twitter already but I’d like to move the discussion here.

What are your thoughts on exposing the names of local variables for each scope?

For example, Identifier nodes could get a boolean local or isLocal property that is true if they had previously been declared using a LocalStatement within that block, or the other way around, using a global / isGlobal property.

Or do you think this information just doesn’t belong in an AST?

Wrong locations with line endings LF on windows.

code:

--[[


THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

]]

local json = {}
 

when with line endings LF (the correct line 14, not 12):

    "type": "Chunk",
    "body": [
        {
            "type": "LocalStatement",
            "variables": [
                {
                    "type": "Identifier",
                    "name": "json",
                    "loc": {
                        "start": {
                            "line": 12,
                            "column": 6
                        },
                        "end": {
                            "line": 12,
                            "column": 10
                        }
                    },
                    "isLocal": true
                }
            ],
            "init": [
                {
                    "type": "TableConstructorExpression",
                    "fields": [],
                    "loc": {
                        "start": {
                            "line": 12,
                            "column": 13
                        },
                        "end": {
                            "line": 12,
                            "column": 15
                        }
                    }
                }
            ],
            "loc": {
                "start": {
                    "line": 12,
                    "column": 0
                },
                "end": {
                    "line": 12,
                    "column": 15
                }
            }
        }
    ],
    "loc": {
        "start": {
            "line": 12,
            "column": 0
        },
        "end": {
            "line": 12,
            "column": 15
        }
    },
    "globals": []
}

correct with line endings CRLF.

system:
windows10 1803

Usage of '...' outside a vararg function is not throwing an error

For example:
x=function()print(...)end x()
function y()print(...)end y()
_G['z']=function()print(...)end z()

The above are parsed "correctly" by luaparse, however they are illegal at runtime.

cannot use '...' outside a vararg function near '...'

The following is valid:
print(...) -- in the root scope, probably to obtain arguments on command-line

I think this behaviour is documented.
Would be hard to fix ? (not sure)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.