tunnelvisionlabs / antlr4ts Goto Github PK

View Code? Open in Web Editor NEW

625.0 23.0 104.0 2.6 MB

Optimized TypeScript target for ANTLR 4

License: Other

TypeScript 92.70% Java 1.65% ANTLR 5.57% JavaScript 0.08%

typescript antlr4 antlr

antlr4ts's People

Contributors

Stargazers

Watchers

Forkers

sharwell carlansley amirms mike-lischke shaunburdick nehaagrawal actioniq-oss petermetz curryfishballs9527 swimlane adityapurwa rators jasonvasquez zigmondy angrypowman keitase bvanreeven que3216 dalestone jonfreedman aotearoacoder houfu alfadb jessitron m-adoo lguzzon-scratchbook nmarusov mindhivefi dberlin grsan45 davidhockey22 steverash zyc1314 heniu75 johnbendi caroe233 stewartmcgown soneymathew alexhenkel raveler unsupervisednn iimachines brucificus stevengum tridungle dreaf belaczek josejulio kmikko plusxp zhengyutay yostinso linbolen popo1221 polpware-frontend-devops burtharris apalaniuk papgeri tyrrx jabo-lang nickjs cosium reda-alaoui donaldxdonald darmie adiun hhy5277 linecode cshang2017 cedececa kaby76 baihouping panda2134 s-oravec theunlocked lawrence-shu manuelbaun melin boardthatpowder chengchengpei niklasschloegel ng-galien jhvanderven cipacda voodooattack kipper-lang qiu-weidong elribonazo itsharex cianx onlyflyer cubuspl42 br0nstein 3fuyang gbromios vitineth damianw27 tianbinraindrop justinstroudbah

antlr4ts's Issues

Selecting a test framework...

I don't have a lot of experience with JavaScript test frameworks, so after reviewing the existing Java tests so I asked for help on stack overflow.

The only answer received so far (from Basarat who has very strong reputation) said that Mocha is the current King of JavaScript Test Frameworks. He cited this as evidence, which looks pretty significant.

Mocha has a lot of features we probably won't need (e.g. async testing), but it looks to me like they won't get in the way. But Mocha seems very adaptable. The default API style quite different from JUnit, but I have seen some projects that are trying to put a decorator-based model on it, like mocha-typescript.

P.S. Basarat also has an interesting-looking IDE project at: http://alm.tools, and TypeScript deep dive e-book at http://basarat.gitbooks.io/typescript/, seems like both are review-worthy.

Document ATNConfig basics

The intent/behavior of the ATNConfig class is not obvious to new readers. Many suggestions for improvements were given in the code review for #108.

Document ATNConfigSet basics

The intent/behavior of the ATNConfigSet class is not obvious to new readers. Some suggestions for improvements were given in the code review for #108.

TsTool could generate `.js` + `.d.ts` files rather than `.ts`

For later consideration...

We could have the tool's output already in JavaScript, while simultaneously emitting a corresponding .d.ts file that declares type information. This would avoid making a consuming package dependent on TypeScript at build time. Seems like a nice-to-have.

Implementation-wise in the tool, .d.ts files may be considered similar to the "C/C++" concept of a .h file. It seems like that target is getting close to integration, so we should probably defer this.

Line ending question

I noticed something early in about line ending normalization. I understand the different conventions, but not how you (and git) want to deal with it.

I have seen messages like this, which I don't understand:

C:\code\antlr4ts>git add package.json
warning: LF will be replaced by CRLF in package.json.
The file will have its original line endings in your working directory.

I think the npm tool created the file. Is there anything I should do differently?

Check the intent of this assert is implemented

AbstractEdgeMap.ts, line 44 the assert and comment seem mismatched, given the semantics of JavaScript's number type...

        // the allowed range (with minIndex and maxIndex inclusive) should be less than 2^32
        assert(maxIndex - minIndex + 1 >= 0);

Optimize ATNConfig types for JavaScript memory behavior

The implementation of ATNConfig uses two primary strategies for optimizing memory for this heavily-allocated type:

Derived types are used to avoid storing unnecessary data where possible.
- Most ATNConfig instances use SemanticContext.NONE as the semantic context, so storage is not provided for anything else in the base type.
- Only ATNConfig instances in the lexer need to store a LexerActionExecutor, so storage is not provided for it in the base type (many more instances are allocated by the parser than the lexer).
Bit fields are used to store three base properties in a single number.

The design decisions for optimizing memory are currently interleaved with design decisions for algorithmic correctness without explanation. See comments in the code review for #108 for several specific requests related to improving clarity.

📝 This issue specifically relates to clarity and efficiency with respect to the performance characteristics of ATNConfig. A separate issue will be filed for questions related to the correctness characteristics of ATNConfig.

Consider bounding hash code calculation to 31 bits

As mentioned in #86, JavaScript runtimes box integers which have the bit 0x80000000 set. Consider updating hashCode implementations to mask intermediate values and/or results to avoid this scenario.

ANTLR tool code generation for TypeScript

To wrap this up, we'll need to provide a new code generator for the tool. I was thinking that it should probably be usable for JavaScript tools without a TypeScript dependency. To do this the tool should emit .js files, with added .d.ts files to enable TypeScript users to benefit from the typings.

This probably belongs in a later milestone.

Perf: measure if SparseEdgeMap performs any better than JavaScripts native Map

I'm guessing this might be perf critical.

Similarly consider if ArrayEdgeMap might derive from Array rather than include it.

Maximize performance where "optional modules" are used in PredictionContext

Currently PredictionContext uses the "Optional Modules" strategy for loading derived types only when/where they are needed. These loads are located in performance-critical code so we should figure out how to best implement the required functionality.

See comments in #36.

Type derived from Array2DHashSet generates runtime error

On branch ex, the purpose was to convert the exception types, and I added a few stub classes to misc/Stubs.ts. One of those stubs, ATNConfigSet, is generating a runtime error when imported, but I haven't figured out why... Now that I think about it, perhaps this is one of those circular-dependency problems.

Here's the message I get:

PS C:\try\antlr4ts\src\atn> npm test

> [email protected] test C:\try\antlr4ts
> tsc & mocha


C:\try\antlr4ts\src\misc\Stubs.ts:197
export class ATNConfigSet extends Array2DHashSet<ATNConfig> {
                                  ^
TypeError: Class extends value undefined is not a function or null
    at Object.<anonymous> (C:\try\antlr4ts\src\misc\Stubs.ts:197:35)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.require (module.js:483:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (C:\try\antlr4ts\src\misc\DefaultEqualityComparator.ts:32:1)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.require (module.js:483:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (C:\try\antlr4ts\src\misc\Array2DHashSet.ts:35:1)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.require (module.js:483:17)
    at require (internal/module.js:20:19)
    at Object.<anonymous> (C:\try\antlr4ts\test\TestArray2DHashSet.ts:3:1)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.require (module.js:483:17)
    at require (internal/module.js:20:19)
    at C:\try\antlr4ts\node_modules\mocha\lib\mocha.js:220:27
    at Array.forEach (native)
    at Mocha.loadFiles (C:\try\antlr4ts\node_modules\mocha\lib\mocha.js:217:14)
    at Mocha.run (C:\try\antlr4ts\node_modules\mocha\lib\mocha.js:485:10)
    at Object.<anonymous> (C:\try\antlr4ts\node_modules\mocha\bin\_mocha:457:18)
    at Module._compile (module.js:556:32)
    at Object.Module._extensions..js (module.js:565:10)
    at Module.load (module.js:473:32)
    at tryModuleLoad (module.js:432:12)
    at Function.Module._load (module.js:424:3)
    at Module.runMain (module.js:590:10)
    at run (bootstrap_node.js:394:7)
    at startup (bootstrap_node.js:149:9)
    at bootstrap_node.js:509:3
npm ERR! Test failed.  See above for more details.

Add a linter, probably tslint

Consider using non-null arguments for VocabularyImpl constructor

The VocabularyImpl constructor in #34 was ported rather literally from the original Java code. Since we don't have to worry about legacy generated code in TypeScript (prior to the time when Vocabulary was introduced), it might not be necessary to support null values for these constructor arguments. After we have completed more of the port, we can examine all uses of VocabularyImpl in generated code and the runtime library to determine if one of the following approaches would be better.

Using non-null arguments, and require all values be passed by the caller to the constructor
Allowing one or more arguments to be omitted, but providing default values for the parameters so the resulting call is made with non-null arguments

📝 First mentioned by @BurtHarris in a comment in #34.

Add tryGet* methods

Several methods which return null in the Java code were updated to validate their arguments and throw exceptions if information is missing. Methods should be added for these which do not throw, but instead return undefined. These methods include:

RuleNode.tryGetChild (non-throwing form of RuleNode.getChild)
ParserRuleContext.tryGetToken (non-throwing form of ParserRuleContext.getToken)
ParserRuleContext.tryGetRuleContext (non-throwing form of ParserRuleContext.getRuleContext)

Update submodule reference to point at tunnelvisionlabs group

Currently the home of the optimized fork of ANTLR 4 is the sharwell/antlr4 repository. After this is moved so tunnelvisionlabs/antlr4 is the home, the submodule in this repository should be updated to point at the new location.

Search/destroy throw "literal string"

At a minimum, this should be throw new Error("literal string").

Verify Lexer.MAX_CHAR_VALUE after Lexer is ported

The Java version relies on Character.MAX_VALUE instead of Lexer.MAX_CHAR_VALUE to establish the bounds on transitions in LexerATNSimulator. The TypeScript target uses Lexer.MAX_CHAR_VALUE instead. After the port is complete, verify that Lexer.MAX_CHAR_VALUE in the TypeScript target is set to 0xFFFF, which is higher than the value seen in the Java target.

Consider readonly properties in interfaces (replacing getXxx())

Currently the Tree interface contains the following method (#5):

getParent(): Tree;

We should consider using the following form, where a property is exposed instead of a method:

readonly parent: Tree;

Consider removing AbstractEqualityComparator<T>

This class may not be useful in the TypeScript target. See comments in #6.

📝 This, along with other items labeled ts-flavor, will be addressed after the port is initially complete and working.

Document minDfaEdge and maxDfaEdge

These constants control the selection of EdgeMap implementations to minimize the overall amount of memory required to store the DFA. In addition to documenting them, the specific heuristic used to choose between SparseEdgeMap and ArrayEdgeMap likely needs to be updated for TypeScript, since JavaScript implementations use a different underlying data structure than the JVM for the type T[].

In addition to minDfaEdge and maxDfaEdge, there are also two constants (0 and 200) used when creating initial states for a precedence DFA. The meaning of these constants should be described in code.

📝 See also comment(s) by @BurtHarris in #107.

Document the precedence DFA identification code in the DFA constructor

This special case in ANTLR is not obvious to users who are used to a more classical DFA implementation. It should be documented so the implementation is clear(er) to readers.

📝 See comments during review of #107.

In ATNConfigSet, is mergedConfigs being a JavaScript Map OK?

I noticed that in the member mergedConfigs is declared a Map with { state: number, alt: number } as a key. JavaScript maps are by identity for object as keys, so this seems suspect. Should this be a Array2DHashMap with a special EqualityComparer?

Add leading underscore to private/protected members

After the sources are ported, it would be a good idea for us to go through all the non-public properties and methods, adding a leading underscore to the names. This is a convention in JavaScript indicating a member is a implementation detail. Might be good to do for some public variables that are only public due to lack of friend or nested class escapes to private/protected.

Don't block the event loop...

In JavaScript asynchronous I/O is the rule, rather than the exception. This is important in many contexts because JavaScript generally runs in a single-threaded environment with an event loop. To remain responsive, that event loop should not be blocked while waiting for input.

IMHO its too much work to recast ANTLR's API to allow it to do asynchronous I/O in JavaScript environments. To remain compliant with JavaScript environment norms, it probably make sense to avoid doing any I/O in the antlr4ts runtime , and expect callers deal with I/O.

Rather than follow the Java this pattern, I suggest that the only implementation of IntStream/CharStream we should provide by default would be a simple wrapper over a string containing the entire input stream!. That way we can move any blocking I/O operations to the tests, and/or testrig.

Background: The Java Model (with threads) isn't the JavaScript way.

The JavaScript asynchronous I/O model is different from the assumptions of design of ANTLR's Java runtime, in particular UnbufferedCharStream which uses blocking read operations so that it can maintain only a small portion of a file's stream in memory at a time. Directly porting this model to JavaScript would lead to code that is wrong by JavaScript and Node.js standards. See for example http://www.nodewiz.biz/your-doing-node-js-wrong-avoid-synchronous-code/.

The current JavaScript runtime blocks in a constructor...

The current JavaScript runtime for ANTLR4 deals with this in a file named FileStream.js, who's constructor function accepts a file name to read. In the constructor, it uses one of the Node.JS rare synchronous I/O calls, fs.readFileSync(), operation to read the entire file into memory as a string, then passing this data to its superclass (InputStream) constructor which proceeds to copy the string into an array of character codes. It should be noted that `fs.readFileSystem()' is generally only available in NodeJS, and not in browser environments.

Create constants for Array2DHashSet<T>

The implementation of Array2DHashSet<T> contains the following values which are intended to be constants. We should investigate if there is a better way to express this intent, since the properties in their current form are technically mutable.

private static INITAL_CAPACITY: number =  16; // must be power of 2
private static INITAL_BUCKET_CAPACITY: number =  8;
private static LOAD_FACTOR: number =  0.75;

Document the difference between DFA.s0 and DFA.s0full

These fields have special behavior in the optimized runtime, but that behavior is never documented. The behavior of these fields, including the special cases for precedence decisions, should be documented.

📝 See comments from @BurtHarris in #107.

Avoid unnecessary allocations in Array2DHashMap

See comments in #36.

Consider using user-defined type guards to remove casts in DFA

Several expressions in the implementation of DFA require explicit casts to DFAState. Many of these casts are conditioned on the result of isPrecedenceDfa, which means it may be possible to use user-defined type guards to eliminate the need to cast.

📝 See comments in #107.

📝 A similar situation occurs with DFAState.isContextSensitive(). If they are not reviewed at the same time, a separate issue should be opened to examine the use of user-defined type guards for the DFAState case.

Fix doc comments re null.

There number of places where where changes in a return type were made allowing undefined in the return type, but the comments still talk about returning null.

For example: Vocabulary.ts line 82-85:

     * @return The string literal associated with the specified token type, or
     * {@code null} if no string literal is associated with the type.
     */
    getLiteralName(tokenType: number): string | undefined;

After the port work is more stable, let's review and decide if we should update the comments or declarations.

Fix copyright headers and notice prior to public repository

It would be preferable from a maintenance standpoint to continue with a copyright assignment plus a list of contributors. In other words, the copyright notice in each file should be the same. Due to @BurtHarris' contributions to the derived work (the TypeScript target), it makes sense to me to have three names listed for each file.

Terence Parr
Sam Harwell
Burt Harris

Broken test discovery VS Test Explorer and mocha-typescript test

Visual studio gets confused when more than one file with these tests is present with the "Mocha" test framework property. Furthermore, it gets confused and associated the tests with the generated .js file, rather than the .ts source.

I can take another look at this once we get things better merged. For now I suggest we use the command-line tests, via npm test.

Add test code coverage tool

There is one called Istanbul that seems to be used a lot for javascript projects, but I have no particular experience with it. Sam are you interested in tackling this?

Should the argument to ATNDeserializer.deserialize() be a string?

Currently it's a UInt16Array. Seems like it should be string.

Review PredictionMode for unreferenced code

As mentioned in #134, some functionality in PredictionMode might not be needed by this target. Unreferenced code should be removed after we complete the initial code conversion.

Document lexer use of MAX_DFA_EDGE

See comments in #107.

Document the DFA and DFAState constructors

These constructors are not documented, which led to several questions during the review of #107.

Determine if project should use strict null checking

Makes us dependent on TypeScript 2.0, perhaps adding technical risk, but it seems within reach given all the existing decorators in the Java code about this. A lot to read about this, see https://www.typescriptlang.org/docs/release-notes/typescript-2.0.html

Evaluate performance of MurmurHash.update

Currently this method operates on both primitives and objects. Since it is a performance-critical method, we should evaluate if it would improve performance to create separate methods for these two cases and remove the dynamic type checking from the one operating on number.

Found during review of #6.

Consider adding a default hashCode() and equals() to Object.prototype

As I understand it, in JavaScript it's possible to actually add methods to the root Object type's prototype!

We might consider writing a module that adds default implementations of hashCode() and equals(other: any) to Object, and optimized ones to String. At a minimum this might make it easier to write test-cases for the collection classes, possibly eliminate the need for the HashedValueType interface outside those collections.

Because reflection is built-in, the central version might do for many low-frequency cases, with per-class implementations of those methods considered an optimization.

Your thoughts?

Document ATNConfig.toDotString

Not all users are familiar with GraphViz, or that "dot" is a term associated with that application. This method should be properly explained in a documentation comment.

📝 This was requested in #108.

Integrate w/ typedoc

http://typedoc.org/guides/installation/.

Looks like the typedoc project hasn't yet addressed TypeScript 2.0, so this is currently blocked. TypeStrong/typedoc#234

Consider changing DFA.states to Array2DHashSet

It seems that Array2DHashSet.putIfAbsent provides the necessary functionality for the DFA.states property. Since it's a lighter-weight type than Array2DHashMap, we should using it instead. See comment from @BurtHarris in #107.

📝 The comment for this property needs to be updated to reflect the actual final behavior as implemented in TypeScript.

Document the purpose of the DFASerializer class

It appears that DFASerializer and LexerDFASerializer may not be used for a critical purpose in the core of the algorithm, but instead used for emitting debug/diagnostic information. This should be reviewed and noted in the documentation for these types, since it would be a notable contrast especially from the ATNDeserializer type.

📝 See comments in the review of #107.

Avoid unintended whitespace changes

I think these are being introduced by Visual Studio and/or the Reshaper add-in. Seems like there should be a setting to avoid it.

ParseTreePatternMatcher matchImpl

Seems to be an odd condition in the Java reference implementation of ParseTreePatternMatcher I noticed while porting. In that file, there are four if statements like this:

                if (mismatchedNode == null) {
                    mismatchedNode = ...;
                }

What's odd is that prior to testing it for null, no code ever assigns it a non-null value. Is this a bug?

Provide overview documentation for ProfilingATNSimulator

Requested during code review in #154.

Document ATNConfig.contains

This method is currently undocumented.

⚠️ ~~This is a priority documentation item since this method is part of the ATNConfigSet reduction and tighter SLL termination condition implemented by the optimized fork of ANTLR 4.~~

📝 This was originally requested in #108.

Prepare for/export to npmjs.com

There are some change we should probably make in the project.json file so that npm will know about our package. Once that is done, I think we'll be able to use npm link to simplify referring to the package for debugging, samples, etc. And the import directives can avoid some or all of the ..\ prefixes.

I've never done this before, so learning curve is probably about the same for either of us.

Review cases where a parameter default value is null

See comments by @BurtHarris in #55.