tunnelvisionlabs / antlr4ts Goto Github PK
View Code? Open in Web Editor NEWOptimized TypeScript target for ANTLR 4
License: Other
Optimized TypeScript target for ANTLR 4
License: Other
I don't have a lot of experience with JavaScript test frameworks, so after reviewing the existing Java tests so I asked for help on stack overflow.
The only answer received so far (from Basarat who has very strong reputation) said that Mocha is the current King of JavaScript Test Frameworks. He cited this as evidence, which looks pretty significant.
Mocha has a lot of features we probably won't need (e.g. async testing), but it looks to me like they won't get in the way. But Mocha seems very adaptable. The default API style quite different from JUnit, but I have seen some projects that are trying to put a decorator-based model on it, like mocha-typescript.
P.S. Basarat also has an interesting-looking IDE project at: http://alm.tools, and TypeScript deep dive e-book at http://basarat.gitbooks.io/typescript/, seems like both are review-worthy.
The intent/behavior of the ATNConfig
class is not obvious to new readers. Many suggestions for improvements were given in the code review for #108.
The intent/behavior of the ATNConfigSet
class is not obvious to new readers. Some suggestions for improvements were given in the code review for #108.
For later consideration...
We could have the tool's output already in JavaScript, while simultaneously emitting a corresponding .d.ts
file that declares type information. This would avoid making a consuming package dependent on TypeScript at build time. Seems like a nice-to-have.
Implementation-wise in the tool, .d.ts
files may be considered similar to the "C/C++" concept of a .h
file. It seems like that target is getting close to integration, so we should probably defer this.
I noticed something early in about line ending normalization. I understand the different conventions, but not how you (and git) want to deal with it.
I have seen messages like this, which I don't understand:
C:\code\antlr4ts>git add package.json
warning: LF will be replaced by CRLF in package.json.
The file will have its original line endings in your working directory.
I think the npm
tool created the file. Is there anything I should do differently?
AbstractEdgeMap.ts, line 44 the assert
and comment seem mismatched, given the semantics of JavaScript's number
type...
// the allowed range (with minIndex and maxIndex inclusive) should be less than 2^32
assert(maxIndex - minIndex + 1 >= 0);
The implementation of ATNConfig
uses two primary strategies for optimizing memory for this heavily-allocated type:
ATNConfig
instances use SemanticContext.NONE
as the semantic context, so storage is not provided for anything else in the base type.ATNConfig
instances in the lexer need to store a LexerActionExecutor
, so storage is not provided for it in the base type (many more instances are allocated by the parser than the lexer).number
.The design decisions for optimizing memory are currently interleaved with design decisions for algorithmic correctness without explanation. See comments in the code review for #108 for several specific requests related to improving clarity.
๐ This issue specifically relates to clarity and efficiency with respect to the performance characteristics of ATNConfig
. A separate issue will be filed for questions related to the correctness characteristics of ATNConfig
.
As mentioned in #86, JavaScript runtimes box integers which have the bit 0x80000000 set. Consider updating hashCode
implementations to mask intermediate values and/or results to avoid this scenario.
To wrap this up, we'll need to provide a new code generator for the tool. I was thinking that it should probably be usable for JavaScript tools without a TypeScript dependency. To do this the tool should emit .js
files, with added .d.ts
files to enable TypeScript users to benefit from the typings.
This probably belongs in a later milestone.
I'm guessing this might be perf critical.
Similarly consider if ArrayEdgeMap might derive from Array rather than include it.
Currently PredictionContext
uses the "Optional Modules" strategy for loading derived types only when/where they are needed. These loads are located in performance-critical code so we should figure out how to best implement the required functionality.
See comments in #36.
On branch ex, the purpose was to convert the exception types, and I added a few stub classes to misc/Stubs.ts
. One of those stubs, ATNConfigSet
, is generating a runtime error when imported, but I haven't figured out why... Now that I think about it, perhaps this is one of those circular-dependency problems.
Here's the message I get:
PS C:\try\antlr4ts\src\atn> npm test
> [email protected] test C:\try\antlr4ts
> tsc & mocha
C:\try\antlr4ts\src\misc\Stubs.ts:197
export class ATNConfigSet extends Array2DHashSet<ATNConfig> {
^
TypeError: Class extends value undefined is not a function or null
at Object.<anonymous> (C:\try\antlr4ts\src\misc\Stubs.ts:197:35)
at Module._compile (module.js:556:32)
at Object.Module._extensions..js (module.js:565:10)
at Module.load (module.js:473:32)
at tryModuleLoad (module.js:432:12)
at Function.Module._load (module.js:424:3)
at Module.require (module.js:483:17)
at require (internal/module.js:20:19)
at Object.<anonymous> (C:\try\antlr4ts\src\misc\DefaultEqualityComparator.ts:32:1)
at Module._compile (module.js:556:32)
at Object.Module._extensions..js (module.js:565:10)
at Module.load (module.js:473:32)
at tryModuleLoad (module.js:432:12)
at Function.Module._load (module.js:424:3)
at Module.require (module.js:483:17)
at require (internal/module.js:20:19)
at Object.<anonymous> (C:\try\antlr4ts\src\misc\Array2DHashSet.ts:35:1)
at Module._compile (module.js:556:32)
at Object.Module._extensions..js (module.js:565:10)
at Module.load (module.js:473:32)
at tryModuleLoad (module.js:432:12)
at Function.Module._load (module.js:424:3)
at Module.require (module.js:483:17)
at require (internal/module.js:20:19)
at Object.<anonymous> (C:\try\antlr4ts\test\TestArray2DHashSet.ts:3:1)
at Module._compile (module.js:556:32)
at Object.Module._extensions..js (module.js:565:10)
at Module.load (module.js:473:32)
at tryModuleLoad (module.js:432:12)
at Function.Module._load (module.js:424:3)
at Module.require (module.js:483:17)
at require (internal/module.js:20:19)
at C:\try\antlr4ts\node_modules\mocha\lib\mocha.js:220:27
at Array.forEach (native)
at Mocha.loadFiles (C:\try\antlr4ts\node_modules\mocha\lib\mocha.js:217:14)
at Mocha.run (C:\try\antlr4ts\node_modules\mocha\lib\mocha.js:485:10)
at Object.<anonymous> (C:\try\antlr4ts\node_modules\mocha\bin\_mocha:457:18)
at Module._compile (module.js:556:32)
at Object.Module._extensions..js (module.js:565:10)
at Module.load (module.js:473:32)
at tryModuleLoad (module.js:432:12)
at Function.Module._load (module.js:424:3)
at Module.runMain (module.js:590:10)
at run (bootstrap_node.js:394:7)
at startup (bootstrap_node.js:149:9)
at bootstrap_node.js:509:3
npm ERR! Test failed. See above for more details.
The VocabularyImpl
constructor in #34 was ported rather literally from the original Java code. Since we don't have to worry about legacy generated code in TypeScript (prior to the time when Vocabulary
was introduced), it might not be necessary to support null values for these constructor arguments. After we have completed more of the port, we can examine all uses of VocabularyImpl
in generated code and the runtime library to determine if one of the following approaches would be better.
๐ First mentioned by @BurtHarris in a comment in #34.
Several methods which return null
in the Java code were updated to validate their arguments and throw exceptions if information is missing. Methods should be added for these which do not throw, but instead return undefined
. These methods include:
RuleNode.tryGetChild
(non-throwing form of RuleNode.getChild
)ParserRuleContext.tryGetToken
(non-throwing form of ParserRuleContext.getToken
)ParserRuleContext.tryGetRuleContext
(non-throwing form of ParserRuleContext.getRuleContext
)Currently the home of the optimized fork of ANTLR 4 is the sharwell/antlr4 repository. After this is moved so tunnelvisionlabs/antlr4 is the home, the submodule in this repository should be updated to point at the new location.
At a minimum, this should be throw new Error("literal string").
The Java version relies on Character.MAX_VALUE
instead of Lexer.MAX_CHAR_VALUE
to establish the bounds on transitions in LexerATNSimulator
. The TypeScript target uses Lexer.MAX_CHAR_VALUE
instead. After the port is complete, verify that Lexer.MAX_CHAR_VALUE
in the TypeScript target is set to 0xFFFF, which is higher than the value seen in the Java target.
Currently the Tree
interface contains the following method (#5):
getParent(): Tree;
We should consider using the following form, where a property is exposed instead of a method:
readonly parent: Tree;
This class may not be useful in the TypeScript target. See comments in #6.
๐ This, along with other items labeled ts-flavor, will be addressed after the port is initially complete and working.
These constants control the selection of EdgeMap
implementations to minimize the overall amount of memory required to store the DFA. In addition to documenting them, the specific heuristic used to choose between SparseEdgeMap
and ArrayEdgeMap
likely needs to be updated for TypeScript, since JavaScript implementations use a different underlying data structure than the JVM for the type T[]
.
In addition to minDfaEdge
and maxDfaEdge
, there are also two constants (0
and 200
) used when creating initial states for a precedence DFA. The meaning of these constants should be described in code.
๐ See also comment(s) by @BurtHarris in #107.
This special case in ANTLR is not obvious to users who are used to a more classical DFA implementation. It should be documented so the implementation is clear(er) to readers.
๐ See comments during review of #107.
I noticed that in the member mergedConfigs
is declared a Map
with { state: number, alt: number }
as a key. JavaScript maps are by identity for object as keys, so this seems suspect. Should this be a Array2DHashMap
with a special EqualityComparer
?
After the sources are ported, it would be a good idea for us to go through all the non-public properties and methods, adding a leading underscore to the names. This is a convention in JavaScript indicating a member is a implementation detail. Might be good to do for some public variables that are only public due to lack of friend
or nested class escapes to private/protected.
In JavaScript asynchronous I/O is the rule, rather than the exception. This is important in many contexts because JavaScript generally runs in a single-threaded environment with an event loop. To remain responsive, that event loop should not be blocked while waiting for input.
IMHO its too much work to recast ANTLR's API to allow it to do asynchronous I/O in JavaScript environments. To remain compliant with JavaScript environment norms, it probably make sense to avoid doing any I/O in the antlr4ts
runtime , and expect callers deal with I/O.
Rather than follow the Java this pattern, I suggest that the only implementation of IntStream/CharStream
we should provide by default would be a simple wrapper over a string containing the entire input stream!. That way we can move any blocking I/O operations to the tests, and/or testrig.
The JavaScript asynchronous I/O model is different from the assumptions of design of ANTLR's Java runtime, in particular UnbufferedCharStream
which uses blocking read operations so that it can maintain only a small portion of a file's stream in memory at a time. Directly porting this model to JavaScript would lead to code that is wrong by JavaScript and Node.js standards. See for example http://www.nodewiz.biz/your-doing-node-js-wrong-avoid-synchronous-code/.
The current JavaScript runtime for ANTLR4 deals with this in a file named FileStream.js
, who's constructor function accepts a file name to read. In the constructor, it uses one of the Node.JS rare synchronous I/O calls, fs.readFileSync()
, operation to read the entire file into memory as a string, then passing this data to its superclass (InputStream
) constructor which proceeds to copy the string into an array of character codes. It should be noted that `fs.readFileSystem()' is generally only available in NodeJS, and not in browser environments.
The implementation of Array2DHashSet<T>
contains the following values which are intended to be constants. We should investigate if there is a better way to express this intent, since the properties in their current form are technically mutable.
private static INITAL_CAPACITY: number = 16; // must be power of 2
private static INITAL_BUCKET_CAPACITY: number = 8;
private static LOAD_FACTOR: number = 0.75;
These fields have special behavior in the optimized runtime, but that behavior is never documented. The behavior of these fields, including the special cases for precedence decisions, should be documented.
๐ See comments from @BurtHarris in #107.
See comments in #36.
Several expressions in the implementation of DFA
require explicit casts to DFAState
. Many of these casts are conditioned on the result of isPrecedenceDfa
, which means it may be possible to use user-defined type guards to eliminate the need to cast.
๐ See comments in #107.
๐ A similar situation occurs with DFAState.isContextSensitive()
. If they are not reviewed at the same time, a separate issue should be opened to examine the use of user-defined type guards for the DFAState
case.
There number of places where where changes in a return type were made allowing undefined
in the return type, but the comments still talk about returning null.
For example: Vocabulary.ts line 82-85:
* @return The string literal associated with the specified token type, or
* {@code null} if no string literal is associated with the type.
*/
getLiteralName(tokenType: number): string | undefined;
After the port work is more stable, let's review and decide if we should update the comments or declarations.
It would be preferable from a maintenance standpoint to continue with a copyright assignment plus a list of contributors. In other words, the copyright notice in each file should be the same. Due to @BurtHarris' contributions to the derived work (the TypeScript target), it makes sense to me to have three names listed for each file.
Visual studio gets confused when more than one file with these tests is present with the "Mocha" test framework property. Furthermore, it gets confused and associated the tests with the generated .js file, rather than the .ts source.
I can take another look at this once we get things better merged. For now I suggest we use the command-line tests, via npm test
.
There is one called Istanbul that seems to be used a lot for javascript projects, but I have no particular experience with it. Sam are you interested in tackling this?
Currently it's a UInt16Array
. Seems like it should be string
.
As mentioned in #134, some functionality in PredictionMode
might not be needed by this target. Unreferenced code should be removed after we complete the initial code conversion.
See comments in #107.
These constructors are not documented, which led to several questions during the review of #107.
Makes us dependent on TypeScript 2.0, perhaps adding technical risk, but it seems within reach given all the existing decorators in the Java code about this. A lot to read about this, see https://www.typescriptlang.org/docs/release-notes/typescript-2.0.html
Currently this method operates on both primitives and objects. Since it is a performance-critical method, we should evaluate if it would improve performance to create separate methods for these two cases and remove the dynamic type checking from the one operating on number
.
Found during review of #6.
As I understand it, in JavaScript it's possible to actually add methods to the root Object type's prototype!
We might consider writing a module that adds default implementations of hashCode()
and equals(other: any)
to Object, and optimized ones to String. At a minimum this might make it easier to write test-cases for the collection classes, possibly eliminate the need for the HashedValueType interface outside those collections.
Because reflection is built-in, the central version might do for many low-frequency cases, with per-class implementations of those methods considered an optimization.
Your thoughts?
Not all users are familiar with GraphViz, or that "dot" is a term associated with that application. This method should be properly explained in a documentation comment.
๐ This was requested in #108.
http://typedoc.org/guides/installation/.
Looks like the typedoc project hasn't yet addressed TypeScript 2.0, so this is currently blocked. TypeStrong/typedoc#234
It seems that Array2DHashSet.putIfAbsent
provides the necessary functionality for the DFA.states
property. Since it's a lighter-weight type than Array2DHashMap
, we should using it instead. See comment from @BurtHarris in #107.
๐ The comment for this property needs to be updated to reflect the actual final behavior as implemented in TypeScript.
It appears that DFASerializer
and LexerDFASerializer
may not be used for a critical purpose in the core of the algorithm, but instead used for emitting debug/diagnostic information. This should be reviewed and noted in the documentation for these types, since it would be a notable contrast especially from the ATNDeserializer
type.
๐ See comments in the review of #107.
I think these are being introduced by Visual Studio and/or the Reshaper add-in. Seems like there should be a setting to avoid it.
Seems to be an odd condition in the Java reference implementation of ParseTreePatternMatcher
I noticed while porting. In that file, there are four if statements like this:
if (mismatchedNode == null) {
mismatchedNode = ...;
}
What's odd is that prior to testing it for null, no code ever assigns it a non-null value. Is this a bug?
Requested during code review in #154.
This method is currently undocumented.
This is a priority documentation item since this method is part of the ATNConfigSet
reduction and tighter SLL termination condition implemented by the optimized fork of ANTLR 4.
๐ This was originally requested in #108.
There are some change we should probably make in the project.json
file so that npm
will know about our package. Once that is done, I think we'll be able to use npm link
to simplify referring to the package for debugging, samples, etc. And the import
directives can avoid some or all of the ..\
prefixes.
I've never done this before, so learning curve is probably about the same for either of us.
See comments by @BurtHarris in #55.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.