antlr / grammars-v4 Goto Github PK
View Code? Open in Web Editor NEWGrammars written for ANTLR v4; expectation that the grammars are free of actions.
License: MIT License
Grammars written for ANTLR v4; expectation that the grammars are free of actions.
License: MIT License
I am used to handle my grammar in this way:
if (ctx.someBlock() != null){
SomeBlockContext bcx = ctx.someBlock();
}
However, if I try to do same with python:
if (p.file_input() != null){
for (StmtContext ctx : p.file_input().stmt())
statements.add(ctx);
}
I get no statements, because they get used by first call to file_input()...
annotationName
: Identifier ('.' Identifier)*
;
qualifiedName
: Identifier ('.' Identifier)*
;
Hi,
if the last line of a Java source file is something like:
// my comment (no new line added)
it fails, I guess it's because LINE_COMMENT makes \n mandatory so there is not a match.
The error is:
line 187:0 extraneous input '/' expecting {, 'interface', 'abstract', 'strictfp', '@', 'class', 'public', 'private', 'final', 'static', 'protected', ';', 'enum'}
Thanks.
input:
def func():
return 42
parser reports:
no error
expected:
An error message that indent before function definition is unexpected/extraneous.
The parse trees generated by the objective c grammar have very long repetitive structure with a lot of superfluous nodes before reaching a leaf inside expressions.
https://github.com/antlr/grammars-v4/blob/master/objc/ObjC.g4
hello,
Can you help me to add advance grammar in verilog for System verilog.
Thanks,
Avdhesh yadav
I know that Antl4 can take LL(k) grammars. I'm just wondering if the Java7 grammar here is using that feature. I'm looking for an LL(1) grammar for Java 7 and would like to base it on something rather than building the whole thing on my own.
having this class
package j7;
public class SimpleTest {
char doSomething(int i1) {
return (char) myMethod(i1);
}
char myMethod(int zz) {
return 'z';
}
}
I produced a tree for return (char) myMethod(i1);
[910·822·813·686·445·417·414·386·294·252·217]
├─[180·910·822·813·686·445·417·414·386·294·252·217]
│ ├─ (
│ ├──[1063·180·910·822·813·686·445·417·414·386·294·252·217]
│ │ └─[589·1063·180·910·822·813·686·445·417·414·386·294·252·217]
│ │ └─ char
│ ├─ )
│ └──[1065·180·910·822·813·686·445·417·414·386·294·252·217]
│ └─[1071·1065·180·910·822·813·686·445·417·414·386·294·252·217]
│ └─ myMethod
├─ (
├──[1165·910·822·813·686·445·417·414·386·294·252·217]
│ └─[1049·1165·910·822·813·686·445·417·414·386·294·252·217]
│ └─[1071·1049·1165·910·822·813·686·445·417·414·386·294·252·217]
│ └─ i1
└─ )
however looks like the tree have to be as listed below:
├─ (
├─ char
└─ )
└─ myMethod
├─ (
├─ i1
└─ )
Am I wrong or missed something?
Related to #106.
EMCAScript.g4 parses true || false && false
as (true || false) && false
.
This should be true || (false && false)
.
Also bit operators &
, ^
and |
are wrong.
1 | 0 & 0
should be 1 | (0 & 0)
.
I found the grammar to be too strict on generic invocations, like
Collections.<String[]>emptyList()
Seems like explicitGenericInvocation rule does not allow an array to be a generic argument. I fixed this to meet my needs by changing the rule to be like this:
explicitGenericInvocation
: typeArguments Identifier arguments
;
but there may be a tighter way.
java -mx512m -jar /projects/hnd_tools/java/lib/antlr4.4/antlr-4.4-complete.jar -Dlanguage=Python3 -no-listener -visitor src/grammar/Verilog2001.g4 -o generated
error(134): Verilog2001.g4:360:0: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:242:73: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:244:72: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:246:72: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:248:52: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:385:16: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:190:29: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:203:28: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:210:38: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:214:58: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:215:58: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:217:39: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:218:38: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:219:33: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:232:40: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:409:72: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:431:46: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:471:51: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:525:48: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:1352:50: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:1397:48: symbol range conflicts with generated code in target language or runtime
make: *** [compile] Error 1
Hi,
I'm getting that error when using preprocessors in the csharp files i'm parsing:
Result StandardError:
line 20:0 missing '}' at '\t#if Testing\r\n'
line 28:0 extraneous input '}' expecting {, 'abstract', 'class', 'delegate', 'enum', 'extern', 'interface', 'internal', 'namespace', 'new', 'override', 'partial', 'private', 'protected', 'public', 'readonly', 'sealed', 'static', 'struct', 'unsafe', 'virtual', 'volatile', '['}
this is the file i'm using to test:
namespace test.test1
{
public class Testing
{
static void Main (string[] args){
#if Testing
String x = "abc";
#else
String y = "xyz";
#endif
String a, b;
}
}
}
I had following vhdl, which failed to parse. I believe it is valid (it synthesizes ok)
seven_seg_led_instance : seven_seg_led port map (
mclk=>mclk,
an(3 downto 0)=>an(3 downto 0),
seg(6 downto 0)=>seg(6 downto 0),
dp=>dp
);
To support this I made the following modification to vhdl.g4
--- a/vhdl/vhdl.g4
+++ b/vhdl/vhdl.g4
@@ -696,7 +696,6 @@ formal_parameter_list
formal_part
: identifier
- | identifier LPAREN explicit_range RPAREN
;
Wasn't sure how to submit a patch, so I offer it here.
Gary
it now recognize
#id {
color:blue
}
but not
#id{ // no space between id and {
color:blue
}
Hi guy,
seems your c.g4 file isn't normal, it even can't recognize #include, pleaes let me know if I have any misunderstanding, thank you in advance.
It can be found here: https://github.com/teverett/phpGrammar
ECMAScript.g4 parse 1 - 2 + 3
as 1 - (2 + 3)
. This should be (1 - 2) + 3
.
The objective-c grammar has no license.
Here is the documentation http://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html
and here is some of the relevant rules:
LambdaExpression:
LambdaParameters -> LambdaBody
LambdaParameters:
Identifier
( [FormalParameterList] )
( InferredFormalParameterList )
LambdaBody:
Expression
Block
I haven't taken the time to figure out exactly adding(& testing) these to the grammar yet, but figured I'd post about it, in case someone else can solve if faster.
Also, I haven't gone through and seen if any other java 8 changes weren't yet added either.
The {Character.func} do not work on targets besides Java.
https://github.com/antlr/grammars-v4/blob/master/java/Java.g4#L983
Maybe something like what is documented here should be employed?
https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Python+Target
ID {$text.equals("test")}?
Unfortunately, this is not portable, but you can work around it. The trick involves:
deriving your parser from a parser you provide, such as BaseParser
implementing utility methods in this BaseParser, such as "isEqualText"
adding a "self" field to the Java/C# BaseParser, and initialize it with "this"
Thanks to the above, you should be able to rewrite the above semantic predicate as follows:
ID {$self.isEqualText($text,"test")}
?
Hi, I really love ANTLR, it's a great tool.
I found that a single message-send statement being parsed into a very deep tree. Obviously, there is no conditional or logical expression in this statement, so I think there is something wrong in the grammar.
[self doSomething];
* 9 expression: '[' - ']'
* 10 assignment_expression: '[' - ']'
* 11 conditional_expression: '[' - ']'
* 12 logical_or_expression: '[' - ']'
* 13 logical_and_expression: '[' - ']'
* 14 inclusive_or_expression: '[' - ']'
* 15 exclusive_or_expression: '[' - ']'
* 16 and_expression: '[' - ']'
* 17 equality_expression: '[' - ']'
* 18 relational_expression: '[' - ']'
* 19 shift_expression: '[' - ']'
* 20 additive_expression: '[' - ']'
* 21 multiplicative_expression: '[' - ']'
* 22 cast_expression: '[' - ']'
* 23 unary_expression: '[' - ']'
* 24 postfix_expression: '[' - ']'
* 25 primary_expression: '[' - ']'
* 26 message_expression: '[' - ']'
* 27 receiver: 'self'
* 28 expression: 'self'
* 29 assignment_expression: 'self'
* 30 conditional_expression: 'self'
* 31 logical_or_expression: 'self'
* 32 logical_and_expression: 'self'
* 33 inclusive_or_expression: 'self'
* 34 exclusive_or_expression: 'self'
* 35 and_expression: 'self'
* 36 equality_expression: 'self'
* 37 relational_expression: 'self'
* 38 shift_expression: 'self'
* 39 additive_expression: 'self'
* 40 multiplicative_expression: 'self'
* 41 cast_expression: 'self'
* 42 unary_expression: 'self'
* 43 postfix_expression: 'self'
* 44 primary_expression: 'self'
* 27 message_selector: 'doSomething'
* 28 selector: 'doSomething'
Thanks.
When running Antlrwork with the Smalltalk.g4 grammar found here there is a parse error when parsing multiple expressions containing unary messages.
Here is the valid Smalltalk:
self class initialize.
self class update.
If the above was reduced to just the first line 'self class initialize.' then all parses without error. However, add the second line in and it fails to parse. Error below:
line 2:0 no viable alternative at input '\nself'
These are the tokens that are being found:
Arguments: [Smalltalk, script, -encoding, UTF-8, -tokens, -tree, -gui, /home/jamesl/dev/antlr/temp.st]
[@0,0:3='self',<22>,1:0]
[@1,4:4=' ',<31>,1:4]
[@2,5:9='class',<27>,1:5]
[@3,10:10=' ',<31>,1:10]
[@4,11:20='initialize',<27>,1:11]
[@5,21:21='.',<2>,1:21]
[@6,22:22='\n',<31>,1:22]
[@7,23:26='self',<22>,2:0]
[@8,27:27=' ',<31>,2:4]
[@9,28:32='class',<27>,2:5]
[@10,33:33=' ',<31>,2:10]
[@11,34:39='update',<27>,2:11]
[@12,40:40='.',<2>,2:17]
[@13,41:41='\n',<31>,2:18]
[@14,42:41='',<-1>,3:19]
(script (sequence ws ws (statements (expressions (expression (binarySend (unarySend (operand (literal (parsetimeLiteral (pseudoVariable self)))) (ws ) (unaryTail (unaryMessage ws (unarySelector class) ) ws (unaryTail (unaryMessage ws (unarySelector initialize) .) ws (ws \n self)) ws)))))) (ws )) class update . \n)
I was expecting the expressionS and expressionList rules to cater for this input.
It is almost like whitespace is gobbling up the \n and self tokens rather than returning from the unaryTail rule.
Please help.
if so I could be interested to help :)
The new Java8 grammar produces a parser that is two orders of magnitude slower than old Java 7 grammar.
This is an example test run, parsing all files in any large java project:
package com.antlr.test.java;
import com.codescore.grammars.Java7Lexer;
import com.codescore.grammars.Java7Parser;
import com.codescore.grammars.Java8Lexer;
import com.codescore.grammars.Java8Parser;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.junit.Test;
import java.io.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class JavaScanDirRawTest {
// A path to a large java project
String path = "/path/to/large/java/project";
@Test
public void scanDirsJava7() throws IOException {
InputStream fileStream;
File rootDir = new File(path);
List<File> javaFiles = getAllFiles(rootDir, "java");
int count = 0;
System.out.println("files:" + javaFiles.size());
Long start = System.currentTimeMillis();
for (File javaFile : javaFiles) {
fileStream = new FileInputStream(javaFile.getPath());
Java7Lexer lexer = new Java7Lexer(new ANTLRInputStream(fileStream));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
Java7Parser parser = new Java7Parser(tokenStream);
Java7Parser.CompilationUnitContext compilationUnit = parser.compilationUnit();
count++;
if (count % 10 == 0)
System.out.println("file: " + count + "/" + javaFiles.size() + " avg_speed: " + 1000 * count / (System.currentTimeMillis() - start) + " files/sec");
}
}
@Test
public void scanDirsJava8() throws IOException {
InputStream fileStream;
File rootDir = new File(path);
List<File> javaFiles = getAllFiles(rootDir, "java");
int count = 0;
System.out.println("files:" + javaFiles.size());
Long start = System.currentTimeMillis();
for (File javaFile : javaFiles) {
fileStream = new FileInputStream(javaFile.getPath());
Java8Lexer lexer = new Java8Lexer(new ANTLRInputStream(fileStream));
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
Java8Parser parser = new Java8Parser(tokenStream);
Java8Parser.CompilationUnitContext compilationUnit = parser.compilationUnit();
count++;
if (count % 10 == 0)
System.out.println("file: " + count + "/" + javaFiles.size() + " avg_speed: " + 1000 * count / (System.currentTimeMillis() - start) + " files/sec");
}
}
private static List<File> getAllFiles(File rootDir, String fileExtension) {
List<File> javaFiles = new ArrayList<>();
collectFilesInDirectoryTree(rootDir, javaFiles, fileExtension);
return javaFiles;
}
private static void collectFilesInDirectoryTree(File directory, List<File> fileList, final String fileExtension) {
File[] files = directory.listFiles(new FilenameFilter() {
@Override
public boolean accept(File dir, String name) {
return name.endsWith("." + fileExtension);
}
});
fileList.addAll(Arrays.asList(files));
File[] dirs = directory.listFiles();
for (File dir : dirs) {
if (dir.isDirectory()) {
collectFilesInDirectoryTree(dir, fileList, fileExtension);
}
}
}
}
there is a line which specifies the grammar name:
grammar vhdl
Could you please rename vhdl to something such as Vhdl or VHDL so that it can be compatible with java?
Java requires that at least the first letter needs to be a capital.
Thanks :)
Java's grammar contains this rule:
fragment
StringCharacters
: StringCharacter+
;
Shouldn't it be non-greedy?
fragment
StringCharacters
: StringCharacter+?
;
gradle test can't continue.
I'll make a pull request.
My StackOverflow question highlights a possible bug: http://stackoverflow.com/questions/24562551/wrong-parsing-with-antlr4s-c-g4.
My target is a generated code for C# and when I generated it using ECMAScript.CSharpTarget.g4, I've found that it raise compilations errors like "The name '_input' does not exist in the current context". So, the name variable '_input' nowhere declared. And I don't know what type it could be to fix it.
Does IDEA plugin support actions for lexer and parser ( @lexer::members, @parser::members )?
Cause why i'm askin my code isn't work: ( my question on stackoverflow has no answers )
@members{ public boolean isAsteriskCommentLine(){ Token t[2]; t[0] = this.getCurrentToken(); t[1] = this.nextToken(); if( ( t[0].getText().equals('\n') || t[0].getText().equals('\r') ) && t[1].getText().equals('*') ) return true; return false; } } ..... COMMENT: {getCharPositionInLine() == 0}?'*' ~[\r\n]*? (('\r'? '\n')| EOF) {isAsteriskCommentLine()}?;
and i cant' understand what i'm doing wrong, book was read, documentation and examples from stackoverflow branches
cannot see that... or is this processed by some other means?
Identifiers in VHDL are restricted with respect to underscores as follows:
Underscores are significant characters in an identifier
and basic identifiers may contain underscores,
but it is not allowed to place an underscore as a first or last character of an identifier.
Moreover, two underscores side by side are not allowed as well.
Underscores are significant characters in an identifier.
The current grammar does not reflect this behaviour. It allows two underscores in a row or as the last character.
Hi,
It seems that the Swift parser generated by ANTLR is taking too long to parse using the GameScene.swift test file.
I'm using this on an Android environment wherein the ANTLR runtime source has been modified to remove all swing components (since they are not supported on Android) and @nullable @NotNull notations. No further code were modified or added yet from the one generated by ANTLR.
Tested using the Java grammar provided here and parsing is significantly faster.
input:
def func():
return 42
# comment
parser reports error:
line 3:11 no viable alternative at input '<EOF>'
expected:
no error
$ antlr4 scala.g4
error(8): scala.g4:34:8: grammar name Scala and file name scala.g4 differ
I believe this should just be a matter of renaming scala.g4
to Scala.g4
.
Hello,
I am very new to ANTLR and to whole parsing thing itself. I was wondering if I could get some help in converting the Python grammar file from ANTLRv3 to ANTLRv4.
I am using the grammar created by Ales Teska as a starting point which was originally created for ANTLR3. I have tried to convert it into ANTLR4. ANTLR4 can generate lexer and parser for the new grammar, but it is not able to parse a python code correct. Any help would be really appreciated.
Here is the grammar file: https://gist.github.com/ajinkyakulkarni/ada5ec1792d25fc1264e
In ANTLRWorks 2, using the Java.g4 grammar with the code below causes
line 18:21 extraneous input 'Main' expecting {'&', '', '[', '<', '--', '!=', '%', '=', '=', '|=', '|', '-=', ',', '-', '(', '&=', '?', '+=', '^=', '++', '^', '.', '+', ';', '&&', '||', '>', '%=', '/=', '/', '==', 'instanceof'}
line 18:30 extraneous input '";\r\n\t\tANTLRInputStream input = new ANTLRInputStream(Directory+filename); \r\n\t\tJavaLexer lexer = new JavaLexer(input);\r\n\t\tCommonTokenStream tokens = new CommonTokenStream(lexer);\r\n\t\tJavaParser parser = new JavaParser(tokens);\r\n\t\tParseTree tree = parser.compilationUnit(); // parse\r\n\t\tSystem.out.println(tree.toStringTree());\r\n\t\tParseTreeWalker walker = new ParseTreeWalker(); // create standard walker\r\n\t\tExtractInterfaceListener extractor = new ExtractInterfaceListener(parser, "' expecting {'&', '', '[', '<', '--', '!=', '%', '=', '=', '|=', '|', '-=', ',', '-', '(', '&=', '?', '+=', '^=', '++', '^', '.', '+', ';', '&&', '||', '>', '%=', '/=', '/', '==', 'instanceof'}
line 26:77 mismatched input 'c' expecting {'&', '', '[', '<', '--', '!=', '%', '=', '=', '|=', '|', '-=', '-', '(', '&=', '?', '+=', '^=', '++', '^', '.', '+', ';', '&&', '||', '>', '%=', '/=', '/', '==', 'instanceof'}
line 31:0 no viable alternative at input 'temp'
package cx.ath.journeyman.JavaTranslator;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
import cx.ath.journeyman.JavaTranslator.generated.JavaLexer;
import cx.ath.journeyman.JavaTranslator.generated.JavaParser;
public class JavaTranslator {
/**
* @param args
*/
public static void main(String[] args) {
String Directory = "D:\\blah\\";
String filename = "Main.java";
ANTLRInputStream input = new ANTLRInputStream(Directory+filename);
JavaLexer lexer = new JavaLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaParser parser = new JavaParser(tokens);
ParseTree tree = parser.compilationUnit(); // parse
System.out.println(tree.toStringTree());
ParseTreeWalker walker = new ParseTreeWalker(); // create standard walker
ExtractInterfaceListener extractor = new ExtractInterfaceListener(parser, "c:\\temp\\"+filename);
walker.walk(extractor, tree); // initiate walk of tree with listener
}
}
cmd in stringliteral can be identified,then raise an error
print "print line1:7 mission STRING at 'print'
or
print "if line1:7 mission STRING at 'if'
or
print "repeat line1:7 mission STRING at 'repeat'
Is it intentional that the grammar accepts constructs such as
public public void myMethod() { }
where a modifier is repeated?
Hi,
I recognized a little bug in the Python3 grammar.
Code:
public static void main(String[] args) {
String input = "def func():\n if";
ANTLRInputStream source = new ANTLRInputStream(input);
Python3Lexer lexer = new Python3Lexer(source);
CommonTokenStream tokens = new CommonTokenStream(lexer);
Python3Parser parser = new Python3Parser(tokens);
Python3Parser.File_inputContext tree = parser.file_input();
}
Expected:
line 0:-1 no viable alternative at input '\n'
Got:
line 0:-1 no viable alternative at input '\n'
Exception in thread "main" java.lang.NullPointerException
at org.antlr.v4.runtime.DefaultErrorStrategy.getMissingSymbol(DefaultErrorStrategy.java:591)
at org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:477)
at org.antlr.v4.runtime.Parser.match(Parser.java:223)
at python.Python3Parser.if_stmt(Python3Parser.java:2959)
at python.Python3Parser.compound_stmt(Python3Parser.java:2857)
at python.Python3Parser.stmt(Python3Parser.java:1253)
at python.Python3Parser.suite(Python3Parser.java:3488)
at python.Python3Parser.funcdef(Python3Parser.java:600)
at python.Python3Parser.compound_stmt(Python3Parser.java:2887)
at python.Python3Parser.stmt(Python3Parser.java:1253)
at python.Python3Parser.file_input(Python3Parser.java:296)
at python.PythonMain.main(PythonMain.java:24)
Best regards,
Michael
C.g4 interprets code int a1;
as declarationSpecfiers
, i.e., the tokens int
and a1
are interpreted as declarationSpecifier
. I expect a1
to be interpreted as a part of initDeclaratorList
. I fixed this by changing the declaration
rule to use initDeclarationList
instead of initDeclarationList?
. But I'm not sure if this breaks something else.
I'm working on it.
I have a big rule for keywords - over 817 keywords, and when i compiled *java files Parser file crashed:
error: code too large for try statement
catch (RecognitionException re) {
^
I found a solution to my code when identifier could be a keyword and have created a rule, example :
identifier: [a-zA-Z]+ | keyword;
And when i looked into parser source i have saw that all keywords were inserted into code 'as is', what's wrong with my approach?
They're missing. I just discovered ANTLR so I have no idea what the difficulty for this task is. I'm also not sure if ANTLR has any built-in barriers that would make parsing these languages impossible.
@parrt , is this a feasible task?
Where would I start looking to write an ANTLR4 grammar for these?
Run this script:
#!/bin/bash
LANG=Java
rm -f ${LANG}*.java ${LANG}*.class ${LANG}*.tokens
antlr4 ${LANG}.g4
javac -classpath /usr/local/Cellar/antlr/4.4/antlr-4.4-complete.jar:. ${LANG}*.java
cat <<EOF | grun ${LANG} compilationUnit -gui
public class Demo {
public int[] intArray;
}
EOF
Note that []
of int[]
are not recognized as array literals:
In comparison this is the tree generated by Java8.g4
:
I'm not sure what exactly is going on, but I notice that even in very simple cases, automatic semicolon insertion is not working properly using the ECMAScript grammar, and that the resulting trees are missing branches, essentially mangling follow-up lines into branch with the missing semicolon. I've created a test case gist here
The resulting output is included, but as long as java/javac is on the PATH, you can run the test case yourself by simply doing
git clone https://gist.github.com/5ff47f7652918e5a5717.git
cd 5ff47f7652918e5a5717
chmod +x run-test.bash
./run-test.bash
It will download antlr 4.5, generate the parser and use it to parse two files. One with semicolons, and one where some are missing. These are the two examples:
function Test() {
o = {};
o.n = "alice";
o.n = null;
o.n = "bob";
}
function Test() {
o = {};
o.n = "alice"
o.n = null
o.n = "bob"
}
In the second example, a lot of errors are printed and the resulting tree is essentially mangling the "null" and "bob" lines into the "alice" branch.
line 4:2 no viable alternative at input 'function Test() {\n o = {};\n o.n = "alice"\n o'
line 1:16 extraneous input '{' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
line 4:2 extraneous input 'o' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
line 5:2 no viable alternative at input '.n = null\n o'
line 5:2 extraneous input 'o' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
line 6:0 no viable alternative at input '.n = "bob"\n}'
line 6:0 mismatched input '}' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
(program (sourceElements sourceElement (sourceElement function) (sourceElement (statement (expressionStatement (expressionSequence (singleExpression (singleExpression Test) (arguments ( )))) <missing ';'>))) (sourceElement (statement (block { (statementList (statement (expressionStatement (expressionSequence (singleExpression (singleExpression o) = (expressionSequence (singleExpression (objectLiteral { }))))) ;)) (statement (expressionStatement (expressionSequence (singleExpression (singleExpression (singleExpression (singleExpression (singleExpression (singleExpression (singleExpression o) . (identifierName n)) = (expressionSequence (singleExpression (literal "alice") o))) . (identifierName n)) = (expressionSequence (singleExpression (literal null) o))) . (identifierName n)) = (expressionSequence (singleExpression (literal "bob"))))) <missing ';'>))) })))) <EOF>)
The gist also include an output of the tree as it should be (with 4 statement branches inside the block, instead of just 2).
I am writing a lexer using eclipse IDE. I want to check if a string contains "null character".
Previously when i tried to write the same in Flex i used "\x00" to detect null character. I tried the same but it did not work.
Currently for string constants i am using the following rule:
STR_CONST : '"'
( ''
( 'n' {buf.append('\n');}
| 't' {buf.append('\t');}
| 'b' {buf.append('\b');}
| 'f' {buf.append('\f');}
| 'r' {buf.append('\r');}
| '"' {buf.append('"');}
| ''' {buf.append(''');}
| '' {buf.append('');}
)
| ~(''|'"') {buf.append((char)_input.LA(-1));}
)*
'"'
I found that the rule "typeName" was never used anywhere in the grammar. Therefore, it would be impossible to parse a statement like this:
java.lang.Object x = 1; //auto-boxing?
BTW, when antlr parse this, what's the depth of look ahead? because at syntax level, it can't distinguish a qualified name from a chained attribute access when "java.lang.Object" is parsed. So only when reached 'x' does it know what it is.
Hi, i have a question about grammar's ambiguities.
I have a rule for comments that should start from '*' of beginning line and this symbol is used for multiply too.
Rule for multiply:
exp: exp '*' exp
| ID ;
Rule for asterisk comment:
ASTERISK_COMMENT
: '\n' '' ~[\r\n]? (('\r'? '\n')|EOF) -> skip
;
ID : [a-zA-Z]+;
and i was adding {getCharPositionInLine() == 0}? to finding start of the line before '\n', cause we can have comment at first position of file
it does not work for me too
i need any advices
But i can't understand why IDEA (Eclipse plugin too) does not produce a right condition for this block and for my code into @lexer:members{} part, look:
public boolean isFirstPosition(){
Token tkn = super.nextToken();
if( 0 == tkn.getCharPositionInLine() )
if( tkn.getText().equals('*') )
return true;
return false;
}
and rule for asterisk_comment
ASTERISK_COMMENT
: '\n' '' ~[\r\n]? (('\r'? '\n')|EOF) {isFirstPosition()}? -> skip
;
may be i need testing via terminal?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.