antlr / grammars-v4 Goto Github PK

Grammars written for ANTLR v4; expectation that the grammars are free of actions.

License: MIT License

ANTLR 72.33% Shell 1.59% Java 13.71% Makefile 0.02% Python 5.64% Swift 0.10% C# 2.23% JavaScript 0.85% Lex 0.04% Yacc 0.05% Go 0.41% C++ 0.90% TypeScript 0.59% C 0.07% sed 0.01% Batchfile 0.01% PowerShell 0.96% Dart 0.26% CMake 0.17% PHP 0.08%

hacktoberfest

grammars-v4's Introduction

Grammars-v4

This repository is a collection of formal grammars written for ANTLR v4

The root directory name is the all-lowercase name of the language or file format parsed by the grammar. For example, java, cpp, csharp, c, etc...

FAQ

Please refer to the grammars-v4 Wiki

Code of Conduct

Please refer to House Rules

grammars-v4's People

Contributors

Stargazers

Watchers

Forkers

hardtoe fivlasz clarkhale wmedvede sc13-bioinf martinshiu hnflee zoltandudas gcaprino sushilshah werney alexkit miho jinghenew wibisonoadi christianwulf hunter2046 huck shoreray shimonure dafanasiev shangmacun dawidvc nezhazheng pandaconstantin stepjacky marcellodesales michaelwang fenollp exkazuu minskybelieve w3ss tjad hmaz4629 helloworld2688 qi21 dirijor pullonly-nrift chibuisimaduka joyjreij sabrilatiff btengineer rajeshnair gerhobbelt adijo 3h-william atomcoderphp gillani0 sgarg7 yazoo178 umerazad danscu lrlucena rtvt123 vshlos wongtai siathalysedi kssd westwoodler ssledz showgood aotianji1238 jenshumrich chwlili shenbai diegozaccai sowjanya-kotha wanggang-yuanqun cmejia lxnnao vonwenm hpham04 pkrall junk0612 issingle mawalker diullei kferrio yuxinghan ewcole normandunbar vitorelli stanleyman rac7 52jolynn tempbottle neildg teocci nathansgreen azbarcea samujjwal davagrci zhouligong duythanhphan jasongorman jjedmorianktah w32zhong liyuankui abrobston kriskon

grammars-v4's Issues

scss, the gradle script can't confine with latest gradle

gradle test can't continue.

I'll make a pull request.

Is someone mantaining a JavaParser based on the Java grammar presented here?

if so I could be interested to help :)

Add a license to the objective-c grammar

The objective-c grammar has no license.

scss, doesn't recognize identifier before block under some situation

it now recognize

#id {
color:blue
}

but not

#id{  // no space between id and {
color:blue
}

Python3 Grammar: Comment at end of input causes "no viable alternative"

input:

def func():
  return 42
  # comment

parser reports error:

line 3:11 no viable alternative at input '<EOF>'

expected:
no error

Wrong parsing with C.g4

C.g4 interprets code int a1; as declarationSpecfiers, i.e., the tokens int and a1 are interpreted as declarationSpecifier. I expect a1 to be interpreted as a part of initDeclaratorList. I fixed this by changing the declaration rule to use initDeclarationList instead of initDeclarationList?. But I'm not sure if this breaks something else.

too strict on explicit generic invocations

I found the grammar to be too strict on generic invocations, like

Collections.<String[]>emptyList()

Seems like explicitGenericInvocation rule does not allow an array to be a generic argument. I fixed this to meet my needs by changing the rule to be like this:

explicitGenericInvocation
: typeArguments Identifier arguments
;

but there may be a tighter way.

Possible error in Java.g4

I found that the rule "typeName" was never used anywhere in the grammar. Therefore, it would be impossible to parse a statement like this:

java.lang.Object x = 1; //auto-boxing?

BTW, when antlr parse this, what's the depth of look ahead? because at syntax level, it can't distinguish a qualified name from a chained attribute access when "java.lang.Object" is parsed. So only when reached 'x' does it know what it is.

vhdl (lowercase) is not an acceptable java class

there is a line which specifies the grammar name:

grammar vhdl

Could you please rename vhdl to something such as Vhdl or VHDL so that it can be compatible with java?
Java requires that at least the first letter needs to be a capital.

Thanks :)

Parse error with the attached Java file

In ANTLRWorks 2, using the Java.g4 grammar with the code below causes

line 18:21 extraneous input 'Main' expecting {'&', '', '[', '<', '--', '!=', '%', '=', '=', '|=', '|', '-=', ',', '-', '(', '&=', '?', '+=', '^=', '++', '^', '.', '+', ';', '&&', '||', '>', '%=', '/=', '/', '==', 'instanceof'}
line 18:30 extraneous input '";\r\n\t\tANTLRInputStream input = new ANTLRInputStream(Directory+filename); \r\n\t\tJavaLexer lexer = new JavaLexer(input);\r\n\t\tCommonTokenStream tokens = new CommonTokenStream(lexer);\r\n\t\tJavaParser parser = new JavaParser(tokens);\r\n\t\tParseTree tree = parser.compilationUnit(); // parse\r\n\t\tSystem.out.println(tree.toStringTree());\r\n\t\tParseTreeWalker walker = new ParseTreeWalker(); // create standard walker\r\n\t\tExtractInterfaceListener extractor = new ExtractInterfaceListener(parser, "' expecting {'&', '', '[', '<', '--', '!=', '%', '=', '=', '|=', '|', '-=', ',', '-', '(', '&=', '?', '+=', '^=', '++', '^', '.', '+', ';', '&&', '||', '>', '%=', '/=', '/', '==', 'instanceof'}
line 26:77 mismatched input 'c' expecting {'&', '', '[', '<', '--', '!=', '%', '=', '=', '|=', '|', '-=', '-', '(', '&=', '?', '+=', '^=', '++', '^', '.', '+', ';', '&&', '||', '>', '%=', '/=', '/', '==', 'instanceof'}
line 31:0 no viable alternative at input 'temp'

package cx.ath.journeyman.JavaTranslator;

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.ParseTreeWalker;

import cx.ath.journeyman.JavaTranslator.generated.JavaLexer;
import cx.ath.journeyman.JavaTranslator.generated.JavaParser;

public class JavaTranslator {

/**
 * @param args
 */
public static void main(String[] args) {
    String Directory = "D:\\blah\\";
    String filename = "Main.java";
    ANTLRInputStream input = new ANTLRInputStream(Directory+filename); 
    JavaLexer lexer = new JavaLexer(input);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    JavaParser parser = new JavaParser(tokens);
    ParseTree tree = parser.compilationUnit(); // parse
    System.out.println(tree.toStringTree());
    ParseTreeWalker walker = new ParseTreeWalker(); // create standard walker
    ExtractInterfaceListener extractor = new ExtractInterfaceListener(parser, "c:\\temp\\"+filename);
    walker.walk(extractor, tree); // initiate walk of tree with listener
}

}

Swift grammar taking too long to parse

Hi,

It seems that the Swift parser generated by ANTLR is taking too long to parse using the GameScene.swift test file.

I'm using this on an Android environment wherein the ANTLR runtime source has been modified to remove all swing components (since they are not supported on Android) and @nullable @NotNull notations. No further code were modified or added yet from the one generated by ANTLR.

Tested using the Java grammar provided here and parsing is significantly faster.

Java.g4 repeated modifier ?

Is it intentional that the grammar accepts constructs such as

public public void myMethod() { }

where a modifier is repeated?

How to detect null character in string?

I am writing a lexer using eclipse IDE. I want to check if a string contains "null character".
Previously when i tried to write the same in Flex i used "\x00" to detect null character. I tried the same but it did not work.
Currently for string constants i am using the following rule:
STR_CONST : '"'
( ''
( 'n' {buf.append('\n');}
| 't' {buf.append('\t');}
| 'b' {buf.append('\b');}
| 'f' {buf.append('\f');}
| 'r' {buf.append('\r');}
| '"' {buf.append('"');}
| ''' {buf.append(''');}
| '' {buf.append('');}
)
| ~(''|'"') {buf.append((char)_input.LA(-1));}
)*
'"'

"error(8): scala.g4:34:8: grammar name Scala and file name scala.g4 differ"

$ antlr4 scala.g4
error(8): scala.g4:34:8: grammar name Scala and file name scala.g4 differ

I believe this should just be a matter of renaming scala.g4 to Scala.g4.

String in Java grammar

Java's grammar contains this rule:

fragment
StringCharacters
    :   StringCharacter+
    ;

Shouldn't it be non-greedy?

fragment
StringCharacters
    :   StringCharacter+?
    ;

Is the Java7 Grammar LL(1)?

I know that Antl4 can take LL(k) grammars. I'm just wondering if the Java7 grammar here is using that feature. I'm looking for an LL(1) grammar for Java 7 and would like to base it on something rather than building the whole thing on my own.

provide less css grammar

I'm working on it.

I'm getting an error with C# Preprocessors

Hi,

I'm getting that error when using preprocessors in the csharp files i'm parsing:

Result StandardError:
line 20:0 missing '}' at '\t#if Testing\r\n'
line 28:0 extraneous input '}' expecting {, 'abstract', 'class', 'delegate', 'enum', 'extern', 'interface', 'internal', 'namespace', 'new', 'override', 'partial', 'private', 'protected', 'public', 'readonly', 'sealed', 'static', 'struct', 'unsafe', 'virtual', 'volatile', '['}

this is the file i'm using to test:
namespace test.test1
{
public class Testing
{
static void Main (string[] args){
#if Testing
String x = "abc";
#else
String y = "xyz";
#endif
String a, b;
}
}
}

Wrong parsing with C.g4 (2)

My StackOverflow question highlights a possible bug: http://stackoverflow.com/questions/24562551/wrong-parsing-with-antlr4s-c-g4.

Python3 Grammar: Unexpected indent not recognized

input:

  def func():
  return 42

parser reports:
no error
expected:
An error message that indent before function definition is unexpected/extraneous.

Code too large for try statement

I have a big rule for keywords - over 817 keywords, and when i compiled *java files Parser file crashed:

error: code too large for try statement
catch (RecognitionException re) {
^

I found a solution to my code when identifier could be a keyword and have created a rule, example :
identifier: [a-zA-Z]+ | keyword;

And when i looked into parser source i have saw that all keywords were inserted into code 'as is', what's wrong with my approach?

Java.g4: Arrays are not being recognized

Run this script:

#!/bin/bash

LANG=Java
rm -f ${LANG}*.java ${LANG}*.class ${LANG}*.tokens
antlr4 ${LANG}.g4
javac -classpath /usr/local/Cellar/antlr/4.4/antlr-4.4-complete.jar:. ${LANG}*.java

cat <<EOF | grun ${LANG} compilationUnit -gui
public class Demo {
    public int[] intArray;
}
EOF

Note that [] of int[] are not recognized as array literals:

In comparison this is the tree generated by Java8.g4:

Python grammar for ANTLR4

Hello,

I am very new to ANTLR and to whole parsing thing itself. I was wondering if I could get some help in converting the Python grammar file from ANTLRv3 to ANTLRv4.

I am using the grammar created by Ales Teska as a starting point which was originally created for ANTLR3. I have tried to convert it into ANTLR4. ANTLR4 can generate lexer and parser for the new grammar, but it is not able to parse a python code correct. Any help would be really appreciated.

Here is the grammar file: https://gist.github.com/ajinkyakulkarni/ada5ec1792d25fc1264e

ECMAScript.g4 parse 1 - 2 + 3 as 1 - (2 + 3)

ECMAScript.g4 parse 1 - 2 + 3 as 1 - (2 + 3). This should be (1 - 2) + 3.

IDEA ANLTR4 plugin and lexer and parser actions

Does IDEA plugin support actions for lexer and parser ( @lexer::members, @parser::members )?

Cause why i'm askin my code isn't work: ( my question on stackoverflow has no answers )

@members{

    public boolean isAsteriskCommentLine(){

        Token t[2];

        t[0] = this.getCurrentToken();

        t[1] = this.nextToken();


        if( ( t[0].getText().equals('\n') || t[0].getText().equals('\r') ) && t[1].getText().equals('*') )
            return true;


        return false;

    }

}


.....

COMMENT:  {getCharPositionInLine() == 0}?'*' ~[\r\n]*? (('\r'? '\n')| EOF) {isAsteriskCommentLine()}?;

and i cant' understand what i'm doing wrong, book was read, documentation and examples from stackoverflow branches

vhdl not allowing port mappings with ranges.

I had following vhdl, which failed to parse. I believe it is valid (it synthesizes ok)

seven_seg_led_instance : seven_seg_led port map (
     mclk=>mclk,
     an(3 downto 0)=>an(3 downto 0),
     seg(6 downto 0)=>seg(6 downto 0),
     dp=>dp
);

To support this I made the following modification to vhdl.g4

--- a/vhdl/vhdl.g4
+++ b/vhdl/vhdl.g4
@@ -696,7 +696,6 @@ formal_parameter_list

 formal_part
   : identifier
-  | identifier LPAREN explicit_range  RPAREN 
   ;

Wasn't sure how to submit a patch, so I offer it here.

Gary

missing '>>>' operation

cannot see that... or is this processed by some other means?

Java grammar is not target agnostic

The {Character.func} do not work on targets besides Java.
https://github.com/antlr/grammars-v4/blob/master/java/Java.g4#L983

Maybe something like what is documented here should be employed?
https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Python+Target

The only production code absolutely required to sit with the grammar should be semantic predicates, like: ID {$text.equals("test")}?

Unfortunately, this is not portable, but you can work around it. The trick involves:
deriving your parser from a parser you provide, such as BaseParser
implementing utility methods in this BaseParser, such as "isEqualText"
adding a "self" field to the Java/C# BaseParser, and initialize it with "this"
Thanks to the above, you should be able to rewrite the above semantic predicate as follows:
ID {$self.isEqualText($text,"test")}?

duplicate rules

annotationName
    : Identifier ('.' Identifier)*
    ;
qualifiedName
    :   Identifier ('.' Identifier)*
    ;

Python3 eats input when only queried

I am used to handle my grammar in this way:

if (ctx.someBlock() != null){
     SomeBlockContext bcx = ctx.someBlock();
}

However, if I try to do same with python:

if (p.file_input() != null){
            for (StmtContext ctx : p.file_input().stmt())
                statements.add(ctx);
        }

I get no statements, because they get used by first call to file_input()...

Support System verilog in verilog

hello,
Can you help me to add advance grammar in verilog for System verilog.

Thanks,
Avdhesh yadav

ECMAScript.g4 - problems with automatic semicolon insertion

I'm not sure what exactly is going on, but I notice that even in very simple cases, automatic semicolon insertion is not working properly using the ECMAScript grammar, and that the resulting trees are missing branches, essentially mangling follow-up lines into branch with the missing semicolon. I've created a test case gist here

The resulting output is included, but as long as java/javac is on the PATH, you can run the test case yourself by simply doing

git clone https://gist.github.com/5ff47f7652918e5a5717.git
cd 5ff47f7652918e5a5717
chmod +x run-test.bash
./run-test.bash

It will download antlr 4.5, generate the parser and use it to parse two files. One with semicolons, and one where some are missing. These are the two examples:

function Test() {
  o = {};
  o.n = "alice";
  o.n = null;
  o.n = "bob";
}

function Test() {
  o = {};
  o.n = "alice"
  o.n = null
  o.n = "bob"
}

In the second example, a lot of errors are printed and the resulting tree is essentially mangling the "null" and "bob" lines into the "alice" branch.

line 4:2 no viable alternative at input 'function Test() {\n  o = {};\n  o.n = "alice"\n  o'
line 1:16 extraneous input '{' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
line 4:2 extraneous input 'o' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
line 5:2 no viable alternative at input '.n = null\n  o'
line 5:2 extraneous input 'o' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
line 6:0 no viable alternative at input '.n = "bob"\n}'
line 6:0 mismatched input '}' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
(program (sourceElements sourceElement (sourceElement function) (sourceElement (statement (expressionStatement (expressionSequence (singleExpression (singleExpression Test) (arguments ( )))) <missing ';'>))) (sourceElement (statement (block { (statementList (statement (expressionStatement (expressionSequence (singleExpression (singleExpression o) = (expressionSequence (singleExpression (objectLiteral { }))))) ;)) (statement (expressionStatement (expressionSequence (singleExpression (singleExpression (singleExpression (singleExpression (singleExpression (singleExpression (singleExpression o) . (identifierName n)) = (expressionSequence (singleExpression (literal "alice") o))) . (identifierName n)) = (expressionSequence (singleExpression (literal null) o))) . (identifierName n)) = (expressionSequence (singleExpression (literal "bob"))))) <missing ';'>))) })))) <EOF>)

The gist also include an output of the tree as it should be (with 4 statement branches inside the block, instead of just 2).

Objective-C grammar creates a very deep tree for a single message-send expression.

Hi, I really love ANTLR, it's a great tool.

I found that a single message-send statement being parsed into a very deep tree. Obviously, there is no conditional or logical expression in this statement, so I think there is something wrong in the grammar.

[self doSomething];

 * 9                    expression: '[' - ']'
 * 10                      assignment_expression: '[' - ']'
 * 11                        conditional_expression: '[' - ']'
 * 12                          logical_or_expression: '[' - ']'
 * 13                            logical_and_expression: '[' - ']'
 * 14                              inclusive_or_expression: '[' - ']'
 * 15                                exclusive_or_expression: '[' - ']'
 * 16                                  and_expression: '[' - ']'
 * 17                                    equality_expression: '[' - ']'
 * 18                                      relational_expression: '[' - ']'
 * 19                                        shift_expression: '[' - ']'
 * 20                                          additive_expression: '[' - ']'
 * 21                                            multiplicative_expression: '[' - ']'
 * 22                                              cast_expression: '[' - ']'
 * 23                                                unary_expression: '[' - ']'
 * 24                                                  postfix_expression: '[' - ']'
 * 25                                                    primary_expression: '[' - ']'
 * 26                                                      message_expression: '[' - ']'
 * 27                                                        receiver: 'self'
 * 28                                                          expression: 'self'
 * 29                                                            assignment_expression: 'self'
 * 30                                                              conditional_expression: 'self'
 * 31                                                                logical_or_expression: 'self'
 * 32                                                                  logical_and_expression: 'self'
 * 33                                                                    inclusive_or_expression: 'self'
 * 34                                                                      exclusive_or_expression: 'self'
 * 35                                                                        and_expression: 'self'
 * 36                                                                          equality_expression: 'self'
 * 37                                                                            relational_expression: 'self'
 * 38                                                                              shift_expression: 'self'
 * 39                                                                                additive_expression: 'self'
 * 40                                                                                  multiplicative_expression: 'self'
 * 41                                                                                    cast_expression: 'self'
 * 42                                                                                      unary_expression: 'self'
 * 43                                                                                        postfix_expression: 'self'
 * 44                                                                                          primary_expression: 'self'
 * 27                                                        message_selector: 'doSomething'
 * 28                                                          selector: 'doSomething'

Thanks.

grammars-v4/c

Hi guy,
seems your c.g4 file isn't normal, it even can't recognize #include, pleaes let me know if I have any misunderstanding, thank you in advance.

Logo.g4 about print stringliteral

cmd in stringliteral can be identified,then raise an error

print "print line1:7 mission STRING at 'print'
or
print "if line1:7 mission STRING at 'if'
or
print "repeat line1:7 mission STRING at 'repeat'

Objective C grammar

The parse trees generated by the objective c grammar have very long repetitive structure with a lot of superfluous nodes before reaching a leaf inside expressions.

https://github.com/antlr/grammars-v4/blob/master/objc/ObjC.g4

Java grammar for 1.8 is missing Lambda Expression

Here is the documentation http://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html

and here is some of the relevant rules:

LambdaExpression:
     LambdaParameters -> LambdaBody

LambdaParameters:
      Identifier 
      ( [FormalParameterList] ) 
      ( InferredFormalParameterList )

LambdaBody:
      Expression 
      Block

I haven't taken the time to figure out exactly adding(& testing) these to the grammar yet, but figured I'd post about it, in case someone else can solve if faster.

Also, I haven't gone through and seen if any other java 8 changes weren't yet added either.

ECMAScript.g4 and ECMAScript.CSharpTarget.g4 use _input member, that hasn't been declared

My target is a generated code for C# and when I generated it using ECMAScript.CSharpTarget.g4, I've found that it raise compilations errors like "The name '_input' does not exist in the current context". So, the name variable '_input' nowhere declared. And I don't know what type it could be to fix it.

Java.g4 cast priority over function call

having this class

package j7;

public class SimpleTest {
    char doSomething(int i1) {
        return (char) myMethod(i1);
    }

    char myMethod(int zz) {
        return 'z';
    }
}

I produced a tree for return (char) myMethod(i1);

[910·822·813·686·445·417·414·386·294·252·217]
├─[180·910·822·813·686·445·417·414·386·294·252·217]
│ ├─ (
│ ├──[1063·180·910·822·813·686·445·417·414·386·294·252·217]
│ │  └─[589·1063·180·910·822·813·686·445·417·414·386·294·252·217]
│ │    └─ char
│ ├─ )
│ └──[1065·180·910·822·813·686·445·417·414·386·294·252·217]
│    └─[1071·1065·180·910·822·813·686·445·417·414·386·294·252·217]
│      └─ myMethod
├─ (
├──[1165·910·822·813·686·445·417·414·386·294·252·217]
│  └─[1049·1165·910·822·813·686·445·417·414·386·294·252·217]
│    └─[1071·1049·1165·910·822·813·686·445·417·414·386·294·252·217]
│      └─ i1
└─ )

however looks like the tree have to be as listed below:

├─ (
├─ char
└─ )
   └─ myMethod
      ├─ (
      ├─ i1
      └─ )

Am I wrong or missed something?

Slight error with Smalltalk grammar - I can't fix

When running Antlrwork with the Smalltalk.g4 grammar found here there is a parse error when parsing multiple expressions containing unary messages.

Here is the valid Smalltalk:

self class initialize.
self class update.

If the above was reduced to just the first line 'self class initialize.' then all parses without error. However, add the second line in and it fails to parse. Error below:

line 2:0 no viable alternative at input '\nself'

These are the tokens that are being found:

Arguments: [Smalltalk, script, -encoding, UTF-8, -tokens, -tree, -gui, /home/jamesl/dev/antlr/temp.st]
[@0,0:3='self',<22>,1:0]
[@1,4:4=' ',<31>,1:4]
[@2,5:9='class',<27>,1:5]
[@3,10:10=' ',<31>,1:10]
[@4,11:20='initialize',<27>,1:11]
[@5,21:21='.',<2>,1:21]
[@6,22:22='\n',<31>,1:22]
[@7,23:26='self',<22>,2:0]
[@8,27:27=' ',<31>,2:4]
[@9,28:32='class',<27>,2:5]
[@10,33:33=' ',<31>,2:10]
[@11,34:39='update',<27>,2:11]
[@12,40:40='.',<2>,2:17]
[@13,41:41='\n',<31>,2:18]
[@14,42:41='',<-1>,3:19]

(script (sequence ws ws (statements (expressions (expression (binarySend (unarySend (operand (literal (parsetimeLiteral (pseudoVariable self)))) (ws ) (unaryTail (unaryMessage ws (unarySelector class) ) ws (unaryTail (unaryMessage ws (unarySelector initialize) .) ws (ws \n self)) ws)))))) (ws )) class update . \n)

I was expecting the expressionS and expressionList rules to cater for this input.
It is almost like whitespace is gobbling up the \n and self tokens rather than returning from the unaryTail rule.

Please help.

vhdl grammar allows invalid identifiers

Identifiers in VHDL are restricted with respect to underscores as follows:

Underscores are significant characters in an identifier 
and basic identifiers may contain underscores, 
but it is not allowed to place an underscore as a first or last character of an identifier. 
Moreover, two underscores side by side are not allowed as well. 
Underscores are significant characters in an identifier.

source

The current grammar does not reflect this behaviour. It allows two underscores in a row or as the last character.

Error when last line is a line comment

Hi,
if the last line of a Java source file is something like:
// my comment (no new line added)

it fails, I guess it's because LINE_COMMENT makes \n mandatory so there is not a match.

The error is:
line 187:0 extraneous input '/' expecting {, 'interface', 'abstract', 'strictfp', '@', 'class', 'public', 'private', 'final', 'static', 'protected', ';', 'enum'}

Thanks.

Add coffeescript and javascript grammar

They're missing. I just discovered ANTLR so I have no idea what the difficulty for this task is. I'm also not sure if ANTLR has any built-in barriers that would make parsing these languages impossible.

@parrt , is this a feasible task?

Where would I start looking to write an ANTLR4 grammar for these?

NullPointerException in Python3 grammar

Hi,
I recognized a little bug in the Python3 grammar.

Code:

public static void main(String[] args) {
    String input = "def func():\n  if";

    ANTLRInputStream source = new ANTLRInputStream(input);
    Python3Lexer lexer = new Python3Lexer(source);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    Python3Parser parser = new Python3Parser(tokens);

    Python3Parser.File_inputContext tree = parser.file_input();
}

Expected:

line 0:-1 no viable alternative at input '\n'

Got:

line 0:-1 no viable alternative at input '\n'

Exception in thread "main" java.lang.NullPointerException
    at org.antlr.v4.runtime.DefaultErrorStrategy.getMissingSymbol(DefaultErrorStrategy.java:591)
    at org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:477)
    at org.antlr.v4.runtime.Parser.match(Parser.java:223)
    at python.Python3Parser.if_stmt(Python3Parser.java:2959)
    at python.Python3Parser.compound_stmt(Python3Parser.java:2857)
    at python.Python3Parser.stmt(Python3Parser.java:1253)
    at python.Python3Parser.suite(Python3Parser.java:3488)
    at python.Python3Parser.funcdef(Python3Parser.java:600)
    at python.Python3Parser.compound_stmt(Python3Parser.java:2887)
    at python.Python3Parser.stmt(Python3Parser.java:1253)
    at python.Python3Parser.file_input(Python3Parser.java:296)
    at python.PythonMain.main(PythonMain.java:24)

Best regards,
Michael

Errors found with -Dlanguage=Python3

java -mx512m -jar /projects/hnd_tools/java/lib/antlr4.4/antlr-4.4-complete.jar -Dlanguage=Python3 -no-listener -visitor src/grammar/Verilog2001.g4 -o generated
error(134): Verilog2001.g4:360:0: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:242:73: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:244:72: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:246:72: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:248:52: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:385:16: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:190:29: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:203:28: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:210:38: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:214:58: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:215:58: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:217:39: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:218:38: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:219:33: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:232:40: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:409:72: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:431:46: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:471:51: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:525:48: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:1352:50: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:1397:48: symbol range conflicts with generated code in target language or runtime
make: *** [compile] Error 1

Grammar's ambiguities

Hi, i have a question about grammar's ambiguities.

I have a rule for comments that should start from '*' of beginning line and this symbol is used for multiply too.

Rule for multiply:

exp: exp '*' exp
| ID ;

Rule for asterisk comment:
ASTERISK_COMMENT
: '\n' '' ~[\r\n]? (('\r'? '\n')|EOF) -> skip
;

ID : [a-zA-Z]+;

and i was adding {getCharPositionInLine() == 0}? to finding start of the line before '\n', cause we can have comment at first position of file
it does not work for me too

i need any advices

But i can't understand why IDEA (Eclipse plugin too) does not produce a right condition for this block and for my code into @lexer:members{} part, look:

public boolean isFirstPosition(){

    Token tkn = super.nextToken();

    if( 0 == tkn.getCharPositionInLine() )
        if( tkn.getText().equals('*') )
            return true;

    return false;

}

and rule for asterisk_comment
ASTERISK_COMMENT
: '\n' '' ~[\r\n]? (('\r'? '\n')|EOF) {isFirstPosition()}? -> skip
;

may be i need testing via terminal?

Java8 generated parser is two orders of magnitude slower then old Java parser

The new Java8 grammar produces a parser that is two orders of magnitude slower than old Java 7 grammar.

This is an example test run, parsing all files in any large java project:

package com.antlr.test.java;

import com.codescore.grammars.Java7Lexer;
import com.codescore.grammars.Java7Parser;
import com.codescore.grammars.Java8Lexer;
import com.codescore.grammars.Java8Parser;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.junit.Test;

import java.io.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class JavaScanDirRawTest {

    // A path to a large java project
    String path = "/path/to/large/java/project";

    @Test
    public void scanDirsJava7() throws IOException {

        InputStream fileStream;

        File rootDir = new File(path);
        List<File> javaFiles = getAllFiles(rootDir, "java");
        int count = 0;
        System.out.println("files:" + javaFiles.size());
        Long start = System.currentTimeMillis();
        for (File javaFile : javaFiles) {

            fileStream = new FileInputStream(javaFile.getPath());

            Java7Lexer lexer = new Java7Lexer(new ANTLRInputStream(fileStream));
            CommonTokenStream tokenStream = new CommonTokenStream(lexer);
            Java7Parser parser = new Java7Parser(tokenStream);
            Java7Parser.CompilationUnitContext compilationUnit = parser.compilationUnit();

            count++;
            if (count % 10 == 0)
                System.out.println("file: " + count + "/" + javaFiles.size() + "  avg_speed: " + 1000 * count / (System.currentTimeMillis() - start) + " files/sec");
        }
    }

    @Test
    public void scanDirsJava8() throws IOException {

        InputStream fileStream;

        File rootDir = new File(path);
        List<File> javaFiles = getAllFiles(rootDir, "java");
        int count = 0;
        System.out.println("files:" + javaFiles.size());
        Long start = System.currentTimeMillis();
        for (File javaFile : javaFiles) {

            fileStream = new FileInputStream(javaFile.getPath());

            Java8Lexer lexer = new Java8Lexer(new ANTLRInputStream(fileStream));
            CommonTokenStream tokenStream = new CommonTokenStream(lexer);
            Java8Parser parser = new Java8Parser(tokenStream);
            Java8Parser.CompilationUnitContext compilationUnit = parser.compilationUnit();

            count++;
            if (count % 10 == 0)
                System.out.println("file: " + count + "/" + javaFiles.size() + "  avg_speed: " + 1000 * count / (System.currentTimeMillis() - start) + " files/sec");
        }
    }

    private static List<File> getAllFiles(File rootDir, String fileExtension) {
        List<File> javaFiles = new ArrayList<>();
        collectFilesInDirectoryTree(rootDir, javaFiles, fileExtension);
        return javaFiles;
    }

    private static void collectFilesInDirectoryTree(File directory, List<File> fileList, final String fileExtension) {
        File[] files = directory.listFiles(new FilenameFilter() {
            @Override
            public boolean accept(File dir, String name) {
                return name.endsWith("." + fileExtension);
            }
        });
        fileList.addAll(Arrays.asList(files));

        File[] dirs = directory.listFiles();
        for (File dir : dirs) {
            if (dir.isDirectory()) {
                collectFilesInDirectoryTree(dir, fileList, fileExtension);
            }
        }
    }
}