antlr / grammars-v4 Goto Github PK

Grammars written for ANTLR v4; expectation that the grammars are free of actions.

License: MIT License

ANTLR 71.80% Shell 1.61% Java 14.10% Makefile 0.02% Python 5.70% Swift 0.10% C# 2.26% JavaScript 0.86% Lex 0.04% Yacc 0.05% Go 0.39% C++ 0.91% TypeScript 0.60% C 0.07% sed 0.01% Batchfile 0.01% PowerShell 0.97% Dart 0.26% CMake 0.17% PHP 0.08%

hacktoberfest

grammars-v4's Issues

Python3 eats input when only queried

I am used to handle my grammar in this way:

if (ctx.someBlock() != null){
     SomeBlockContext bcx = ctx.someBlock();
}

However, if I try to do same with python:

if (p.file_input() != null){
            for (StmtContext ctx : p.file_input().stmt())
                statements.add(ctx);
        }

I get no statements, because they get used by first call to file_input()...

duplicate rules

annotationName
    : Identifier ('.' Identifier)*
    ;
qualifiedName
    :   Identifier ('.' Identifier)*
    ;

Error when last line is a line comment

Hi,
if the last line of a Java source file is something like:
// my comment (no new line added)

it fails, I guess it's because LINE_COMMENT makes \n mandatory so there is not a match.

The error is:
line 187:0 extraneous input '/' expecting {, 'interface', 'abstract', 'strictfp', '@', 'class', 'public', 'private', 'final', 'static', 'protected', ';', 'enum'}

Thanks.

Python3 Grammar: Unexpected indent not recognized

input:

  def func():
  return 42

parser reports:
no error
expected:
An error message that indent before function definition is unexpected/extraneous.

Objective C grammar

The parse trees generated by the objective c grammar have very long repetitive structure with a lot of superfluous nodes before reaching a leaf inside expressions.

https://github.com/antlr/grammars-v4/blob/master/objc/ObjC.g4

Support System verilog in verilog

hello,
Can you help me to add advance grammar in verilog for System verilog.

Thanks,
Avdhesh yadav

Is the Java7 Grammar LL(1)?

I know that Antl4 can take LL(k) grammars. I'm just wondering if the Java7 grammar here is using that feature. I'm looking for an LL(1) grammar for Java 7 and would like to base it on something rather than building the whole thing on my own.

Java.g4 cast priority over function call

having this class

package j7;

public class SimpleTest {
    char doSomething(int i1) {
        return (char) myMethod(i1);
    }

    char myMethod(int zz) {
        return 'z';
    }
}

I produced a tree for return (char) myMethod(i1);

[910·822·813·686·445·417·414·386·294·252·217]
├─[180·910·822·813·686·445·417·414·386·294·252·217]
│ ├─ (
│ ├──[1063·180·910·822·813·686·445·417·414·386·294·252·217]
│ │  └─[589·1063·180·910·822·813·686·445·417·414·386·294·252·217]
│ │    └─ char
│ ├─ )
│ └──[1065·180·910·822·813·686·445·417·414·386·294·252·217]
│    └─[1071·1065·180·910·822·813·686·445·417·414·386·294·252·217]
│      └─ myMethod
├─ (
├──[1165·910·822·813·686·445·417·414·386·294·252·217]
│  └─[1049·1165·910·822·813·686·445·417·414·386·294·252·217]
│    └─[1071·1049·1165·910·822·813·686·445·417·414·386·294·252·217]
│      └─ i1
└─ )

however looks like the tree have to be as listed below:

├─ (
├─ char
└─ )
   └─ myMethod
      ├─ (
      ├─ i1
      └─ )

Am I wrong or missed something?

EMCAScript.g4 parse "true || false && false" as "(true || false) && false"

Related to #106.
EMCAScript.g4 parses true || false && false as (true || false) && false.
This should be true || (false && false).

Also bit operators &, ^ and | are wrong.
1 | 0 & 0 should be 1 | (0 & 0).

too strict on explicit generic invocations

I found the grammar to be too strict on generic invocations, like

Collections.<String[]>emptyList()

Seems like explicitGenericInvocation rule does not allow an array to be a generic argument. I fixed this to meet my needs by changing the rule to be like this:

explicitGenericInvocation
: typeArguments Identifier arguments
;

but there may be a tighter way.

Errors found with -Dlanguage=Python3

java -mx512m -jar /projects/hnd_tools/java/lib/antlr4.4/antlr-4.4-complete.jar -Dlanguage=Python3 -no-listener -visitor src/grammar/Verilog2001.g4 -o generated
error(134): Verilog2001.g4:360:0: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:242:73: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:244:72: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:246:72: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:248:52: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:385:16: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:190:29: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:203:28: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:210:38: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:214:58: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:215:58: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:217:39: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:218:38: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:219:33: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:232:40: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:409:72: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:431:46: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:471:51: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:525:48: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:1352:50: symbol range conflicts with generated code in target language or runtime
error(134): Verilog2001.g4:1397:48: symbol range conflicts with generated code in target language or runtime
make: *** [compile] Error 1

I'm getting an error with C# Preprocessors

Hi,

I'm getting that error when using preprocessors in the csharp files i'm parsing:

Result StandardError:
line 20:0 missing '}' at '\t#if Testing\r\n'
line 28:0 extraneous input '}' expecting {, 'abstract', 'class', 'delegate', 'enum', 'extern', 'interface', 'internal', 'namespace', 'new', 'override', 'partial', 'private', 'protected', 'public', 'readonly', 'sealed', 'static', 'struct', 'unsafe', 'virtual', 'volatile', '['}

this is the file i'm using to test:
namespace test.test1
{
public class Testing
{
static void Main (string[] args){
#if Testing
String x = "abc";
#else
String y = "xyz";
#endif
String a, b;
}
}
}

vhdl not allowing port mappings with ranges.

I had following vhdl, which failed to parse. I believe it is valid (it synthesizes ok)

seven_seg_led_instance : seven_seg_led port map (
     mclk=>mclk,
     an(3 downto 0)=>an(3 downto 0),
     seg(6 downto 0)=>seg(6 downto 0),
     dp=>dp
);

To support this I made the following modification to vhdl.g4

--- a/vhdl/vhdl.g4
+++ b/vhdl/vhdl.g4
@@ -696,7 +696,6 @@ formal_parameter_list

 formal_part
   : identifier
-  | identifier LPAREN explicit_range  RPAREN 
   ;

Wasn't sure how to submit a patch, so I offer it here.

Gary

scss, doesn't recognize identifier before block under some situation

it now recognize

#id {
color:blue
}

but not

#id{  // no space between id and {
color:blue
}

grammars-v4/c

Hi guy,
seems your c.g4 file isn't normal, it even can't recognize #include, pleaes let me know if I have any misunderstanding, thank you in advance.

PHP grammar is missing

It can be found here: https://github.com/teverett/phpGrammar

ECMAScript.g4 parse 1 - 2 + 3 as 1 - (2 + 3)

ECMAScript.g4 parse 1 - 2 + 3 as 1 - (2 + 3). This should be (1 - 2) + 3.

Add a license to the objective-c grammar

The objective-c grammar has no license.

Java grammar for 1.8 is missing Lambda Expression

Here is the documentation http://docs.oracle.com/javase/specs/jls/se8/html/jls-19.html

and here is some of the relevant rules:

LambdaExpression:
     LambdaParameters -> LambdaBody

LambdaParameters:
      Identifier 
      ( [FormalParameterList] ) 
      ( InferredFormalParameterList )

LambdaBody:
      Expression 
      Block

I haven't taken the time to figure out exactly adding(& testing) these to the grammar yet, but figured I'd post about it, in case someone else can solve if faster.

Also, I haven't gone through and seen if any other java 8 changes weren't yet added either.

Java grammar is not target agnostic

The {Character.func} do not work on targets besides Java.
https://github.com/antlr/grammars-v4/blob/master/java/Java.g4#L983

Maybe something like what is documented here should be employed?
https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Python+Target

The only production code absolutely required to sit with the grammar should be semantic predicates, like: ID {$text.equals("test")}?

Unfortunately, this is not portable, but you can work around it. The trick involves:
deriving your parser from a parser you provide, such as BaseParser
implementing utility methods in this BaseParser, such as "isEqualText"
adding a "self" field to the Java/C# BaseParser, and initialize it with "this"
Thanks to the above, you should be able to rewrite the above semantic predicate as follows:
ID {$self.isEqualText($text,"test")}?

Objective-C grammar creates a very deep tree for a single message-send expression.

Hi, I really love ANTLR, it's a great tool.

I found that a single message-send statement being parsed into a very deep tree. Obviously, there is no conditional or logical expression in this statement, so I think there is something wrong in the grammar.

[self doSomething];

 * 9                    expression: '[' - ']'
 * 10                      assignment_expression: '[' - ']'
 * 11                        conditional_expression: '[' - ']'
 * 12                          logical_or_expression: '[' - ']'
 * 13                            logical_and_expression: '[' - ']'
 * 14                              inclusive_or_expression: '[' - ']'
 * 15                                exclusive_or_expression: '[' - ']'
 * 16                                  and_expression: '[' - ']'
 * 17                                    equality_expression: '[' - ']'
 * 18                                      relational_expression: '[' - ']'
 * 19                                        shift_expression: '[' - ']'
 * 20                                          additive_expression: '[' - ']'
 * 21                                            multiplicative_expression: '[' - ']'
 * 22                                              cast_expression: '[' - ']'
 * 23                                                unary_expression: '[' - ']'
 * 24                                                  postfix_expression: '[' - ']'
 * 25                                                    primary_expression: '[' - ']'
 * 26                                                      message_expression: '[' - ']'
 * 27                                                        receiver: 'self'
 * 28                                                          expression: 'self'
 * 29                                                            assignment_expression: 'self'
 * 30                                                              conditional_expression: 'self'
 * 31                                                                logical_or_expression: 'self'
 * 32                                                                  logical_and_expression: 'self'
 * 33                                                                    inclusive_or_expression: 'self'
 * 34                                                                      exclusive_or_expression: 'self'
 * 35                                                                        and_expression: 'self'
 * 36                                                                          equality_expression: 'self'
 * 37                                                                            relational_expression: 'self'
 * 38                                                                              shift_expression: 'self'
 * 39                                                                                additive_expression: 'self'
 * 40                                                                                  multiplicative_expression: 'self'
 * 41                                                                                    cast_expression: 'self'
 * 42                                                                                      unary_expression: 'self'
 * 43                                                                                        postfix_expression: 'self'
 * 44                                                                                          primary_expression: 'self'
 * 27                                                        message_selector: 'doSomething'
 * 28                                                          selector: 'doSomething'

Thanks.

Swift String_literal does not permit empty string (i.e. "")

Slight error with Smalltalk grammar - I can't fix

When running Antlrwork with the Smalltalk.g4 grammar found here there is a parse error when parsing multiple expressions containing unary messages.

Here is the valid Smalltalk:

self class initialize.
self class update.

If the above was reduced to just the first line 'self class initialize.' then all parses without error. However, add the second line in and it fails to parse. Error below:

line 2:0 no viable alternative at input '\nself'

These are the tokens that are being found:

Arguments: [Smalltalk, script, -encoding, UTF-8, -tokens, -tree, -gui, /home/jamesl/dev/antlr/temp.st]
[@0,0:3='self',<22>,1:0]
[@1,4:4=' ',<31>,1:4]
[@2,5:9='class',<27>,1:5]
[@3,10:10=' ',<31>,1:10]
[@4,11:20='initialize',<27>,1:11]
[@5,21:21='.',<2>,1:21]
[@6,22:22='\n',<31>,1:22]
[@7,23:26='self',<22>,2:0]
[@8,27:27=' ',<31>,2:4]
[@9,28:32='class',<27>,2:5]
[@10,33:33=' ',<31>,2:10]
[@11,34:39='update',<27>,2:11]
[@12,40:40='.',<2>,2:17]
[@13,41:41='\n',<31>,2:18]
[@14,42:41='',<-1>,3:19]

(script (sequence ws ws (statements (expressions (expression (binarySend (unarySend (operand (literal (parsetimeLiteral (pseudoVariable self)))) (ws ) (unaryTail (unaryMessage ws (unarySelector class) ) ws (unaryTail (unaryMessage ws (unarySelector initialize) .) ws (ws \n self)) ws)))))) (ws )) class update . \n)

I was expecting the expressionS and expressionList rules to cater for this input.
It is almost like whitespace is gobbling up the \n and self tokens rather than returning from the unaryTail rule.

Please help.

Is someone mantaining a JavaParser based on the Java grammar presented here?

if so I could be interested to help :)

Java8 generated parser is two orders of magnitude slower then old Java parser

The new Java8 grammar produces a parser that is two orders of magnitude slower than old Java 7 grammar.

This is an example test run, parsing all files in any large java project:

package com.antlr.test.java;

import com.codescore.grammars.Java7Lexer;
import com.codescore.grammars.Java7Parser;
import com.codescore.grammars.Java8Lexer;
import com.codescore.grammars.Java8Parser;
import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.junit.Test;

import java.io.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class JavaScanDirRawTest {

    // A path to a large java project
    String path = "/path/to/large/java/project";

    @Test
    public void scanDirsJava7() throws IOException {

        InputStream fileStream;

        File rootDir = new File(path);
        List<File> javaFiles = getAllFiles(rootDir, "java");
        int count = 0;
        System.out.println("files:" + javaFiles.size());
        Long start = System.currentTimeMillis();
        for (File javaFile : javaFiles) {

            fileStream = new FileInputStream(javaFile.getPath());

            Java7Lexer lexer = new Java7Lexer(new ANTLRInputStream(fileStream));
            CommonTokenStream tokenStream = new CommonTokenStream(lexer);
            Java7Parser parser = new Java7Parser(tokenStream);
            Java7Parser.CompilationUnitContext compilationUnit = parser.compilationUnit();

            count++;
            if (count % 10 == 0)
                System.out.println("file: " + count + "/" + javaFiles.size() + "  avg_speed: " + 1000 * count / (System.currentTimeMillis() - start) + " files/sec");
        }
    }

    @Test
    public void scanDirsJava8() throws IOException {

        InputStream fileStream;

        File rootDir = new File(path);
        List<File> javaFiles = getAllFiles(rootDir, "java");
        int count = 0;
        System.out.println("files:" + javaFiles.size());
        Long start = System.currentTimeMillis();
        for (File javaFile : javaFiles) {

            fileStream = new FileInputStream(javaFile.getPath());

            Java8Lexer lexer = new Java8Lexer(new ANTLRInputStream(fileStream));
            CommonTokenStream tokenStream = new CommonTokenStream(lexer);
            Java8Parser parser = new Java8Parser(tokenStream);
            Java8Parser.CompilationUnitContext compilationUnit = parser.compilationUnit();

            count++;
            if (count % 10 == 0)
                System.out.println("file: " + count + "/" + javaFiles.size() + "  avg_speed: " + 1000 * count / (System.currentTimeMillis() - start) + " files/sec");
        }
    }

    private static List<File> getAllFiles(File rootDir, String fileExtension) {
        List<File> javaFiles = new ArrayList<>();
        collectFilesInDirectoryTree(rootDir, javaFiles, fileExtension);
        return javaFiles;
    }

    private static void collectFilesInDirectoryTree(File directory, List<File> fileList, final String fileExtension) {
        File[] files = directory.listFiles(new FilenameFilter() {
            @Override
            public boolean accept(File dir, String name) {
                return name.endsWith("." + fileExtension);
            }
        });
        fileList.addAll(Arrays.asList(files));

        File[] dirs = directory.listFiles();
        for (File dir : dirs) {
            if (dir.isDirectory()) {
                collectFilesInDirectoryTree(dir, fileList, fileExtension);
            }
        }
    }
}

vhdl (lowercase) is not an acceptable java class

there is a line which specifies the grammar name:

grammar vhdl

Could you please rename vhdl to something such as Vhdl or VHDL so that it can be compatible with java?
Java requires that at least the first letter needs to be a capital.

Thanks :)

String in Java grammar

Java's grammar contains this rule:

fragment
StringCharacters
    :   StringCharacter+
    ;

Shouldn't it be non-greedy?

fragment
StringCharacters
    :   StringCharacter+?
    ;

scss, the gradle script can't confine with latest gradle

gradle test can't continue.

I'll make a pull request.

Wrong parsing with C.g4 (2)

My StackOverflow question highlights a possible bug: http://stackoverflow.com/questions/24562551/wrong-parsing-with-antlr4s-c-g4.

ECMAScript.g4 and ECMAScript.CSharpTarget.g4 use _input member, that hasn't been declared

My target is a generated code for C# and when I generated it using ECMAScript.CSharpTarget.g4, I've found that it raise compilations errors like "The name '_input' does not exist in the current context". So, the name variable '_input' nowhere declared. And I don't know what type it could be to fix it.

IDEA ANLTR4 plugin and lexer and parser actions

Does IDEA plugin support actions for lexer and parser ( @lexer::members, @parser::members )?

Cause why i'm askin my code isn't work: ( my question on stackoverflow has no answers )

@members{

    public boolean isAsteriskCommentLine(){

        Token t[2];

        t[0] = this.getCurrentToken();

        t[1] = this.nextToken();


        if( ( t[0].getText().equals('\n') || t[0].getText().equals('\r') ) && t[1].getText().equals('*') )
            return true;


        return false;

    }

}


.....

COMMENT:  {getCharPositionInLine() == 0}?'*' ~[\r\n]*? (('\r'? '\n')| EOF) {isAsteriskCommentLine()}?;

and i cant' understand what i'm doing wrong, book was read, documentation and examples from stackoverflow branches

missing '>>>' operation

cannot see that... or is this processed by some other means?

vhdl grammar allows invalid identifiers

Identifiers in VHDL are restricted with respect to underscores as follows:

Underscores are significant characters in an identifier 
and basic identifiers may contain underscores, 
but it is not allowed to place an underscore as a first or last character of an identifier. 
Moreover, two underscores side by side are not allowed as well. 
Underscores are significant characters in an identifier.

source

The current grammar does not reflect this behaviour. It allows two underscores in a row or as the last character.

Swift grammar taking too long to parse

Hi,

It seems that the Swift parser generated by ANTLR is taking too long to parse using the GameScene.swift test file.

I'm using this on an Android environment wherein the ANTLR runtime source has been modified to remove all swing components (since they are not supported on Android) and @nullable @NotNull notations. No further code were modified or added yet from the one generated by ANTLR.

Tested using the Java grammar provided here and parsing is significantly faster.

Python3 Grammar: Comment at end of input causes "no viable alternative"

input:

def func():
  return 42
  # comment

parser reports error:

line 3:11 no viable alternative at input '<EOF>'

expected:
no error

"error(8): scala.g4:34:8: grammar name Scala and file name scala.g4 differ"

$ antlr4 scala.g4
error(8): scala.g4:34:8: grammar name Scala and file name scala.g4 differ

I believe this should just be a matter of renaming scala.g4 to Scala.g4.

Python grammar for ANTLR4

Hello,

I am very new to ANTLR and to whole parsing thing itself. I was wondering if I could get some help in converting the Python grammar file from ANTLRv3 to ANTLRv4.

I am using the grammar created by Ales Teska as a starting point which was originally created for ANTLR3. I have tried to convert it into ANTLR4. ANTLR4 can generate lexer and parser for the new grammar, but it is not able to parse a python code correct. Any help would be really appreciated.

Here is the grammar file: https://gist.github.com/ajinkyakulkarni/ada5ec1792d25fc1264e

Parse error with the attached Java file

In ANTLRWorks 2, using the Java.g4 grammar with the code below causes

line 18:21 extraneous input 'Main' expecting {'&', '', '[', '<', '--', '!=', '%', '=', '=', '|=', '|', '-=', ',', '-', '(', '&=', '?', '+=', '^=', '++', '^', '.', '+', ';', '&&', '||', '>', '%=', '/=', '/', '==', 'instanceof'}
line 18:30 extraneous input '";\r\n\t\tANTLRInputStream input = new ANTLRInputStream(Directory+filename); \r\n\t\tJavaLexer lexer = new JavaLexer(input);\r\n\t\tCommonTokenStream tokens = new CommonTokenStream(lexer);\r\n\t\tJavaParser parser = new JavaParser(tokens);\r\n\t\tParseTree tree = parser.compilationUnit(); // parse\r\n\t\tSystem.out.println(tree.toStringTree());\r\n\t\tParseTreeWalker walker = new ParseTreeWalker(); // create standard walker\r\n\t\tExtractInterfaceListener extractor = new ExtractInterfaceListener(parser, "' expecting {'&', '', '[', '<', '--', '!=', '%', '=', '=', '|=', '|', '-=', ',', '-', '(', '&=', '?', '+=', '^=', '++', '^', '.', '+', ';', '&&', '||', '>', '%=', '/=', '/', '==', 'instanceof'}
line 26:77 mismatched input 'c' expecting {'&', '', '[', '<', '--', '!=', '%', '=', '=', '|=', '|', '-=', '-', '(', '&=', '?', '+=', '^=', '++', '^', '.', '+', ';', '&&', '||', '>', '%=', '/=', '/', '==', 'instanceof'}
line 31:0 no viable alternative at input 'temp'

package cx.ath.journeyman.JavaTranslator;

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.ParseTreeWalker;

import cx.ath.journeyman.JavaTranslator.generated.JavaLexer;
import cx.ath.journeyman.JavaTranslator.generated.JavaParser;

public class JavaTranslator {

/**
 * @param args
 */
public static void main(String[] args) {
    String Directory = "D:\\blah\\";
    String filename = "Main.java";
    ANTLRInputStream input = new ANTLRInputStream(Directory+filename); 
    JavaLexer lexer = new JavaLexer(input);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    JavaParser parser = new JavaParser(tokens);
    ParseTree tree = parser.compilationUnit(); // parse
    System.out.println(tree.toStringTree());
    ParseTreeWalker walker = new ParseTreeWalker(); // create standard walker
    ExtractInterfaceListener extractor = new ExtractInterfaceListener(parser, "c:\\temp\\"+filename);
    walker.walk(extractor, tree); // initiate walk of tree with listener
}

}

Logo.g4 about print stringliteral

cmd in stringliteral can be identified,then raise an error

print "print line1:7 mission STRING at 'print'
or
print "if line1:7 mission STRING at 'if'
or
print "repeat line1:7 mission STRING at 'repeat'

Java.g4 repeated modifier ?

Is it intentional that the grammar accepts constructs such as

public public void myMethod() { }

where a modifier is repeated?

NullPointerException in Python3 grammar

Hi,
I recognized a little bug in the Python3 grammar.

Code:

public static void main(String[] args) {
    String input = "def func():\n  if";

    ANTLRInputStream source = new ANTLRInputStream(input);
    Python3Lexer lexer = new Python3Lexer(source);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    Python3Parser parser = new Python3Parser(tokens);

    Python3Parser.File_inputContext tree = parser.file_input();
}

Expected:

line 0:-1 no viable alternative at input '\n'

Got:

line 0:-1 no viable alternative at input '\n'

Exception in thread "main" java.lang.NullPointerException
    at org.antlr.v4.runtime.DefaultErrorStrategy.getMissingSymbol(DefaultErrorStrategy.java:591)
    at org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:477)
    at org.antlr.v4.runtime.Parser.match(Parser.java:223)
    at python.Python3Parser.if_stmt(Python3Parser.java:2959)
    at python.Python3Parser.compound_stmt(Python3Parser.java:2857)
    at python.Python3Parser.stmt(Python3Parser.java:1253)
    at python.Python3Parser.suite(Python3Parser.java:3488)
    at python.Python3Parser.funcdef(Python3Parser.java:600)
    at python.Python3Parser.compound_stmt(Python3Parser.java:2887)
    at python.Python3Parser.stmt(Python3Parser.java:1253)
    at python.Python3Parser.file_input(Python3Parser.java:296)
    at python.PythonMain.main(PythonMain.java:24)

Best regards,
Michael

Wrong parsing with C.g4

C.g4 interprets code int a1; as declarationSpecfiers, i.e., the tokens int and a1 are interpreted as declarationSpecifier. I expect a1 to be interpreted as a part of initDeclaratorList. I fixed this by changing the declaration rule to use initDeclarationList instead of initDeclarationList?. But I'm not sure if this breaks something else.

provide less css grammar

I'm working on it.

Code too large for try statement

I have a big rule for keywords - over 817 keywords, and when i compiled *java files Parser file crashed:

error: code too large for try statement
catch (RecognitionException re) {
^

I found a solution to my code when identifier could be a keyword and have created a rule, example :
identifier: [a-zA-Z]+ | keyword;

And when i looked into parser source i have saw that all keywords were inserted into code 'as is', what's wrong with my approach?

Add coffeescript and javascript grammar

They're missing. I just discovered ANTLR so I have no idea what the difficulty for this task is. I'm also not sure if ANTLR has any built-in barriers that would make parsing these languages impossible.

@parrt , is this a feasible task?

Where would I start looking to write an ANTLR4 grammar for these?

Java.g4: Arrays are not being recognized

Run this script:

#!/bin/bash

LANG=Java
rm -f ${LANG}*.java ${LANG}*.class ${LANG}*.tokens
antlr4 ${LANG}.g4
javac -classpath /usr/local/Cellar/antlr/4.4/antlr-4.4-complete.jar:. ${LANG}*.java

cat <<EOF | grun ${LANG} compilationUnit -gui
public class Demo {
    public int[] intArray;
}
EOF

Note that [] of int[] are not recognized as array literals:

In comparison this is the tree generated by Java8.g4:

ECMAScript.g4 - problems with automatic semicolon insertion

I'm not sure what exactly is going on, but I notice that even in very simple cases, automatic semicolon insertion is not working properly using the ECMAScript grammar, and that the resulting trees are missing branches, essentially mangling follow-up lines into branch with the missing semicolon. I've created a test case gist here

The resulting output is included, but as long as java/javac is on the PATH, you can run the test case yourself by simply doing

git clone https://gist.github.com/5ff47f7652918e5a5717.git
cd 5ff47f7652918e5a5717
chmod +x run-test.bash
./run-test.bash

It will download antlr 4.5, generate the parser and use it to parse two files. One with semicolons, and one where some are missing. These are the two examples:

function Test() {
  o = {};
  o.n = "alice";
  o.n = null;
  o.n = "bob";
}

function Test() {
  o = {};
  o.n = "alice"
  o.n = null
  o.n = "bob"
}

In the second example, a lot of errors are printed and the resulting tree is essentially mangling the "null" and "bob" lines into the "alice" branch.

line 4:2 no viable alternative at input 'function Test() {\n  o = {};\n  o.n = "alice"\n  o'
line 1:16 extraneous input '{' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
line 4:2 extraneous input 'o' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
line 5:2 no viable alternative at input '.n = null\n  o'
line 5:2 extraneous input 'o' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
line 6:0 no viable alternative at input '.n = "bob"\n}'
line 6:0 mismatched input '}' expecting {'[', '(', ';', ',', '=', '?', '.', '++', '--', '+', '-', '*', '/', '%', '>>', '<<', '>>>', '<', '>', '<=', '>=', '==', '!=', '===', '!==', '&', '^', '|', '&&', '||', '*=', '/=', '%=', '+=', '-=', '<<=', '>>=', '>>>=', '&=', '^=', '|=', 'instanceof', 'in'}
(program (sourceElements sourceElement (sourceElement function) (sourceElement (statement (expressionStatement (expressionSequence (singleExpression (singleExpression Test) (arguments ( )))) <missing ';'>))) (sourceElement (statement (block { (statementList (statement (expressionStatement (expressionSequence (singleExpression (singleExpression o) = (expressionSequence (singleExpression (objectLiteral { }))))) ;)) (statement (expressionStatement (expressionSequence (singleExpression (singleExpression (singleExpression (singleExpression (singleExpression (singleExpression (singleExpression o) . (identifierName n)) = (expressionSequence (singleExpression (literal "alice") o))) . (identifierName n)) = (expressionSequence (singleExpression (literal null) o))) . (identifierName n)) = (expressionSequence (singleExpression (literal "bob"))))) <missing ';'>))) })))) <EOF>)

The gist also include an output of the tree as it should be (with 4 statement branches inside the block, instead of just 2).

How to detect null character in string?

I am writing a lexer using eclipse IDE. I want to check if a string contains "null character".
Previously when i tried to write the same in Flex i used "\x00" to detect null character. I tried the same but it did not work.
Currently for string constants i am using the following rule:
STR_CONST : '"'
( ''
( 'n' {buf.append('\n');}
| 't' {buf.append('\t');}
| 'b' {buf.append('\b');}
| 'f' {buf.append('\f');}
| 'r' {buf.append('\r');}
| '"' {buf.append('"');}
| ''' {buf.append(''');}
| '' {buf.append('');}
)
| ~(''|'"') {buf.append((char)_input.LA(-1));}
)*
'"'

Possible error in Java.g4

I found that the rule "typeName" was never used anywhere in the grammar. Therefore, it would be impossible to parse a statement like this:

java.lang.Object x = 1; //auto-boxing?

BTW, when antlr parse this, what's the depth of look ahead? because at syntax level, it can't distinguish a qualified name from a chained attribute access when "java.lang.Object" is parsed. So only when reached 'x' does it know what it is.

Grammar's ambiguities

Hi, i have a question about grammar's ambiguities.

I have a rule for comments that should start from '*' of beginning line and this symbol is used for multiply too.

Rule for multiply:

exp: exp '*' exp
| ID ;

Rule for asterisk comment:
ASTERISK_COMMENT
: '\n' '' ~[\r\n]? (('\r'? '\n')|EOF) -> skip
;

ID : [a-zA-Z]+;

and i was adding {getCharPositionInLine() == 0}? to finding start of the line before '\n', cause we can have comment at first position of file
it does not work for me too

i need any advices

But i can't understand why IDEA (Eclipse plugin too) does not produce a right condition for this block and for my code into @lexer:members{} part, look:

public boolean isFirstPosition(){

    Token tkn = super.nextToken();

    if( 0 == tkn.getCharPositionInLine() )
        if( tkn.getText().equals('*') )
            return true;

    return false;

}

and rule for asterisk_comment
ASTERISK_COMMENT
: '\n' '' ~[\r\n]? (('\r'? '\n')|EOF) {isFirstPosition()}? -> skip
;

may be i need testing via terminal?

antlr / grammars-v4 Goto Github PK

grammars-v4's Issues

Recommend Projects

Recommend Topics

Recommend Org