renatahodovan / grammarinator Goto Github PK

View Code? Open in Web Editor NEW

328.0 15.0 57.0 658 KB

ANTLR v4 grammar-based test generator

License: Other

Python 73.56% ANTLR 23.23% Jinja 3.21%

fuzzing fuzzer grammar-based-testing security bughunting antlr4 random-testing test-automation hacktoberfest

grammarinator's People

Contributors

Stargazers

Watchers

Forkers

akosthekiss ufwt zenhumany chubbymaggie riusksk jackbro mldeveloper01 fuzzamos 0x9k flyawaytang p3r1k0 knooow matedabis aazim-yaswant m4rm0k vglavnyy triplekill seabreg etsangsplk richinseattle ztz2018 bzsolt rerobika dhvssigrun skimberk tnsr1 5l1v3r1 gsongx rafeeque1 abyss7 xlw712 jeffreylovitz saraalsaheel autogrammar xinlongyang hartl3y94 adelapie realdbcooper kyuheon-kr jsaribeirolopes djun wintered zeph1912 roncking quapka aroddick sagaroffsec rindphi 38b394ce01 vineethkannanshanmugaraj kortemik splendorlee elijahahianyo cychen2021 rog3rsm1th lkheh

grammarinator's Issues

Generating alternation at most once

Hi,

I'm exploring the use of grammarinator for a project.

I'd like to generate an alternation for which the order does not matter, but each alternative should be produced at most once. I want to avoid listing all the N! possibilities for N alternatives.

I noticed that grammarinator has some support for ANTLR's actions, and I got as far as defining the following grammar (the number of alternatives is larger than 3 in my real use case), which produces syntactically correct Python code. Generation fails with a ValueError: Total of weights must be greater than zero, which makes sense as well.

entry [x=1,y=1,z=1]
   : component[x,y,z] EOF
   ;

component [x,y,z]
   : {x}? x_component (',' component[0,y,z])*
   | {y}? y_component (',' component[x,0,z])*
   | {z}? z_component (',' component[x,y,0])*
   ;

Is there a way to accomplish what I want to do without listing the alternatives? I thought of guarding the (',' component[...])* with {y+z>0}? (and similar).

I have already written a custom generator that overrides the code-generated one that does what I want, but I'm not sure how well that will play with mutations/transformations, yet. I'd be interested if you have thoughts on this as well.

Thanks,
Matthias

Using MySqlGrammar generator SelectStatement unrecognizable Code

exec below:
grammarinator-generate -l example/test/MySqlUnlexer.py -p example/test/MySqlUnparser.py -r selectStatement -n 100 -d 20 -o tests/data/test_%d.txt -t grammarinator.runtime.simple_space_transformer

return some data
( ( SELECT SQL_CALC_FOUND_ROWS * ORDER BY ! !𝍠 DESC , @JTOV2B2 :=🃑𐱉🥳 NOT RLIKE @97 > @"" :=ཉলא IS NOT \N ) ) UNION ( SELECT * ORDER BY @M6 NOT REGEXP𑊏 RLIKE @8 :=ஂ LIKE @'꜀' := @.$ IS UNKNOWN DESC , FALSE OR @`` :=𐠀𑴈ຘ𛲉 | | 2 < ALL ( ( SELECT * ) ) IS FALSE LIMIT @@`` , @@I ) ORDER BY @'' := 0 NOT REGEXP @"" := NULL NOT RLIKE @@45T LIKE NULL NOT REGEXP @@LWD REGEXP @VF NOT RLIKE @'' := \N < ANY ( ( ( ( SELECT STRAIGHT_JOIN * ) ) ) LOCK IN SHARE MODE ) BETWEEN @`` := @@RT < = SOME ( ( SELECT SQL_CACHE * ) LOCK IN SHARE MODE ) < = @'''\Z' := @@K. IS NOT NULL AND NULL SOUNDS LIKE @'' := NOT NULL < = NOT NULL NOT BETWEEN @_ AND @"" := @W IS FALSE DESC , BINARY CURRENT_TIME < = SOME ( ( ( SELECT SQL_CALC_FOUND_ROWS HIGH_PRIORITY SQL_BIG_RESULT CURDATE ( ) LIMIT 0 , 2 INTO୞ , @. ) ) FOR UPDATE ) IS NOT TRUE
Some special symbols appear,Is there any way to specify the resulting code?
please help me

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9204

We are facing the following error trying to run the command line tool on a Mac 10.12.5, using Python 3.6 (all files are encoded in UTF8):

grammarinator-process Lexer.g4 Parser.g4 -o output/
Traceback (most recent call last):
  File "/usr/local/bin/grammarinator-process", line 11, in <module>
    sys.exit(execute())
  File "/usr/local/lib/python3.6/site-packages/grammarinator/process.py", line 488, in execute
    FuzzerFactory(args.out, args.antlr).generate_fuzzer(args.grammars, args.actions, args.out, args.pep8)
  File "/usr/local/lib/python3.6/site-packages/grammarinator/process.py", line 403, in generate_fuzzer
    root, grammar_parser = self.parse(grammar)
  File "/usr/local/lib/python3.6/site-packages/grammarinator/process.py", line 447, in parse
    current_root, current_parser = self.parse_single(grammar)
  File "/usr/local/lib/python3.6/site-packages/grammarinator/process.py", line 435, in parse_single
    token_stream = CommonTokenStream(self.lexer(FileStream(grammar)))
  File "/usr/local/lib/python3.6/site-packages/antlr4/FileStream.py", line 20, in __init__
    super().__init__(self.readDataFrom(fileName, encoding))
  File "/usr/local/lib/python3.6/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
    return codecs.decode(bytes, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9204: ordinal not in range(128)

unrecognized arguments: -p

When I use the latest version of grammarinator, there is no parameter - p for grammarinator-generate

Test generation failed: name 'local_ctx' is not defined.

Hi,

Grammarinator looks very nice and I plan to use it in a compiler course I am developing so that everyone learns about grammar-based fuzzing.

Given a simple grammar without actions (see below), running these two commands:

$ grammarinator-process ../antlr4/Kal.g4 --no-actions
$ grammarinator-generate -l KalUnlexer.py -p KalUnparser.py

which yields:

Test generation failed: name 'local_ctx' is not defined.

I'm using grammarinator 18.10 on Ubuntu 18.04

Thanks for developing this tool and for any help you can provide.

matt

grammar Kal;

// program is a sequence of decls and definitions with one top level expr
program : (extern_decl | function)* expr
;

extern_decl : 'extern' prototype
;

prototype : ID '(' proto_args ')'
;

proto_args : ID (',' ID) *
;

function : 'def' prototype expr
;

atom : ite #iteExpr
| call #callExpr
| '(' expr ')' #parenExpr
| NUM #numExpr
| ID #idExpr
;

ite : 'if' expr 'then' expr 'else' expr
;

call : ID '(' call_args ')'
;

call_args : expr (',' expr)*
;

// Lexer

MUL : '*' ;
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
LT : '<' ;
LTE : '<=' ;
GT : '>' ;
GTE : '>=' ;
EQ : '==' ;
NE : '!=' ;

NUM : [0-9]+ ;

ID : [a-zA-Z_][a-zA-Z0-9_]* ;

// "-> skip" defines a lexical action which skips the matched characters

WS : [ \t\n]+ -> skip ;

COMMENT : '#' ~[\n]* -> skip ;

How to get the value of `current` ?

I've been having a bit of trouble getting the current string-value of a lex token. Here is what I'm currently doing:

from ExprGenerator import ExprGenerator
from grammarinator.runtime import *
import random

class MyExprGenerator(ExprGenerator):

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def Int(self, parent=None):
        with RuleContext(self, UnlexerRule(name='Int', parent=parent)) as current:
            UnlexerRule(src=str(random.random()), parent=current)

            # such a hackish way to get the current value?
            _current_val = "{}".format(current)
            if float(_current_val) < 0.5: return self.Int(parent=parent)

            return current

Is there a better way to get current ? I tried a few things, but I was basically getting a parent/child node or None, for example:

>>> current.__dict__
# {'name': 'Int', 'parent': <grammarinator.runtime.tree.UnparserRule object at 0x107f0deb0>, 'children': [<grammarinator.runtime.tree.UnlexerRule object at 0x107f1ca90>], 'level': None, 'depth': None, 'src': None}

What's the suggested way to get the current value? What I'm trying to do is validation, for example, make sure a number is > 0.5, which is pretty difficult to do directly from the antlr grammar, so I'm doing some additional validation in the listeners/generators. Thanks so much for your time and help!

David

Test case doesn't work anymore

It seems perhaps the format was updated to accept two different files, one for lexing and one for parsing. Running the default example I get:

LA-DEV-IM-MM:grammarinator david$ grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 \
>   -o examples/fuzzer/
LA-DEV-IM-MM:grammarinator david$ grammarinator-generate HTMLCustomGenerator.HTMLCustomGenerator -r htmlDocument -d 20 \
>   -o examples/tests/test_%d.html -n 100 \
>   -s HTMLGenerator.html_space_serializer \
>   --sys-path examples/fuzzer/
usage: grammarinator-generate [-h] -p FILE -l FILE [-r NAME]
                              [-t LIST [LIST ...]]
                              [--test-transformers LIST [LIST ...]]
                              [-d NUM] [-c NUM] [--population DIR]
                              [--no-generate] [--no-mutate]
                              [--no-recombine] [--keep-trees]
                              [-j NUM] [-o FILE] [--encoding ENC]
                              [--log-level LEVEL] [-n NUM]
                              [--sys-recursion-limit NUM] [--version]
grammarinator-generate: error: the following arguments are required: -p/--unparser, -l/--unlexer

Actually, I wasn't able to get this to work with the separated files (the tree itself would always error). Any feedback or update on how to use this would be great. Thanks

How to generate only printable words

Hello, I run the example case for HTML and generated 100 HTMLs, but many labels had unprintable words, what should I do if I only want to generate printable words in labels, thanks!

Can't get this to work

I'm trying to understand how to run Grammarinator with a minimal knowledge beyond Linux/Bash/Antlr4. Unfortunately, I can't get far past the first two lines (one to build it after cloning, the other grammarinator-process).

Following the instructions starting here and filling in the details as though it's written for GitHub Actions:

git clone https://github.com/renatahodovan/grammarinator.git
cd grammarinator

pip --version
# pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)

pip install .
mkdir x
cd x
cat << END > Expr.g4
grammar Expr;
file_ : expr EOF;
expr : expr ('*' | '/') expr | expr ('+' | '-') expr | '(' expr ')' | ('+' | '-')* atom ;
atom : INT;
INT : [0..9]+;
WS : [ \r\n\t] + -> channel(HIDDEN) ;
END

java --version
# openjdk 11.0.13 2021-10-19
# OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.20.04)
# OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

java -jar /mnt/c/Users/Kenne/Downloads/antlr-4.9.3-complete.jar Expr.g4
# (Antlr4 works.)

grammarinator-process Expr.g4 -o .
# (grammarinator-process works.)

ls -l ExprGenerator.py
# -rwxrwxrwx 1 ken ken 3808 Feb 15 18:29 ExprGenerator.py*

grammarinator-generate
# grammarinator-generate: error: the following arguments are required: NAME

grammarinator-generate ExprGenerator
# Traceback (most recent call last):
#   File "/home/ken/.local/bin/grammarinator-generate", line 8, in <module>
#     sys.exit(execute())
#   File "/home/ken/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 295, in execute
#     with Generator(generator=args.generator, rule=args.rule, out_format=args.out,
#   File "/home/ken/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 72, in __init__
#     self.generator_cls = import_object(generator) if generator else None
#   File "/home/ken/.local/lib/python3.8/site-packages/inators/imp.py", line 24, in import_object
#     return getattr(importlib.import_module('.'.join(steps[0:-1])), steps[-1])
#   File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
#     return _bootstrap._gcd_import(name[level:], package, level)
#   File "<frozen importlib._bootstrap>", line 1011, in _gcd_import
#   File "<frozen importlib._bootstrap>", line 950, in _sanity_check
# ValueError: Empty module name

grammarinator-generate ExprGenerator.ExprGenerator
# Traceback (most recent call last):
#   File "/home/ken/.local/bin/grammarinator-generate", line 8, in <module>
#     sys.exit(execute())
#   File "/home/ken/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 295, in execute
#     with Generator(generator=args.generator, rule=args.rule, out_format=args.out,
#   File "/home/ken/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 72, in __init__
#     self.generator_cls = import_object(generator) if generator else None
#   File "/home/ken/.local/lib/python3.8/site-packages/inators/imp.py", line 24, in import_object
#     return getattr(importlib.import_module('.'.join(steps[0:-1])), steps[-1])
#   File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
#     return _bootstrap._gcd_import(name[level:], package, level)
#   File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
#   File "<frozen importlib._bootstrap>", line 991, in _find_and_load
#   File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
# ModuleNotFoundError: No module named 'ExprGenerator'

I just want to see what grammarinator-generate outputs. The instructions are inadequate.

Error processing grammar ModuleNotFoundError: No module named 'grammarinator.parser.ANTLRv4Lexer'

I'm installing latest grammarinator on master and got this error when I want to generate json (I separate some rules from original json grammar file to the lexer and parser in order to successful generating output)

cityoflight:json cityoflight$ grammarinator-process JSONLexer.g4 JSONParser.g4 --no-actions -o output
-sh: grammarinator-process: command not found
cityoflight:json cityoflight$ source ~/.bash_profile
cityoflight:json cityoflight$ grammarinator-process JSONLexer.g4 JSONParser.g4 --no-actions -o output
Traceback (most recent call last):
  File "/Users/cityoflight/Library/Python/3.9/bin/grammarinator-process", line 5, in <module>
    from grammarinator.process import execute
  File "/Users/cityoflight/Library/Python/3.9/lib/python/site-packages/grammarinator/__init__.py", line 11, in <module>
    from .process import FuzzerFactory
  File "/Users/cityoflight/Library/Python/3.9/lib/python/site-packages/grammarinator/process.py", line 28, in <module>
    from .parser import ANTLRv4Lexer, ANTLRv4Parser
  File "/Users/cityoflight/Library/Python/3.9/lib/python/site-packages/grammarinator/parser/__init__.py", line 8, in <module>
    from .ANTLRv4Lexer import ANTLRv4Lexer
ModuleNotFoundError: No module named 'grammarinator.parser.ANTLRv4Lexer'

But when I uninstall latest one and using old one by typing pip3 install grammarinator I'm successfully generating json output

Latest release is out of date

Hi,
The latest release on pip (19.3) is currently out of date with master and is missing a few features.

Also, this project looks really cool!

Recommendation: python env

EDIT: The documentation explicitly says python3, my mistake to run simple python setup.py install on my guest OS without realizing by default uses python2 and that's why I had syntax errors.

You can always add a shebang #!/usr/bin/env python3 to avoid this, but in this case it was my mistake.

Thanks for your projects and keep up the good work.

Wrong python code generated

Grammarinator generating incorrect code with some grammars. In particular, when a non-terminal is in the definition of a token. When generating the code with grammarinator-process, no error is thrown. Only when generating inputs with grammarinator-generate ,does the error appear.

The bug can be reproduced with the following grammar:

json
   : token
   ;

token
   : NUMBER   ;

NUMBER
   : '-'? INT ('.' [0-9] +)? token?
   ;

fragment INT
   : '0' | [1-9] [0-9]*
   ;

After running grammarinator-process and grammarinator-generate. The error message thrown is:

Traceback (most recent call last):
  File "/home/bachir/.local/bin/grammarinator-generate", line 8, in <module>
    sys.exit(execute())
  File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 281, in execute
    with Generator(unlexer_path=args.unlexer, unparser_path=args.unparser, rule=args.rule, out_format=args.out,
  File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 71, in __init__
    self.unlexer_cls = import_entity('.'.join([unlexer, unlexer]))
  File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 59, in import_entity
    return getattr(importlib.import_module('.'.join(steps[0:-1])), steps[-1])
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 844, in exec_module
  File "<frozen importlib._bootstrap_external>", line 981, in get_code
  File "<frozen importlib._bootstrap_external>", line 911, in source_to_code
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/bachir/grammar-mutation/json/grammarinator-bugs/13/generate/JSONUnlexer.py", line 42
    return current
    ^
IndentationError: expected an indented block

When looking at the file JSONUnlexer.py, we see the following invalid code:

        if self.unlexer.max_depth >= 0:
            for _ in self.zero_or_one():

        return current

grammarinator-generate fails for the PostgreSQL grammar

While, as described in #40, grammarinator-process failed for the SQLite and MySQL grammars, it processed the grammars of PostgreSQL without any issues.

However, executing grammarinator-generate PostgreSQLGenerator.PostgreSQLGenerator -r root -d 5 -o test%d.sql -n 100 --sys-path . in the directory of PostgreSQLGenerator.py resulted in errors.

First, I found that grammarinator included a multi-line comment starting with /* from the grammar, which is not valid in Python and caused it to fail with an error:

  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "./PostgreSQLGenerator.py", line 22
    /* This field stores the tags which are used to detect the end of a dollar-quoted string literal.
    ^

After fixing this in the generated code, a follow-up failure with an invalid indent appeared:

  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "./PostgreSQLGenerator.py", line 15292
    ParseRoutineBody(_localctx);
    ^

By reducing the indent, the failure disappeared, but the next failure appeared which I did not investigate:

  File "./PostgreSQLGenerator.py", line 11, in <module>
    from PostgreSQLParserBase import PostgreSQLParserBase
ModuleNotFoundError: No module named 'PostgreSQLParserBase'

AttributeError: type object 'JSONGenerator' has no attribute '<INVALID>'

I was trying to generate json files using the --population option and the antlr json grammar. The grt files have been prepared with no problem. But I get an error when I run the following command:

grammarinator-generate JSONGenerator.JSONGenerator -r json -o tests/test_%d.json -d 20 -n 1 --population ../seeds/grts --no-generate --sys-path .

The error output:

Test generation failed.
Traceback (most recent call last):
  File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 119, in create_new_test
    tree = generator(self.rule, self.max_depth)
  File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 197, in recombine
    options = self.default_selector(node for rule_name in common_types for node in tree_1.node_dict[rule_name])
  File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 213, in default_selector
    return [node for node in iterable if node.name is not None and node.parent is not None and node.name != 'EOF' and node.level + min_depth(node) < self.max_depth]
  File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 213, in <listcomp>
    return [node for node in iterable if node.name is not None and node.parent is not None and node.name != 'EOF' and node.level + min_depth(node) < self.max_depth]
  File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 211, in min_depth
    return getattr(getattr(self.generator_cls, node.name), 'min_depth', 0)
AttributeError: type object 'JSONGenerator' has no attribute '<INVALID>'

The error does not occur if option --population is omitted. In addition, the seed inputs have been successfully parsed into grt files. The error occurs with any provided grt file.

Any idea what could be the cause of this?

Try to get C/Cpp Generator work

Hi, thanks for your awesome tool!

I am wondering how grammarianor deals with the C/Cpp antlr-v4 file. Specifically, CPP.14.g4 is not separated into Lexer and Parser. Is grammarinator can also work ok in this situation? How can I get test cases from a single CPP14.g4 file?

Any suggestions are welcome, thank you very much!

Generation seed not working

I am trying to use option --random-seed in order to be able to generate the same input when given the same seed number.

However, it's not working, as the generated inputs differ.

I have experimented with multiple grammars, including the json antlr grammar. The following is the command I am using:

grammarinator-generate JSONGenerator.JSONGenerator -r json -o tests/test_1.json -n 1 --random-seed 22 --sys-path .

The versions I tried:

grammarinator-generate 19.3.post125+geecdcb7
grammarinator-generate 19.3.post136+g350ce45

Invalid words while using NOT

Hey,

I'm writing a converter from ANTLR to another format. To test this converter I used Grammarinator and a parser generator to fuzz my conversion and verify if it can parse every word.

I have encountered the following problem:
When using the NOT feature of ANTLR, Grammarinator generates invalid words.

Consider the following ANTLR grammar not.g4

grammar not;

start: '>' No_comma '<';
No_comma: ~',';

Now all possible words should come out like >a< or >6<, but the word >,< is not valid because we don't allow a comma with the NOT feature.
I tested different variants for the No_comma rule to match the ANTLR documentation both as lexer and parser rules, in every run Grammarinator still outputs invalid words.

Finding this error took me a lot of time, because I first looked for the error in my converter. Even if it is not fixed, it would be very nice to confirm it so that I can be sure.

Used versions:
Grammarinator19.3
antlr-4.9.3-complete.jar
Python 3.7

Thanks a lot,
38b394ce01

Python keywords as rule name

It is not possible to use grammars that contain any Python keywords as rule name.

Consider the following ANTLR grammar:

grammar keyword;
start: False;
False: 'anything';`

This wil procude the keywordGenerator.py with the following function:

    @depthcontrol
    def False(self, parent=None):
        current = UnlexerRule(name='False', parent=parent)
        self._enter_rule(current)
        UnlexerRule(src='anything', parent=current)
        self._exit_rule(current)
        return current
    False.min_depth = 0

Because False is a reserved keyword in Python this is not valid Python code.
The bug occurs with every keyword.

Allows specifying the import dir

Hi @renatahodovan!

Firs of all, thank you very much for sharing this incredible tool with us! It's really helpful!

We realized that the tool does not support specifying the import dir for grammars which import files are located in a different directory. To give you a context, our project structure looks like this:

grammars
    base
        BaseParser.g4
        BaseLexer.g4
    java
        JavaParser.g4
        JavaLexer.g4
   ...

As our grammar use semantic predicates, we've split them to keep the target language code out of the base grammar. This approach allows us reusing the grammar for generating parser for different target languages without duplicating any rule. So, is it possible to introduce a new argument to specify where the generator should look when importing files, exactly as the Antlr command line tool does?

Generated file missing tokens

I am trying to test grammars-v4/verilog/verilog/ using Grammarinator. But, I'm getting problems in parsing some generated output. When I look at output from Trees.print(), the tree doesn't seem to contain all the tokens or sometimes more tokens that aren't in the printed tree.

Here is the code that I am executing:

git clone https://github.com/antlr/grammars-v4.git
cd grammars-v4
git checkout ffecfeee601ffc75edbc52845c1509753d6dd4a1
cd verilog/verilog
# Already cloned and build grammarinator from sources.
grammarinator-process VerilogLexer.g4 VerilogParser.g4 -o .
grammarinator-generate VerilogGenerator.VerilogGenerator  --sys-path . -d 15 -n 100 -r source_text --serializer grammarinator.runtime.simple_space_serializer --no-mutate --no-recombine
# Already built a standardized Antlr4 parser driver for the the grammar.
for  i in tests/test_*; do echo $i; ./Generated/bin/Debug/net5.0/Test.exe -file $i; status=$?; if [[ $status != 0 ]]; then break; fi; done

This loops through the various generated tests, parsing each, and stops the loop on a test file that does not parse.

I've assume that Grammarinator would construct a valid CST ("Unparser" tree) and output that. While most tests parse, some do not, and only appear when -d 15 is specified. I've included the --no-mutate and --no-recombine so that the tree is output as is unmodified.

To understand WHY the parse fails, I need to look at the CST constructed prior to serializing the token stream into a generated test. To do that, I modified generate.py after this line with this code:

    print("Index = ")
    print(index)
    tree.print()

I now rerun the grammarinator-generate command and save the human-readable parse trees, and rerun the parser.

Selecting a test that fails, I've noticed that the tree.print() output is not the same as the generated text, and the tokens reported by the standardized Antlr parser.

For example,

Output from tree.print():

...
VERTICAL_BAR
DOLLAR_RANDOM
COMMA
COMMA
SIMPLE_IDENTIFIER
...
Tokens recognized by parser:

...
VERTICAL_BAR
DOLLAR_RANDOM
COMMA
SIMPLE_IDENTIFIER
...

(Note, only one COMMA.)

Relevant sequence in generated file:

| $random , J

(Note, only one COMMA.)

I have noticed other times similar token differences. It seems that

Grammarinator indicates some tokens in the CST that are not being outputted.

Incidentally, I tried to just save the tree using --keep-trees but there is no tool to print out the trees after reading. I tried something like this, but it did not work.

from pydoc import importfile
module = importfile('/full/path/to/trees.py')
module.Trees.print(module.Trees.load("/full/path/to/test_xxx.grt"))

Testcase Uniqueness

Heya, so I was testing out the Json grammer fuzzing and noticed alot of duplication in the output testcases. It may have been something I was doing wrong but I was following the instructions on the Readme.md.

Testcase Generation I was using (standard Unlexer/Unparser):
grammarinator-generate -l JSONUnlexer.py -p JSONUnparser.py -d 10 -n 1000000 -o json_fuzzer_test2

The above would finish incredibly fast at ~37 seconds for the 1 million testcases, which was awesome, but when testing the uniqueness for a smaller sample via "for i in `ls` ; do md5sum $i >> hashes.txt; done", I got the following:

~grammarinator/json_fuzz/# wc -l hashes.txt
133614 hashes.txt
~/grammarinator/json_fuzz/# cat hashes.txt | cut -d " " -f 1 | sort -u | wc -l
29594

Anyways, I wrote a patch for getting grammarinator-generate to produce unique testcases that can be found at the below. It's just a hack and could probably done better as to reduce the runtime cost. As it stands, the runtime is significantly increased, but it seems like the testcases are unique:

time grammarinator-generate -l JSONUnlexer.py -p JSONUnparser.py -d 10 -n 1000000 -o json_fuzzer_test2
real    41m58.709s
user    5m4.388s
sys     69m52.101s

/json_fuzzer_test2# cat hashes.txt | wc -l
76184
/json_fuzzer_test2# cat hashes.txt | cut -d " " -f 1 | sort -u | wc -l
76184

18,19d17
< import hashlib
<
21c19
< from multiprocessing import Pool, Manager, Lock
---
> from multiprocessing import Pool
56,59c54
<                  cleanup=True, encoding='utf-8', shared_dict={}, shared_lock = None):
<
<         self.shared_dict = shared_dict
<         self.shared_lock = shared_lock
---
>                  cleanup=True, encoding='utf-8'):
147a143,144
>         with codecs.open(test_fn, 'w', self.encoding) as f:
>             f.write(str(Generator.transform(tree.root, self.test_transformers)))
149,163c146
<         output = str(Generator.transform(tree.root, self.test_transformers))
<         output_hash = hashlib.md5(output.encode('utf-8')).digest()
<          
<         try:
<             with self.shared_lock:
<                 _ = self.shared_dict[output_hash]
<             return self.create_new_test()
<         except KeyError:
<             with self.shared_lock:
<                 self.shared_dict[output_hash] = 1
<
<             with codecs.open(test_fn, 'w', self.encoding) as f:
<                 f.write(output)
<
<             return test_fn, tree_fn
---
>         return test_fn, tree_fn
302,305d284
<     sync_manager = Manager()
<     shared_dict_ = sync_manager.dict()
<     lock = sync_manager.Lock()
<
310c289
<                    cleanup=False, encoding=args.encoding, shared_dict=shared_dict_, shared_lock = lock) as generator:
---
>                    cleanup=False, encoding=args.encoding) as generator:

grammarinator-process triggers ValueError: Unicode properties (\p{...}) are not supported

Using the current HEAD of master and running the command:

grammarinator-process --no-actions Cypher.g4 --pep8 --encoding utf-8

After a few seconds, the following traceback is generated:

Traceback (most recent call last):
  File "/home/tower-linux/.local/bin/grammarinator-process", line 11, in <module>
    load_entry_point('grammarinator==19.3+53.gbfde275', 'console_scripts', 'grammarinator-process')()
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 748, in execute
    FuzzerFactory(args.language, args.out, args.antlr).generate_fuzzer(args.grammar, options=options, encoding=args.encoding, lib_dir=args.lib, actions=args.actions, pep8=args.pep8)
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 656, in generate_fuzzer
    graph = build_graph(self.antlr_parser_cls, actions, lexer_root, parser_root)
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 600, in build_graph
    build_rules(root)
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 582, in build_rules
    build_rule(*rule_args)
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 527, in build_rule
    build_expr(node, rule.id)
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 522, in build_expr
    build_expr(child, parent_id)
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 398, in build_expr
    build_expr(children[0], parent_id)
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 426, in build_expr
    build_expr(child, parent_id)
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 445, in build_expr
    build_expr(node.children[0], parent_id)
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 497, in build_expr
    ranges = lexer_charset_interval(str(node.LEXER_CHAR_SET())[1:-1])
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 362, in lexer_charset_interval
    codepoint, offset = lexer_charset_char(s, offset)
  File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 330, in lexer_charset_char
    raise ValueError('Unicode properties (\\p{...}) are not supported')
ValueError: Unicode properties (\p{...}) are not supported

Cypher.g4 can be found here - https://gist.github.com/jeffreylovitz/25f6ed569e3fc474e1d360fff9407446

Extended unicode not supported

antlr4 supports extended unicode of the form \u{12345}, python3 however wants longer unicodes on the form \U00012345.

Resulting in the following exception:

Traceback (most recent call last):
  File "/home/phasip/.local/bin/grammarinator-process", line 11, in <module>
    sys.exit(execute())
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 709, in execute
    FuzzerFactory(args.out, args.antlr).generate_fuzzer(args.grammars, encoding=args.encoding, lib_dir=args.lib, actions=args.actions, pep8=args.pep8)
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 629, in generate_fuzzer
    for name, src in fuzzer_generator.generate(lexer_root, parser_root):
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 268, in generate
    self.generate_grammar(root)
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 368, in generate_grammar
    self.unlexer_body += self.generate_single(rule, None)
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 393, in generate_single
    rule_code += self.generate_single(rule_block, rule_name)
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 578, in generate_single
    return ''.join([self.generate_single(child, parent_id) for child in node.children])
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 578, in <listcomp>
    return ''.join([self.generate_single(child, parent_id) for child in node.children])
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 425, in generate_single
    return self.generate_single(children[0], parent_id)
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 475, in generate_single
    return ''.join([self.generate_single(child, parent_id) for child in children])
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 475, in <listcomp>
    return ''.join([self.generate_single(child, parent_id) for child in children])
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 497, in generate_single
    return self.generate_single(node.children[0], parent_id)
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 553, in generate_single
    ranges = self.lexer_charset_interval(str(node.LEXER_CHAR_SET())[1:-1])
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 235, in lexer_charset_interval
    element = bytes(element, 'utf-8').decode('unicode_escape')
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 144-145: truncated \uXXXX escape

A ugly hack to fix this is to search and replace with regex, however this would fail on cases with an even number of slashes before the u , eg: "\u{10000}".

Sources:
https://github.com/antlr/antlr4/blob/master/doc/unicode.md
https://docs.python.org/3/library/codecs.html

Edit: As this seems very rare a very simple solution is to catch the UnicodeDecodeError and give the user a description of this problem and let them change their .g4 files manually.

Enforce coverage of a grammar rule

Is there away to make sure that a rule gets covered at least once during the generation process?

For example if I want integers to be included in all of my generated json inputs. It would be nice if I can annotate the grammar such that the corresponding rule is always exercised by Grammarinator.

Wrong indentation in generated Python code

Steps to reproduce:

git clone https://github.com/renatahodovan/grammarinator, f85b80c
cd grammarinator && pip3 install .
mkdir lua-examples lua-fuzzer
curl -O https://raw.githubusercontent.com/antlr/grammars-v4/master/lua/Lua.g4
grammarinator-process Lua.g4 -o lua-fuzzer/ --pep8 -v
grammarinator-generate LuaGenerator.LuaGenerator -r chunk -d 20 -o lua-examples/test_%d.lua -n 100 -s grammarinator.runtime.simple_space_serializer --sys-path lua-fuzzer/

Syntax error in the generated file:

Traceback (most recent call last):
  File "/home/sergeyb/.local/bin/grammarinator-generate", line 8, in <module>
    sys.exit(execute())
  File "/home/sergeyb/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 294, in execute
    with Generator(generator=args.generator, rule=args.rule, out_format=args.out,
  File "/home/sergeyb/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 72, in __init__
    self.generator_cls = import_object(generator) if generator else None
  File "/home/sergeyb/.local/lib/python3.8/site-packages/inators/imp.py", line 24, in import_object
    return getattr(importlib.import_module('.'.join(steps[0:-1])), steps[-1])
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 844, in exec_module
  File "<frozen importlib._bootstrap_external>", line 981, in get_code
  File "<frozen importlib._bootstrap_external>", line 911, in source_to_code
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/sergeyb/sources/MRG/tarantool/grammarinator/lua-fuzzer/LuaGenerator.py", line 988
    elif choice == 1:
    ^
SyntaxError: invalid syntax

Seems indentation is wrong and patch below fixes a problem:

--- lua-fuzzer/LuaGenerator.py.orig     2022-07-04 15:52:37.284564569 +0300
+++ lua-fuzzer/LuaGenerator.py  2022-07-04 15:52:56.883836234 +0300
@@ -985,7 +985,8 @@
         choice = self._model.choice(current, 0, [0 if [
                                     0, 0, 0, 0][i] > self._max_depth else w for i, w in enumerate([1, 1, 1, 1])])
         if choice == 0:
-            elif choice == 1:
+            pass
+        elif choice == 1:
             UnlexerRule(src='[', parent=current)
             if self._max_depth >= 0:
                 for _ in self._model.quantify(current, 0, min=0, max=inf):

Document unlexer limitations?

Awesome tool, really useful. Thanks!

I read the README and accompanying paper but failed to realise that random outputs may be chosen which would be rejected by a corresponding ANTLR generated lexer. The unlexer doesn't capture the rules implied by the ordering in the ANTLR lexer file.

Here's a simple example:

parser grammar ExampleParser;
options {
	tokenVocab = ExampleLexer;
}

session: command ARG EOF;

command: A | B;

lexer grammar ExampleLexer;

A: 'a';
B: 'b';

ARG: [a-z];

WS: [ \t\u000C\r\n]+ -> channel(HIDDEN);

Inputs such as a a and a b would not be accepted by the ANTLR generated lexer. The matching rules are such that the first rule (reading the lexer file from start to end) which matches is selected so an "ARG" can be any letter a-z as long as it doesn't match "A" or "B" i.e. [c-z]. "ExampleGenerator" has the following code showing that it'll generate an "ARG" in the range [a-z]:

    @depthcontrol
    def ARG(self, parent=None):
        current = UnlexerRule(name='ARG', parent=parent)
        self.enter_rule(current)
        UnlexerRule(src=self.model.charset(current, 0, self._charsets[1]), parent=current)
        self.exit_rule(current)
        return current
    ARG.min_depth = 0

...

 _charsets = {
        0: list(chain.from_iterable([range(32, 127)])),
        1: list(chain.from_iterable([range(97, 123)])),
        2: list(chain.from_iterable([range(9, 10), range(10, 11), range(12, 13), range(13, 14), range(32, 33)])),
    }

In a more realistic setting, fuzzing languages like Lua where variable names cannot be keywords like "or", "and" etc (and this is captured in the order of rules in the lexer) will require a few tweaks to avoid wasting time fuzzing uninteresting parts of the language runtime. I opted to override unlexer methods so that names are chosen from a pool which won't collide with reserved words.

The README and paper cover fixing the random outputs to meet semantic requirements really well. I noticed the above when I saw coverage in lexer error paths so maybe it's subtle enough to document? Could be that I'm being dense: happy either way.

No way to control recursion depth

The generate function often crashes due to recursion depth. A way to control recursion depth with grammarinator-generate would be a useful feature. Any options for controlling it? Thanks

Try to get JavaScript generator working

Using the antlr grammar I tried to set it up: The grammarinator-process generates references to JavaScriptBaseLexer class:

Any help would be great:)

grammarinator-process ../grammars-v4/javascript/javascript/JavaScriptLexer.g4 ../grammars-v4/javascript/javascript/JavaScriptParser.g4 -o testout

grammarinator-generate -l testout/JavaScriptUnlexer.py -p testout/JavaScriptUnparser.py -r htmlDocument -o examples/tests/test_%d.js -n 100 -d 20
Traceback (most recent call last):
  File "/home/detlef/tmp/fuzz/venv/bin/grammarinator-generate", line 11, in <module>
    load_entry_point('grammarinator==19.3+15.g6f43afe.d20200213', 'console_scripts', 'grammarinator-generate')()
  File "/home/detlef/tmp/fuzz/venv/lib/python3.5/site-packages/grammarinator-19.3+15.g6f43afe.d20200213-py3.5.egg/grammarinator/generate.py", line 293, in execute
    cleanup=False, encoding=args.encoding) as generator:
  File "/home/detlef/tmp/fuzz/venv/lib/python3.5/site-packages/grammarinator-19.3+15.g6f43afe.d20200213-py3.5.egg/grammarinator/generate.py", line 69, in __init__
    self.unlexer_cls = import_entity('.'.join([unlexer, unlexer]))
  File "/home/detlef/tmp/fuzz/venv/lib/python3.5/site-packages/grammarinator-19.3+15.g6f43afe.d20200213-py3.5.egg/grammarinator/generate.py", line 57, in import_entity
    return getattr(importlib.import_module('.'.join(steps[0:-1])), steps[-1])
  File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 665, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "testout/JavaScriptUnlexer.py", line 10, in <module>
    from JavaScriptBaseLexer import JavaScriptBaseLexer
ImportError: No module named 'JavaScriptBaseLexer'

AttributeError: 'Unlexer' object has no attribute 'WHEN'

We're facing the following error when generating our grammar:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/usr/local/lib/python3.6/site-packages/grammarinator/generate.py", line 21, in generate
    root = getattr(parser_cls(lexer_cls()), rule)()
  File "output/BaseCclUnparser.py", line 17, in domain
    current += self.lexer.WHEN()
AttributeError: 'TestUnlexer' object has no attribute 'WHEN'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/grammarinator-generate", line 11, in <module>
    sys.exit(execute())
  File "/usr/local/lib/python3.6/site-packages/grammarinator/generate.py", line 74, in execute
    pool.starmap(generate, [(lexer_cls, parser_cls, args.rule, transformers, args.out % i) for i in range(args.n)])
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 268, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
    raise self._value
AttributeError: 'TestUnlexer' object has no attribute 'WHEN'

Parser:

parser grammar TestParser;

options {
    tokenVocab=TestLexer;
}

domain: WHEN ;

Lexer:

lexer grammar TestLexer;

WHEN: W H E N;

// Letters
A: [Aa];
B: [Bb];
C: [Cc];
D: [Dd];
E: [Ee];
F: [Ff];
G: [Gg];
H: [Hh];
I: [Ii];
J: [Jj];
K: [Kk];
L: [Ll];
M: [Mm];
N: [Nn];
O: [Oo];
P: [Pp];
Q: [Qq];
R: [Rr];
S: [Ss];
T: [Tt];
U: [Uu];
V: [Vv];
W: [Ww];
X: [Xx];
Y: [Yy];
Z: [Zz];

Add support for generating case insensitive tests

Our grammar is case insensitive, hence most tests look like the following:

aBc mAtcHes CdA
AbC mAtcHes cDa

However, it isn't relevant for us, so we've created a transformer that converts all UnlexerRule's to lowercase. The problem is that now more than 50% of the tests are duplicated.

Do you have any idea of how to handle this?

php grammar-v4 not working

I'm following the instructions in [README](grammarinator-process PhpLexer.g4 PhpParser.g4 -o out)

Download PhpLexer.g4 and PhpParser.g4 from grammars-v4, then

$ grammarinator-process PhpLexer.g4 PhpParser.g4 -o out
$ tree out
├── PhpUnlexer.py
└── PhpUnparser.py

I've also downloaded grammars-v4 PhpLexerBase.py and placed it to out/.

Now when I try to generate:

$ grammarinator-generate -p out/PhpUnparser.py -l out/PhpUnlexer.py -o generated_%d -j 1
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 129, in create_new_test
    tree = generator(self.rule, self.max_depth)
  File "/home/user/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 167, in generate
    unlexer = self.unlexer_cls(**dict(self.unlexer_kwargs, max_depth=max_depth))
  File "/home/user/userspace/php-gramm/out/PhpUnlexer.py", line 47, in __init__
    super(PhpUnlexer, self).__init__() 
TypeError: __init__() missing 1 required positional argument: 'input'

I tried to manually fix this, by figuring out what the hierarchy is of grammars-v4 PhpLexerBase which extends Lexer class. but it leads to other non-defined fields in PhpUnlexer.py. Seems, this integration is not working, or I'm missing something.

Please assist
Sabr

API usage

I would like to know how to invoke grammarinator-generate from a Python script. Instead of CLI.

Are there any documentations or examples please?

Error processing javascript grammar

Hi @renatahodovan

I cannot process javascript grammar because it return error. This is grammarinator version I'm using grammarinator-process 19.3.post79+g820a01a.d20210726

Step to reproduce

Before processing I'm removing options { superClass=JavaScriptLexerBase; } in JavascriptLexer.g4 and options { tokenVocab=JavaScriptLexer; superClass=JavaScriptParserBase; } in JavascriptParser.g4
Then I run grammarinator-process JavaScriptLexer.g4 JavaScriptParser.g4 --no-actions -o output command

javascript cityoflight77$ grammarinator-process JavaScriptLexer.g4 JavaScriptParser.g4 --no-actions -o output
Traceback (most recent call last):
  File "/Users/cityoflight77/anaconda3/bin/grammarinator-process", line 8, in <module>
    sys.exit(execute())
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 743, in execute
    FuzzerFactory(args.language, args.antlr, args.out).generate_fuzzer(args.grammar, options=options, encoding=args.encoding, lib_dir=args.lib, actions=args.actions, pep8=args.pep8)
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 650, in generate_fuzzer
    graph = build_graph(actions, lexer_root, parser_root)
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 602, in build_graph
    build_rules(root)
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 584, in build_rules
    build_rule(*rule_args)
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 529, in build_rule
    build_expr(node, rule.id)
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 524, in build_expr
    build_expr(child, parent_id)
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 411, in build_expr
    build_expr(child, alternative_id)
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 428, in build_expr
    build_expr(child, parent_id)
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 447, in build_expr
    build_expr(node.children[0], parent_id)
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 499, in build_expr
    ranges = lexer_charset_interval(str(node.LEXER_CHAR_SET())[1:-1])
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 364, in lexer_charset_interval
    codepoint, offset = lexer_charset_char(s, offset)
  File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 332, in lexer_charset_char
    raise ValueError('Unicode properties (\\p{...}) are not supported')
ValueError: Unicode properties (\p{...}) are not supported

No generator, no whitespace? Please help

How do I generate a ? None of the tests have whitespace and the input often gives invalid characters.
I got the g4 for https://github.com/antlr/grammars-v4/blob/master/javascript/javascript/

grammarinator-process JavaScriptParser.g4 JavaScriptLexer.g4 -o out --no-actions

grammarinator-generate -p out/JavaScriptUnparser.py -l out/JavaScriptUnlexer.py -r program -d 30 -n 3000

MacOS Python3.9
What is a generator do I need to write a generator program?
How do I write a transformer or change the grammar to add spaces? (Looking at the closed issue this is the solution but I have no idea what a transformer is).

TypeError: choice() takes 1 positional argument but 2 were given

After manually fixing issues due to #15 the grammar in https://github.com/chungkwong/fooledit/tree/490c4bc0a4ba6ceec3ac0c4cd1947a54e397ef34/mode.xml/src/main/antlr4/cc/fooledit/editor/text/mode/xml i receive this error:

Data: <function XMLUnparser.choice at 0x7f26101f06a8>, ([1, 1],), {}
Test generation failed.
Traceback (most recent call last):
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/generate.py", line 129, in create_new_test
    tree = generator(self.rule, self.max_depth)
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/generate.py", line 168, in generate
    tree = Tree(getattr(self.unparser_cls(unlexer) if rule[0].islower() else unlexer, rule)())
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/runtime/grammarinator.py", line 61, in controlled_fn
    result = fn(obj, *args, **kwargs)
  File "XMLUnparser.py", line 21, in document
    current += self.element()
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/runtime/grammarinator.py", line 61, in controlled_fn
    result = fn(obj, *args, **kwargs)
  File "XMLUnparser.py", line 244, in element
    choice = self.choice([0 if [2, 2][i] > self.unlexer.max_depth else w * self.unlexer.weights.get(('alt_147', i), 1) for i, w in enumerate([1, 1])])
  File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/runtime/grammarinator.py", line 61, in controlled_fn
    result = fn(obj, *args, **kwargs)
TypeError: choice() takes 1 positional argument but 2 were given

Note: The data line comes from a print added by me on grammarinator.py line 60: print("Data: %s, %s, %s"%(fn,args,kwargs),file=sys.stderr)

Content of XMLUnparser.py choice function:

@depthcontrol
    def choice(self):
        current = self.create_node(UnparserRule(name='choice'))
        current += self.create_node(UnlexerRule(src='('))
        if self.unlexer.max_depth >= 3:
            for _ in self.zero_or_one():
                current += self.s()

        current += self.cp()
        if self.unlexer.max_depth >= 0:
            for _ in self.one_or_more():
                if self.unlexer.max_depth >= 3:
                    for _ in self.zero_or_one():
                        current += self.s()

                current += self.create_node(UnlexerRule(src='|'))
                if self.unlexer.max_depth >= 3:
                    for _ in self.zero_or_one():
                        current += self.s()

                current += self.cp()

        if self.unlexer.max_depth >= 3:
            for _ in self.zero_or_one():
                current += self.s()

        current += self.create_node(UnlexerRule(src=')'))
        return current
    choice.min_depth = 2

'poetry build' error

Hi @renatahodovan!

I'm facing the following error when trying to run poetry build in the root dir of grammarinator:

==> Starting build()...

  PyProjectException

  [tool.poetry] section not found in /home/noptrix/blackarch/repos/blackarch/packages/grammarinator/src/grammarinator/pyproject.toml

  at /usr/lib/python3.10/site-packages/poetry/core/pyproject/toml.py:56 in poetry_config
      52│     def poetry_config(self):  # type: () -> Optional[TOMLDocument]
      53│         if self._poetry_config is None:
      54│             self._poetry_config = self.data.get("tool", {}).get("poetry")
      55│             if self._poetry_config is None:
    → 56│                 raise PyProjectException(
      57│                     "[tool.poetry] section not found in {}".format(self._file)
      58│                 )
      59│         return self._poetry_config
      60│ 
==> ERROR: A failure occurred in build().
    Aborting...

Any ideas here? Here is the PKGBUILD to build grammarinator under ArchLinux.

Unlexer generates forbidden characters

Versions of packages:

➤ pip install --user grammarinator                                                                     
Requirement already satisfied: grammarinator in /home/user/.local/lib/python3.8/site-packages (19.3)
Requirement already satisfied: antlerinator==4.7.1-1 in /home/user/.local/lib/python3.8/site-packages (from grammarinator) (4.7.1.post1)
Requirement already satisfied: autopep8 in /home/user/.local/lib/python3.8/site-packages (from grammarinator) (1.5.4)
Requirement already satisfied: antlr4-python3-runtime==4.7.1 in /home/user/.local/lib/python3.8/site-packages (from antlerinator==4.7.1-1->grammarinator) (4.7.1)
Requirement already satisfied: pycodestyle>=2.6.0 in /home/user/.local/lib/python3.8/site-packages (from autopep8->grammarinator) (2.6.0)
Requirement already satisfied: toml in /home/user/.local/lib/python3.8/site-packages (from autopep8->grammarinator) (0.10.1)

I have lexer rules:

STRING_LITERAL: QUOTE_SINGLE ( ~([\\']) | (BACKSLASH .) )* QUOTE_SINGLE;
BACKSLASH: '\\';
QUOTE_SINGLE: '\'';

And a parser rule (one of alternatives):

TRIM LPAREN (BOTH | LEADING | TRAILING) STRING_LITERAL FROM columnExpr RPAREN

I run generator like this:

grammarinator-generate -r queryList -o /tmp/sql_test_%d.sql -n 100 -c 0.3 -d 20 -p Unparser.py -l Unlexer.py --test-transformers SpaceTransformer.single_line_whitespace

And sometimes get the following output (partial):

tRIM	( bOTH '''	FrOM	(	*

I don't understand why there are triple single-quotes here. Looks like a bug.

grammarinator-process fails for SQLite and MySQL grammars

I just tried grammarinator on the SQLite and MySQL grammars from https://github.com/antlr/grammars-v4/tree/master/sql (commit 8dca3622acbea8fce8726c73364af232cb6eacce), but for both of them, the latest version of grammarinator-process failed.

grammarinator-process SQLiteLexer.g4 SQLiteParser.g4 -o out/
Traceback (most recent call last):
  File "/home/manuel/.local/bin/grammarinator-process", line 8, in <module>
    sys.exit(execute())
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 743, in execute
    FuzzerFactory(args.language, args.antlr, args.out).generate_fuzzer(args.grammar, options=options, encoding=args.encoding, lib_dir=args.lib, actions=args.actions, pep8=args.pep8)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 650, in generate_fuzzer
    graph = build_graph(actions, lexer_root, parser_root)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 604, in build_graph
    graph.calc_min_depths()
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 236, in calc_min_depths
    assert all(min_depths[node.id] < inf for node in self.vertices[ident].out_neighbours), '{ident} has an alternative that isn\'t reachable.'.format(ident=ident)
AssertionError: 739 has an alternative that isn't reachable.

grammarinator-process MySqlLexer.g4 MySqlParser.g4 
Traceback (most recent call last):
  File "/home/manuel/.local/bin/grammarinator-process", line 8, in <module>
    sys.exit(execute())
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 743, in execute
    FuzzerFactory(args.language, args.antlr, args.out).generate_fuzzer(args.grammar, options=options, encoding=args.encoding, lib_dir=args.lib, actions=args.actions, pep8=args.pep8)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 650, in generate_fuzzer
    graph = build_graph(actions, lexer_root, parser_root)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 602, in build_graph
    build_rules(root)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 584, in build_rules
    build_rule(*rule_args)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 529, in build_rule
    build_expr(node, rule.id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 524, in build_expr
    build_expr(child, parent_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 411, in build_expr
    build_expr(child, alternative_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 415, in build_expr
    build_expr(node.alternative(), parent_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 428, in build_expr
    build_expr(child, parent_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 456, in build_expr
    build_expr(node.children[0], quant_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 524, in build_expr
    build_expr(child, parent_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 524, in build_expr
    build_expr(child, parent_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 400, in build_expr
    build_expr(children[0], parent_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 428, in build_expr
    build_expr(child, parent_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 447, in build_expr
    build_expr(node.children[0], parent_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 509, in build_expr
    build_expr(child, parent_id)
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 513, in build_expr
    graph.add_edge(frm=parent_id, to=str(node.TOKEN_REF()))
  File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 215, in add_edge
    assert to in self.vertices, '{to} not in vertices.'.format(to=to)
AssertionError: ADMIN not in vertices.

Is this expected?

Unable do generate recursive grammars

In continuation of issue #5, we're still getting the following errors trying to generate tests for a grammar with recursive rules:

> grammarinator-process TestLexer.g4 TestParser.g4 -o test
> grammarinator-generate -l test/TestUnlexer.py -p test/TestUnparser.py -r domain -d 1 -n 1 -o output/tests/test_%d.ccl
domain cannot be generated within the given depth (min needed: 5).
> clean grammarinator-generate -l test/TestUnlexer.py -p test/TestUnparser.py -r domain -d 1 -n 5 -o output/tests/test_%d.ccl
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
> clean grammarinator-generate -l test/TestUnlexer.py -p test/TestUnparser.py -r domain -d 1 -n 10 -o output/tests/test_%d.ccl
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).

Now seems the warning is reported up to the max depth.

ModuleNotFoundError: No module named 'grammarinator.parser.ANTLRv4Lexer'

I got error when running grammarinator-process. I'm using latest grammarinator grammarinator-19.3.post79+g820a01a.d20210726. I installed grammarinator with pip3 install .
I'm using mac 10.14 with python 3.9.6

Traceback (most recent call last):
  File "/usr/local/bin/grammarinator-process", line 5, in <module>
    from grammarinator.process import execute
  File "/usr/local/lib/python3.9/site-packages/grammarinator/__init__.py", line 11, in <module>
    from .process import FuzzerFactory
  File "/usr/local/lib/python3.9/site-packages/grammarinator/process.py", line 29, in <module>
    from .parser import ANTLRv4Lexer, ANTLRv4Parser
  File "/usr/local/lib/python3.9/site-packages/grammarinator/parser/__init__.py", line 8, in <module>
    from .ANTLRv4Lexer import ANTLRv4Lexer
ModuleNotFoundError: No module named 'grammarinator.parser.ANTLRv4Lexer'

Can't set alt weights

Is there a way to assign weights/probabilities to grammar alternatives?

ANTLR download fails due to SSLError

When executing grammarinator-process without the --antlr argument, grammarinator requests antlerinator to download the ANTLR jarfile to ~/.antlerinator/antlr-4.7.1-complete.jar. This fails on my machine with an SSLError and the following stacktrace.

Traceback (most recent call last):
  File "/usr/lib/python3.10/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 975, in send
    self.connect()
  File "/usr/lib/python3.10/http/client.py", line 1454, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1062, in _create
    self._sslobj = self._context._wrap_socket(
ssl.SSLError: Cannot create a client socket with a PROTOCOL_TLS_SERVER context (_ssl.c:801)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<redacted>/.venv/bin/grammarinator-process", line 33, in <module>
    sys.exit(load_entry_point('grammarinator==19.3', 'console_scripts', 'grammarinator-process')())
  File "<redacted>/.venv/lib/python3.10/site-packages/grammarinator/process.py", line 708, in execute
    antlerinator.install(lazy=True)
  File "<redacted>/.venv/lib/python3.10/site-packages/antlerinator/install.py", line 47, in install
    with contextlib.closing(urlopen(tool_url, context=ssl_context)) as response:
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error Cannot create a client socket with a PROTOCOL_TLS_SERVER context (_ssl.c:801)>

I am able to circumvent this issue by downloading the required ANTLR jarfile and pointing to it using the --antlr argument.

Have the SSL certificates perhaps expired?
Is grammarinator not compatible with my version of Python?

How to keep spaces as part of the generated tests?

Is there any way to keep the spaces in the output? In our tests, it's generating invalid expressions.

We noticed in the example a space transformer, but we could not understand how to use it for our case.

undefined variables

Will the program generated by this tool have undefined variables?

Half the generated files are empty?

I'm using the grammar here and generated tests using the latest on master:

grammarinator-process  VerilogLexer.g4 VerilogParser.g4 -o .
grammarinator-generate VerilogGenerator.VerilogGenerator  --sys-path . -d 30 -n 10 -r source_text --serializer grammarinator.runtime.simple_space_serializer

I'm not sure I understand why half of the files generated have zero character length.

Random seed initialisation not working

When mutation and recombination modes are enabled, grammarinator-generate doesn't generate the same output consistently, despite setting --random-seed to a fixed number.

Here is the example command I am using:

grammarinator-generate JSONGenerator.JSONGenerator -r json -o tests/input.in -d 60 -n 1 --random-seed 4 --sys-path . -j=1 --population ../seeds/grts --no-generate

I used a JSON grammar to build JSONGenerator.

Fresh install on Ubuntu gives SyntaxError

uname -a
Linux hn0-sparkd 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

sudo grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 -o examples/fuzzer/
Traceback (most recent call last):
File "/usr/local/bin/grammarinator-process", line 11, in
load_entry_point('grammarinator==17.7.post0', 'console_scripts', 'grammarinator-process')()
File "/home/sshuser/.local/lib/python2.7/site-packages/pkg_resources/init.py", line 572, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/home/sshuser/.local/lib/python2.7/site-packages/pkg_resources/init.py", line 2769, in load_entry_point
return ep.load()
File "/home/sshuser/.local/lib/python2.7/site-packages/pkg_resources/init.py", line 2422, in load
return self.resolve()
File "/home/sshuser/.local/lib/python2.7/site-packages/pkg_resources/init.py", line 2428, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/local/lib/python2.7/dist-packages/grammarinator-17.7.post0-py2.7.egg/grammarinator/init.py", line 8, in
from . import runtime
File "/usr/local/lib/python2.7/dist-packages/grammarinator-17.7.post0-py2.7.egg/grammarinator/runtime/init.py", line 8, in
from .grammarinator import depthcontrol, Grammarinator, multirange_diff, printable_ascii_ranges, printable_unicode_ranges
File "/usr/local/lib/python2.7/dist-packages/grammarinator-17.7.post0-py2.7.egg/grammarinator/runtime/grammarinator.py", line 71
def init(self, *, max_cnt=8000):
^
SyntaxError: invalid syntax

[question] Evolutionary options

Hi Renata,

Last days I was running grammarinator with a JS grammar and works perfectly, thanks. Looking the last commits, I saw that there is an evolutionary settings to generate files through existents files. I tried to run it with "--population" args but the tool didn't recognize it.
There is any wiki, paper or something to explain how it works?

thanks

Does the grammar need to be targeted to Python?

It's clear in the documentation if the grammar must be written in Python. Our grammar uses semantic predicates (is it supported, right?), so we're getting errors for generating the tests.