renatahodovan / grammarinator Goto Github PK
View Code? Open in Web Editor NEWANTLR v4 grammar-based test generator
License: Other
ANTLR v4 grammar-based test generator
License: Other
Hi,
I'm exploring the use of grammarinator for a project.
I'd like to generate an alternation for which the order does not matter, but each alternative should be produced at most once. I want to avoid listing all the N!
possibilities for N
alternatives.
I noticed that grammarinator has some support for ANTLR's actions, and I got as far as defining the following grammar (the number of alternatives is larger than 3 in my real use case), which produces syntactically correct Python code. Generation fails with a ValueError: Total of weights must be greater than zero
, which makes sense as well.
entry [x=1,y=1,z=1]
: component[x,y,z] EOF
;
component [x,y,z]
: {x}? x_component (',' component[0,y,z])*
| {y}? y_component (',' component[x,0,z])*
| {z}? z_component (',' component[x,y,0])*
;
Is there a way to accomplish what I want to do without listing the alternatives? I thought of guarding the (',' component[...])*
with {y+z>0}?
(and similar).
I have already written a custom generator that overrides the code-generated one that does what I want, but I'm not sure how well that will play with mutations/transformations, yet. I'd be interested if you have thoughts on this as well.
Thanks,
Matthias
exec below:
grammarinator-generate -l example/test/MySqlUnlexer.py -p example/test/MySqlUnparser.py -r selectStatement -n 100 -d 20 -o tests/data/test_%d.txt -t grammarinator.runtime.simple_space_transformer
return some data
( ( SELECT SQL_CALC_FOUND_ROWS * ORDER BY ! !
๐ DESC , @JTOV2B2 :=
๐๐ฑ๐ฅณ NOT RLIKE @97 > @"" :=
เฝเฆฒื IS NOT \N ) ) UNION ( SELECT * ORDER BY @M6 NOT REGEXP
๐ RLIKE @8 :=
เฎ LIKE @'๊' := @.$ IS UNKNOWN DESC , FALSE OR @`` :=
๐ ๐ดเบ๐ฒ | | 2 < ALL ( ( SELECT * ) ) IS FALSE LIMIT @@`` , @@I ) ORDER BY @'' := 0 NOT REGEXP @"" := NULL NOT RLIKE @@45T LIKE NULL NOT REGEXP @@LWD REGEXP @VF NOT RLIKE @'' := \N < ANY ( ( ( ( SELECT STRAIGHT_JOIN * ) ) ) LOCK IN SHARE MODE ) BETWEEN @`` := @@RT < = SOME ( ( SELECT SQL_CACHE * ) LOCK IN SHARE MODE ) < = @'''\Z' := @@K. IS NOT NULL AND NULL SOUNDS LIKE @'' := NOT NULL < = NOT NULL NOT BETWEEN @_ AND @"" := @W IS FALSE DESC , BINARY CURRENT_TIME < = SOME ( ( ( SELECT SQL_CALC_FOUND_ROWS HIGH_PRIORITY SQL_BIG_RESULT CURDATE ( ) LIMIT 0 , 2 INTO
เญ , @. ) ) FOR UPDATE ) IS NOT TRUE
Some special symbols appear,Is there any way to specify the resulting code?
please help me
We are facing the following error trying to run the command line tool on a Mac 10.12.5, using Python 3.6 (all files are encoded in UTF8):
grammarinator-process Lexer.g4 Parser.g4 -o output/
Traceback (most recent call last):
File "/usr/local/bin/grammarinator-process", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python3.6/site-packages/grammarinator/process.py", line 488, in execute
FuzzerFactory(args.out, args.antlr).generate_fuzzer(args.grammars, args.actions, args.out, args.pep8)
File "/usr/local/lib/python3.6/site-packages/grammarinator/process.py", line 403, in generate_fuzzer
root, grammar_parser = self.parse(grammar)
File "/usr/local/lib/python3.6/site-packages/grammarinator/process.py", line 447, in parse
current_root, current_parser = self.parse_single(grammar)
File "/usr/local/lib/python3.6/site-packages/grammarinator/process.py", line 435, in parse_single
token_stream = CommonTokenStream(self.lexer(FileStream(grammar)))
File "/usr/local/lib/python3.6/site-packages/antlr4/FileStream.py", line 20, in __init__
super().__init__(self.readDataFrom(fileName, encoding))
File "/usr/local/lib/python3.6/site-packages/antlr4/FileStream.py", line 27, in readDataFrom
return codecs.decode(bytes, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9204: ordinal not in range(128)
When I use the latest version of grammarinator, there is no parameter - p for grammarinator-generate
Hi,
Grammarinator looks very nice and I plan to use it in a compiler course I am developing so that everyone learns about grammar-based fuzzing.
Given a simple grammar without actions (see below), running these two commands:
$ grammarinator-process ../antlr4/Kal.g4 --no-actions
$ grammarinator-generate -l KalUnlexer.py -p KalUnparser.py
which yields:
Test generation failed: name 'local_ctx' is not defined.
I'm using grammarinator 18.10 on Ubuntu 18.04
Thanks for developing this tool and for any help you can provide.
grammar Kal;
// program is a sequence of decls and definitions with one top level expr
program : (extern_decl | function)* expr
;
extern_decl : 'extern' prototype
;
prototype : ID '(' proto_args ')'
;
proto_args : ID (',' ID) *
;
function : 'def' prototype expr
;
//
// operator precedence in ANTLR4 comes from ordering
//
// # creates a named alternative that can be accessed in visitor
//
// factoring out binary expressions (which is nice for visitors)
// leads to mutual left recursive rules which ANTLR4 doesn't handle
//
expr : SUB expr #unaryMinusExpr
| expr op=(MUL | DIV) expr #multiplicativeExpr
| expr op=(ADD | SUB) expr #additiveExpr
| expr op=(LT | LTE | GT | GTE) expr #relationalExpr
| expr op=(EQ | NE) expr #equalityExpr
| atom #atomExpr
;
atom : ite #iteExpr
| call #callExpr
| '(' expr ')' #parenExpr
| NUM #numExpr
| ID #idExpr
;
ite : 'if' expr 'then' expr 'else' expr
;
call : ID '(' call_args ')'
;
call_args : expr (',' expr)*
;
// Lexer
MUL : '*' ;
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
LT : '<' ;
LTE : '<=' ;
GT : '>' ;
GTE : '>=' ;
EQ : '==' ;
NE : '!=' ;
NUM : [0-9]+ ;
ID : [a-zA-Z_][a-zA-Z0-9_]* ;
// "-> skip" defines a lexical action which skips the matched characters
WS : [ \t\n]+ -> skip ;
COMMENT : '#' ~[\n]* -> skip ;
I've been having a bit of trouble getting the current string-value of a lex token. Here is what I'm currently doing:
from ExprGenerator import ExprGenerator
from grammarinator.runtime import *
import random
class MyExprGenerator(ExprGenerator):
def __init__(self, **kwargs):
super().__init__(**kwargs)
def Int(self, parent=None):
with RuleContext(self, UnlexerRule(name='Int', parent=parent)) as current:
UnlexerRule(src=str(random.random()), parent=current)
# such a hackish way to get the current value?
_current_val = "{}".format(current)
if float(_current_val) < 0.5: return self.Int(parent=parent)
return current
Is there a better way to get current
? I tried a few things, but I was basically getting a parent/child node or None
, for example:
>>> current.__dict__
# {'name': 'Int', 'parent': <grammarinator.runtime.tree.UnparserRule object at 0x107f0deb0>, 'children': [<grammarinator.runtime.tree.UnlexerRule object at 0x107f1ca90>], 'level': None, 'depth': None, 'src': None}
What's the suggested way to get the current
value? What I'm trying to do is validation, for example, make sure a number is > 0.5, which is pretty difficult to do directly from the antlr grammar, so I'm doing some additional validation in the listeners/generators. Thanks so much for your time and help!
David
It seems perhaps the format was updated to accept two different files, one for lexing and one for parsing. Running the default example I get:
LA-DEV-IM-MM:grammarinator david$ grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 \
> -o examples/fuzzer/
LA-DEV-IM-MM:grammarinator david$ grammarinator-generate HTMLCustomGenerator.HTMLCustomGenerator -r htmlDocument -d 20 \
> -o examples/tests/test_%d.html -n 100 \
> -s HTMLGenerator.html_space_serializer \
> --sys-path examples/fuzzer/
usage: grammarinator-generate [-h] -p FILE -l FILE [-r NAME]
[-t LIST [LIST ...]]
[--test-transformers LIST [LIST ...]]
[-d NUM] [-c NUM] [--population DIR]
[--no-generate] [--no-mutate]
[--no-recombine] [--keep-trees]
[-j NUM] [-o FILE] [--encoding ENC]
[--log-level LEVEL] [-n NUM]
[--sys-recursion-limit NUM] [--version]
grammarinator-generate: error: the following arguments are required: -p/--unparser, -l/--unlexer
Actually, I wasn't able to get this to work with the separated files (the tree itself would always error). Any feedback or update on how to use this would be great. Thanks
Hello, I run the example case for HTML and generated 100 HTMLs, but many labels had unprintable words, what should I do if I only want to generate printable words in labels, thanks!
I'm trying to understand how to run Grammarinator with a minimal knowledge beyond Linux/Bash/Antlr4. Unfortunately, I can't get far past the first two lines (one to build it after cloning, the other grammarinator-process
).
Following the instructions starting here and filling in the details as though it's written for GitHub Actions:
git clone https://github.com/renatahodovan/grammarinator.git
cd grammarinator
pip --version
# pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)
pip install .
mkdir x
cd x
cat << END > Expr.g4
grammar Expr;
file_ : expr EOF;
expr : expr ('*' | '/') expr | expr ('+' | '-') expr | '(' expr ')' | ('+' | '-')* atom ;
atom : INT;
INT : [0..9]+;
WS : [ \r\n\t] + -> channel(HIDDEN) ;
END
java --version
# openjdk 11.0.13 2021-10-19
# OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.20.04)
# OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
java -jar /mnt/c/Users/Kenne/Downloads/antlr-4.9.3-complete.jar Expr.g4
# (Antlr4 works.)
grammarinator-process Expr.g4 -o .
# (grammarinator-process works.)
ls -l ExprGenerator.py
# -rwxrwxrwx 1 ken ken 3808 Feb 15 18:29 ExprGenerator.py*
grammarinator-generate
# grammarinator-generate: error: the following arguments are required: NAME
grammarinator-generate ExprGenerator
# Traceback (most recent call last):
# File "/home/ken/.local/bin/grammarinator-generate", line 8, in <module>
# sys.exit(execute())
# File "/home/ken/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 295, in execute
# with Generator(generator=args.generator, rule=args.rule, out_format=args.out,
# File "/home/ken/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 72, in __init__
# self.generator_cls = import_object(generator) if generator else None
# File "/home/ken/.local/lib/python3.8/site-packages/inators/imp.py", line 24, in import_object
# return getattr(importlib.import_module('.'.join(steps[0:-1])), steps[-1])
# File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
# return _bootstrap._gcd_import(name[level:], package, level)
# File "<frozen importlib._bootstrap>", line 1011, in _gcd_import
# File "<frozen importlib._bootstrap>", line 950, in _sanity_check
# ValueError: Empty module name
grammarinator-generate ExprGenerator.ExprGenerator
# Traceback (most recent call last):
# File "/home/ken/.local/bin/grammarinator-generate", line 8, in <module>
# sys.exit(execute())
# File "/home/ken/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 295, in execute
# with Generator(generator=args.generator, rule=args.rule, out_format=args.out,
# File "/home/ken/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 72, in __init__
# self.generator_cls = import_object(generator) if generator else None
# File "/home/ken/.local/lib/python3.8/site-packages/inators/imp.py", line 24, in import_object
# return getattr(importlib.import_module('.'.join(steps[0:-1])), steps[-1])
# File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
# return _bootstrap._gcd_import(name[level:], package, level)
# File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
# File "<frozen importlib._bootstrap>", line 991, in _find_and_load
# File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
# ModuleNotFoundError: No module named 'ExprGenerator'
I just want to see what grammarinator-generate outputs. The instructions are inadequate.
I'm installing latest grammarinator on master and got this error when I want to generate json (I separate some rules from original json grammar file to the lexer and parser in order to successful generating output)
cityoflight:json cityoflight$ grammarinator-process JSONLexer.g4 JSONParser.g4 --no-actions -o output
-sh: grammarinator-process: command not found
cityoflight:json cityoflight$ source ~/.bash_profile
cityoflight:json cityoflight$ grammarinator-process JSONLexer.g4 JSONParser.g4 --no-actions -o output
Traceback (most recent call last):
File "/Users/cityoflight/Library/Python/3.9/bin/grammarinator-process", line 5, in <module>
from grammarinator.process import execute
File "/Users/cityoflight/Library/Python/3.9/lib/python/site-packages/grammarinator/__init__.py", line 11, in <module>
from .process import FuzzerFactory
File "/Users/cityoflight/Library/Python/3.9/lib/python/site-packages/grammarinator/process.py", line 28, in <module>
from .parser import ANTLRv4Lexer, ANTLRv4Parser
File "/Users/cityoflight/Library/Python/3.9/lib/python/site-packages/grammarinator/parser/__init__.py", line 8, in <module>
from .ANTLRv4Lexer import ANTLRv4Lexer
ModuleNotFoundError: No module named 'grammarinator.parser.ANTLRv4Lexer'
But when I uninstall latest one and using old one by typing pip3 install grammarinator
I'm successfully generating json output
Hi,
The latest release on pip
(19.3) is currently out of date with master
and is missing a few features.
Also, this project looks really cool!
EDIT: The documentation explicitly says python3, my mistake to run simple python setup.py install
on my guest OS without realizing by default uses python2 and that's why I had syntax errors.
You can always add a shebang #!/usr/bin/env python3
to avoid this, but in this case it was my mistake.
Thanks for your projects and keep up the good work.
Grammarinator generating incorrect code with some grammars. In particular, when a non-terminal is in the definition of a token. When generating the code with grammarinator-process
, no error is thrown. Only when generating inputs with grammarinator-generate
,does the error appear.
The bug can be reproduced with the following grammar:
json
: token
;
token
: NUMBER ;
NUMBER
: '-'? INT ('.' [0-9] +)? token?
;
fragment INT
: '0' | [1-9] [0-9]*
;
After running grammarinator-process
and grammarinator-generate
. The error message thrown is:
Traceback (most recent call last):
File "/home/bachir/.local/bin/grammarinator-generate", line 8, in <module>
sys.exit(execute())
File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 281, in execute
with Generator(unlexer_path=args.unlexer, unparser_path=args.unparser, rule=args.rule, out_format=args.out,
File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 71, in __init__
self.unlexer_cls = import_entity('.'.join([unlexer, unlexer]))
File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 59, in import_entity
return getattr(importlib.import_module('.'.join(steps[0:-1])), steps[-1])
File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 844, in exec_module
File "<frozen importlib._bootstrap_external>", line 981, in get_code
File "<frozen importlib._bootstrap_external>", line 911, in source_to_code
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/bachir/grammar-mutation/json/grammarinator-bugs/13/generate/JSONUnlexer.py", line 42
return current
^
IndentationError: expected an indented block
When looking at the file JSONUnlexer.py
, we see the following invalid code:
if self.unlexer.max_depth >= 0:
for _ in self.zero_or_one():
return current
While, as described in #40, grammarinator-process
failed for the SQLite and MySQL grammars, it processed the grammars of PostgreSQL without any issues.
However, executing grammarinator-generate PostgreSQLGenerator.PostgreSQLGenerator -r root -d 5 -o test%d.sql -n 100 --sys-path .
in the directory of PostgreSQLGenerator.py
resulted in errors.
First, I found that grammarinator included a multi-line comment starting with /*
from the grammar, which is not valid in Python and caused it to fail with an error:
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "./PostgreSQLGenerator.py", line 22
/* This field stores the tags which are used to detect the end of a dollar-quoted string literal.
^
After fixing this in the generated code, a follow-up failure with an invalid indent appeared:
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "./PostgreSQLGenerator.py", line 15292
ParseRoutineBody(_localctx);
^
By reducing the indent, the failure disappeared, but the next failure appeared which I did not investigate:
File "./PostgreSQLGenerator.py", line 11, in <module>
from PostgreSQLParserBase import PostgreSQLParserBase
ModuleNotFoundError: No module named 'PostgreSQLParserBase'
I was trying to generate json files using the --population
option and the antlr json grammar. The grt files have been prepared with no problem. But I get an error when I run the following command:
grammarinator-generate JSONGenerator.JSONGenerator -r json -o tests/test_%d.json -d 20 -n 1 --population ../seeds/grts --no-generate --sys-path .
The error output:
Test generation failed.
Traceback (most recent call last):
File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 119, in create_new_test
tree = generator(self.rule, self.max_depth)
File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 197, in recombine
options = self.default_selector(node for rule_name in common_types for node in tree_1.node_dict[rule_name])
File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 213, in default_selector
return [node for node in iterable if node.name is not None and node.parent is not None and node.name != 'EOF' and node.level + min_depth(node) < self.max_depth]
File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 213, in <listcomp>
return [node for node in iterable if node.name is not None and node.parent is not None and node.name != 'EOF' and node.level + min_depth(node) < self.max_depth]
File "/home/bachir/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 211, in min_depth
return getattr(getattr(self.generator_cls, node.name), 'min_depth', 0)
AttributeError: type object 'JSONGenerator' has no attribute '<INVALID>'
The error does not occur if option --population
is omitted. In addition, the seed inputs have been successfully parsed into grt files. The error occurs with any provided grt file.
Any idea what could be the cause of this?
Hi, thanks for your awesome tool!
I am wondering how grammarianor deals with the C/Cpp antlr-v4 file. Specifically, CPP.14.g4 is not separated into Lexer and Parser. Is grammarinator can also work ok in this situation? How can I get test cases from a single CPP14.g4 file?
Any suggestions are welcome, thank you very much!
I am trying to use option --random-seed
in order to be able to generate the same input when given the same seed number.
However, it's not working, as the generated inputs differ.
I have experimented with multiple grammars, including the json antlr grammar. The following is the command I am using:
grammarinator-generate JSONGenerator.JSONGenerator -r json -o tests/test_1.json -n 1 --random-seed 22 --sys-path .
The versions I tried:
grammarinator-generate 19.3.post125+geecdcb7
grammarinator-generate 19.3.post136+g350ce45
Hey,
I'm writing a converter from ANTLR to another format. To test this converter I used Grammarinator and a parser generator to fuzz my conversion and verify if it can parse every word.
I have encountered the following problem:
When using the NOT feature of ANTLR, Grammarinator generates invalid words.
Consider the following ANTLR grammar not.g4
grammar not;
start: '>' No_comma '<';
No_comma: ~',';
Now all possible words should come out like >a<
or >6<
, but the word >,<
is not valid because we don't allow a comma with the NOT feature.
I tested different variants for the No_comma
rule to match the ANTLR documentation both as lexer and parser rules, in every run Grammarinator still outputs invalid words.
Finding this error took me a lot of time, because I first looked for the error in my converter. Even if it is not fixed, it would be very nice to confirm it so that I can be sure.
Used versions:
Grammarinator19.3
antlr-4.9.3-complete.jar
Python 3.7
Thanks a lot,
38b394ce01
It is not possible to use grammars that contain any Python keywords as rule name.
Consider the following ANTLR grammar:
grammar keyword;
start: False;
False: 'anything';`
This wil procude the keywordGenerator.py with the following function:
@depthcontrol
def False(self, parent=None):
current = UnlexerRule(name='False', parent=parent)
self._enter_rule(current)
UnlexerRule(src='anything', parent=current)
self._exit_rule(current)
return current
False.min_depth = 0
Because False
is a reserved keyword in Python this is not valid Python code.
The bug occurs with every keyword.
Hi @renatahodovan!
Firs of all, thank you very much for sharing this incredible tool with us! It's really helpful!
We realized that the tool does not support specifying the import dir for grammars which import files are located in a different directory. To give you a context, our project structure looks like this:
grammars
base
BaseParser.g4
BaseLexer.g4
java
JavaParser.g4
JavaLexer.g4
...
As our grammar use semantic predicates, we've split them to keep the target language code out of the base grammar. This approach allows us reusing the grammar for generating parser for different target languages without duplicating any rule. So, is it possible to introduce a new argument to specify where the generator should look when importing files, exactly as the Antlr command line tool does?
I am trying to test grammars-v4/verilog/verilog/ using Grammarinator. But, I'm getting problems in parsing some generated output. When I look at output from Trees.print(), the tree doesn't seem to contain all the tokens or sometimes more tokens that aren't in the printed tree.
Here is the code that I am executing:
git clone https://github.com/antlr/grammars-v4.git
cd grammars-v4
git checkout ffecfeee601ffc75edbc52845c1509753d6dd4a1
cd verilog/verilog
# Already cloned and build grammarinator from sources.
grammarinator-process VerilogLexer.g4 VerilogParser.g4 -o .
grammarinator-generate VerilogGenerator.VerilogGenerator --sys-path . -d 15 -n 100 -r source_text --serializer grammarinator.runtime.simple_space_serializer --no-mutate --no-recombine
# Already built a standardized Antlr4 parser driver for the the grammar.
for i in tests/test_*; do echo $i; ./Generated/bin/Debug/net5.0/Test.exe -file $i; status=$?; if [[ $status != 0 ]]; then break; fi; done
This loops through the various generated tests, parsing each, and stops the loop on a test file that does not parse.
I've assume that Grammarinator would construct a valid CST ("Unparser" tree) and output that. While most tests parse, some do not, and only appear when -d 15
is specified. I've included the --no-mutate
and --no-recombine
so that the tree is output as is unmodified.
To understand WHY the parse fails, I need to look at the CST constructed prior to serializing the token stream into a generated test. To do that, I modified generate.py after this line with this code:
print("Index = ")
print(index)
tree.print()
I now rerun the grammarinator-generate
command and save the human-readable parse trees, and rerun the parser.
Selecting a test that fails, I've noticed that the tree.print() output is not the same as the generated text, and the tokens reported by the standardized Antlr parser.
For example,
Output from tree.print():
...
VERTICAL_BAR
DOLLAR_RANDOM
COMMA
COMMA
SIMPLE_IDENTIFIER
...
Tokens recognized by parser:
...
VERTICAL_BAR
DOLLAR_RANDOM
COMMA
SIMPLE_IDENTIFIER
...
(Note, only one COMMA.)
Relevant sequence in generated file:
| $random , J
(Note, only one COMMA.)
I have noticed other times similar token differences. It seems that
Grammarinator indicates some tokens in the CST that are not being outputted.
Incidentally, I tried to just save the tree using --keep-trees but there is no tool to print out the trees after reading. I tried something like this, but it did not work.
from pydoc import importfile
module = importfile('/full/path/to/trees.py')
module.Trees.print(module.Trees.load("/full/path/to/test_xxx.grt"))
Heya, so I was testing out the Json grammer fuzzing and noticed alot of duplication in the output testcases. It may have been something I was doing wrong but I was following the instructions on the Readme.md.
Testcase Generation I was using (standard Unlexer/Unparser):
grammarinator-generate -l JSONUnlexer.py -p JSONUnparser.py -d 10 -n 1000000 -o json_fuzzer_test2
The above would finish incredibly fast at ~37 seconds for the 1 million testcases, which was awesome, but when testing the uniqueness for a smaller sample via "for i in `ls` ; do md5sum $i >> hashes.txt; done", I got the following:
~grammarinator/json_fuzz/# wc -l hashes.txt
133614 hashes.txt
~/grammarinator/json_fuzz/# cat hashes.txt | cut -d " " -f 1 | sort -u | wc -l
29594
Anyways, I wrote a patch for getting grammarinator-generate to produce unique testcases that can be found at the below. It's just a hack and could probably done better as to reduce the runtime cost. As it stands, the runtime is significantly increased, but it seems like the testcases are unique:
time grammarinator-generate -l JSONUnlexer.py -p JSONUnparser.py -d 10 -n 1000000 -o json_fuzzer_test2
real 41m58.709s
user 5m4.388s
sys 69m52.101s
/json_fuzzer_test2# cat hashes.txt | wc -l
76184
/json_fuzzer_test2# cat hashes.txt | cut -d " " -f 1 | sort -u | wc -l
76184
18,19d17
< import hashlib
<
21c19
< from multiprocessing import Pool, Manager, Lock
---
> from multiprocessing import Pool
56,59c54
< cleanup=True, encoding='utf-8', shared_dict={}, shared_lock = None):
<
< self.shared_dict = shared_dict
< self.shared_lock = shared_lock
---
> cleanup=True, encoding='utf-8'):
147a143,144
> with codecs.open(test_fn, 'w', self.encoding) as f:
> f.write(str(Generator.transform(tree.root, self.test_transformers)))
149,163c146
< output = str(Generator.transform(tree.root, self.test_transformers))
< output_hash = hashlib.md5(output.encode('utf-8')).digest()
<
< try:
< with self.shared_lock:
< _ = self.shared_dict[output_hash]
< return self.create_new_test()
< except KeyError:
< with self.shared_lock:
< self.shared_dict[output_hash] = 1
<
< with codecs.open(test_fn, 'w', self.encoding) as f:
< f.write(output)
<
< return test_fn, tree_fn
---
> return test_fn, tree_fn
302,305d284
< sync_manager = Manager()
< shared_dict_ = sync_manager.dict()
< lock = sync_manager.Lock()
<
310c289
< cleanup=False, encoding=args.encoding, shared_dict=shared_dict_, shared_lock = lock) as generator:
---
> cleanup=False, encoding=args.encoding) as generator:
Using the current HEAD of master and running the command:
grammarinator-process --no-actions Cypher.g4 --pep8 --encoding utf-8
After a few seconds, the following traceback is generated:
Traceback (most recent call last):
File "/home/tower-linux/.local/bin/grammarinator-process", line 11, in <module>
load_entry_point('grammarinator==19.3+53.gbfde275', 'console_scripts', 'grammarinator-process')()
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 748, in execute
FuzzerFactory(args.language, args.out, args.antlr).generate_fuzzer(args.grammar, options=options, encoding=args.encoding, lib_dir=args.lib, actions=args.actions, pep8=args.pep8)
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 656, in generate_fuzzer
graph = build_graph(self.antlr_parser_cls, actions, lexer_root, parser_root)
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 600, in build_graph
build_rules(root)
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 582, in build_rules
build_rule(*rule_args)
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 527, in build_rule
build_expr(node, rule.id)
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 522, in build_expr
build_expr(child, parent_id)
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 398, in build_expr
build_expr(children[0], parent_id)
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 426, in build_expr
build_expr(child, parent_id)
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 445, in build_expr
build_expr(node.children[0], parent_id)
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 497, in build_expr
ranges = lexer_charset_interval(str(node.LEXER_CHAR_SET())[1:-1])
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 362, in lexer_charset_interval
codepoint, offset = lexer_charset_char(s, offset)
File "/home/tower-linux/.local/lib/python3.8/site-packages/grammarinator-19.3+53.gbfde275-py3.8.egg/grammarinator/process.py", line 330, in lexer_charset_char
raise ValueError('Unicode properties (\\p{...}) are not supported')
ValueError: Unicode properties (\p{...}) are not supported
Cypher.g4 can be found here - https://gist.github.com/jeffreylovitz/25f6ed569e3fc474e1d360fff9407446
antlr4 supports extended unicode of the form \u{12345}, python3 however wants longer unicodes on the form \U00012345.
Resulting in the following exception:
Traceback (most recent call last):
File "/home/phasip/.local/bin/grammarinator-process", line 11, in <module>
sys.exit(execute())
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 709, in execute
FuzzerFactory(args.out, args.antlr).generate_fuzzer(args.grammars, encoding=args.encoding, lib_dir=args.lib, actions=args.actions, pep8=args.pep8)
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 629, in generate_fuzzer
for name, src in fuzzer_generator.generate(lexer_root, parser_root):
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 268, in generate
self.generate_grammar(root)
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 368, in generate_grammar
self.unlexer_body += self.generate_single(rule, None)
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 393, in generate_single
rule_code += self.generate_single(rule_block, rule_name)
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 578, in generate_single
return ''.join([self.generate_single(child, parent_id) for child in node.children])
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 578, in <listcomp>
return ''.join([self.generate_single(child, parent_id) for child in node.children])
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 425, in generate_single
return self.generate_single(children[0], parent_id)
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 475, in generate_single
return ''.join([self.generate_single(child, parent_id) for child in children])
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 475, in <listcomp>
return ''.join([self.generate_single(child, parent_id) for child in children])
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 497, in generate_single
return self.generate_single(node.children[0], parent_id)
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 553, in generate_single
ranges = self.lexer_charset_interval(str(node.LEXER_CHAR_SET())[1:-1])
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/process.py", line 235, in lexer_charset_interval
element = bytes(element, 'utf-8').decode('unicode_escape')
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 144-145: truncated \uXXXX escape
A ugly hack to fix this is to search and replace with regex, however this would fail on cases with an even number of slashes before the u , eg: "\u{10000}".
Sources:
https://github.com/antlr/antlr4/blob/master/doc/unicode.md
https://docs.python.org/3/library/codecs.html
Edit: As this seems very rare a very simple solution is to catch the UnicodeDecodeError and give the user a description of this problem and let them change their .g4 files manually.
Is there away to make sure that a rule gets covered at least once during the generation process?
For example if I want integers to be included in all of my generated json inputs. It would be nice if I can annotate the grammar such that the corresponding rule is always exercised by Grammarinator.
git clone https://github.com/renatahodovan/grammarinator
, f85b80ccd grammarinator && pip3 install .
mkdir lua-examples lua-fuzzer
curl -O https://raw.githubusercontent.com/antlr/grammars-v4/master/lua/Lua.g4
grammarinator-process Lua.g4 -o lua-fuzzer/ --pep8 -v
grammarinator-generate LuaGenerator.LuaGenerator -r chunk -d 20 -o lua-examples/test_%d.lua -n 100 -s grammarinator.runtime.simple_space_serializer --sys-path lua-fuzzer/
Syntax error in the generated file:
Traceback (most recent call last):
File "/home/sergeyb/.local/bin/grammarinator-generate", line 8, in <module>
sys.exit(execute())
File "/home/sergeyb/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 294, in execute
with Generator(generator=args.generator, rule=args.rule, out_format=args.out,
File "/home/sergeyb/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 72, in __init__
self.generator_cls = import_object(generator) if generator else None
File "/home/sergeyb/.local/lib/python3.8/site-packages/inators/imp.py", line 24, in import_object
return getattr(importlib.import_module('.'.join(steps[0:-1])), steps[-1])
File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 844, in exec_module
File "<frozen importlib._bootstrap_external>", line 981, in get_code
File "<frozen importlib._bootstrap_external>", line 911, in source_to_code
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/sergeyb/sources/MRG/tarantool/grammarinator/lua-fuzzer/LuaGenerator.py", line 988
elif choice == 1:
^
SyntaxError: invalid syntax
Seems indentation is wrong and patch below fixes a problem:
--- lua-fuzzer/LuaGenerator.py.orig 2022-07-04 15:52:37.284564569 +0300
+++ lua-fuzzer/LuaGenerator.py 2022-07-04 15:52:56.883836234 +0300
@@ -985,7 +985,8 @@
choice = self._model.choice(current, 0, [0 if [
0, 0, 0, 0][i] > self._max_depth else w for i, w in enumerate([1, 1, 1, 1])])
if choice == 0:
- elif choice == 1:
+ pass
+ elif choice == 1:
UnlexerRule(src='[', parent=current)
if self._max_depth >= 0:
for _ in self._model.quantify(current, 0, min=0, max=inf):
Awesome tool, really useful. Thanks!
I read the README and accompanying paper but failed to realise that random outputs may be chosen which would be rejected by a corresponding ANTLR generated lexer. The unlexer doesn't capture the rules implied by the ordering in the ANTLR lexer file.
Here's a simple example:
parser grammar ExampleParser;
options {
tokenVocab = ExampleLexer;
}
session: command ARG EOF;
command: A | B;
lexer grammar ExampleLexer;
A: 'a';
B: 'b';
ARG: [a-z];
WS: [ \t\u000C\r\n]+ -> channel(HIDDEN);
Inputs such as a a
and a b
would not be accepted by the ANTLR generated lexer. The matching rules are such that the first rule (reading the lexer file from start to end) which matches is selected so an "ARG" can be any letter a-z as long as it doesn't match "A" or "B" i.e. [c-z]. "ExampleGenerator" has the following code showing that it'll generate an "ARG" in the range [a-z]:
@depthcontrol
def ARG(self, parent=None):
current = UnlexerRule(name='ARG', parent=parent)
self.enter_rule(current)
UnlexerRule(src=self.model.charset(current, 0, self._charsets[1]), parent=current)
self.exit_rule(current)
return current
ARG.min_depth = 0
...
_charsets = {
0: list(chain.from_iterable([range(32, 127)])),
1: list(chain.from_iterable([range(97, 123)])),
2: list(chain.from_iterable([range(9, 10), range(10, 11), range(12, 13), range(13, 14), range(32, 33)])),
}
In a more realistic setting, fuzzing languages like Lua where variable names cannot be keywords like "or", "and" etc (and this is captured in the order of rules in the lexer) will require a few tweaks to avoid wasting time fuzzing uninteresting parts of the language runtime. I opted to override unlexer methods so that names are chosen from a pool which won't collide with reserved words.
The README and paper cover fixing the random outputs to meet semantic requirements really well. I noticed the above when I saw coverage in lexer error paths so maybe it's subtle enough to document? Could be that I'm being dense: happy either way.
The generate function often crashes due to recursion depth. A way to control recursion depth with grammarinator-generate would be a useful feature. Any options for controlling it? Thanks
Using the antlr grammar I tried to set it up: The grammarinator-process generates references to JavaScriptBaseLexer class:
Any help would be great:)
grammarinator-process ../grammars-v4/javascript/javascript/JavaScriptLexer.g4 ../grammars-v4/javascript/javascript/JavaScriptParser.g4 -o testout
grammarinator-generate -l testout/JavaScriptUnlexer.py -p testout/JavaScriptUnparser.py -r htmlDocument -o examples/tests/test_%d.js -n 100 -d 20
Traceback (most recent call last):
File "/home/detlef/tmp/fuzz/venv/bin/grammarinator-generate", line 11, in <module>
load_entry_point('grammarinator==19.3+15.g6f43afe.d20200213', 'console_scripts', 'grammarinator-generate')()
File "/home/detlef/tmp/fuzz/venv/lib/python3.5/site-packages/grammarinator-19.3+15.g6f43afe.d20200213-py3.5.egg/grammarinator/generate.py", line 293, in execute
cleanup=False, encoding=args.encoding) as generator:
File "/home/detlef/tmp/fuzz/venv/lib/python3.5/site-packages/grammarinator-19.3+15.g6f43afe.d20200213-py3.5.egg/grammarinator/generate.py", line 69, in __init__
self.unlexer_cls = import_entity('.'.join([unlexer, unlexer]))
File "/home/detlef/tmp/fuzz/venv/lib/python3.5/site-packages/grammarinator-19.3+15.g6f43afe.d20200213-py3.5.egg/grammarinator/generate.py", line 57, in import_entity
return getattr(importlib.import_module('.'.join(steps[0:-1])), steps[-1])
File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 665, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "testout/JavaScriptUnlexer.py", line 10, in <module>
from JavaScriptBaseLexer import JavaScriptBaseLexer
ImportError: No module named 'JavaScriptBaseLexer'
We're facing the following error when generating our grammar:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/usr/local/lib/python3.6/site-packages/grammarinator/generate.py", line 21, in generate
root = getattr(parser_cls(lexer_cls()), rule)()
File "output/BaseCclUnparser.py", line 17, in domain
current += self.lexer.WHEN()
AttributeError: 'TestUnlexer' object has no attribute 'WHEN'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/grammarinator-generate", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python3.6/site-packages/grammarinator/generate.py", line 74, in execute
pool.starmap(generate, [(lexer_cls, parser_cls, args.rule, transformers, args.out % i) for i in range(args.n)])
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 268, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
raise self._value
AttributeError: 'TestUnlexer' object has no attribute 'WHEN'
Parser:
parser grammar TestParser;
options {
tokenVocab=TestLexer;
}
domain: WHEN ;
Lexer:
lexer grammar TestLexer;
WHEN: W H E N;
// Letters
A: [Aa];
B: [Bb];
C: [Cc];
D: [Dd];
E: [Ee];
F: [Ff];
G: [Gg];
H: [Hh];
I: [Ii];
J: [Jj];
K: [Kk];
L: [Ll];
M: [Mm];
N: [Nn];
O: [Oo];
P: [Pp];
Q: [Qq];
R: [Rr];
S: [Ss];
T: [Tt];
U: [Uu];
V: [Vv];
W: [Ww];
X: [Xx];
Y: [Yy];
Z: [Zz];
Our grammar is case insensitive, hence most tests look like the following:
aBc mAtcHes CdA
AbC mAtcHes cDa
However, it isn't relevant for us, so we've created a transformer that converts all UnlexerRule
's to lowercase. The problem is that now more than 50% of the tests are duplicated.
Do you have any idea of how to handle this?
I'm following the instructions in [README](grammarinator-process PhpLexer.g4 PhpParser.g4 -o out)
Download PhpLexer.g4 and PhpParser.g4 from grammars-v4, then
$ grammarinator-process PhpLexer.g4 PhpParser.g4 -o out
$ tree out
โโโ PhpUnlexer.py
โโโ PhpUnparser.py
I've also downloaded grammars-v4 PhpLexerBase.py and placed it to out/
.
Now when I try to generate:
$ grammarinator-generate -p out/PhpUnparser.py -l out/PhpUnlexer.py -o generated_%d -j 1
Traceback (most recent call last):
File "/home/user/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 129, in create_new_test
tree = generator(self.rule, self.max_depth)
File "/home/user/.local/lib/python3.8/site-packages/grammarinator/generate.py", line 167, in generate
unlexer = self.unlexer_cls(**dict(self.unlexer_kwargs, max_depth=max_depth))
File "/home/user/userspace/php-gramm/out/PhpUnlexer.py", line 47, in __init__
super(PhpUnlexer, self).__init__()
TypeError: __init__() missing 1 required positional argument: 'input'
I tried to manually fix this, by figuring out what the hierarchy is of grammars-v4 PhpLexerBase
which extends Lexer
class. but it leads to other non-defined fields in PhpUnlexer.py
. Seems, this integration is not working, or I'm missing something.
Please assist
Sabr
I would like to know how to invoke grammarinator-generate
from a Python script. Instead of CLI.
Are there any documentations or examples please?
I cannot process javascript grammar because it return error. This is grammarinator version I'm using grammarinator-process 19.3.post79+g820a01a.d20210726
Before processing I'm removing options { superClass=JavaScriptLexerBase; }
in JavascriptLexer.g4 and options { tokenVocab=JavaScriptLexer; superClass=JavaScriptParserBase; }
in JavascriptParser.g4
Then I run grammarinator-process JavaScriptLexer.g4 JavaScriptParser.g4 --no-actions -o output
command
javascript cityoflight77$ grammarinator-process JavaScriptLexer.g4 JavaScriptParser.g4 --no-actions -o output
Traceback (most recent call last):
File "/Users/cityoflight77/anaconda3/bin/grammarinator-process", line 8, in <module>
sys.exit(execute())
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 743, in execute
FuzzerFactory(args.language, args.antlr, args.out).generate_fuzzer(args.grammar, options=options, encoding=args.encoding, lib_dir=args.lib, actions=args.actions, pep8=args.pep8)
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 650, in generate_fuzzer
graph = build_graph(actions, lexer_root, parser_root)
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 602, in build_graph
build_rules(root)
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 584, in build_rules
build_rule(*rule_args)
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 529, in build_rule
build_expr(node, rule.id)
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 524, in build_expr
build_expr(child, parent_id)
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 411, in build_expr
build_expr(child, alternative_id)
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 428, in build_expr
build_expr(child, parent_id)
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 447, in build_expr
build_expr(node.children[0], parent_id)
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 499, in build_expr
ranges = lexer_charset_interval(str(node.LEXER_CHAR_SET())[1:-1])
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 364, in lexer_charset_interval
codepoint, offset = lexer_charset_char(s, offset)
File "/Users/cityoflight77/anaconda3/lib/python3.6/site-packages/grammarinator/process.py", line 332, in lexer_charset_char
raise ValueError('Unicode properties (\\p{...}) are not supported')
ValueError: Unicode properties (\p{...}) are not supported
How do I generate a ? None of the tests have whitespace and the input often gives invalid characters.
I got the g4 for https://github.com/antlr/grammars-v4/blob/master/javascript/javascript/
grammarinator-process JavaScriptParser.g4 JavaScriptLexer.g4 -o out --no-actions
grammarinator-generate -p out/JavaScriptUnparser.py -l out/JavaScriptUnlexer.py -r program -d 30 -n 3000
MacOS Python3.9
What is a generator do I need to write a generator program?
How do I write a transformer or change the grammar to add spaces? (Looking at the closed issue this is the solution but I have no idea what a transformer is).
After manually fixing issues due to #15 the grammar in https://github.com/chungkwong/fooledit/tree/490c4bc0a4ba6ceec3ac0c4cd1947a54e397ef34/mode.xml/src/main/antlr4/cc/fooledit/editor/text/mode/xml i receive this error:
Data: <function XMLUnparser.choice at 0x7f26101f06a8>, ([1, 1],), {}
Test generation failed.
Traceback (most recent call last):
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/generate.py", line 129, in create_new_test
tree = generator(self.rule, self.max_depth)
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/generate.py", line 168, in generate
tree = Tree(getattr(self.unparser_cls(unlexer) if rule[0].islower() else unlexer, rule)())
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/runtime/grammarinator.py", line 61, in controlled_fn
result = fn(obj, *args, **kwargs)
File "XMLUnparser.py", line 21, in document
current += self.element()
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/runtime/grammarinator.py", line 61, in controlled_fn
result = fn(obj, *args, **kwargs)
File "XMLUnparser.py", line 244, in element
choice = self.choice([0 if [2, 2][i] > self.unlexer.max_depth else w * self.unlexer.weights.get(('alt_147', i), 1) for i, w in enumerate([1, 1])])
File "/home/phasip/.local/lib/python3.6/site-packages/grammarinator/runtime/grammarinator.py", line 61, in controlled_fn
result = fn(obj, *args, **kwargs)
TypeError: choice() takes 1 positional argument but 2 were given
Note: The data line comes from a print added by me on grammarinator.py line 60: print("Data: %s, %s, %s"%(fn,args,kwargs),file=sys.stderr)
Content of XMLUnparser.py choice function:
@depthcontrol
def choice(self):
current = self.create_node(UnparserRule(name='choice'))
current += self.create_node(UnlexerRule(src='('))
if self.unlexer.max_depth >= 3:
for _ in self.zero_or_one():
current += self.s()
current += self.cp()
if self.unlexer.max_depth >= 0:
for _ in self.one_or_more():
if self.unlexer.max_depth >= 3:
for _ in self.zero_or_one():
current += self.s()
current += self.create_node(UnlexerRule(src='|'))
if self.unlexer.max_depth >= 3:
for _ in self.zero_or_one():
current += self.s()
current += self.cp()
if self.unlexer.max_depth >= 3:
for _ in self.zero_or_one():
current += self.s()
current += self.create_node(UnlexerRule(src=')'))
return current
choice.min_depth = 2
Hi @renatahodovan!
I'm facing the following error when trying to run poetry build
in the root dir of grammarinator
:
==> Starting build()...
PyProjectException
[tool.poetry] section not found in /home/noptrix/blackarch/repos/blackarch/packages/grammarinator/src/grammarinator/pyproject.toml
at /usr/lib/python3.10/site-packages/poetry/core/pyproject/toml.py:56 in poetry_config
52โ def poetry_config(self): # type: () -> Optional[TOMLDocument]
53โ if self._poetry_config is None:
54โ self._poetry_config = self.data.get("tool", {}).get("poetry")
55โ if self._poetry_config is None:
โ 56โ raise PyProjectException(
57โ "[tool.poetry] section not found in {}".format(self._file)
58โ )
59โ return self._poetry_config
60โ
==> ERROR: A failure occurred in build().
Aborting...
Any ideas here? Here is the PKGBUILD to build grammarinator
under ArchLinux.
Versions of packages:
โค pip install --user grammarinator
Requirement already satisfied: grammarinator in /home/user/.local/lib/python3.8/site-packages (19.3)
Requirement already satisfied: antlerinator==4.7.1-1 in /home/user/.local/lib/python3.8/site-packages (from grammarinator) (4.7.1.post1)
Requirement already satisfied: autopep8 in /home/user/.local/lib/python3.8/site-packages (from grammarinator) (1.5.4)
Requirement already satisfied: antlr4-python3-runtime==4.7.1 in /home/user/.local/lib/python3.8/site-packages (from antlerinator==4.7.1-1->grammarinator) (4.7.1)
Requirement already satisfied: pycodestyle>=2.6.0 in /home/user/.local/lib/python3.8/site-packages (from autopep8->grammarinator) (2.6.0)
Requirement already satisfied: toml in /home/user/.local/lib/python3.8/site-packages (from autopep8->grammarinator) (0.10.1)
I have lexer rules:
STRING_LITERAL: QUOTE_SINGLE ( ~([\\']) | (BACKSLASH .) )* QUOTE_SINGLE;
BACKSLASH: '\\';
QUOTE_SINGLE: '\'';
And a parser rule (one of alternatives):
TRIM LPAREN (BOTH | LEADING | TRAILING) STRING_LITERAL FROM columnExpr RPAREN
I run generator like this:
grammarinator-generate -r queryList -o /tmp/sql_test_%d.sql -n 100 -c 0.3 -d 20 -p Unparser.py -l Unlexer.py --test-transformers SpaceTransformer.single_line_whitespace
And sometimes get the following output (partial):
tRIM ( bOTH ''' FrOM ( *
I don't understand why there are triple single-quotes here. Looks like a bug.
I just tried grammarinator on the SQLite and MySQL grammars from https://github.com/antlr/grammars-v4/tree/master/sql (commit 8dca3622acbea8fce8726c73364af232cb6eacce), but for both of them, the latest version of grammarinator-process
failed.
grammarinator-process SQLiteLexer.g4 SQLiteParser.g4 -o out/
Traceback (most recent call last):
File "/home/manuel/.local/bin/grammarinator-process", line 8, in <module>
sys.exit(execute())
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 743, in execute
FuzzerFactory(args.language, args.antlr, args.out).generate_fuzzer(args.grammar, options=options, encoding=args.encoding, lib_dir=args.lib, actions=args.actions, pep8=args.pep8)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 650, in generate_fuzzer
graph = build_graph(actions, lexer_root, parser_root)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 604, in build_graph
graph.calc_min_depths()
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 236, in calc_min_depths
assert all(min_depths[node.id] < inf for node in self.vertices[ident].out_neighbours), '{ident} has an alternative that isn\'t reachable.'.format(ident=ident)
AssertionError: 739 has an alternative that isn't reachable.
grammarinator-process MySqlLexer.g4 MySqlParser.g4
Traceback (most recent call last):
File "/home/manuel/.local/bin/grammarinator-process", line 8, in <module>
sys.exit(execute())
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 743, in execute
FuzzerFactory(args.language, args.antlr, args.out).generate_fuzzer(args.grammar, options=options, encoding=args.encoding, lib_dir=args.lib, actions=args.actions, pep8=args.pep8)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 650, in generate_fuzzer
graph = build_graph(actions, lexer_root, parser_root)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 602, in build_graph
build_rules(root)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 584, in build_rules
build_rule(*rule_args)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 529, in build_rule
build_expr(node, rule.id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 524, in build_expr
build_expr(child, parent_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 411, in build_expr
build_expr(child, alternative_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 415, in build_expr
build_expr(node.alternative(), parent_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 428, in build_expr
build_expr(child, parent_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 456, in build_expr
build_expr(node.children[0], quant_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 524, in build_expr
build_expr(child, parent_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 524, in build_expr
build_expr(child, parent_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 400, in build_expr
build_expr(children[0], parent_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 428, in build_expr
build_expr(child, parent_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 447, in build_expr
build_expr(node.children[0], parent_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 509, in build_expr
build_expr(child, parent_id)
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 513, in build_expr
graph.add_edge(frm=parent_id, to=str(node.TOKEN_REF()))
File "/home/manuel/.local/lib/python3.7/site-packages/grammarinator/process.py", line 215, in add_edge
assert to in self.vertices, '{to} not in vertices.'.format(to=to)
AssertionError: ADMIN not in vertices.
Is this expected?
In continuation of issue #5, we're still getting the following errors trying to generate tests for a grammar with recursive rules:
> grammarinator-process TestLexer.g4 TestParser.g4 -o test
> grammarinator-generate -l test/TestUnlexer.py -p test/TestUnparser.py -r domain -d 1 -n 1 -o output/tests/test_%d.ccl
domain cannot be generated within the given depth (min needed: 5).
> clean grammarinator-generate -l test/TestUnlexer.py -p test/TestUnparser.py -r domain -d 1 -n 5 -o output/tests/test_%d.ccl
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
> clean grammarinator-generate -l test/TestUnlexer.py -p test/TestUnparser.py -r domain -d 1 -n 10 -o output/tests/test_%d.ccl
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
domain cannot be generated within the given depth (min needed: 5).
Now seems the warning is reported up to the max depth.
I got error when running grammarinator-process. I'm using latest grammarinator grammarinator-19.3.post79+g820a01a.d20210726
. I installed grammarinator with pip3 install .
I'm using mac 10.14 with python 3.9.6
Traceback (most recent call last):
File "/usr/local/bin/grammarinator-process", line 5, in <module>
from grammarinator.process import execute
File "/usr/local/lib/python3.9/site-packages/grammarinator/__init__.py", line 11, in <module>
from .process import FuzzerFactory
File "/usr/local/lib/python3.9/site-packages/grammarinator/process.py", line 29, in <module>
from .parser import ANTLRv4Lexer, ANTLRv4Parser
File "/usr/local/lib/python3.9/site-packages/grammarinator/parser/__init__.py", line 8, in <module>
from .ANTLRv4Lexer import ANTLRv4Lexer
ModuleNotFoundError: No module named 'grammarinator.parser.ANTLRv4Lexer'
Is there a way to assign weights/probabilities to grammar alternatives?
When executing grammarinator-process
without the --antlr
argument, grammarinator requests antlerinator to download the ANTLR jarfile to ~/.antlerinator/antlr-4.7.1-complete.jar
. This fails on my machine with an SSLError
and the following stacktrace.
Traceback (most recent call last):
File "/usr/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/usr/lib/python3.10/http/client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
self.send(msg)
File "/usr/lib/python3.10/http/client.py", line 975, in send
self.connect()
File "/usr/lib/python3.10/http/client.py", line 1454, in connect
self.sock = self._context.wrap_socket(self.sock,
File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "/usr/lib/python3.10/ssl.py", line 1062, in _create
self._sslobj = self._context._wrap_socket(
ssl.SSLError: Cannot create a client socket with a PROTOCOL_TLS_SERVER context (_ssl.c:801)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<redacted>/.venv/bin/grammarinator-process", line 33, in <module>
sys.exit(load_entry_point('grammarinator==19.3', 'console_scripts', 'grammarinator-process')())
File "<redacted>/.venv/lib/python3.10/site-packages/grammarinator/process.py", line 708, in execute
antlerinator.install(lazy=True)
File "<redacted>/.venv/lib/python3.10/site-packages/antlerinator/install.py", line 47, in install
with contextlib.closing(urlopen(tool_url, context=ssl_context)) as response:
File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/usr/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error Cannot create a client socket with a PROTOCOL_TLS_SERVER context (_ssl.c:801)>
I am able to circumvent this issue by downloading the required ANTLR jarfile and pointing to it using the --antlr
argument.
Have the SSL certificates perhaps expired?
Is grammarinator not compatible with my version of Python?
Is there any way to keep the spaces in the output? In our tests, it's generating invalid expressions.
We noticed in the example a space transformer, but we could not understand how to use it for our case.
Will the program generated by this tool have undefined variables?
I'm using the grammar here and generated tests using the latest on master
:
grammarinator-process VerilogLexer.g4 VerilogParser.g4 -o .
grammarinator-generate VerilogGenerator.VerilogGenerator --sys-path . -d 30 -n 10 -r source_text --serializer grammarinator.runtime.simple_space_serializer
I'm not sure I understand why half of the files generated have zero character length.
When mutation and recombination modes are enabled, grammarinator-generate
doesn't generate the same output consistently, despite setting --random-seed
to a fixed number.
Here is the example command I am using:
grammarinator-generate JSONGenerator.JSONGenerator -r json -o tests/input.in -d 60 -n 1 --random-seed 4 --sys-path . -j=1 --population ../seeds/grts --no-generate
I used a JSON grammar to build JSONGenerator
.
uname -a
Linux hn0-sparkd 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
sudo grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 -o examples/fuzzer/
Traceback (most recent call last):
File "/usr/local/bin/grammarinator-process", line 11, in
load_entry_point('grammarinator==17.7.post0', 'console_scripts', 'grammarinator-process')()
File "/home/sshuser/.local/lib/python2.7/site-packages/pkg_resources/init.py", line 572, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/home/sshuser/.local/lib/python2.7/site-packages/pkg_resources/init.py", line 2769, in load_entry_point
return ep.load()
File "/home/sshuser/.local/lib/python2.7/site-packages/pkg_resources/init.py", line 2422, in load
return self.resolve()
File "/home/sshuser/.local/lib/python2.7/site-packages/pkg_resources/init.py", line 2428, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/local/lib/python2.7/dist-packages/grammarinator-17.7.post0-py2.7.egg/grammarinator/init.py", line 8, in
from . import runtime
File "/usr/local/lib/python2.7/dist-packages/grammarinator-17.7.post0-py2.7.egg/grammarinator/runtime/init.py", line 8, in
from .grammarinator import depthcontrol, Grammarinator, multirange_diff, printable_ascii_ranges, printable_unicode_ranges
File "/usr/local/lib/python2.7/dist-packages/grammarinator-17.7.post0-py2.7.egg/grammarinator/runtime/grammarinator.py", line 71
def init(self, *, max_cnt=8000):
^
SyntaxError: invalid syntax
Hi Renata,
Last days I was running grammarinator with a JS grammar and works perfectly, thanks. Looking the last commits, I saw that there is an evolutionary settings to generate files through existents files. I tried to run it with "--population" args but the tool didn't recognize it.
There is any wiki, paper or something to explain how it works?
thanks
It's clear in the documentation if the grammar must be written in Python. Our grammar uses semantic predicates (is it supported, right?), so we're getting errors for generating the tests.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.