Comments (6)
Hi @kaby76
It's not a bell curve but it's an (1/x)^n curve (in this case (1/2)^n), which is exactly what we expect from quantifiers by definition/implementation. The generation of quantifiers happens according to the following pseudo code:
source_text = UnparserRule(name='source_text')
while random_decision():
source_text += UnparserRule(name='description')
It means, that the probability of the generation of one description
is 1/2, for two descriptions
is (1/2)^2, for three is (1/2)^3, etc., i.e.; (1/2)^n, what your plot shows as well.
from grammarinator.
@kaby76 I was just about to leave a comment guiding you to models, if you wanted to tweak the "let's flip a coin" default approach. You can write your own decision model that has the same API as DefaultModel . Every random decision of the generated fuzzer (e.g., how to chose an alternative from A | B
or how many times to iterate over *
) actually happens here. And the default model can be replaced even from the command line using the -m
or --model
switch:
https://github.com/renatahodovan/grammarinator/blob/master/grammarinator/generate.py#L237-L238
As the documentation of models is incomplete (so to say), let me introduce quantify(self, node, idx, min, max)
. Whenever a quantifier is reached during test case generation, the model's quantify
method is called in a for loop. Actually, quantify
should be a generator and it should yield
as many times as the loop is expected to iterate. It is expected that it yields
between min
and max
times (inclusive). To help quantify
make the decision, the current node is passed as an argument, for which children are being generated; e.g., node.name
names the rule that is corresponding to the node in the grammar. Moreover, idx
is also passed as an argument, which uniquely identifies the quantifier within the rule. (E.g., in S: A* B?;
, *
has index 0, ?
has index 1.)
I know that the above is a bit brief, but I hope it helps.
BTW, there is also a subclass of DefaultModel
, called DispatchingModel. It simplifies tweaking the random decisions in some selected rules by writing methods named like quantify_<RULE>
. E.g., in your example:
class VerilogModel(grammarinator.runtime.DispatchingModel):
def quantify_source_text(self, node, idx, min, max):
yield
yield
yield
(And this would create test cases that always contained exactly three descriptions. The rest of the quantifiers would still use the flip-the-coin approach.)
from grammarinator.
I'm facing same issue with all grammars I tested they generated empty files, but I don't know it's intended or not.
from grammarinator.
Hi @kaby76 and @CityOfLight77
It's not a surprise if you look carefully into the grammar to generate test cases from. In case of VerilogGenerator, the start rule used in the example is source_text
. It's definition from the grammar is:
// START SYMBOL
source_text
: description* EOF
;
It means, that source_text
must be constructed from zero or more description
(due to the Kleene star quantifier *
after description
), i.e., empty files should be recognized by a Verilog parser.
Grammarinator does exactly the same in the opposite direction: before every generation it rolls a dice to decide whether to generate zero or more description
(i.e., generate empty file or not).
Although this random decision about zero or more quantifier expansion is quite useful deeper in the derivation tree to avoid infinite recursions, at the beginning, around the start_rule
, it's worth to manually replace the *
with +
(Kleene plus, "one or more" quantifier) to avoid empty output files.
I hope this helps!
@CityOfLight77 If it doesn't solve your problem with empty files, please share the grammar and I'll look into it.
Cheers,
Reni
from grammarinator.
For grammarinator-generate.exe VerilogGenerator.VerilogGenerator --sys-path . -d 10 -n 100 -r source_text --serializer grammarinator.runtime.simple_space_serializer
, I then used Trash to get the number of children for the source_text
rule (for i in tests/*; do trparse -t gen $i 2>/dev/null | trxgrep ' /source_text/*' | trtext -c ; done > o
) and made a histogram plot for the number of children in a source_text
for 100 generated tests. It seems the "sampling" for the LL-derivations follows a bell curve. Why is that?
from grammarinator.
Thanks. That explains quite a bit of what the generated code is doing. I can now follow through on what for _ in self._model.quantify(current, 0, min=0, max=inf)
does.
from grammarinator.
Related Issues (20)
- API usage HOT 2
- unrecognized arguments: -p HOT 1
- Generating alternation at most once HOT 2
- Enforce coverage of a grammar rule HOT 1
- undefined variables HOT 3
- grammarinator-generate fails for the PostgreSQL grammar HOT 3
- Can't get this to work HOT 2
- Generated file missing tokens HOT 1
- Invalid words while using NOT HOT 1
- Python keywords as rule name HOT 3
- Error processing grammar ModuleNotFoundError: No module named 'grammarinator.parser.ANTLRv4Lexer' HOT 3
- Wrong indentation in generated Python code HOT 2
- Test case doesn't work anymore HOT 2
- How to get the value of `current` ? HOT 2
- Wrong python code generated HOT 3
- Can't set alt weights HOT 4
- AttributeError: type object 'JSONGenerator' has no attribute '<INVALID>' HOT 1
- ANTLR download fails due to SSLError HOT 2
- Generation seed not working HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grammarinator.