I'm using the grammar <a href="https://github.com/antlr/grammars-v4/tree/master/verilo

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Half the generated files are empty? about grammarinator HOT 6 CLOSED

renatahodovan commented on May 13, 2024

Half the generated files are empty?

from grammarinator.

Comments (6)

renatahodovan commented on May 13, 2024 1

Hi @kaby76

It's not a bell curve but it's an (1/x)^n curve (in this case (1/2)^n), which is exactly what we expect from quantifiers by definition/implementation. The generation of quantifiers happens according to the following pseudo code:

source_text = UnparserRule(name='source_text')
while random_decision():
    source_text += UnparserRule(name='description')

It means, that the probability of the generation of one description is 1/2, for two descriptions is (1/2)^2, for three is (1/2)^3, etc., i.e.; (1/2)^n, what your plot shows as well.

from grammarinator.

akosthekiss commented on May 13, 2024 1

@kaby76 I was just about to leave a comment guiding you to models, if you wanted to tweak the "let's flip a coin" default approach. You can write your own decision model that has the same API as DefaultModel . Every random decision of the generated fuzzer (e.g., how to chose an alternative from A | B or how many times to iterate over *) actually happens here. And the default model can be replaced even from the command line using the -m or --model switch:

https://github.com/renatahodovan/grammarinator/blob/master/grammarinator/generate.py#L237-L238

As the documentation of models is incomplete (so to say), let me introduce quantify(self, node, idx, min, max). Whenever a quantifier is reached during test case generation, the model's quantify method is called in a for loop. Actually, quantify should be a generator and it should yield as many times as the loop is expected to iterate. It is expected that it yields between min and max times (inclusive). To help quantify make the decision, the current node is passed as an argument, for which children are being generated; e.g., node.name names the rule that is corresponding to the node in the grammar. Moreover, idx is also passed as an argument, which uniquely identifies the quantifier within the rule. (E.g., in S: A* B?;, * has index 0, ? has index 1.)

I know that the above is a bit brief, but I hope it helps.

BTW, there is also a subclass of DefaultModel, called DispatchingModel. It simplifies tweaking the random decisions in some selected rules by writing methods named like quantify_<RULE>. E.g., in your example:

class VerilogModel(grammarinator.runtime.DispatchingModel):
    def quantify_source_text(self, node, idx, min, max):
        yield
        yield
        yield

(And this would create test cases that always contained exactly three descriptions. The rest of the quantifiers would still use the flip-the-coin approach.)

from grammarinator.

CityOfLight77 commented on May 13, 2024

I'm facing same issue with all grammars I tested they generated empty files, but I don't know it's intended or not.

from grammarinator.

renatahodovan commented on May 13, 2024

Hi @kaby76 and @CityOfLight77

It's not a surprise if you look carefully into the grammar to generate test cases from. In case of VerilogGenerator, the start rule used in the example is source_text. It's definition from the grammar is:

// START SYMBOL
source_text
	: description* EOF
	;

It means, that source_text must be constructed from zero or more description (due to the Kleene star quantifier * after description), i.e., empty files should be recognized by a Verilog parser.
Grammarinator does exactly the same in the opposite direction: before every generation it rolls a dice to decide whether to generate zero or more description (i.e., generate empty file or not).

Although this random decision about zero or more quantifier expansion is quite useful deeper in the derivation tree to avoid infinite recursions, at the beginning, around the start_rule, it's worth to manually replace the * with + (Kleene plus, "one or more" quantifier) to avoid empty output files.

I hope this helps!

@CityOfLight77 If it doesn't solve your problem with empty files, please share the grammar and I'll look into it.

Cheers,
Reni

from grammarinator.

kaby76 commented on May 13, 2024

For grammarinator-generate.exe VerilogGenerator.VerilogGenerator --sys-path . -d 10 -n 100 -r source_text --serializer grammarinator.runtime.simple_space_serializer, I then used Trash to get the number of children for the source_text rule (for i in tests/*; do trparse -t gen $i 2>/dev/null | trxgrep ' /source_text/*' | trtext -c ; done > o) and made a histogram plot for the number of children in a source_text for 100 generated tests. It seems the "sampling" for the LL-derivations follows a bell curve. Why is that?

from grammarinator.

kaby76 commented on May 13, 2024

Thanks. That explains quite a bit of what the generated code is doing. I can now follow through on what for _ in self._model.quantify(current, 0, min=0, max=inf) does.

from grammarinator.

Half the generated files are empty? about grammarinator HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent