Giter Site home page Giter Site logo

Comments (4)

renatahodovan avatar renatahodovan commented on September 26, 2024

Hi @Beliefuture !

Thanks for your interest in Grammarinator!

Empty output are generated by HTMLGenerator if all the quantified components of the starting htmlDocument rule decides to stop generation at the first iteration. Since there is 0.5 chance for stopping and continuing the loop at every iteration and since there is 6 quantified components in htmlDocument, empty output happen with 0.5^6 chance.

As per the invalid output... these output might look as invalid HTMLs (and some of them are indeed), however they fulfill all the requirements defined by the grammar. The grammar doesn't have any information about tag or attribute names or attribute values. It doesn't know anything about spaces between the tokens. It doesn't know the semantics of style, script or xml tags. Etc. This is simply because these grammars are parser grammars. They are responsible to check only the syntax of an input and all the further checks are usually implemented manually. Similarly, if these grammars are used to generate output, then the additional information needed to be defined manually. Either by editing the grammar itself with rule rewrites, custom predicates or actions (probably with loosing the possibility of using the grammar for parsing) or by implementing custom generator subclasses and/or models/listeners/serializers etc. HTMLCustomGenerator is a basic example for such a custom generator.

Regarding the PostgreSQL issue, are you sure you commented out the superClass options both in the lexer and parser grammars and regenerated the generator? Another option to control the superclass of the produced generator is rewriting the superClass option from CLI like this:

grammarinator-process -DsuperClass=Generator ...

Getting only empty output from PostgreSQL is weird. Although stmtmulti is completely quantified, it should only result at most 50% empty result. Could you paste the command you used resulting in only empty output?

from grammarinator.

Beliefuture avatar Beliefuture commented on September 26, 2024

Hi @renatahodovan !

Thanks for your detailed explanation and sorry for the late response.

  1. For the empty output by the HTMLGenerator, could I ask whether there exist some ways to customize the rules to enforce the quantified components not to decide to stop generation at the first or later iteration?

    In the meanwhile, how to control the complexity (i.e., the number of tokens) of the generated files (specify the value of the -d parameter?) since I have found the generated files are almost short with few tokens.

  2. Since there exist messy characters in the generated files (e.g., 𧞢䯸, 왘𥍾𤏖, 𗉊𬱦......) in the demonstrated cases above, I wonder how this tool populates these values and how to specify the set of the values to make the generated file more reasonable?

  3. For the class issue of PostgreSQL, I have checked the files, and I am sorry that I forget to comment the class in the PostgreSQLLexer.g4 file.

    For the empty issue of PostgreSQL, I have tried to generate ten cases again for testing and only three of them are not empty (≈0.3). Based on your illustration that the default probability of the empty output is 0.5, I think it doesn't raise an issue. Does this tool adopt random strategy to generate testing cases now? Could I set my preference to make it generate specific clauses or expressions I want?

    Again, I want to know how to specify the literal values of the generated SQL to make them more reasonable to be the testing cases (since they look strange?). I have listed a case below:

CREATE
OPERATOR CLASS RIGHT . U&"V" . OVERLAPS . HEADER . XMLFOREST DEFAULT FOR TYPE U&"X" % ROWTYPE USING ROWTYPE FAMILY OPEN AS STORAGE INTERVAL ( 9 ) [ 96 ] , STORAGE SETOF BIT VARYING ARRAY , OPERATOR 626 * ( COALESCE % TYPE , NONE ) FOR SEARCH , STORAGE :"" % ROWTYPE , STORAGE INTEGER ARRAY [ 7 ] , STORAGE SETOF TRANSLATE ARRAY [ 53 ]
  1. Another minor issue is that I encounter the following errors when I generated the PostgreSQLGenerator file but I think they might be attributed to the antlr source grammar file. But I am not sure whether these errors will hinder this tool to function properly.
The farthest rule from 'root' is 'a_expr_typecast' (25 steps). 
150 rule(s) unreachable from 'root': 'Dollar', 'DOT_DOT', 'OperatorEndingWithPlusMinus', 'OperatorCharacterNotAl lowPlusMinusAtEnd', 'OperatorCharacterAllowPlusMinusAtEnd', 'WHILE', 'FOREACH', 'LOOP', 'InvalidQuotedIdentifier', 'Inva lidUnterminatedQuotedIdentifier', 'UnterminatedUnicodeQuotedIdentifier', 'InvalidUnicodeQuotedIdentifier', 'InvalidUnter minatedUnicodeQuotedIdentifier', 'BeginEscapeStringConstant', 'InvalidBinaryStringConstant', 'InvalidUnterminatedBinaryS tringConstant', 'InvalidHexadecimalStringConstant', 'InvalidUnterminatedHexadecimalStringConstant', 'NumericFail', 'Whit espace', 'Newline', 'LineComment', 'BlockComment', 'UnterminatedBlockComment', 'ErrorCharacter', 'UnterminatedEscapeStri ngConstant', 'InvalidEscapeStringConstant', 'InvalidUnterminatedEscapeStringConstant', 'InvalidEscapeStringText', 'After EscapeStringConstantMode_Whitespace', 'AfterEscapeStringConstantMode_Newline', 'AfterEscapeStringConstantMode_NotContinu ed', 'AfterEscapeStringConstantWithNewlineMode_Whitespace', 'AfterEscapeStringConstantWithNewlineMode_Newline', 'AfterEs capeStringConstantWithNewlineMode_Continued', 'AfterEscapeStringConstantWithNewlineMode_NotContinued', 'plsqlroot', 'pl_ function', 'comp_options', 'comp_option', 'sharp', 'option_value', 'opt_semi', 'pl_block', 'decl_sect', 'decl_start', 'd ecl_stmts', 'label_decl', 'decl_stmt', 'decl_statement', 'opt_scrollable', 'decl_cursor_query', 'decl_cursor_args', 'dec l_cursor_arglist', 'decl_cursor_arg', 'decl_is_for', 'decl_aliasitem', 'decl_varname', 'decl_const', 'decl_datatype', 'd ecl_collate', 'decl_notnull', 'decl_defval', 'decl_defkey', 'assign_operator', 'proc_sect', 'proc_stmt', 'stmt_perform', 'stmt_call', 'opt_expr_list', 'stmt_assign', 'stmt_getdiag', 'getdiag_area_opt', 'getdiag_list', 'getdiag_list_item', ' getdiag_item', 'getdiag_target', 'assign_var', 'stmt_if', 'stmt_elsifs', 'stmt_else', 'stmt_case', 'opt_expr_until_when' , 'case_when_list', 'case_when', 'opt_case_else', 'stmt_loop', 'stmt_while', 'stmt_for', 'for_control', 'opt_for_using_e xpression', 'opt_cursor_parameters', 'opt_reverse', 'opt_by_expression', 'for_variable', 'stmt_foreach_a', 'foreach_slic e', 'stmt_exit', 'exit_type', 'stmt_return', 'opt_return_result', 'stmt_raise', 'opt_stmt_raise_level', 'opt_raise_list' , 'opt_raise_using', 'opt_raise_using_elem', 'opt_raise_using_elem_list', 'stmt_assert', 'opt_stmt_assert_message', 'loo p_body', 'stmt_execsql', 'stmt_dynexecute', 'opt_execute_using', 'opt_execute_using_list', 'opt_execute_into', 'stmt_ope n', 'opt_open_bound_list_item', 'opt_open_bound_list', 'opt_open_using', 'opt_scroll_option', 'opt_scroll_option_no', 's tmt_fetch', 'opt_cursor_from', 'opt_fetch_direction', 'stmt_move', 'stmt_close', 'stmt_null', 'stmt_commit', 'stmt_rollb ack', 'plsql_opt_transaction_chain', 'stmt_set', 'cursor_variable', 'exception_sect', 'proc_exceptions', 'proc_exception ', 'proc_conditions', 'proc_condition', 'opt_block_label', 'opt_loop_label', 'opt_label', 'opt_exitcond', 'any_identifie r', 'sql_expression', 'expr_until_then', 'expr_until_semi', 'expr_until_rightbracket', 'expr_until_loop', 'make_execsql_ stmt', 'opt_returning_clause_into', 'c_expr_c_expr_expr'

Please leave messages if you have any questions :)

from grammarinator.

Beliefuture avatar Beliefuture commented on September 26, 2024

@renatahodovan

Hi @renatahodovan !

Thanks for your detailed explanation and sorry for the late response.

  1. For the empty output by the HTMLGenerator, could I ask whether there exist some ways to customize the rules to enforce the quantified components not to decide to stop generation at the first or later iteration?
    In the meanwhile, how to control the complexity (i.e., the number of tokens) of the generated files (specify the value of the -d parameter?) since I have found the generated files are almost short with few tokens.
  2. Since there exist messy characters in the generated files (e.g., 𧞢䯸, 왘𥍾𤏖, 𗉊𬱦......) in the demonstrated cases above, I wonder how this tool populates these values and how to specify the set of the values to make the generated file more reasonable?
  3. For the class issue of PostgreSQL, I have checked the files, and I am sorry that I forget to comment the class in the PostgreSQLLexer.g4 file.
    For the empty issue of PostgreSQL, I have tried to generate ten cases again for testing and only three of them are not empty (≈0.3). Based on your illustration that the default probability of the empty output is 0.5, I think it doesn't raise an issue. Does this tool adopt random strategy to generate testing cases now? Could I set my preference to make it generate specific clauses or expressions I want?
    Again, I want to know how to specify the literal values of the generated SQL to make them more reasonable to be the testing cases (since they look strange?). I have listed a case below:
CREATE
OPERATOR CLASS RIGHT . U&"V" . OVERLAPS . HEADER . XMLFOREST DEFAULT FOR TYPE U&"X" % ROWTYPE USING ROWTYPE FAMILY OPEN AS STORAGE INTERVAL ( 9 ) [ 96 ] , STORAGE SETOF BIT VARYING ARRAY , OPERATOR 626 * ( COALESCE % TYPE , NONE ) FOR SEARCH , STORAGE :"" % ROWTYPE , STORAGE INTEGER ARRAY [ 7 ] , STORAGE SETOF TRANSLATE ARRAY [ 53 ]
  1. Another minor issue is that I encounter the following errors when I generated the PostgreSQLGenerator file but I think they might be attributed to the antlr source grammar file. But I am not sure whether these errors will hinder this tool to function properly.
The farthest rule from 'root' is 'a_expr_typecast' (25 steps). 
150 rule(s) unreachable from 'root': 'Dollar', 'DOT_DOT', 'OperatorEndingWithPlusMinus', 'OperatorCharacterNotAl lowPlusMinusAtEnd', 'OperatorCharacterAllowPlusMinusAtEnd', 'WHILE', 'FOREACH', 'LOOP', 'InvalidQuotedIdentifier', 'Inva lidUnterminatedQuotedIdentifier', 'UnterminatedUnicodeQuotedIdentifier', 'InvalidUnicodeQuotedIdentifier', 'InvalidUnter minatedUnicodeQuotedIdentifier', 'BeginEscapeStringConstant', 'InvalidBinaryStringConstant', 'InvalidUnterminatedBinaryS tringConstant', 'InvalidHexadecimalStringConstant', 'InvalidUnterminatedHexadecimalStringConstant', 'NumericFail', 'Whit espace', 'Newline', 'LineComment', 'BlockComment', 'UnterminatedBlockComment', 'ErrorCharacter', 'UnterminatedEscapeStri ngConstant', 'InvalidEscapeStringConstant', 'InvalidUnterminatedEscapeStringConstant', 'InvalidEscapeStringText', 'After EscapeStringConstantMode_Whitespace', 'AfterEscapeStringConstantMode_Newline', 'AfterEscapeStringConstantMode_NotContinu ed', 'AfterEscapeStringConstantWithNewlineMode_Whitespace', 'AfterEscapeStringConstantWithNewlineMode_Newline', 'AfterEs capeStringConstantWithNewlineMode_Continued', 'AfterEscapeStringConstantWithNewlineMode_NotContinued', 'plsqlroot', 'pl_ function', 'comp_options', 'comp_option', 'sharp', 'option_value', 'opt_semi', 'pl_block', 'decl_sect', 'decl_start', 'd ecl_stmts', 'label_decl', 'decl_stmt', 'decl_statement', 'opt_scrollable', 'decl_cursor_query', 'decl_cursor_args', 'dec l_cursor_arglist', 'decl_cursor_arg', 'decl_is_for', 'decl_aliasitem', 'decl_varname', 'decl_const', 'decl_datatype', 'd ecl_collate', 'decl_notnull', 'decl_defval', 'decl_defkey', 'assign_operator', 'proc_sect', 'proc_stmt', 'stmt_perform', 'stmt_call', 'opt_expr_list', 'stmt_assign', 'stmt_getdiag', 'getdiag_area_opt', 'getdiag_list', 'getdiag_list_item', ' getdiag_item', 'getdiag_target', 'assign_var', 'stmt_if', 'stmt_elsifs', 'stmt_else', 'stmt_case', 'opt_expr_until_when' , 'case_when_list', 'case_when', 'opt_case_else', 'stmt_loop', 'stmt_while', 'stmt_for', 'for_control', 'opt_for_using_e xpression', 'opt_cursor_parameters', 'opt_reverse', 'opt_by_expression', 'for_variable', 'stmt_foreach_a', 'foreach_slic e', 'stmt_exit', 'exit_type', 'stmt_return', 'opt_return_result', 'stmt_raise', 'opt_stmt_raise_level', 'opt_raise_list' , 'opt_raise_using', 'opt_raise_using_elem', 'opt_raise_using_elem_list', 'stmt_assert', 'opt_stmt_assert_message', 'loo p_body', 'stmt_execsql', 'stmt_dynexecute', 'opt_execute_using', 'opt_execute_using_list', 'opt_execute_into', 'stmt_ope n', 'opt_open_bound_list_item', 'opt_open_bound_list', 'opt_open_using', 'opt_scroll_option', 'opt_scroll_option_no', 's tmt_fetch', 'opt_cursor_from', 'opt_fetch_direction', 'stmt_move', 'stmt_close', 'stmt_null', 'stmt_commit', 'stmt_rollb ack', 'plsql_opt_transaction_chain', 'stmt_set', 'cursor_variable', 'exception_sect', 'proc_exceptions', 'proc_exception ', 'proc_conditions', 'proc_condition', 'opt_block_label', 'opt_loop_label', 'opt_label', 'opt_exitcond', 'any_identifie r', 'sql_expression', 'expr_until_then', 'expr_until_semi', 'expr_until_rightbracket', 'expr_until_loop', 'make_execsql_ stmt', 'opt_returning_clause_into', 'c_expr_c_expr_expr'

Please leave messages if you have any questions :)

Besides, I have found that the generated SQLs for PostgreSQL are typically incomplete and not executable that fail to obey the grammar rule strictly?

SELECT INTERSECT ALL SELECT ; SELECT INTERSECT DISTINCT SELECT SELECT INTERSECT DISTINCT SELECT UNION SELECT INTERSECT ALL SELECT INTERSECT DISTINCT SELECT INTERSECT DISTINCT SELECT UNION DISTINCT SELECT FOR READ ONLY ;
SELECT INTERSECT SELECT EXCEPT SELECT ; SELECT EXCEPT ALL SELECT INTERSECT ALL SELECT

from grammarinator.

Beliefuture avatar Beliefuture commented on September 26, 2024

Maybe the incomplete queries generated can be attributed to the truncation due to the parameter -d?

from grammarinator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.