zac-hd / hypothesmith Goto Github PK

View Code? Open in Web Editor NEW

93.0 10.0 9.0 97 KB

Hypothesis strategies for generating Python programs, something like CSmith

Home Page: https://pypi.org/project/hypothesmith/

License: Mozilla Public License 2.0

Python 100.00%

hypothesis python fuzzing

hypothesmith's Introduction

hypothesmith

Hypothesis strategies for generating Python programs, something like CSmith.

This is definitely pre-alpha, but if you want to play with it feel free! You can even keep the shiny pieces when - not if - it breaks.

Get it today with pip install hypothesmith, or by cloning the GitHub repo.

You can run the tests, such as they are, with tox on Python 3.6 or later. Use tox -va to see what environments are available.

Usage

This package provides two Hypothesis strategies for generating Python source code.

The generated code will always be syntatically valid, and is useful for testing parsers, linters, auto-formatters, and other tools that operate on source code.

DO NOT EXECUTE CODE GENERATED BY THESE STRATEGIES.

It could do literally anything that running Python code is able to do, including changing, deleting, or uploading important data. Arbitrary code can be useful, but "arbitrary code execution" can be very, very bad.

`hypothesmith.from_grammar(start="file_input", *, auto_target=True)`

Generates syntactically-valid Python source code based on the grammar.

Valid values for start are "single_input", "file_input", or "eval_input"; respectively a single interactive statement, a module or sequence of commands read from a file, and input for the eval() function.

If auto_target is True, this strategy uses hypothesis.target() internally to drive towards larger and more complex examples. We recommend leaving this enabled, as the grammar is quite complex and only simple examples tend to be generated otherwise.

`hypothesmith.from_node(node=libcst.Module, *, auto_target=True)`

Generates syntactically-valid Python source code based on the node types defined by the LibCST project.

You can pass any subtype of libcst.CSTNode. Alternatively, you can use Hypothesis' built-in from_type(node_type).map(lambda n: libcst.Module([n]).code, after Hypothesmith has registered the required strategies. However, this does not include automatic targeting and limitations of LibCST may lead to invalid code being generated.

Notable bugs found with Hypothesmith

BPO-40661, a segfault in the new parser, was given maximum priority and blocked the planned release of CPython 3.9 beta1.
BPO-38953 tokenize -> untokenize roundtrip bugs.
BPO-42218 mishandled error case in new PEG parser.
lib2to3 errors on \r in comment
Black fails on files ending in a backslash
At least three round-trip bugs in LibCST (search commits for "hypothesis")
Invalid code generated by LibCST

Changelog

Patch notes can be found in CHANGELOG.md.

hypothesmith's People

Contributors

Stargazers

Watchers

Forkers

jayvdb pombredanne pawamoy megaing sobolevn reaganjlee sysfce2 sluglab jakkdl

hypothesmith's Issues

Generated programs are missing spaces

I must be missing something simple. I run this program:

from hypothesis import given
import hypothesmith as hs

@given(hs.from_grammar("eval_input"))
@settings(max_examples=50)
def test(expr):
    print(repr(expr))

test()

This prints examples like these:

'A'
'A\n'
'A,A,'
'A,A'
'A,'
'A,A,\n'
'A\n'
'AorA,A'
'AorA,'
'A,\n'
'lambda:A,\n'

I am pretty sure it meant A or A, not AorA. (I saw more similar example in other runs and variations of the program.)

It also occasionally prints a traceback and this error:

hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data. Health check fo
und 50 filtered examples but only 6 good ones. This will make your tests much slower, and also will probably dis
tort the data generation quite a lot. You should adapt your strategy to filter less. This can also be caused by 
a low max_leaves parameter in recursive() calls

This is Python 3.9.2 on Windows.

hypothesis          6.13.0
hypothesmith        0.1.8

I figure I'm doing something wrong or not understanding something?

Document and validate start productions for from_grammar

No git version tags

According to https://pypi.org/project/hypothesmith/ latest version is 0.1.8 however there is no in git repo version tags.
Is it possible to add version tag for last version?

hypothesmith on PyPI breaks with FileNotFoundError when importing from_grammar or from_node

Repro:

create a clean virtualenv
pip install hypothesmith
open python prompt (3.10.8)
any of the following doesn't work:
- from hypothesmith import from_grammar
- from hypothesmith import from_node

Error:

$ python          
Python 3.10.8 (main, Nov  1 2022, 14:18:21) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from hypothesmith import grammar
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/test_hypothesmith/.virtualenv/test_hypothesmith-dgeh/lib/python3.10/site-packages/hypothesmith/__init__.py", line 3, in <module>
    from hypothesmith.cst import from_node
  File "/tmp/test_hypothesmith/.virtualenv/test_hypothesmith-dgeh/lib/python3.10/site-packages/hypothesmith/cst.py", line 24, in <module>
    from hypothesmith.syntactic import identifiers
  File "/tmp/test_hypothesmith/.virtualenv/test_hypothesmith-dgeh/lib/python3.10/site-packages/hypothesmith/syntactic.py", line 21, in <module>
    LARK_GRAMMAR = read_text("hypothesmith", "python.lark")
  File "/usr/lib/python3.10/importlib/resources.py", line 103, in read_text
    with open_text(package, resource, encoding, errors) as fp:
  File "/usr/lib/python3.10/importlib/resources.py", line 82, in open_text
    open_binary(package, resource), encoding=encoding, errors=errors
  File "/usr/lib/python3.10/importlib/resources.py", line 46, in open_binary
    return reader.open_resource(resource)
  File "/usr/lib/python3.10/importlib/abc.py", line 433, in open_resource
    return self.files().joinpath(resource).open('rb')
  File "/usr/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/test_hypothesmith/.virtualenv/test_hypothesmith-dgeh/lib/python3.10/site-packages/hypothesmith/python.lark'

I assume it's related to the recent lark/lark-parser snafus

Add release

Would it be possible to add a release so this can be properly packaged? What is your roadmap?

hypothesmith powered testing for isort

Hi!

This is a really cool project! I saw the PR posted on the black project here and was wondering if there would be any interest in helping to integrate it into isort? isort currently has some property testing in place, but only to deal with the complexity around the number of configuration options, and attempting to try many combinations together. However, if you look at the test suite it is riddled with countless parsing issues that unfortunately users found for the project first. I'd love to integrate hypothesmith as part of a strategy to get ahead of future parsing errors.

Thanks!

~Timothy

Generate names which collide when NFKC-normalized

See this comment on Reddit and this blog post:

Be warned that Python always applies NFKC normalization to characters. Therefore, two distinct characters may actually produce the same variable name. For example:
>>> ª = 1 # FEMININE ORDINAL INDICATOR
>>> a # LATIN SMALL LETTER A (i.e., ASCII lowercase 'a')
1

Hypothesmith should deliberately violate this rule, to expose tools which compare identifiers as strings without correctly normalizing them first.

0.3.3: pytest fails in `tests/test_syntactic.py::test_black_autoformatter_from_grammar` unit

I'm packaging your module as an rpm package so I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

python3 -sBm build -w --no-isolation
because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
install .whl file in </install/prefix> using installer module
run pytest with $PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>
build is performed in env which is cut off from access to the public network (pytest is executed with -m "not network")

Here is pytest output:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-hypothesmith-0.3.3-2.fc36.x86_64/usr/lib64/python3.9/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-hypothesmith-0.3.3-2.fc36.x86_64/usr/lib/python3.9/site-packages
+ /usr/bin/pytest -ra -m 'not network'
============================= test session starts ==============================
platform linux -- Python 3.9.18, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/tkloczko/rpmbuild/BUILD/hypothesmith-0.3.3
configfile: tox.ini
plugins: hypothesis-6.99.6
collected 181 items

tests/test_cst.py ....s................................................. [ 29%]
...................................F....F..........................s.... [ 69%]
............................F............x...                            [ 94%]
tests/test_syntactic.py xx.....                                          [ 98%]
tests/test_version.py ...                                                [100%]

=================================== FAILURES ===================================
________________ test_source_code_from_libcst_node_type[Match] _________________
tests/test_cst.py:25: in test_source_code_from_libcst_node_type
    @given(data=st.data())
E   hypothesis.errors.Unsatisfiable: Unable to satisfy assumptions of test_source_code_from_libcst_node_type
---------------------------------- Hypothesis ----------------------------------
You can add @seed(41049388112553499489934747796162445915) to this test or run pytest with --hypothesis-seed=41049388112553499489934747796162445915 to reproduce this failure.
______________ test_source_code_from_libcst_node_type[MatchList] _______________
tests/test_cst.py:25: in test_source_code_from_libcst_node_type
    @given(data=st.data())
E   hypothesis.errors.Unsatisfiable: Unable to satisfy assumptions of test_source_code_from_libcst_node_type
---------------------------------- Hypothesis ----------------------------------
You can add @seed(189423066481151525614644471526471958627) to this test or run pytest with --hypothesis-seed=189423066481151525614644471526471958627 to reproduce this failure.
_______________ test_source_code_from_libcst_node_type[TryStar] ________________
tests/test_cst.py:25: in test_source_code_from_libcst_node_type
    @given(data=st.data())
E   hypothesis.errors.Unsatisfiable: Unable to satisfy assumptions of test_source_code_from_libcst_node_type
---------------------------------- Hypothesis ----------------------------------
You can add @seed(94447068965149540313353865934997293123) to this test or run pytest with --hypothesis-seed=94447068965149540313353865934997293123 to reproduce this failure.
================================== XFAILURES ===================================
_____________________ test_black_autoformatter_from_nodes ______________________
/usr/lib/python3.9/site-packages/black/__init__.py:1526: in assert_equivalent
    src_ast = parse_ast(src)
/usr/lib/python3.9/site-packages/black/parsing.py:148: in parse_ast
    raise SyntaxError(first_error)
E   SyntaxError: invalid character '▒' (U+2592) (<unknown>, line 1)

The above exception was the direct cause of the following exception:
tests/test_cst.py:57: in test_black_autoformatter_from_nodes
    @example("A\u2592", black.Mode())
/usr/lib/python3.9/site-packages/hypothesis/core.py:1277: in _raise_to_user
    raise the_error_hypothesis_found
tests/test_cst.py:69: in test_black_autoformatter_from_nodes
    result = black.format_file_contents(source_code, fast=False, mode=mode)
/usr/lib/python3.9/site-packages/black/__init__.py:1083: in format_file_contents
    check_stability_and_equivalence(
/usr/lib/python3.9/site-packages/black/__init__.py:1057: in check_stability_and_equivalence
    assert_equivalent(src_contents, dst_contents)
/usr/lib/python3.9/site-packages/black/__init__.py:1528: in assert_equivalent
    raise ASTSafetyError(
E   black.parsing.ASTSafetyError: cannot use --safe with this file; failed to parse source file AST: invalid character '▒' (U+2592) (<unknown>, line 1)
E   This could be caused by running Black with an older Python version that does not support new syntax used in your source file.
E   Falsifying explicit example: test_black_autoformatter_from_nodes(
E       source_code='A▒',
E       mode=Mode(target_versions=set(), line_length=88, string_normalization=True, is_pyi=False, is_ipynb=False, skip_source_first_line=False, magic_trailing_comma=True, python_cell_magics=set(), preview=False, unstable=False, enabled_features=set()),
E   )
________________________ test_tokenize_round_trip_bytes ________________________
  + Exception Group Traceback (most recent call last):
  |   File "/usr/lib/python3.9/site-packages/_pytest/runner.py", line 340, in from_call
  |     result: Optional[TResult] = func()
  |   File "/usr/lib/python3.9/site-packages/_pytest/runner.py", line 240, in <lambda>
  |     lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
  |   File "/usr/lib/python3.9/site-packages/pluggy/_hooks.py", line 501, in __call__
  |     return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_manager.py", line 119, in _hookexec
  |     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 181, in _multicall
  |     return outcome.get_result()
  |   File "/usr/lib/python3.9/site-packages/pluggy/_result.py", line 99, in get_result
  |     raise exc.with_traceback(exc.__traceback__)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 166, in _multicall
  |     teardown.throw(outcome._exception)
  |   File "/usr/lib/python3.9/site-packages/_pytest/threadexception.py", line 87, in pytest_runtest_call
  |     yield from thread_exception_runtest_hook()
  |   File "/usr/lib/python3.9/site-packages/_pytest/threadexception.py", line 63, in thread_exception_runtest_hook
  |     yield
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 166, in _multicall
  |     teardown.throw(outcome._exception)
  |   File "/usr/lib/python3.9/site-packages/_pytest/unraisableexception.py", line 90, in pytest_runtest_call
  |     yield from unraisable_exception_runtest_hook()
  |   File "/usr/lib/python3.9/site-packages/_pytest/unraisableexception.py", line 65, in unraisable_exception_runtest_hook
  |     yield
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 166, in _multicall
  |     teardown.throw(outcome._exception)
  |   File "/usr/lib/python3.9/site-packages/_pytest/logging.py", line 849, in pytest_runtest_call
  |     yield from self._runtest_for(item, "call")
  |   File "/usr/lib/python3.9/site-packages/_pytest/logging.py", line 832, in _runtest_for
  |     yield
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 166, in _multicall
  |     teardown.throw(outcome._exception)
  |   File "/usr/lib/python3.9/site-packages/_pytest/capture.py", line 883, in pytest_runtest_call
  |     return (yield)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 166, in _multicall
  |     teardown.throw(outcome._exception)
  |   File "/usr/lib/python3.9/site-packages/_pytest/skipping.py", line 256, in pytest_runtest_call
  |     return (yield)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 102, in _multicall
  |     res = hook_impl.function(*args)
  |   File "/usr/lib/python3.9/site-packages/_pytest/runner.py", line 182, in pytest_runtest_call
  |     raise e
  |   File "/usr/lib/python3.9/site-packages/_pytest/runner.py", line 172, in pytest_runtest_call
  |     item.runtest()
  |   File "/usr/lib/python3.9/site-packages/_pytest/python.py", line 1777, in runtest
  |     self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_hooks.py", line 501, in __call__
  |     return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_manager.py", line 119, in _hookexec
  |     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 138, in _multicall
  |     raise exception.with_traceback(exception.__traceback__)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 102, in _multicall
  |     res = hook_impl.function(*args)
  |   File "/usr/lib/python3.9/site-packages/_pytest/python.py", line 200, in pytest_pyfunc_call
  |     result = testfunction(**testargs)
  |   File "/home/tkloczko/rpmbuild/BUILD/hypothesmith-0.3.3/tests/test_syntactic.py", line 23, in test_tokenize_round_trip_bytes
  |     @example("#")
  |   File "/usr/lib/python3.9/site-packages/hypothesis/core.py", line 1573, in wrapped_test
  |     _raise_to_user(errors, state.settings, [], " in explicit examples")
  |   File "/usr/lib/python3.9/site-packages/hypothesis/core.py", line 1277, in _raise_to_user
  |     raise the_error_hypothesis_found
  | exceptiongroup.ExceptionGroup: Hypothesis found 2 distinct failures in explicit examples. (2 sub-exceptions)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/tkloczko/rpmbuild/BUILD/hypothesmith-0.3.3/tests/test_syntactic.py", line 32, in test_tokenize_round_trip_bytes
    |     tokens = list(tokenize.tokenize(io.BytesIO(source).readline))
    |   File "/usr/lib64/python3.9/tokenize.py", line 521, in _tokenize
    |     raise TokenError("EOF in multi-line statement", (lnum, 0))
    | tokenize.TokenError: ('EOF in multi-line statement', (3, 0))
    | Falsifying explicit example: test_tokenize_round_trip_bytes(
    |     source_code='\n\\\n',
    | )
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    |   File "/home/tkloczko/rpmbuild/BUILD/hypothesmith-0.3.3/tests/test_syntactic.py", line 35, in test_tokenize_round_trip_bytes
    |     assert [(t.type, t.string) for t in tokens] == [(t.type, t.string) for t in output]
    | AssertionError: assert [(62, 'utf-8'...4, '\n'), ...] == [(62, 'utf-8'...60, '#'), ...]
    |
    |   At index 3 diff: (1, 'pass') != (5, ' ')
    |   Right contains 2 more items, first extra item: (6, '')
    |   Use -v to get more diff
    | Falsifying explicit example: test_tokenize_round_trip_bytes(
    |     source_code='#\n\x0cpass#\n',
    | )
    +------------------------------------
_______________________ test_tokenize_round_trip_string ________________________
  + Exception Group Traceback (most recent call last):
  |   File "/usr/lib/python3.9/site-packages/_pytest/runner.py", line 340, in from_call
  |     result: Optional[TResult] = func()
  |   File "/usr/lib/python3.9/site-packages/_pytest/runner.py", line 240, in <lambda>
  |     lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
  |   File "/usr/lib/python3.9/site-packages/pluggy/_hooks.py", line 501, in __call__
  |     return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_manager.py", line 119, in _hookexec
  |     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 181, in _multicall
  |     return outcome.get_result()
  |   File "/usr/lib/python3.9/site-packages/pluggy/_result.py", line 99, in get_result
  |     raise exc.with_traceback(exc.__traceback__)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 166, in _multicall
  |     teardown.throw(outcome._exception)
  |   File "/usr/lib/python3.9/site-packages/_pytest/threadexception.py", line 87, in pytest_runtest_call
  |     yield from thread_exception_runtest_hook()
  |   File "/usr/lib/python3.9/site-packages/_pytest/threadexception.py", line 63, in thread_exception_runtest_hook
  |     yield
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 166, in _multicall
  |     teardown.throw(outcome._exception)
  |   File "/usr/lib/python3.9/site-packages/_pytest/unraisableexception.py", line 90, in pytest_runtest_call
  |     yield from unraisable_exception_runtest_hook()
  |   File "/usr/lib/python3.9/site-packages/_pytest/unraisableexception.py", line 65, in unraisable_exception_runtest_hook
  |     yield
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 166, in _multicall
  |     teardown.throw(outcome._exception)
  |   File "/usr/lib/python3.9/site-packages/_pytest/logging.py", line 849, in pytest_runtest_call
  |     yield from self._runtest_for(item, "call")
  |   File "/usr/lib/python3.9/site-packages/_pytest/logging.py", line 832, in _runtest_for
  |     yield
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 166, in _multicall
  |     teardown.throw(outcome._exception)
  |   File "/usr/lib/python3.9/site-packages/_pytest/capture.py", line 883, in pytest_runtest_call
  |     return (yield)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 166, in _multicall
  |     teardown.throw(outcome._exception)
  |   File "/usr/lib/python3.9/site-packages/_pytest/skipping.py", line 256, in pytest_runtest_call
  |     return (yield)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 102, in _multicall
  |     res = hook_impl.function(*args)
  |   File "/usr/lib/python3.9/site-packages/_pytest/runner.py", line 182, in pytest_runtest_call
  |     raise e
  |   File "/usr/lib/python3.9/site-packages/_pytest/runner.py", line 172, in pytest_runtest_call
  |     item.runtest()
  |   File "/usr/lib/python3.9/site-packages/_pytest/python.py", line 1777, in runtest
  |     self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_hooks.py", line 501, in __call__
  |     return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_manager.py", line 119, in _hookexec
  |     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 138, in _multicall
  |     raise exception.with_traceback(exception.__traceback__)
  |   File "/usr/lib/python3.9/site-packages/pluggy/_callers.py", line 102, in _multicall
  |     res = hook_impl.function(*args)
  |   File "/usr/lib/python3.9/site-packages/_pytest/python.py", line 200, in pytest_pyfunc_call
  |     result = testfunction(**testargs)
  |   File "/home/tkloczko/rpmbuild/BUILD/hypothesmith-0.3.3/tests/test_syntactic.py", line 41, in test_tokenize_round_trip_string
  |     @example("#")
  |   File "/usr/lib/python3.9/site-packages/hypothesis/core.py", line 1573, in wrapped_test
  |     _raise_to_user(errors, state.settings, [], " in explicit examples")
  |   File "/usr/lib/python3.9/site-packages/hypothesis/core.py", line 1277, in _raise_to_user
  |     raise the_error_hypothesis_found
  | exceptiongroup.ExceptionGroup: Hypothesis found 2 distinct failures in explicit examples. (2 sub-exceptions)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/tkloczko/rpmbuild/BUILD/hypothesmith-0.3.3/tests/test_syntactic.py", line 46, in test_tokenize_round_trip_string
    |     tokens = list(tokenize.generate_tokens(io.StringIO(source_code).readline))
    |   File "/usr/lib64/python3.9/tokenize.py", line 521, in _tokenize
    |     raise TokenError("EOF in multi-line statement", (lnum, 0))
    | tokenize.TokenError: ('EOF in multi-line statement', (3, 0))
    | Falsifying explicit example: test_tokenize_round_trip_string(
    |     source_code='\n\\\n',
    | )
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    |   File "/home/tkloczko/rpmbuild/BUILD/hypothesmith-0.3.3/tests/test_syntactic.py", line 49, in test_tokenize_round_trip_string
    |     assert [(t.type, t.string) for t in tokens] == [(t.type, t.string) for t in output]
    | AssertionError: assert [(60, '#'), (...\n'), (0, '')] == [(60, '#'), (...4, '\n'), ...]
    |
    |   At index 2 diff: (1, 'pass') != (5, ' ')
    |   Right contains 2 more items, first extra item: (6, '')
    |   Use -v to get more diff
    | Falsifying explicit example: test_tokenize_round_trip_string(
    |     source_code='#\n\x0cpass#\n',
    | )
    +------------------------------------
=========================== short test summary info ============================
SKIPPED [2] tests/test_cst.py:43: codegen not supported yet, e.g. Annotation
XFAIL tests/test_cst.py::test_black_autoformatter_from_nodes
XFAIL tests/test_syntactic.py::test_tokenize_round_trip_bytes
XFAIL tests/test_syntactic.py::test_tokenize_round_trip_string
FAILED tests/test_cst.py::test_source_code_from_libcst_node_type[Match] - hyp...
FAILED tests/test_cst.py::test_source_code_from_libcst_node_type[MatchList]
FAILED tests/test_cst.py::test_source_code_from_libcst_node_type[TryStar] - h...
======= 3 failed, 173 passed, 2 skipped, 3 xfailed in 3907.94s (1:05:07) =======

List of installed modules in build env:

Package            Version
------------------ -----------
attrs              23.2.0
black              24.3.0
build              1.1.1
click              8.1.7
exceptiongroup     1.1.3
hypothesis         6.99.11
importlib_metadata 7.0.1
iniconfig          2.0.0
installer          0.7.0
lark               1.1.9
libcst             1.2.0
mypy_extensions    1.0.0
packaging          24.0
parso              0.8.3
pathspec           0.12.1
platformdirs       4.2.0
pluggy             1.4.0
pyproject_hooks    1.0.0
pytest             8.1.1
python-dateutil    2.9.0.post0
PyYAML             6.0.1
setuptools         69.1.1
sortedcontainers   2.4.0
tokenize_rt        5.2.0
tomli              2.0.1
typing_extensions  4.10.0
typing_inspect     0.9.0
wheel              0.43.0
zipp               3.17.0

Please let me know if you need more details or want me to perform some diagnostics.

`hypothesmith` needs a logo!

Every project with aspirations to greatness needs a logo, and hypothesmith is no exception. Are you the generous designer who can help?

Hypothesmith is, as the name suggests, built on Hypothesis. You may therefore want to draw on that project's logo and brand, though it's not required.
The other major inspiration is CSmith. I wouldn't copy them too closely, but the blacksmith theme is pretty obvious. Perhaps other kinds of smithing (silver, gold, etc.) would look cool?
Hypothesmith creates Python code, so you could also work in the Python snakes somehow.
Once hypothesmithhas a logo I like, I'll be printing it on stickers - and will send you some wherever you are if you would like some. The logo need not include the project name, but it would be nice to have a sticker design that does for easier recognition.

Ideas or sketches are welcome, not just finished proposals 😁

Cannot import on 3.13, importlib.resources.read_text is removed

tests/test_flake8_trio.py:24: in <module>
    from hypothesmith import from_grammar, from_node
.tox/py313/lib/python3.13/site-packages/hypothesmith/__init__.py:3: in <module>
    from hypothesmith.cst import from_node
.tox/py313/lib/python3.13/site-packages/hypothesmith/cst.py:24: in <module>
    from .syntactic import ALLOWED_CHARS
.tox/py313/lib/python3.13/site-packages/hypothesmith/syntactic.py:7: in <module>
    from importlib.resources import read_text
E   ImportError: cannot import name 'read_text' from 'importlib.resources' (/usr/lib/python3.13/importlib/resources/__init__.py)

https://docs.python.org/3/library/importlib.resources.html#importlib.resources.read_text says calls to the method can be replaced by

files(package).joinpath(resource).read_text(encoding=encoding)

This is the only blocker (I think) from testing flake8-trio with 3.13

Error when generating single_input

With the following test:

@hypothesis.given(hypothesmith.from_grammar(start="single_input"))
def test_statements(self, statement: str):
    statement = statement.strip()
    hypothesis.assume(statement and not statement.startswith('#'))
    print(repr(statement))
    # Assert logic ommited

I get the following error in hypothesmith:

Traceback (most recent call last):
  File "/projects/test/tests/test_matcher.py", line 47, in test_statements
    def test_statements(self, statement: str):
  File "/.virtualenvs/test/lib/python3.7/site-packages/hypothesis/core.py", line 1142, in wrapped_test
    raise the_error_hypothesis_found
  File "/.virtualenvs/test/lib/python3.7/site-packages/hypothesmith/syntactic.py", line 93, in do_draw
    nodes = list(ast.walk(ast.parse(result)))
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 2
    �xrÁÔ�󩅲�ró򣯽
              ^
SyntaxError: invalid character in identifier

Failure on py3.9: SystemError: Negative size passed to PyUnicode_New

Hi! My CI crashed with something rather interesting:

Link: https://github.com/wemake-services/wemake-python-styleguide/runs/4175506726?check_suite_focus=true

 =================================== FAILURES ===================================
______________________________ test_no_exceptions ______________________________

self = <hypothesmith.syntactic.GrammarStrategy object at 0x7f4930fc3e80>
data = ConjectureData(VALID, 28 bytes, frozen)
symbol = NonTerminal('simple_stmt')
draw_state = DrawState(result=['global', 'A', '\\ \n', '#\n'])

    def draw_symbol(self, data, symbol, draw_state):  # type: ignore
        count = len(draw_state.result)
        super().draw_symbol(data, symbol, draw_state)
        if symbol.name in COMPILE_MODES:
            try:
>               compile(
                    source="".join(draw_state.result[count:]),
                    filename="<string>",
                    mode=COMPILE_MODES[symbol.name],
                )
E               SystemError: Negative size passed to PyUnicode_New

.venv/lib/python3.9/site-packages/hypothesmith/syntactic.py:110: SystemError

The above exception was the direct cause of the following exception:

default_options = options(min_name_length=2, max_name_length=45, i_control_code=True, i_dont_control_code=True, max_noqa_comments=10, ne...gnitive_average=8, max_call_level=3, max_annotation_complexity=3, max_import_from_members=8, max_tuple_unpack_length=4)
parse_ast_tree = <function parse_ast_tree.<locals>.factory at 0x7f479cd01430>
parse_tokens = <function parse_tokens.<locals>.factory at 0x7f479ce863a0>

>   ???

tests/test_checker/test_hypothesis.py:40: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.venv/lib/python3.9/site-packages/hypothesmith/syntactic.py:89: in do_draw
    result = super().do_draw(data)
.venv/lib/python3.9/site-packages/hypothesis/extra/lark.py:153: in do_draw
    self.draw_symbol(data, start, state)
.venv/lib/python3.9/site-packages/hypothesmith/syntactic.py:107: in draw_symbol
    super().draw_symbol(data, symbol, draw_state)
.venv/lib/python3.9/site-packages/hypothesis/extra/lark.py:181: in draw_symbol
    self.draw_symbol(data, e, draw_state)
.venv/lib/python3.9/site-packages/hypothesmith/syntactic.py:107: in draw_symbol
    super().draw_symbol(data, symbol, draw_state)
.venv/lib/python3.9/site-packages/hypothesis/extra/lark.py:181: in draw_symbol
    self.draw_symbol(data, e, draw_state)
.venv/lib/python3.9/site-packages/hypothesmith/syntactic.py:107: in draw_symbol
    super().draw_symbol(data, symbol, draw_state)
.venv/lib/python3.9/site-packages/hypothesis/extra/lark.py:181: in draw_symbol
    self.draw_symbol(data, e, draw_state)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <hypothesmith.syntactic.GrammarStrategy object at 0x7f4930fc3e80>
data = ConjectureData(VALID, 28 bytes, frozen)
symbol = NonTerminal('simple_stmt')
draw_state = DrawState(result=['global', 'A', '\\ \n', '#\n'])

    def draw_symbol(self, data, symbol, draw_state):  # type: ignore
        count = len(draw_state.result)
        super().draw_symbol(data, symbol, draw_state)
        if symbol.name in COMPILE_MODES:
            try:
                compile(
                    source="".join(draw_state.result[count:]),
                    filename="<string>",
                    mode=COMPILE_MODES[symbol.name],
                )
            except SyntaxError:
                # Python's grammar doesn't actually fully describe the behaviour of the
                # CPython parser and AST-post-processor, so we just filter out errors.
                assume(False)
            except Exception as err:  # pragma: no cover
                # Attempting to compile almost-valid strings has triggered a wide range
                # of bizzare errors in CPython, especially with the new PEG parser,
                # and so we maintain this extra clause to ensure that we get a decent
                # error message out of it.
                if isinstance(err, SystemError) and sys.version_info[:3] == (3, 9, 0):
                    # We've triggered https://bugs.python.org/issue42218 - it's been
                    # fixed upstream, so we'll treat it as if it were a SyntaxError.
                    assume(False)
                source_code = ascii("".join(draw_state.result[count:]))
>               raise type(err)(
                    f"compile({source_code}, '<string>', "
                    f"{COMPILE_MODES[symbol.name]!r}) "
                    f"raised {type(err).__name__}: {str(err)}"
                ) from err
E               SystemError: compile('globalA\\ \n#\n', '<string>', 'single') raised SystemError: Negative size passed to PyUnicode_New

.venv/lib/python3.9/site-packages/hypothesmith/syntactic.py:129: SystemError
---------------------------------- Hypothesis ----------------------------------
Highest target scores:
               6  (label='(hypothesmith) number of unique ast node types')
               9  (label='(hypothesmith) instructions in bytecode')
              13  (label='(hypothesmith) total number of ast nodes')


You can reproduce this example by temporarily adding @reproduce_failure('6.24.2', b'AXicY2RkYGBlABIgwMwIYgAJCB8AAUAAEQ==') as a decorator on your test case

Register a strategy for `libcst.MatchSingleton`

I've packaged hypothesmith 0.2.0 and ran the self tests, and I got:

_____________________________________________________________________ test_source_code_from_libcst_node_type[MatchSingleton] _____________________________________________________________________
tests/test_cst.py:23: in test_source_code_from_libcst_node_type
    @given(data=st.data())
E   hypothesis.errors.Unsatisfiable: Unable to satisfy assumptions of test_source_code_from_libcst_node_type
------------------------------------------------------------------------------------------- Hypothesis -------------------------------------------------------------------------------------------
You can add @seed(77441592302281572436730079892928709310) to this test or run pytest with --hypothesis-seed=77441592302281572436730079892928709310 to reproduce this failure.

The installed dependencies are libcst-0.4.1, lark-parser-0.12.0, and hypothesis 6.36.1, with python 3.10.2; on NetBSD/amd64 in case it matters.

can pysource-codegen be used to generate code with hypothesmith

hi, I want to let you know about pysource-codegen

My Story:

I needed something like hypothesmith but my problem was that i needed really long source codes.
Maybe 10000 loc. I already use hypothesis and found your project, but it did not fit my needs.

Hypothesis has some limits (If I am correct) for the size of the generated properties and #15 did also not sound very promising. So I decided to have some fun trying to generate some python code myself.

And now I have something working, which might be also useful for you.

How it works:

It generates a random ast (the ast from python). It does this by parsing some help strings to figure out which nodes can have which child nodes.
This ast can still represent things like 1 + 1 = x which is not valid python.
I check for this cases and fix them before I create them, or after I create the ast. This depends on the kind of problem. It took me some time but I was able to fix all invalid ast constructs.
The generated tree can then be converted to source code using ast.unparse

And this might be the difference between hypothesmith and pysource-codegen. I generate almost normal python code. I am using only identifiers from name_0 to name_4 and only two strings "some string" and "" for example.
Nevertheless, I was able to find 3 bugs in black.

The generated code can then be reduced with pysource-minimize. It parses the code again and performs a binary search.

I don't know enough to understand how hypothesmith works to tell why it generates different code.
I also don't know if it is possible to combine it somehow.

But I wanted to let you know about my approach. Maybe you will find it useful or inspiring.

Let me know If you have any ideas how this could be integrated into hypothesmith, or if I should explain anything further.

New failure on Python3.9

Latest minimal example: compile('A.\u018a\\ ', '<string>', 'single')

Hi! We are using hypothesmith to test our wemake-python-styleguide linter.
And today I got this failure:

self = <hypothesmith.syntactic.GrammarStrategy object at 0x7f702a236ee0>
data = ConjectureData(VALID, 1628 bytes, frozen)
symbol = NonTerminal('simple_stmt')
draw_state = DrawState(result=['\r\n \t\t\t \t \t\n #²\x97\x9c\n\t\r\n \t \t#\U000cf1a7`\x1e£Í\x0b#Ã\U0009460b\U00041f90\U000b0ffc§...b6¯\n#r×í@d¿\x08\r\n  \t\t\t\t\t\t \t\t\t  \t\t \t\t\t\t#\U000dca7e\U000cad5c?\U0005caae¤ÂÁ0Z\U000aa6eap#\U0010c8a6{'])

    def draw_symbol(self, data, symbol, draw_state):  # type: ignore
        count = len(draw_state.result)
        super().draw_symbol(data, symbol, draw_state)
        if symbol.name in COMPILE_MODES:
            try:
                compile(
                    source="".join(draw_state.result[count:]),
                    filename="<string>",
                    mode=COMPILE_MODES[symbol.name],
                )
            except SystemError as err:  # pragma: no cover
                # Extra output to help track down a possible upstream issue
                # https://github.com/Zac-HD/stdlib-property-tests/issues/14
                source_code = "".join(draw_state.result[count:])
>               raise Exception(
                    "unexpected error while attempting to compile "
                    f"{ascii(source_code)!r} in mode={COMPILE_MODES[symbol.name]}"
                ) from err
E               Exception: unexpected error while attempting to compile "'import\\u012c\\u0155u\\u011f\\u9782\\u0117\\xc4\\xd8.\\xcd\\u0179\\u0152\\u0139\\u010b\\xdb.\\xd0\\U0002852aD\\u017b\\xd6\\\\\\t \\t\\r\\nas#\\U0009a4a1\\u0178T\\u5408\\U00029fe8\\xcb\\u0141\\U0002b2b9\\u013c\\xf34D\\xd1p\\xf1\\t\\x0c \\x0c\\x0c \\x0c\\t\\x0c \\x0c\\t \\x0c\\x0c\\x0c \\\\\\x0c\\t\\n;pass\\r\\n\\t \\t\\t  \\r\\n\\r\\n\\t\\t\\t \\t\\t#\\x8b\\n\\r\\n  \\t\\t#\\xe6k\\x81E\\r\\n\\t  \\t \\t#\\U0001ed6b\\xd1#^\\r\\n \\r\\n  \\t\\r\\n  \\t\\t\\t #G\\x08\\x08#\\xe3^6\\xf1\\x87\\U000d8254\\U00070572 oQ}\\x10\\xce\\xc1\\U000563b6\\xaf\\n#r\\xd7\\xed@d\\xbf\\x08\\r\\n  \\t\\t\\t\\t\\t\\t \\t\\t\\t  \\t\\t \\t\\t\\t\\t#\\U000dca7e\\U000cad5c?\\U0005caae\\xa4\\xc2\\xc10Z\\U000aa6eap#\\U0010c8a6{'" in mode=single

.venv/lib/python3.9/site-packages/hypothesmith/syntactic.py:119: Exception
---------------------------------- Hypothesis ----------------------------------
You can add @seed(11935010665446469984514748066733166299) to this test or run pytest with --hypothesis-seed=11935010665446469984514748066733166299 to reproduce this failure.

Test file: https://github.com/wemake-services/wemake-python-styleguide/blob/master/tests/test_checker/test_hypothesis.py
Version used: 0.1.5

Hypothesmith 0.0.3 Appears Incompatible with Hypothesis above 4.32.3

Commit HypothesisWorks/hypothesis@2b7ddef#diff-a6a0e1e84af3282e1fc162b4ebec8fdf introduced a new required parameter to LarkStrategy in hypothesis, but hypothesmith seems to still initialize without it. A marginally useful stack trace from LibCST is as follows:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.6/unittest/__main__.py", line 18, in <module>
    main(module=None)
  File "/usr/lib/python3.6/unittest/main.py", line 94, in __init__
    self.parseArgs(argv)
  File "/usr/lib/python3.6/unittest/main.py", line 141, in parseArgs
    self.createTests()
  File "/usr/lib/python3.6/unittest/main.py", line 148, in createTests
    self.module)
  File "/usr/lib/python3.6/unittest/loader.py", line 219, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/usr/lib/python3.6/unittest/loader.py", line 219, in <listcomp>
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/usr/lib/python3.6/unittest/loader.py", line 153, in loadTestsFromName
    module = __import__(module_name)
  File "/home/dragonminded/LibCST/libcst/tests/test_fuzz.py", line 49, in <module>
    class FuzzTest(unittest.TestCase):
  File "/home/dragonminded/LibCST/libcst/tests/test_fuzz.py", line 55, in FuzzTest
    @hypothesis.given(source_code=from_grammar(start="file_input"))
  File "/home/dragonminded/LibCST/.tox/fuzz/lib/python3.6/site-packages/hypothesmith/syntactic.py", line 106, in from_grammar
    return GrammarStrategy(grammar, start, explicit_strategies).map(check_and_fix)
  File "/home/dragonminded/LibCST/.tox/fuzz/lib/python3.6/site-packages/hypothesmith/syntactic.py", line 55, in __init__
    LarkStrategy.__init__(self, grammar, start=start)  # type: ignore
TypeError: __init__() missing 1 required positional argument: 'explicit'

Pinning to hypothesis 4.32.3 works around the problem.

Broken with `hypothesis==6.84.0`

Source: https://github.com/wemake-services/wemake-python-styleguide/blob/master/tests/test_checker/test_hypothesis.py
PR: wemake-services/wemake-python-styleguide#2729

==================================== ERRORS ====================================
____________ ERROR collecting tests/test_checker/test_hypothesis.py ____________
.venv/lib/python3.8/site-packages/_pytest/runner.py:341: in from_call
    result: Optional[TResult] = func()
.venv/lib/python3.8/site-packages/_pytest/runner.py:372: in <lambda>
    call = CallInfo.from_call(lambda: list(collector.collect()), "collect")
.venv/lib/python3.8/site-packages/_pytest/doctest.py:547: in collect
    module = import_path(
.venv/lib/python3.8/site-packages/_pytest/pathlib.py:567: in import_path
    importlib.import_module(module_name)
/opt/hostedtoolcache/Python/3.8.17/x64/lib/python3.8/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1014: in _gcd_import
    ???
<frozen importlib._bootstrap>:991: in _find_and_load
    ???
<frozen importlib._bootstrap>:975: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:671: in _load_unlocked
    ???
.venv/lib/python3.8/site-packages/_pytest/assertion/rewrite.py:178: in exec_module
    exec(co, module.__dict__)
tests/test_checker/test_hypothesis.py:19: in <module>
    import hypothesmith
.venv/lib/python3.8/site-packages/hypothesmith/__init__.py:3: in <module>
    from hypothesmith.cst import from_node
.venv/lib/python3.8/site-packages/hypothesmith/cst.py:24: in <module>
    from hypothesmith.syntactic import identifiers
.venv/lib/python3.8/site-packages/hypothesmith/syntactic.py:12: in <module>
    from hypothesis.internal.charmap import _union_intervals
E   ImportError: cannot import name '_union_intervals' from 'hypothesis.internal.charmap' (/home/runner/work/wemake-python-styleguide/wemake-python-styleguide/.venv/lib/python3.8/site-packages/hypothesis/internal/charmap.py)

Related commit: HypothesisWorks/hypothesis@26ffda9

AST-based program generation

Grammar-based generation works, and gives us syntactically valid source code.

The next step is to get semantically valid source code! The clear best approach for this is to generate a syntax tree, and "unparse" it into source code. Based on experiments at the PyCon Australia sprints the best AST to use is probably from lib2to3 - and that will give us the unparsing for free via black.

After that, I'd like to go to a concrete syntax tree where we draw formatting information at the same time as the node. This would massively improve our usefulness for black, but it's a lot of extra work.

Generate programs matching a syntatic pattern... using `libcst` matchers

Sometimes you want to test code on some particular syntatic structure - nested context managers perhaps, or binary operators with comments between the clauses (both real examples!).

I was recently explaining to a friend that I couldn't work out how to generate these automatically, because it was really hard to come up with an expressive yet ergonomic way to specify "just enough" structure... and then immediately realized that LibCST Matchers provide exactly that, and we're already working with their node objects anyway for the from_node() strategy.

Let's make a from_matcher() strategy!

I don't even have a design sketch here, but while I expect the implementation to be quite a lot of work, it'll probably be more tedious than especially difficult (relative to, you know, state-of-the-art random program generation without leaning on a fantastic CST library).

Python3.9 related bug.

Please see: psf/black#1749

How to get hypothesmith to generate a FunctionDef with arguments?

I have been experimenting with this library for a bit for purposes of testing refactoring tools. I can generate lots of extremely weird identifiers for all kinds of things, but the structure of everything generated seems fairly simple - too simple.

For example, I am trying to generate function signatures. I imagine many possibilities with the arguments - they can have annotations, positional-only-args, the **kwargs-keyword, default-arguments, etc etc.

Here is my code to generate function-definitions that have at least something in their argument brackets (execute with pytests -s flag to show the prints):

import hypothesmith
from hypothesis import given, settings, HealthCheck, assume
import libcst
import re

s = hypothesmith.from_node(node=libcst.FunctionDef, auto_target=True)

@given(code = s)
@settings(suppress_health_check=[HealthCheck.filter_too_much, HealthCheck.too_slow], max_examples=5000)
def test(code):
    assume(re.search(r"\([^)(]+\)\s*:", code) is not None) # should match 'def foo(x):pass' but not 'def foo():pass'
    print("------")
    print(code)
    print("------")

This calculates for a long time but finds absolutely nothing. What am I doing wrong? Or did I misunderstand the purpose of this library?

FailedHealthCheck when generating eval_input

When generating eval input I consistently get the following error:

hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data. Health check found 50 filtered examples but only 3 good ones. This will make your tests much slower, and also will probably distort the data generation quite a lot. You should adapt your strategy to filter less. This can also be caused by a low max_leaves parameter in recursive() calls
See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.filter_too_much to the suppress_health_check settings for this test.

The test is really straightforward:

@hypothesis.given(hypothesmith.from_grammar(start="eval_input"))
def test_expressions(self, expression: str):
    expression = expression.strip()
    print(expression)
    self.assertTrue(expression)

Note: the error appears in subsequent runs. The first time I get no error but is quite slow, but whenever I re-run the tests the errors is triggered and the tests are run much faster.

0.3.1: no git tag and no commits

According to https://github.com/Zac-HD/hypothesmith later version is 0.3.1.
However here there is no here any 0.3.1 commits and there is no as well any version tags.

Please push recent changes to git repo and start adding version tags.
Without those tags it is really hard to figure out what has changes between versions 😞

Grammar definition file handling is broken

The very idea if fetching grammar definition file is broken

hypothesmith/src/hypothesmith/syntactic.py

Lines 16 to 25 in 167d3f4

    
           URL = "https://raw.githubusercontent.com/lark-parser/lark/master/lark/grammars/python.lark" 
        
           fname = Path(__file__).with_name(URL.split("/")[-1]) 
        
           if fname.exists(): 
        
               with open(fname) as f: 
        
                   lark_grammar = f.read() 
        
           else:  # pragma: no cover 
        
               # To update the grammar definition, delete the file and execute this. 
        
               with urllib.request.urlopen(URL) as handle: 
        
                   lark_grammar = handle.read().decode()

It doesn't work offline (either physically, or forcibly, such as within sandboxed CI), it will break on transient network problems, it will break if the file is moved in lark repository or the repository itself is moved, it doesn't work when the directory where the file is located is writable (e.g. case when hypothesmith is installed systemwide). hypothesmith should either install its own copy of the file, or point to the file which should be already installed by libcst.

Here's the patch for the latter solution I've used for FreeBSD port:

--- src/hypothesmith/syntactic.py.orig	2022-11-26 03:56:51 UTC
+++ src/hypothesmith/syntactic.py
@@ -7,24 +7,20 @@ import urllib.request
 from functools import lru_cache
 from pathlib import Path
 
+import lark.grammars
 from hypothesis import assume, strategies as st
 from hypothesis.extra.lark import LarkStrategy
 from hypothesis.internal.charmap import _union_intervals
 from lark import Lark
 from lark.indenter import Indenter
 
-URL = "https://raw.githubusercontent.com/lark-parser/lark/master/lark/grammars/python.lark"
-fname = Path(__file__).with_name(URL.split("/")[-1])
+fname = Path(lark.grammars.__file__).with_name('python.lark')
 
 if fname.exists():
     with open(fname) as f:
         lark_grammar = f.read()
 else:  # pragma: no cover
-    # To update the grammar definition, delete the file and execute this.
-    with urllib.request.urlopen(URL) as handle:
-        lark_grammar = handle.read().decode()
-    with open(fname, "w") as f:
-        f.write(lark_grammar)
+    raise RuntimeError(f'Grammar definition file not found at {fname}')
 
 COMPILE_MODES = {
     "eval_input": "eval",

Use `from_regex(..., alphabet=st.characters(min_codepoint=1, codec="utf-8"))` to reduce rejection sampling

Only a subset of Unicode is permitted in Python source files, and thanks to new features in Hypothesis 6.84 we can finally express that by constraining our regex-based generation instead of rejection sampling with .filter() (e.g.). This is probably a modest performance improvement, but easy to implement and the gains will grow as we fix other issues.

Distribute license and tests in pypi tarball

For distribution purposes we need the license to be shipped with the sourcecode.
The tests would allow us to verify our python stack against the hypotesmith and see if we break up something.
Alternative option would be if you could add tags here so we would fetch the tag tarball from github.

hypothesmith-inspired compiler fuzzer used to validate PEP 709 implementation

Hi!

Thought you might be interested to know that I wrote (and @JelleZijlstra greatly improved) a compiler fuzzer to validate the implementation of PEP 709 (inlined comprehensions) in Python 3.12: https://github.com/carljm/compgenerator

I started out with Hypothesmith, but I wanted to constrain the generated examples a lot more, and as I started replacing Hypothesmith strategies one by one, I realized that it would be a lot simpler to work with AST rather than with LibCST. We're fuzzing the compiler, not the parser, so we don't care about syntactic trivia.

The resulting fuzzer works pretty well (although it's slow, because we filter too much; I haven't worked out yet how to better constrain generation so we don't have to filter so much), and it's caught, so far, I think five or six different bugs in the PEP 709 implementation (all now fixed.)

No particular action item here, feel free to close :)

I guess one possible thing to consider would be whether AST-based strategies would make any sense for Hypothesmith, for non-parser-focused uses.

libcst version requirement should be updated

With libcst 0.3.23, hypothesmith 0.2.1:

tests/test_cst.py:12: in <module>
    import hypothesmith
/usr/local/lib/python3.9/site-packages/hypothesmith/__init__.py:3: in <module>
    from hypothesmith.cst import from_node
/usr/local/lib/python3.9/site-packages/hypothesmith/cst.py:100: in <module>
    libcst.MatchSingleton,
E   AttributeError: module 'libcst' has no attribute 'MatchSingleton'

It looks like the newer version is required and the version requirement should be updated.

Invalid source code generated

Hi!

Thanks a lot for writing this, today I have stumbled upon this package in gforcada/flake8-builtins#46
and then decided to implement the same check in my own project: wemake-services/wemake-python-styleguide#1080

It is really useful for linters and code quality tools!

I am not sure if that's actually a bug or not, but it looks like the generated source code is not a valid python:

E         File "<unknown>", line 1
E           pass\;#
E                 ^
E       SyntaxError: unexpected character after line continuation character

I can get around this problem by rejecting code that is not valid:

try:
    compile(code_to_parse, '<filename>', 'exec') 
except SyntaxError:
    reject()

	URL = "https://raw.githubusercontent.com/lark-parser/lark/master/lark/grammars/python.lark"
	fname = Path(__file__).with_name(URL.split("/")[-1])

	if fname.exists():
	with open(fname) as f:
	lark_grammar = f.read()
	else: # pragma: no cover
	# To update the grammar definition, delete the file and execute this.
	with urllib.request.urlopen(URL) as handle:
	lark_grammar = handle.read().decode()