well-typed / falsify Goto Github PK

View Code? Open in Web Editor NEW

35.0 35.0 5.0 372 KB

License: Other

Haskell 100.00%

falsify's People

Contributors

Stargazers

Watchers

Forkers

voidus ejconlon konn augyg pgujjula

falsify's Issues

example of how to use Test.Falsify.Predicate.at?

I'd like to be able to use P.at to supply some more useful string context rather than the Show'ed form. I can't see how it's possible to make up the Expr argument needed to pass to it, though. Have I missed something?

I guess I can always newtype up something with a more pleasant Show instance, but it'd be nice to be able to do it in a more localised way.

Test database a la Hypothesis?

Might it be possible to capture the state of the input to the test that failed? I know there's --falsify-replay for reproducibility, but ideally, I'd be able to capture the exact way that a test failed, generate the input off that state, and save it to a test database as something to try quickly, rather than have it spend ten minutes each time grinding through finding and shrinking a complex test case.

Obviously it's possible to just capture the actual input outside the library, but presumably there's some internal value that fully determines the smallest test case that falsify could find, and it could be fed back in as something to try? I'm willing to have a bit of a hack at implementation, but if it's all boiling down to the Word64 from prim, that would be far easier & more efficient to stash. (Obviously it would probably not be preserved as semantically interesting across versions of falsify)

https://hypothesis.readthedocs.io/en/latest/database.html

Support progress reporting in tasty

falsify/lib/src/Test/Falsify/Internal/Driver/Tasty.hs

Lines 67 to 69 in f490150

    
           instance IsTest Test where 
        
             -- @tasty@ docs (1.4.3) explicitly say to ignore the @reportProgress@ argument 
        
             run opts (Test testOpts prop) _reportProgress =

This will cease to be true in tasty-1.5. It would be great if falsify gets updated to support progress reporting. You can try the latest tasty with the following cabal.project:

packages: .

source-repository-package
  type: git
  location: https://github.com/UnkindPartition/tasty
  subdir: core

source-repository-package
  type: git
  location: https://github.com/UnkindPartition/tasty
  subdir: hunit

source-repository-package
  type: git
  location: https://github.com/UnkindPartition/tasty
  subdir: quickcheck

source-repository-package
  type: git
  location: https://github.com/UnkindPartition/tasty
  subdir: smallcheck

source-repository-package
  type: git
  location: https://github.com/Bodigrim/tasty-bench

source-repository-package
  type: git
  location: https://github.com/ocharles/tasty-rerun

Cf. UnkindPartition/tasty#379

Introduce a uniform `signedFraction` generator

The paper describes a way to generate signed fractions (Section 5.3), but the resulting generator will not be uniform in the range. @brunjlar proposes the following approach instead:

I think I have a relatively elegant solution for the problem, using the "Probability Integral Transform": https://en.wikipedia.org/wiki/Probability_integral_transform

First, we need to determine the CDF of your construction, i.e. the CDF of taking the minimum of two uniformely distributed random variables from [0,1], but that's easy:

P(min X Y < a) = 1 - P(x >= a && y >= a) = 1 - (1 - a)(1 - a) = 2a - a^2.

This also shows that I was right, and the minimum is NOT uniformly distributed. However, using the theorem from above link, we get that

let m = min x y in 2 * m - m * m

IS uniformly distributed in [0,1]. Furthermore, f(a)=2a-a^2 is positive on [0,1], so this transformation is strictly monotonous increasing, which means that you still get the desired effect of being shrunk towards zero.

So to sum up, if I'm not mistaken, all you have to do is use your method from the library and the paper and then - instead of taking min X Y directly - apply f(a)=2a-a^2 to it.

Figure out why the `StringToBool` test is so slow

It turns out, the variant function is called more than 3 million times in this test; that explains why #26 had such a dramatic impact. It's however not entirely clear to me why it's called quite that often. Would be good to investigate, as it might reduce the need to solve #25 .

Indexed Arbitrary typeclass

Hello,

Great library so far, thanks for publishing it!

Even with integrated shrinking libraries, I find it useful to define my own project-local Arbitrary typeclass with Generics support to build Gen values for boring records. I understand that Arbitrary is problematic in general because for some types there isn't just one way to make a generator, but what about an indexed variant, something like:

class Arbitrary p a where
  arbitrary :: Proxy p -> Gen a

This way projects can define their own local types for p and corresponding (non-orphan) instances as they see fit.

If it interests you, I would be happy to open a PR with the typeclass and Generic instances.

Make precision of `fraction` configurable

I think this can result in important performance gains. See the Compound.list property.

Release for GenDefault typeclass

Hi @edsko - what do you think about cutting a release to pick up the default typeclass changes? I would like to push a lib to hackage that would make use of them. Thanks!

IO in Propertys

Heya,

I'm thinking about how to test some code that needs IO to run.
The situations I'm thinking about are more like using STM internally or something, the possibility of IO making the property unreliable is not really my focus here.

I didn't find anything in the docs, so I looked into the code. Relevant definitions:

newtype TestResultT e m a = TestResultT {
      runTestResultT :: m (TestResult e a)
    }
newtype Property' e a = WrapProperty {
      unwrapProperty :: TestResultT e (StateT TestRun Gen) a
    }
newtype Gen a = Gen { runGen :: SampleTree -> (a, [SampleTree]) }

So there's no base IO behind Gen (like in hedgehog) that we could expose through MonadIO, even if we wanted to. And we don't:

-- NOTE: 'Gen' is /NOT/ an instance of 'Alternative'; this would not be
-- compatible with the generation of infinite data structures. For the same
-- reason, we do not have a monad transformer version of Gen either.

This leaves the question: Is there a way to do what I'm looking for?

I've tried:

Changing TestResultT to m (IO (TestResult e a)), but couldn't write the monad instance, presumably because m isn't a monad transformer.
Since TestResultT is applied to StateT TestRun Gen, requiring a transformer is out
I've reworked the Predicate to basically run in IO, but got stuck when implementing assert, which is not a big surprise I guess.

Right now, I can't really see a way to implement this at all (short of making Gen a transformer).

Maybe an interesting approach is to split the Property in a generation phase (without IO, with gen) and an assertion phase (with IO, without gen)? Seems annoying type-wise, but that would prevent the obvious footgun of depending on external input during test prep.

A generator for Text

Heya,

I think it would be nice to have a generator for Text values. (String would probably be good as well, but I think most use cases will prefer Text).

This will probably not be possible in an unopinionated way, but that doesn't feel like a good reason not to do it.

I've tried the following, which works but it seems very slow. I've looked at the others (hedgehog, genvalidity, quickcheck) and tried to figure our something sensible.

My goals:

No further configuration (hedgehog requires passing a Gen Char to generate Text)
Shrink towards printable
Generate long values occasionally (Not sure if this is actually valuable)
Cover the whole range of unicode

This is what I came up with

genChar :: Gen Char
genChar =
    let
        charBetween a b = chr <$> Gen.int (between (a, b))
    in Gen.frequency
        [(1, charBetween 0x61 0x7A) -- a-z
        ,(1, charBetween 0x21 0x7E) -- printable ascii
        ,(1, foldl1 Gen.choose
                -- anything excluding noncharacters and invalid standalone surrogates
                -- (This is what hedgehog calls "unicode")
                [ charBetween (ord $ minBound @Char) 55295
                , charBetween 57344 65533
                , charBetween 65536 (ord $ maxBound @Char)
                ]
            )
        ,(1, Gen.choose
            -- UTF16Surrogate codepoints. GenValidity lists them separately,
            -- maybe they often cause problems?
            (charBetween 0xD800 0xDBFF)
            (charBetween 0xDC00 0xDFFF)
         )
        ]

genText :: Gen Text
genText = do
    len :: Range Word <- Gen.frequency
        [( 1, pure $ constant 0)
        ,(20, pure (skewedBy 10 (1, 1_000_000)))
        ]
    -- TextBuilder.run . mconcat . map TextBuilder.char
    toText
        <$> Gen.list len genChar

test_foo :: TestTree
test_foo = testProperty "it's not slow" do
    t <- gen genText
    collect "length" [T.length t]
    assert $ P.ne .$ ("", "something extremely specific") .$ ("t", t)

It's super slow though. The test takes about 10 seconds on my machine, here's a run:

Build profile: -w ghc-9.4.5 -O1
In order, the following will be built (use -v for more details):
 - package-0.1.0.0 (test:test) (file test/Guestlist/Handlers/PublicTest.hs changed)
Preprocessing test suite 'test' for package-0.1.0.0..
Building test suite 'test' for package-0.1.0.0..
[1 of 2] Compiling Module ( test/Module.hs, /home/path/dist-newstyle/build/x86_64-linux/ghc-9.4.5/package-0.1.0.0/t/test/build/test/test-tmp/Module.o ) [Source file changed]
[2 of 2] Compiling Main             ( test/test.hs, /home/path/dist-newstyle/build/x86_64-linux/ghc-9.4.5/package-0.1.0.0/t/test/build/test/test-tmp/Main.o ) [Module changed]
[3 of 3] Linking /home/path/build/x86_64-linux/ghc-9.4.5/package-0.1.0.0/t/test/build/test/test [Objects changed]
test/test.hs
  it's not slow: OK (11.31s)
    100 successful tests
    
    Label "length":
        8.0000% 0
       33.0000% 1
        1.0000% 10235
        1.0000% 103
        1.0000% 11396
        1.0000% 114996
        1.0000% 12
        1.0000% 13
        1.0000% 1301
        1.0000% 1345
        1.0000% 137
        1.0000% 154
        2.0000% 16
        1.0000% 175151
        1.0000% 18640
        1.0000% 2
        1.0000% 20210
        1.0000% 2265
        1.0000% 2395
        1.0000% 25
        1.0000% 25015
        1.0000% 266042
        1.0000% 27241
        2.0000% 3
        1.0000% 30
        1.0000% 3192
        1.0000% 32
        1.0000% 3405
        1.0000% 3485
        1.0000% 3699
        1.0000% 37
        1.0000% 376
        1.0000% 37735
        1.0000% 390
        1.0000% 3955
        1.0000% 4114
        1.0000% 416
        1.0000% 4771
        1.0000% 47725
        1.0000% 483
        1.0000% 4844
        1.0000% 5118
        1.0000% 6
        1.0000% 6395
        1.0000% 64
        1.0000% 68
        1.0000% 7336
        1.0000% 765
        1.0000% 76819
        1.0000% 77191
        1.0000% 77225
        1.0000% 87304
        1.0000% 8739
        1.0000% 9
        1.0000% 90972
        1.0000% 931
        1.0000% 96
        1.0000% 97994
        1.0000% 98

All 1 tests passed (11.31s)

I had this for the length before:

genText :: Gen Text
genText = do
    len :: Range Word <- Gen.frequency
        [( 1, pure $ constant 0)
        ,(20, pure (skewedBy 10 (1, 1_000)))
        ,( 8, pure (skewedBy 10 (1_000, 1_000_000)))
        ]
    TextBuilder.run . mconcat . map TextBuilder.char <$> Gen.list len genChar

But the shrinker got stuck with the third option, ending up with a 1000 char counterexample, which sucks.

I'm still starting to figure our property testing, so if you think this is not going anywhere let me know.

Think about generation of functions

Some references:

https://www.reddit.com/r/haskell/comments/646k3d/comment/dg1485c
https://hackage.haskell.org/package/QuickCheck-2.14.2/docs/Test-QuickCheck.html#g:15
https://hackage.haskell.org/package/QuickCheck-2.14.2/docs/Test-QuickCheck-Gen-Unsafe.html#v:promote
https://dl.acm.org/doi/10.1145/2364506.2364516
Corresponding presentation: https://www.youtube.com/watch?v=CH8UQJiv9Q4 . Quote (around 6:00): "Many people think that when you quantify over a function -- when you generate some kind of random function -- that must already be some kind of finite table. (..) But you cannot really do that in this way, because you don't know beforehand what the function is going to be applied to". It then shows an example of a function from strings to booleans, making it evne more obvious that we should wait until we are given a value before we produce a result. Main insight: "in a terminating computation, any function is only applied a finite number of times". (The Argument class in the presentation is called Function in modern QuickCheck.)

Pretty sure that at least promote is not definable, because it relies on capturing the PRNG. Would it make sense to have an additional constructor in Gen for this...?

data Gen a where
  Pure  :: a -> Gen a
  Prim  :: Prim a -> Gen a
  Bind  :: Gen a -> (a -> Gen b) -> Gen b
  STree :: Gen SampleTree

perhaps...? How would that then shrink...?

Improve separation of concerns between `Function` and `Perturb`

Currently when generating a Fun Char .., we need both Perturb Char and Function Char: the Perturb is necessary to generate the original random function, and Function is necessary to then reify this function. The strange thing is that when we then shrink the reified function, we might end up Perturbing at a different type (in this case, Int, because the Function instance for Char is defined in terms of the Function instance for Int.

This is confusing and annoying; for example, it might be beneficial to improve the Perturb instance for Char so that common characters get a shorter Focus; but that's not possible, since the instance for Char is only used for the initial value of the function, not during shrinking, so we're not actually getting much benefit.

Feature request: Don't generate unused values (a la LazySmallcheck2012)

What is lazy generation

LazySmallcheck2012 (original, and my fork that works on newer GHCs) behaves in a way where data is only generated when the test forces its evaluation. (I just checked the original lazysmallcheck package, and it doesn't seem to behave this way.)

For example, consider this simple test in falsify:

prop_unneededElements :: Property ()
prop_unneededElements = do
  lst :: [Int] <- gen (Gen.list
                        (Range.between (0, 100))
                        (Gen.inRange (Range.between (0, 100))))
  if length lst > 3 && head lst < 10
     then testFailed "known failure, to check strictness"
     else return ()

This is asserting that every lst :: [Int] with more than three elements also has a first element that is at least 10. It fails (as expected):

Running 1 test suites...
Test suite falsify-test-test: RUNNING...
Playing
  Unneeded elements: FAIL (0.12s)
    failed after 28 successful tests and 5 shrinks
    known failure, to check strictness
    Logs for failed test run:
    generated [0,0,0,0] at CallStack (from HasCallStack):
      gen, called at test/Main.hs:18:19 in main:Main
    
    Use --falsify-replay=010985dabce2ff4eb5c78d1e70747b669b to replay.

1 out of 1 tests failed (0.13s)

Test suite falsify-test-test: FAIL

In this case falsify reports a minimal counterexample of [0, 0, 0, 0]. Whilst that's the smallest fully-normalised value, it is actually over-specified: the test never forces the last three elements, so they are irrelevant to the outcome (only the spine and the first element are relevant). If we compare this to the same check in LazySmallcheck2012 (in GHCi):

*LSC2012> test (\(lst :: [Int]) -> not (length lst > 3 && head lst < 10))
LSC: Depth 0:
LSC: Property holds after 3 tests.
LSC: Depth 1:
LSC: Property holds after 5 tests.
LSC: Depth 2:
LSC: Property holds after 7 tests.
LSC: Depth 3:
LSC: Property holds after 9 tests.
LSC: Depth 4:
LSC: Counterexample found after 12 tests.
Var 0: -3:_:_:_:[]
*** Exception: ExitFailure 1

It gives -3:_:_:_:[], AKA [-3, _, _, _]. Here _ indicates a value which was never forced, so never ran its generator, and is hence irrelevant to the cause of the test failure. Unfortunately in this case the first value (-3) isn't minimal, since (Lazy)Smallcheck doesn't do any shrinking; the ideal counterexample would be [0, _, _, _].

Why lazy generation

There are a few reasons this behaviour is nicer, as a user, than seeing fully normalised counterexamples:

Minimal values, like 0, are ambiguous: sometimes it means they can be ignored, but sometimes there is significance to that value. Using _ for the former removes some ambiguity.
If we expect a value to be important for branching, seeing it appear as _ indicates a problem with our control flow
Using _ instead of a particular value avoids accidental coincidences that we can waste time ruling out, e.g. [0,0,0,0] contains duplicate elements, but that's just a coincidence and isn't related to the failure.
More properties can be proved correct. Those with small, finite domains like Bool can already be proved by exhaustion; lazy generators complement this, since a property which passes without forcing some part of its input must therefore be true for all values of that part.

There may be performance benefits, by avoiding a bunch of generators running; but I haven't found that particularly noticable.

How LazySmallcheck does it

As far as I know, LazySmallcheck2012 works by starting with all inputs as exceptions, and retrying a test with more refined inputs if any of those exceptions are thrown (until some depth limit is reached). In the above example it might have run the property on an input like throw A, which causes the exception A to be thrown; so it ran again with [] (which passed) and throw A : throw B (which throws B). The latter is refined to throw A : [] (which passed) and throw A : throw B : throw C (which throws C) and so on until throw A : throw B : throw C : throw D : [] finally satisfies the length check, and causes the head check to throw A; causing it to be retried with an input like -3 : throw A : throw B : throw C : [] which finally causes the test to fail, and that is our counterexample (with throw X shown as _).

How falsify might do it

Reading through https://well-typed.com/blog/2023/04/falsify/ it appears like falsify's sample trees could be used for lazy generation. From my understanding, each generator gets its own sub-tree; each node contains a PRNG sample which a generator can use directly; and calls to other generators can be given the child subtrees. It seems to me that we can tell which parts of the input were forced by a test, by keeping track of which nodes of the sample tree have been forced.

Unfortunately I can't think of an elegant, pure way to do this. We could wrap each node in a thunk which has the side-effect of inserting an ID in a set, then freeze a copy of the set after running the test, to see which nodes were forced. Ugly, but workable.

Once we know which parts of the same tree were used/unused, we need to map that over to actual counterexamples, which would require running the generator again. We'd need a pretty-printing mechanism that's similar to LazySmallcheck's, i.e. capable of inserting a _ in place of undefined values (e.g. by replacing the unused sample nodes with such printers, or exceptions, or something else).

I'm sure someone could think of a more elegant approach than this, but I at least think it's (a) desirable behaviour and (b) plausible to implement, given the way (I think) falsify works.

Memoization

We have the withoutShrinking combinator that makes generators that we want to run only once against a SampleTree. Unfortunately, we currently have to re-run those generators every single time they run, which leads to pretty bad performance; this is most notable in tests that depend on fromShrinkTree; see the TestSuite.Sanity.Functions.StringToBool test one a particularly bad example.

It feels like it should be relatively simple to memoize these results. For example, we could imagine that shrinking could return new generators, such that if we have

data Gen a where
  ..
  Once :: (SampleTree -> (a, SampleTree)) -> Gen a

then

shrink (Once f) st = first Pure $ f st

replacing the generator by a constant one that just returns the value it produces. Unfortunately, this does not work: we cannot return new generators, because generators are monadic (how would we update a continuation a -> Gen b?).

The only state we have is in the SampleTree. Now, we could of course do something like this:

data Sample = .. | NotShrunk Word64 (Maybe Dynamic)

where we cache the constructed value right in the same tree. This too would make sense, but it's unclear how this works with context re-interpretation . For example, what if we have a generator such as

example :: Gen ..
example = do
    b <- bool
    if b then once ..
         else once ..

We now run two different generators against the same sample tree; surely we don't expect to get the result from one of those generators in the other one? That would be very confusing (consider the case where one generator is prim, and the other (* 2) <$> prim or something like that; it would be very strange if we got an odd number out of that generator).

`Gen.enum` is unusable: no way to construct a `Range` for an enumerable (but not integral) type

Consider the use of `Ratio` instead of `Double` for fractions

Improve performance of `list` by using `bst`

We introduced bst in #29, but can probably also use it in list generation: this would enable us to drop entire parts of the list in one go.

Add to Stackage?

Would be useful to have this package in stackage. Are there any plans to do so?

Feature Request: Pure interface for Gen

Hi,

First of all, thank you for your great library and great expositions! This library is really cool indeed.

Feature Request: Pure interface for Gen

Currently, there is an impure interface to sample a value from Gen:

sample :: Gen a -> IO a

It would also be nice to have a pure analogue for this. The most straightforward one could be:

sampleFrom :: SMGen -> Gen a -> a

Unfortunately, this leaks the implementation detail. So perhaps it could be something like this:

data GenSource -- opaque (could be just a newtype wrapper around SMGen or SampleTree)
mkGenSource :: Word64 -> GenSource
sampleFrom :: GenSource -> Gen a -> a

Background

Recently, I use falsify to test the validity of transformations on propositional formulae.
The test worked well, and I eventually have several valid implementations with different performance characteristics. So I decided to take some benchmarks on them.

To do that, I have to generate some random formula of a specific size. Surely, we can do this with sample :: Gen a -> IO a. One downside of sample function is the lack of reproducibility - as sample initialises SMGen from the air on every invocation, we don't have any access to the seed it used.
Once we have sampled some formulae from Gen, we must save them on a disk. This becomes tedious when one needs multiple formulae. If we can sample a value from specified seed, then we can just hard-code the seed instead.

	instance IsTest Test where
	-- @tasty@ docs (1.4.3) explicitly say to ignore the @reportProgress@ argument
	run opts (Test testOpts prop) _reportProgress =

well-typed / falsify Goto Github PK

falsify's People

Contributors

Stargazers

Watchers

Forkers

falsify's Issues

What is lazy generation

Why lazy generation

How LazySmallcheck does it

How falsify might do it

Feature Request: Pure interface for Gen

Background

Recommend Projects

Recommend Topics

Recommend Org