well-typed / falsify Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
I'd like to be able to use P.at to supply some more useful string context rather than the Show'ed form. I can't see how it's possible to make up the Expr argument needed to pass to it, though. Have I missed something?
I guess I can always newtype up something with a more pleasant Show instance, but it'd be nice to be able to do it in a more localised way.
Might it be possible to capture the state of the input to the test that failed? I know there's --falsify-replay for reproducibility, but ideally, I'd be able to capture the exact way that a test failed, generate the input off that state, and save it to a test database as something to try quickly, rather than have it spend ten minutes each time grinding through finding and shrinking a complex test case.
Obviously it's possible to just capture the actual input outside the library, but presumably there's some internal value that fully determines the smallest test case that falsify
could find, and it could be fed back in as something to try? I'm willing to have a bit of a hack at implementation, but if it's all boiling down to the Word64 from prim
, that would be far easier & more efficient to stash. (Obviously it would probably not be preserved as semantically interesting across versions of falsify
)
falsify/lib/src/Test/Falsify/Internal/Driver/Tasty.hs
Lines 67 to 69 in f490150
This will cease to be true in tasty-1.5
. It would be great if falsify
gets updated to support progress reporting. You can try the latest tasty
with the following cabal.project
:
packages: .
source-repository-package
type: git
location: https://github.com/UnkindPartition/tasty
subdir: core
source-repository-package
type: git
location: https://github.com/UnkindPartition/tasty
subdir: hunit
source-repository-package
type: git
location: https://github.com/UnkindPartition/tasty
subdir: quickcheck
source-repository-package
type: git
location: https://github.com/UnkindPartition/tasty
subdir: smallcheck
source-repository-package
type: git
location: https://github.com/Bodigrim/tasty-bench
source-repository-package
type: git
location: https://github.com/ocharles/tasty-rerun
The paper describes a way to generate signed fractions (Section 5.3), but the resulting generator will not be uniform in the range. @brunjlar proposes the following approach instead:
I think I have a relatively elegant solution for the problem, using the "Probability Integral Transform": https://en.wikipedia.org/wiki/Probability_integral_transform
First, we need to determine the CDF of your construction, i.e. the CDF of taking the minimum of two uniformely distributed random variables from [0,1], but that's easy:
P(min X Y < a) = 1 - P(x >= a && y >= a) = 1 - (1 - a)(1 - a) = 2a - a^2.
This also shows that I was right, and the minimum is NOT uniformly distributed. However, using the theorem from above link, we get that
let m = min x y in 2 * m - m * m
IS uniformly distributed in [0,1]. Furthermore, f(a)=2a-a^2 is positive on [0,1], so this transformation is strictly monotonous increasing, which means that you still get the desired effect of being shrunk towards zero.
So to sum up, if I'm not mistaken, all you have to do is use your method from the library and the paper and then - instead of taking min X Y directly - apply
f(a)=2a-a^2
to it.
Hello,
Great library so far, thanks for publishing it!
Even with integrated shrinking libraries, I find it useful to define my own project-local Arbitrary
typeclass with Generics support to build Gen
values for boring records. I understand that Arbitrary
is problematic in general because for some types there isn't just one way to make a generator, but what about an indexed variant, something like:
class Arbitrary p a where
arbitrary :: Proxy p -> Gen a
This way projects can define their own local types for p
and corresponding (non-orphan) instances as they see fit.
If it interests you, I would be happy to open a PR with the typeclass and Generic instances.
I think this can result in important performance gains. See the Compound.list
property.
Hi @edsko - what do you think about cutting a release to pick up the default typeclass changes? I would like to push a lib to hackage that would make use of them. Thanks!
Heya,
I'm thinking about how to test some code that needs IO to run.
The situations I'm thinking about are more like using STM internally or something, the possibility of IO making the property unreliable is not really my focus here.
I didn't find anything in the docs, so I looked into the code. Relevant definitions:
newtype TestResultT e m a = TestResultT {
runTestResultT :: m (TestResult e a)
}
newtype Property' e a = WrapProperty {
unwrapProperty :: TestResultT e (StateT TestRun Gen) a
}
newtype Gen a = Gen { runGen :: SampleTree -> (a, [SampleTree]) }
So there's no base IO
behind Gen
(like in hedgehog) that we could expose through MonadIO
, even if we wanted to. And we don't:
-- NOTE: 'Gen' is /NOT/ an instance of 'Alternative'; this would not be
-- compatible with the generation of infinite data structures. For the same
-- reason, we do not have a monad transformer version of Gen either.
This leaves the question: Is there a way to do what I'm looking for?
I've tried:
TestResultT
to m (IO (TestResult e a))
, but couldn't write the monad instance, presumably because m isn't a monad transformer.TestResultT
is applied to StateT TestRun Gen
, requiring a transformer is outassert
, which is not a big surprise I guess.Right now, I can't really see a way to implement this at all (short of making Gen
a transformer).
Maybe an interesting approach is to split the Property in a generation phase (without IO, with gen
) and an assertion phase (with IO, without gen
)? Seems annoying type-wise, but that would prevent the obvious footgun of depending on external input during test prep.
Heya,
I think it would be nice to have a generator for Text
values. (String
would probably be good as well, but I think most use cases will prefer Text
).
This will probably not be possible in an unopinionated way, but that doesn't feel like a good reason not to do it.
I've tried the following, which works but it seems very slow. I've looked at the others (hedgehog, genvalidity, quickcheck) and tried to figure our something sensible.
My goals:
Gen Char
to generate Text
)This is what I came up with
genChar :: Gen Char
genChar =
let
charBetween a b = chr <$> Gen.int (between (a, b))
in Gen.frequency
[(1, charBetween 0x61 0x7A) -- a-z
,(1, charBetween 0x21 0x7E) -- printable ascii
,(1, foldl1 Gen.choose
-- anything excluding noncharacters and invalid standalone surrogates
-- (This is what hedgehog calls "unicode")
[ charBetween (ord $ minBound @Char) 55295
, charBetween 57344 65533
, charBetween 65536 (ord $ maxBound @Char)
]
)
,(1, Gen.choose
-- UTF16Surrogate codepoints. GenValidity lists them separately,
-- maybe they often cause problems?
(charBetween 0xD800 0xDBFF)
(charBetween 0xDC00 0xDFFF)
)
]
genText :: Gen Text
genText = do
len :: Range Word <- Gen.frequency
[( 1, pure $ constant 0)
,(20, pure (skewedBy 10 (1, 1_000_000)))
]
-- TextBuilder.run . mconcat . map TextBuilder.char
toText
<$> Gen.list len genChar
test_foo :: TestTree
test_foo = testProperty "it's not slow" do
t <- gen genText
collect "length" [T.length t]
assert $ P.ne .$ ("", "something extremely specific") .$ ("t", t)
It's super slow though. The test takes about 10 seconds on my machine, here's a run:
Build profile: -w ghc-9.4.5 -O1
In order, the following will be built (use -v for more details):
- package-0.1.0.0 (test:test) (file test/Guestlist/Handlers/PublicTest.hs changed)
Preprocessing test suite 'test' for package-0.1.0.0..
Building test suite 'test' for package-0.1.0.0..
[1 of 2] Compiling Module ( test/Module.hs, /home/path/dist-newstyle/build/x86_64-linux/ghc-9.4.5/package-0.1.0.0/t/test/build/test/test-tmp/Module.o ) [Source file changed]
[2 of 2] Compiling Main ( test/test.hs, /home/path/dist-newstyle/build/x86_64-linux/ghc-9.4.5/package-0.1.0.0/t/test/build/test/test-tmp/Main.o ) [Module changed]
[3 of 3] Linking /home/path/build/x86_64-linux/ghc-9.4.5/package-0.1.0.0/t/test/build/test/test [Objects changed]
test/test.hs
it's not slow: OK (11.31s)
100 successful tests
Label "length":
8.0000% 0
33.0000% 1
1.0000% 10235
1.0000% 103
1.0000% 11396
1.0000% 114996
1.0000% 12
1.0000% 13
1.0000% 1301
1.0000% 1345
1.0000% 137
1.0000% 154
2.0000% 16
1.0000% 175151
1.0000% 18640
1.0000% 2
1.0000% 20210
1.0000% 2265
1.0000% 2395
1.0000% 25
1.0000% 25015
1.0000% 266042
1.0000% 27241
2.0000% 3
1.0000% 30
1.0000% 3192
1.0000% 32
1.0000% 3405
1.0000% 3485
1.0000% 3699
1.0000% 37
1.0000% 376
1.0000% 37735
1.0000% 390
1.0000% 3955
1.0000% 4114
1.0000% 416
1.0000% 4771
1.0000% 47725
1.0000% 483
1.0000% 4844
1.0000% 5118
1.0000% 6
1.0000% 6395
1.0000% 64
1.0000% 68
1.0000% 7336
1.0000% 765
1.0000% 76819
1.0000% 77191
1.0000% 77225
1.0000% 87304
1.0000% 8739
1.0000% 9
1.0000% 90972
1.0000% 931
1.0000% 96
1.0000% 97994
1.0000% 98
All 1 tests passed (11.31s)
I had this for the length before:
genText :: Gen Text
genText = do
len :: Range Word <- Gen.frequency
[( 1, pure $ constant 0)
,(20, pure (skewedBy 10 (1, 1_000)))
,( 8, pure (skewedBy 10 (1_000, 1_000_000)))
]
TextBuilder.run . mconcat . map TextBuilder.char <$> Gen.list len genChar
But the shrinker got stuck with the third option, ending up with a 1000 char counterexample, which sucks.
I'm still starting to figure our property testing, so if you think this is not going anywhere let me know.
Some references:
Argument
class in the presentation is called Function
in modern QuickCheck.)Pretty sure that at least promote
is not definable, because it relies on capturing the PRNG. Would it make sense to have an additional constructor in Gen
for this...?
data Gen a where
Pure :: a -> Gen a
Prim :: Prim a -> Gen a
Bind :: Gen a -> (a -> Gen b) -> Gen b
STree :: Gen SampleTree
perhaps...? How would that then shrink...?
Currently when generating a Fun Char ..
, we need both Perturb Char
and Function Char
: the Perturb
is necessary to generate the original random function, and Function
is necessary to then reify this function. The strange thing is that when we then shrink the reified function, we might end up Perturb
ing at a different type (in this case, Int
, because the Function
instance for Char
is defined in terms of the Function
instance for Int
.
This is confusing and annoying; for example, it might be beneficial to improve the Perturb
instance for Char
so that common characters get a shorter Focus
; but that's not possible, since the instance for Char
is only used for the initial value of the function, not during shrinking, so we're not actually getting much benefit.
LazySmallcheck2012 (original, and my fork that works on newer GHCs) behaves in a way where data is only generated when the test forces its evaluation. (I just checked the original lazysmallcheck package, and it doesn't seem to behave this way.)
For example, consider this simple test in falsify:
prop_unneededElements :: Property ()
prop_unneededElements = do
lst :: [Int] <- gen (Gen.list
(Range.between (0, 100))
(Gen.inRange (Range.between (0, 100))))
if length lst > 3 && head lst < 10
then testFailed "known failure, to check strictness"
else return ()
This is asserting that every lst :: [Int]
with more than three elements also has a first element that is at least 10
. It fails (as expected):
Running 1 test suites...
Test suite falsify-test-test: RUNNING...
Playing
Unneeded elements: FAIL (0.12s)
failed after 28 successful tests and 5 shrinks
known failure, to check strictness
Logs for failed test run:
generated [0,0,0,0] at CallStack (from HasCallStack):
gen, called at test/Main.hs:18:19 in main:Main
Use --falsify-replay=010985dabce2ff4eb5c78d1e70747b669b to replay.
1 out of 1 tests failed (0.13s)
Test suite falsify-test-test: FAIL
In this case falsify reports a minimal counterexample of [0, 0, 0, 0]
. Whilst that's the smallest fully-normalised value, it is actually over-specified: the test never forces the last three elements, so they are irrelevant to the outcome (only the spine and the first element are relevant). If we compare this to the same check in LazySmallcheck2012 (in GHCi):
*LSC2012> test (\(lst :: [Int]) -> not (length lst > 3 && head lst < 10))
LSC: Depth 0:
LSC: Property holds after 3 tests.
LSC: Depth 1:
LSC: Property holds after 5 tests.
LSC: Depth 2:
LSC: Property holds after 7 tests.
LSC: Depth 3:
LSC: Property holds after 9 tests.
LSC: Depth 4:
LSC: Counterexample found after 12 tests.
Var 0: -3:_:_:_:[]
*** Exception: ExitFailure 1
It gives -3:_:_:_:[]
, AKA [-3, _, _, _]
. Here _
indicates a value which was never forced, so never ran its generator, and is hence irrelevant to the cause of the test failure. Unfortunately in this case the first value (-3
) isn't minimal, since (Lazy)Smallcheck doesn't do any shrinking; the ideal counterexample would be [0, _, _, _]
.
There are a few reasons this behaviour is nicer, as a user, than seeing fully normalised counterexamples:
0
, are ambiguous: sometimes it means they can be ignored, but sometimes there is significance to that value. Using _
for the former removes some ambiguity._
indicates a problem with our control flow_
instead of a particular value avoids accidental coincidences that we can waste time ruling out, e.g. [0,0,0,0]
contains duplicate elements, but that's just a coincidence and isn't related to the failure.Bool
can already be proved by exhaustion; lazy generators complement this, since a property which passes without forcing some part of its input must therefore be true for all values of that part.There may be performance benefits, by avoiding a bunch of generators running; but I haven't found that particularly noticable.
As far as I know, LazySmallcheck2012 works by starting with all inputs as exceptions, and retrying a test with more refined inputs if any of those exceptions are thrown (until some depth limit is reached). In the above example it might have run the property on an input like throw A
, which causes the exception A
to be thrown; so it ran again with []
(which passed) and throw A : throw B
(which throws B
). The latter is refined to throw A : []
(which passed) and throw A : throw B : throw C
(which throws C
) and so on until throw A : throw B : throw C : throw D : []
finally satisfies the length
check, and causes the head
check to throw A
; causing it to be retried with an input like -3 : throw A : throw B : throw C : []
which finally causes the test to fail, and that is our counterexample (with throw X
shown as _
).
Reading through https://well-typed.com/blog/2023/04/falsify/ it appears like falsify's sample trees could be used for lazy generation. From my understanding, each generator gets its own sub-tree; each node contains a PRNG sample which a generator can use directly; and calls to other generators can be given the child subtrees. It seems to me that we can tell which parts of the input were forced by a test, by keeping track of which nodes of the sample tree have been forced.
Unfortunately I can't think of an elegant, pure way to do this. We could wrap each node in a thunk which has the side-effect of inserting an ID in a set, then freeze a copy of the set after running the test, to see which nodes were forced. Ugly, but workable.
Once we know which parts of the same tree were used/unused, we need to map that over to actual counterexamples, which would require running the generator again. We'd need a pretty-printing mechanism that's similar to LazySmallcheck's, i.e. capable of inserting a _
in place of undefined values (e.g. by replacing the unused sample nodes with such printers, or exceptions, or something else).
I'm sure someone could think of a more elegant approach than this, but I at least think it's (a) desirable behaviour and (b) plausible to implement, given the way (I think) falsify works.
We have the withoutShrinking
combinator that makes generators that we want to run only once against a SampleTree
. Unfortunately, we currently have to re-run those generators every single time they run, which leads to pretty bad performance; this is most notable in tests that depend on fromShrinkTree
; see the TestSuite.Sanity.Functions.StringToBool
test one a particularly bad example.
It feels like it should be relatively simple to memoize these results. For example, we could imagine that shrinking could return new generators, such that if we have
data Gen a where
..
Once :: (SampleTree -> (a, SampleTree)) -> Gen a
then
shrink (Once f) st = first Pure $ f st
replacing the generator by a constant one that just returns the value it produces. Unfortunately, this does not work: we cannot return new generators, because generators are monadic (how would we update a continuation a -> Gen b
?).
The only state we have is in the SampleTree
. Now, we could of course do something like this:
data Sample = .. | NotShrunk Word64 (Maybe Dynamic)
where we cache the constructed value right in the same tree. This too would make sense, but it's unclear how this works with context re-interpretation . For example, what if we have a generator such as
example :: Gen ..
example = do
b <- bool
if b then once ..
else once ..
We now run two different generators against the same sample tree; surely we don't expect to get the result from one of those generators in the other one? That would be very confusing (consider the case where one generator is prim
, and the other (* 2) <$> prim
or something like that; it would be very strange if we got an odd number out of that generator).
We introduced bst
in #29, but can probably also use it in list generation: this would enable us to drop entire parts of the list in one go.
Would be useful to have this package in stackage. Are there any plans to do so?
Hi,
First of all, thank you for your great library and great expositions! This library is really cool indeed.
Currently, there is an impure interface to sample a value from Gen
:
sample :: Gen a -> IO a
It would also be nice to have a pure analogue for this. The most straightforward one could be:
sampleFrom :: SMGen -> Gen a -> a
Unfortunately, this leaks the implementation detail. So perhaps it could be something like this:
data GenSource -- opaque (could be just a newtype wrapper around SMGen or SampleTree)
mkGenSource :: Word64 -> GenSource
sampleFrom :: GenSource -> Gen a -> a
Recently, I use falsify
to test the validity of transformations on propositional formulae.
The test worked well, and I eventually have several valid implementations with different performance characteristics. So I decided to take some benchmarks on them.
To do that, I have to generate some random formula of a specific size. Surely, we can do this with sample :: Gen a -> IO a
. One downside of sample
function is the lack of reproducibility - as sample
initialises SMGen
from the air on every invocation, we don't have any access to the seed it used.
Once we have sampled some formulae from Gen
, we must save them on a disk. This becomes tedious when one needs multiple formulae. If we can sample a value from specified seed, then we can just hard-code the seed instead.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.