stedolan / crowbar Goto Github PK
View Code? Open in Web Editor NEWProperty fuzzing for OCaml
License: MIT License
Property fuzzing for OCaml
License: MIT License
I'm very much looking forward to using this library across quite a few projects! The first one I have in mind is profuse, where the central data structure has some interesting invariants, fairly straightforward to express, but tricky to check. Crowbar seems like an ideal tool for such cases. I'm also looking forward to using it in a serialisation/rpc library under development at the moment.
In trying to understand the design of crowbar, I've made a few notes about things that it might be useful to express slightly differently. These are currently just suggestions, sometimes quite rough, but if you like the sound of any of them, I'm happy to turn them into pull requests. This first issue is mostly about implementation, but I think it spills over into the interface a bit, too.
The central gen
type is currently defined as a datatype, with constructors for lists, options, non-empty lists, etc. There are sometimes reasons to be cautious about defining things this way, so I'd like to suggest exploring an alternative approach.
Since there's essentially only one operation defined over the gen
type, it should be straightforward to define it in a more compositional/extensible style. (This is really just the old distinction between oo-style, which makes it easy to add new types and hard to add new operations, and typed-fp-style, which makes it hard to add new types and easy to add new operations.)
In other words, rather than a definition of this form
type 'a gen =
...
| Option : 'a gen -> 'a option gen
| List : 'a gen -> 'a list gen
| List1 : 'a gen -> 'a list gen
...
along with a function like this, where generation is defined by case analysis:
let rec generate : type a . int -> state -> a gen -> a * unit printer =
fun size input gen -> match gen with
(* ... *)
| Option gen ->
if read_bool input then
let v, pv = generate size input gen in
Some v, fun ppf () -> pp ppf "Some (%a)" pv ()
else
None, fun ppf () -> pp ppf "None"
| List gen ->
(* etc. *)
| List1 gen ->
(* etc. *)
I think it'd be worth considering a definition more like this:
type +'a gen = bool stream -> bool stream * 'a
along with a separate function for each constructor
let option = (* ... *)
let list = (* ... *)
There are three main benefits to doing things this way.
First, it's easy to add new constructors, either inside or outside the library, since list
and option
are just functions with no special status rather than constructors of a distinguished types.
Second, it keeps things compositional. With a datatype definition it's always tempting to do case analysis on subvalues, which tends to lead to uneven behaviour. (Indeed, the current definition of the generate
function does inspect subvalues in at least the Choose
case.) With the separate-function design it's essentially impossible to analyse subvalues, so the semantics is almost certain to end up cleaner / more "denotational".
Third, it keeps the core small. With the separate-function design the core can be tiny (see below), which makes it easier to make interesting changes to it, and easier to ensure that the core is correct. It's also a useful test of the design: if functions like list
and option
can easily be built on top of the core then it's likely that almost any other user-defined function can be built that way, too; on the other hand, if you start with a larger core, with everything you need provided as a built in then there's nothing to ensure that the langauge will be easy to extend outside the library, and it's more likely that you'll have to extend the core with additional operations later.
Here's a sketch of such a design. (It's entirely possible that I've misunderstood some details -- if so, I'm keen to understand which -- but I expect the general idea can be made to work.)
The central idea is generating data from binary input, so it's natural to express the core as a monad with a single operation that reads and returns one bit, like this:
module Generate :
sig
type +'a t
val return : 'a -> 'a t
val (>>=) : 'a t -> ('a -> 'b t) -> 'b t
val bit : bool t
val run : 'a t -> bool stream -> bool stream * 'a
end
Then everything else can be built on top of the monad operations plus bit
, and so lives outside of the core. (I think it's possible to implement this interface fairly efficiently, but it's possible that operations that read multiple bits at a go might also be useful.)
The stream
here is intended to be functional, where reading an element doesn't change the state of the stream, rather than imperative/OCaml-style, where Stream.next
returns a new element each time. Here's a suitable definition in the even style:
type 'a stream_ = Stream of 'a * 'a stream
and 'a stream = 'a stream_ Lazy.t
Then run
can be invoked with some source of bits -- either a PRNG for quickcheck-style checking, or a bit vector that is iteratively perturbed by the afl framework for fuzzing. (Again, since the source of bits is abstracted, it's straightforward to add alternative sources later, even outside the library.)
The combination of the monad operations and bit
should be sufficiently expressive to define more building blocks, e.g.
val (<*>) : 'a t -> 'b t -> ('a * 'b) t
(** Combine two generators *)
val (<|>) : 'a t -> 'b t -> ('a, 'b) either t
(** Pick a generator based on the next bit in the stream *)
val option : 'a t -> 'a option t
(** Build a generator for [t option] from a generator for [t] *)
Perhaps I'm not using correctly crowbar, but it seems that creating a correct test case is not user friendly: an empty file is not enough and I don't know if depending on the test cases you have to create a file arbitrary big. Instead of premature end of file
couldn't crowbar consider that it is filled with null bytes?
PS: the makefile in examples should have examples that uses afl-fuzz
even if they are not run by default. Except that crowbar is great!
When generating characters from their ASCII code if we want every char except the controls chars
Controls char: from 0 to 31 and 127
map [choose [range 32; const 127]] Char.chr
Non controls char:
choose [
map [range 94] (fun n -> n + 32); (* 32 -> 126 *)
map [range 127] (fun n -> n + 128)] (*128 -> 255*)
]
I suggest instead:
choose [range 32 127; range 128 256]
Same for alphanumeric, and all sets of contiguous values in ASCII.
I have noticed that lists generated by list
are short and I found no easy way to control their length. Should list
not take a generator as argument for the length? This would make it easy to plug range
or const
for the length. Otherwise I would suggest to solve this problem in one of the examples.
Bugs found in OPAM packages:
xmldiff: zoggy/xmldiff#4
calendar: https://forge.ocamlcore.org/tracker/index.php?func=detail&aid=1748&group_id=83&atid=415
yojson: ocaml-community/yojson#20 (reproduced, not originally found with crowbar)
cbor: ygrek/ocaml-cbor#1
bson: MassD/bson#16
bes: (via email)
uunf: dbuenzli/uunf#8, dbuenzli/uunf#10
bencode: rgrinberg/bencode#4
fpath: dbuenzli/fpath#10
@yomimono did some serious testing of charrua-core:
charrua-core: mirage/charrua#47, mirage/charrua#48, mirage/charrua#48, mirage/charrua#49, mirage/charrua#50, mirage/charrua#51, mirage/charrua#52, mirage/charrua#53, mirage/charrua#54, mirage/charrua#55
I'm new to the library, and I tried reading through the source code and all existing open and closed issues and PRs, but I couldn't find any guidance to this.
I have a simple recursive generator for a tree datatype that causes stack overflows when tested extensively (on the order of millions of executions). What should I do to work around this? (By the way, I tried using lazy
/unlazy
and fix
, but the problem occurs with both.)
I saw some reference to a trick where you "pull constants out of the choose", but I couldn't quite figure out what that meant or how to do it.
Thanks!
I try currently to use crowbar
and afl
to test an implementation of RFC 1951 available here:
https://github.com/dinosaure/z
Fuzzer is available here: https://github.com/dinosaure/z/blob/master/fuzz/fuzz.ml#L156
hxd
is necessary (to pin) for debugging. afl-fuzz
got an error with this output:
$ xxd fuzz0.in
00000000: 01f4 ffff ff00 0000 0000 0000 1a00 ..............
Which is the output of:
[# 255; #1 [0; 0]]
In other words, a Literal '\255'
and a Copy (0, 0)
. So I launched it on my server, get back fuzz0.in
and run locally the same fuzzer on my computer:
$ dune exec fuzz/fuzz.exe -- fuzz0.in
z/zlib: PASS
It seems that the result differ from what afl-fuzz
said on my server. On my server, I ran the same command and it tells me the expected error. On my computer, it seems that Copy (0, 0)
does not appear and fuzzer can not get the error.
Finally, I can not reproduce error from my server locally with:
fuzz0.in
4.07.1+afl
crowbar.0.1
(no pin)I can deliver more details if you want.
It's possible to install and use crowbar in a non-afl-switch, and then try to run the resulting binaries under afl-fuzz
. The error message that afl-fuzz
gives is, eh, somewhat unhelpful.
Crowbar should warn when building without afl support. (Or maybe bun should do this?)
Hi !
First of all thanks for this great tool !
I've been willing to use it over the past few days, and when I needed to generate bytes
and strings
, I realized:
bytes gen
generator.val bytes : string gen
sounds a bit confusing to me ; as I expected it either to be named string
or to return a bytes gen
.Of course, one could easily use map
to get a string gen
out of a bytes gen
, but it would also probably be easier to use and understand if Crowbar could provide generators for basic types, named accordingly.
So, here is my proposal (I'd be happy to open a PR for that):
bytes
to string
(since it returns a string gen
).val bytes : bytes gen
.Using a slightly modified version of the identity example:
let identity x = Crowbar.check_eq x (x+1)
let () = Crowbar.(add_test ~name:"identity function" [int] (fun i -> identity i))
Running the test without afl fails as expected:
% ./_build/default/test.exe
identity function: ....
identity function: FAIL
When given the input:
-2701223470281276764
the test failed:
different
But running through afl doesn't report any crashes, even after several cycles:
+- process timing -------------------------------------+- overall results -----+
| run time : 0 days, 0 hrs, 0 min, 6 sec | cycles done : 76 |
| last new path : 0 days, 0 hrs, 0 min, 5 sec | total paths : 2 |
| last uniq crash : none seen yet | uniq crashes : 0 |
| last uniq hang : none seen yet | uniq hangs : 0 |
This is on 4.06.0+afl using the master version of crowbar and afl-fuzz 2.52b.
There've been a lot of additions since the released version (0.1
, February 2018). Is it time for a new release?
String grammar generation could be improved and made easier when concatenating a list of generator
Sentence = Subject Verb Location
...
in order to make my sentence i have to do something like:
map [subj; verb; loc] (fun subj verb loc -> subj ^ verb ^ loc)
or map [subj; verb; loc] (fun subj verb loc -> String.concat separator [subj; verb; loc])
if i need a separator. The more generators you want to concatenate the more unreadable it gets !
The concatenation isn't well supported, this would be great for generating grammar such as http.
I suggest adding functions for strings
(* concatenate a List of string gen inserting the separator string sep between each *)
let concat_gen_list sep l =
List.fold_left (fun acc e ->
map [acc; sep; e] (fun acc sep e -> acc ^ e ^ sep)
) empty l
(* Concatenate a Crowbar.list equivalent String.concat on Crowbar.list gen *)
let concat_list_gen sep l =
map [sep; l] String.concat
We could even improve the concept by adding the operator as an argument
(* concatenate a List of 'a gen inserting the separator sep between each *)
let concat_gen_list sep l op =
List.fold_left (fun acc e ->
map [acc; sep; e] (fun acc sep e -> op (op acc e) sep )
) empty l
This is just an idea, PR to come if you approve !
In order to test an encoding library (à la json-typed) I'm having to generate values for a recursive type with some specific restrictions. Here's a simplified version:
type 'a e =
| Ranged_int: (int * int) -> int e
| List: 'a t -> 'a list e
| Tuple: 'a t * 'b t -> ('a * 'b) t
(* other constructors omitted *)
This type describes ways to encode/decode values and can be passed to the following functions:
val constr: 'a e -> 'a -> json
val destr: 'a e -> json -> 'a result
Note that in int list t
, all integers must share the same range. And similarly, in int list list t
, int list list list t
, etc. However, in a tuple, each side can be of different rangedness.
This lends itself nicely to a generator such as:
let rec gengen: ('a e * 'a gen) gen = fix (fun gen ->
choose [
map [int; int] (fun min max -> Ranged_int (min, max), range ~min (max - min));
map [gen] (fun (e, g) -> List e, list g);
map [gen; gen] (fun (e1, g1) (e2, g2) -> Tuple (e1, e2), map [g1, g2] (fun v1 v2 -> (v1, v2)));
])
Which needs some form of bind
to extricate the nested gen
.
let gen =
bind gengen (fun (e, g) ->
bind g (fun v ->
return (e, v)
))
Is this doable with the current interface? I don't think it is, but maybe I missed some combinator trickery.
Is it possible to implement?
There are alternatives:
join
operatorgen_tests
that's like add_test
except it just queues more tests for later)(After writing this up, I noticed something similar-sounding in the TODO, which is probably a good sign. I'm sending this anyway, to add support for the idea, to flesh out the design a bit, and to open up discussion.)
At present the interface to tests is quite stateful: there's a function for adding tests to a global list
val add_test : ?name:string -> ('f, test_result) gens -> 'f -> unit
and the tests are implicitly invoked by a function installed with at_exit
.
This interface makes it a little difficult to integrate crowbar into larger test frameworks.
It would be helpful to have an ounit/alcotest-style interface, where the user constructs test suites, perhaps hierarchically, and where invoking the tests is the user's responsibility.
For example, here's a basic less-stateful interface that sticks quite closely to the spirit of the current design:
type test
val test : ?name:string -> ('f, test_result) gens -> 'f -> test
val run_main : test list -> unit
With an interface like this one might write:
let tests = [
Crowbar.test "some property"
test_some_property [int; string] @@ fun i s ->
(* ... *)
;
Crowbar.test "another property"
test_another_property [list1 string] @@ fun ss ->
(* ... *)
;
]
let () = run_main tests
Crucially, since this interface separates the operations of defining tests and registering/invoking them, it's easier to select tests (or even decide whether or not to invoke the crowbar-based testsuite at all) based on command-line options or other input.
Is there some reason why it'd be tricky to add such an interface? I wonder if the current approach with atexit
is just for convenience, or if there's some deeper technical reason why it's better to do things that way.
When I look at @yomimono's heroic code, it appears clear that without a metaprogramming way to generate Crowbar generators for algebraic datatypes to be enumerated in "the obvious way" (with annotations to override special cases, etc.), it can currently be very painful to deploy Crowbar for a library.
Do some people here have plans for such a plugin?
I looked around a bit and there already exist at least one ppx_deriving plugin for random generation, @paurkedal's ppx_deriving_random. Unfortunately, because of the difference in type signatures for generators and generator language in general, I don't think that it would be possible to reuse it directly. It should be a very good start for hacking a crowbar generator, though.
(In general the distance between Crowbar-style generation interfaces and Quickcheck-style generation interface is an uncanny zero, so it sounds like a nice factorization should be within reach, but right now I'm tempted to try Crowbar on stuff and I would really like even a separate/specialized generator.)
If a test is too restrictive:
let is_bad_data : ... -> bool
f data =
if is_bad_data data
then Crowbar.bad_test ()
else ... (* do test *)
let () =
Crowbar.add_test ~name:"test f" [generator] f
I mean if Crowbar.bad_test
is hit some number of times, it would be better to fail at some point, to give feedback to the caller. At the time of writing, Crowbar
doesn't terminate. For example Haskell's quickcheck
does this, along the lines of giving up generating data after X trials and only Y successes
.
As this is a breaking change (it could make tests that pass now (slowly though) fail), care should be taken regarding adoption.
Thanks for your work on crowbar ❤️
Hello!
AFAIU there is currently no shrinking feature in Crowbar (I could only find a few words about it in the Wiki design notes).
What's the status on the idea of shrinking in case of test failure?
Thank you for your time!
@pascutto and I recently ran into a weird issue while trying to fuzz https://github.com/mirage/index.
We tried fuzzing an uninstrumented binary by mistake and got a Fork server handshake failed
error from afl-fuzz
instead of the usual No instrumentation detected
one which made it a bit hard to realize our mistake.
I tried reproducing this on simpler examples from https://github.com/NathanReb/ocaml-afl-examples and it seems to indicate that this happens when using crowbar but not otherwise.
To reproduce you can clone the repo and run the following commands from non afl
opam switch:
$ dune build @simple-parser/fuzz
afl-fuzz alias simple-parser/fuzz/fuzz (exit 1)
(cd _build/default/simple-parser/fuzz && /usr/bin/afl-fuzz -i inputs -o findings -- ./fuzz_me.exe @@)
afl-fuzz 2.52b by <[email protected]>
...
[-] Looks like the target binary is not instrumented! The fuzzer depends on
compile-time instrumentation to isolate interesting test cases while
mutating the input data. For more information, and for tips on how to
instrument binaries, please see /usr/share/doc/afl-doc/docs/README.
When source code is not available, you may be able to leverage QEMU
mode support. Consult the README for tips on how to enable this.
(It is also possible to use afl-fuzz as a traditional, "dumb" fuzzer.
For that, you can use the -n option - but expect much worse results.)
[-] PROGRAM ABORT : No instrumentation detected
Location : check_binary(), afl-fuzz.c:6920
and
$ dune build @awesome-list/fuzz
afl-fuzz alias awesome-list/fuzz/fuzz (exit 1)
(cd _build/default/awesome-list/fuzz && /usr/bin/afl-fuzz -i inputs -o findings -- ./fuzz_me.exe @@)
afl-fuzz 2.52b by <[email protected]>
...
[-] Hmm, looks like the target binary terminated before we could complete a
handshake with the injected code. There are two probable explanations:
- The current memory limit (50.0 MB) is too restrictive, causing an OOM
fault in the dynamic linker. This can be fixed with the -m option. A
simple way to confirm the diagnosis may be:
( ulimit -Sv $[49 << 10]; /path/to/fuzzed_app )
Tip: you can use http://jwilk.net/software/recidivm to quickly
estimate the required amount of virtual memory for the binary.
- Less likely, there is a horrible bug in the fuzzer. If other options
fail, poke <[email protected]> for troubleshooting tips.
[-] PROGRAM ABORT : Fork server handshake failed
Location : init_forkserver(), afl-fuzz.c:2253
As you can see, the first example is just a simple binary trying to parse an int from the input. It doesn't use crowbar and we get the expected No instrumentation detected
error.
The second one on the other hand uses crowbar and leads to the Fork server handsake failed
.
Do you have any idea why afl-fuzz
isn't able to detect that the binary isn't instrumented?
I have the following property tests:
open Crowbar
let env (type a) (value : a gen) : (string * a) list gen =
map
[ list (pair bytes value) ]
(List.fold_left (fun env (name, value) -> (name, value) :: env) [])
let evenp x = x mod 2 = 0
let sign i = if i < 0 then `Negative else `Positive
let () =
add_test ~name:"example true property" [ int ] (fun i ->
Crowbar.check_eq (evenp (i - 1)) (evenp (i + 1)));
add_test ~name:"example false property" [ int ] (fun i ->
Crowbar.check (i = -1 || sign i = sign (i + 1)));
if true then
add_test
[ pair (pair bytes int) (pair bytes int); env int ]
(fun ((l, v), (l', v')) s ->
check_eq
(List.assoc_opt l' ((l, v') :: (l, v) :: s))
(List.assoc_opt l' ((l, v') :: s)))
I build this with the following dune
file:
(executable
(libraries crowbar)
(name example_proptest)
(ocamlopt_flags (-afl-instrument)))
When I run this under AFL or AFL++, I get PROGRAM ABORT : No instrumentation detected
, which of course seems like it's a problem with my configuration. However, if I change the if true
to if false
(or otherwise remove the last test), it works perfectly fine under either (and they find the bug in the second property in seconds).
I can reproduce this against OCaml 5.0.0 or 4.14.1, both running on NixOS 23.11 (Linux) on an x86_64 machine.
Is there something particularly weird about that property, or broken with that test, or is this a broader bug?
Would you consider adding a:
val shuffle : 'a list -> 'a list gen
to crowbar
? I have a use case for it and I think it could be a worthy addition generally speaking.
I'm happy to contribute if you feel like it should be part of the library!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.