stedolan / crowbar Goto Github PK

View Code? Open in Web Editor NEW

180.0 180.0 31.0 202 KB

Property fuzzing for OCaml

License: MIT License

OCaml 100.00%

crowbar's People

Contributors

Stargazers

Watchers

crowbar's Issues

Design suggestion: smaller core, compositional constructors

I'm very much looking forward to using this library across quite a few projects! The first one I have in mind is profuse, where the central data structure has some interesting invariants, fairly straightforward to express, but tricky to check. Crowbar seems like an ideal tool for such cases. I'm also looking forward to using it in a serialisation/rpc library under development at the moment.

In trying to understand the design of crowbar, I've made a few notes about things that it might be useful to express slightly differently. These are currently just suggestions, sometimes quite rough, but if you like the sound of any of them, I'm happy to turn them into pull requests. This first issue is mostly about implementation, but I think it spills over into the interface a bit, too.

datatype-centric vs compositional design

The central gen type is currently defined as a datatype, with constructors for lists, options, non-empty lists, etc. There are sometimes reasons to be cautious about defining things this way, so I'd like to suggest exploring an alternative approach.

Since there's essentially only one operation defined over the gen type, it should be straightforward to define it in a more compositional/extensible style. (This is really just the old distinction between oo-style, which makes it easy to add new types and hard to add new operations, and typed-fp-style, which makes it hard to add new types and easy to add new operations.)

In other words, rather than a definition of this form

type 'a gen =
  ...
  | Option : 'a gen -> 'a option gen
  | List : 'a gen -> 'a list gen
  | List1 : 'a gen -> 'a list gen
  ...

along with a function like this, where generation is defined by case analysis:

let rec generate : type a . int -> state -> a gen -> a * unit printer =
  fun size input gen -> match gen with
   (* ... *)
  | Option gen ->
     if read_bool input then
       let v, pv = generate size input gen in
       Some v, fun ppf () -> pp ppf "Some (%a)" pv ()
     else
       None, fun ppf () -> pp ppf "None"
  | List gen ->
   (* etc. *)
  | List1 gen ->
   (* etc. *)

I think it'd be worth considering a definition more like this:

type +'a gen = bool stream -> bool stream * 'a

along with a separate function for each constructor

let option = (* ... *)

let list = (* ... *)

There are three main benefits to doing things this way.

First, it's easy to add new constructors, either inside or outside the library, since list and option are just functions with no special status rather than constructors of a distinguished types.

Second, it keeps things compositional. With a datatype definition it's always tempting to do case analysis on subvalues, which tends to lead to uneven behaviour. (Indeed, the current definition of the generate function does inspect subvalues in at least the Choose case.) With the separate-function design it's essentially impossible to analyse subvalues, so the semantics is almost certain to end up cleaner / more "denotational".

Third, it keeps the core small. With the separate-function design the core can be tiny (see below), which makes it easier to make interesting changes to it, and easier to ensure that the core is correct. It's also a useful test of the design: if functions like list and option can easily be built on top of the core then it's likely that almost any other user-defined function can be built that way, too; on the other hand, if you start with a larger core, with everything you need provided as a built in then there's nothing to ensure that the langauge will be easy to extend outside the library, and it's more likely that you'll have to extend the core with additional operations later.

design sketch

Here's a sketch of such a design. (It's entirely possible that I've misunderstood some details -- if so, I'm keen to understand which -- but I expect the general idea can be made to work.)

The central idea is generating data from binary input, so it's natural to express the core as a monad with a single operation that reads and returns one bit, like this:

module Generate :
sig
  type +'a t
  val return : 'a -> 'a t
  val (>>=) : 'a t -> ('a -> 'b t) -> 'b t
 
  val bit : bool t
 
  val run : 'a t -> bool stream -> bool stream * 'a
end

Then everything else can be built on top of the monad operations plus bit, and so lives outside of the core. (I think it's possible to implement this interface fairly efficiently, but it's possible that operations that read multiple bits at a go might also be useful.)

The stream here is intended to be functional, where reading an element doesn't change the state of the stream, rather than imperative/OCaml-style, where Stream.next returns a new element each time. Here's a suitable definition in the even style:

type 'a stream_ = Stream of 'a * 'a stream
and 'a stream = 'a stream_ Lazy.t

Then run can be invoked with some source of bits -- either a PRNG for quickcheck-style checking, or a bit vector that is iteratively perturbed by the afl framework for fuzzing. (Again, since the source of bits is abstracted, it's straightforward to add alternative sources later, even outside the library.)

The combination of the monad operations and bit should be sufficiently expressive to define more building blocks, e.g.

val (<*>) : 'a t -> 'b t -> ('a * 'b) t
(** Combine two generators *)

val (<|>) : 'a t -> 'b t -> ('a, 'b) either t
(** Pick a generator based on the next bit in the stream *)

val option : 'a t -> 'a option t
(** Build a generator for [t option] from a generator for [t] *)

avoid `Premature end of file`

Perhaps I'm not using correctly crowbar, but it seems that creating a correct test case is not user friendly: an empty file is not enough and I don't know if depending on the test cases you have to create a file arbitrary big. Instead of premature end of file couldn't crowbar consider that it is filled with null bytes?

PS: the makefile in examples should have examples that uses afl-fuzz even if they are not run by default. Except that crowbar is great!

Better range function

When generating characters from their ASCII code if we want every char except the controls chars

Controls char: from 0 to 31 and 127

map [choose [range 32; const 127]] Char.chr

Non controls char:

choose [
  map [range 94] (fun n -> n + 32); (* 32 -> 126 *)
  map [range 127] (fun n -> n + 128)] (*128 -> 255*)
]

I suggest instead:

choose [range 32 127; range 128 256]

Same for alphanumeric, and all sets of contiguous values in ASCII.

Length of lists

I have noticed that lists generated by list are short and I found no easy way to control their length. Should list not take a generator as argument for the length? This would make it easy to plug range or const for the length. Otherwise I would suggest to solve this problem in one of the examples.

decreased stability in xmldiff example with abstract gens API

Fuzzing xmldiff in cffd7df gets 100% reported stability in afl-fuzz.
Fuzzing xmldiff in 8540109 reports stability that fluctuates in the high 99%s.

I wouldn't expect for them to be different?

Bugs found with Crowbar

Bugs found in OPAM packages:

xmldiff: zoggy/xmldiff#4
calendar: https://forge.ocamlcore.org/tracker/index.php?func=detail&aid=1748&group_id=83&atid=415
yojson: ocaml-community/yojson#20 (reproduced, not originally found with crowbar)
cbor: ygrek/ocaml-cbor#1
bson: MassD/bson#16
bes: (via email)
uunf: dbuenzli/uunf#8, dbuenzli/uunf#10
bencode: rgrinberg/bencode#4
fpath: dbuenzli/fpath#10

@yomimono did some serious testing of charrua-core:
charrua-core: mirage/charrua#47, mirage/charrua#48, mirage/charrua#48, mirage/charrua#49, mirage/charrua#50, mirage/charrua#51, mirage/charrua#52, mirage/charrua#53, mirage/charrua#54, mirage/charrua#55

What is the best way to avoid stack overflows with recursive generators?

I'm new to the library, and I tried reading through the source code and all existing open and closed issues and PRs, but I couldn't find any guidance to this.

I have a simple recursive generator for a tree datatype that causes stack overflows when tested extensively (on the order of millions of executions). What should I do to work around this? (By the way, I tried using lazy/unlazy and fix, but the problem occurs with both.)

I saw some reference to a trick where you "pull constants out of the choose", but I couldn't quite figure out what that meant or how to do it.

Thanks!

Replication of crashe

I try currently to use crowbar and afl to test an implementation of RFC 1951 available here:
https://github.com/dinosaure/z
Fuzzer is available here: https://github.com/dinosaure/z/blob/master/fuzz/fuzz.ml#L156

hxd is necessary (to pin) for debugging. afl-fuzz got an error with this output:

$ xxd fuzz0.in
00000000: 01f4 ffff ff00 0000 0000 0000 1a00      ..............

Which is the output of:

    [# 255; #1 [0; 0]]

In other words, a Literal '\255' and a Copy (0, 0). So I launched it on my server, get back fuzz0.in and run locally the same fuzzer on my computer:

$ dune exec fuzz/fuzz.exe -- fuzz0.in
z/zlib: PASS

It seems that the result differ from what afl-fuzz said on my server. On my server, I ran the same command and it tells me the expected error. On my computer, it seems that Copy (0, 0) does not appear and fuzzer can not get the error.

Finally, I can not reproduce error from my server locally with:

the same input file fuzz0.in
the same version of the compiler 4.07.1+afl
the same version of crowbar.0.1 (no pin)

I can deliver more details if you want.

Warn when afl support unavailable

It's possible to install and use crowbar in a non-afl-switch, and then try to run the resulting binaries under afl-fuzz. The error message that afl-fuzz gives is, eh, somewhat unhelpful.

Crowbar should warn when building without afl support. (Or maybe bun should do this?)

Bytes and string generators

Hi !
First of all thanks for this great tool !

I've been willing to use it over the past few days, and when I needed to generate bytes and strings, I realized:

There is no bytes gen generator.
val bytes : string gen sounds a bit confusing to me ; as I expected it either to be named string or to return a bytes gen.

Of course, one could easily use map to get a string gen out of a bytes gen, but it would also probably be easier to use and understand if Crowbar could provide generators for basic types, named accordingly.

So, here is my proposal (I'd be happy to open a PR for that):

Rename bytes to string (since it returns a string gen).
Add val bytes : bytes gen.
These would obviously be breaking changes, so let me know what you think :)

Failing test not reported as crash

Using a slightly modified version of the identity example:

let identity x = Crowbar.check_eq x (x+1)
let () = Crowbar.(add_test ~name:"identity function" [int] (fun i -> identity i))

Running the test without afl fails as expected:

% ./_build/default/test.exe
identity function: ....
identity function: FAIL

When given the input:

    -2701223470281276764
    
the test failed:

    different

But running through afl doesn't report any crashes, even after several cycles:

+- process timing -------------------------------------+- overall results -----+
|        run time : 0 days, 0 hrs, 0 min, 6 sec        |  cycles done : 76     |
|   last new path : 0 days, 0 hrs, 0 min, 5 sec        |  total paths : 2      |
| last uniq crash : none seen yet                      | uniq crashes : 0      |
|  last uniq hang : none seen yet                      |   uniq hangs : 0      |

This is on 4.06.0+afl using the master version of crowbar and afl-fuzz 2.52b.

New release?

There've been a lot of additions since the released version (0.1, February 2018). Is it time for a new release?

Improving grammar concatenation - string - 'a

String grammar generation could be improved and made easier when concatenating a list of generator

Sentence = Subject Verb Location
...

in order to make my sentence i have to do something like:
map [subj; verb; loc] (fun subj verb loc -> subj ^ verb ^ loc)
or map [subj; verb; loc] (fun subj verb loc -> String.concat separator [subj; verb; loc]) if i need a separator. The more generators you want to concatenate the more unreadable it gets !

The concatenation isn't well supported, this would be great for generating grammar such as http.
I suggest adding functions for strings

(* concatenate a List of string gen inserting the separator string sep between each *)
let concat_gen_list sep l =
  List.fold_left (fun acc e ->
    map [acc; sep; e] (fun acc sep e -> acc ^ e ^ sep)
  ) empty l

(* Concatenate a Crowbar.list equivalent String.concat on Crowbar.list gen *)
let concat_list_gen sep l =
  map [sep; l] String.concat

We could even improve the concept by adding the operator as an argument

(* concatenate a List of 'a gen inserting the separator sep between each *)
let concat_gen_list sep l op =
  List.fold_left (fun acc e ->
    map [acc; sep; e] (fun acc sep e -> op (op acc e) sep )
  ) empty l

This is just an idea, PR to come if you approve !

bind (or join or similar) combinator

In order to test an encoding library (à la json-typed) I'm having to generate values for a recursive type with some specific restrictions. Here's a simplified version:

type 'a e =
  | Ranged_int: (int * int) -> int e
  | List: 'a t -> 'a list e
  | Tuple: 'a t * 'b t -> ('a * 'b) t
  (* other constructors omitted *)

This type describes ways to encode/decode values and can be passed to the following functions:

val constr: 'a e -> 'a -> json
val destr: 'a e -> json -> 'a result

Note that in int list t, all integers must share the same range. And similarly, in int list list t, int list list list t, etc. However, in a tuple, each side can be of different rangedness.

This lends itself nicely to a generator such as:

let rec gengen: ('a e * 'a gen) gen = fix (fun gen ->
  choose [
    map [int; int] (fun min max -> Ranged_int (min, max), range ~min (max - min));
    map [gen] (fun (e, g) -> List e, list g);
    map [gen; gen] (fun (e1, g1) (e2, g2) -> Tuple (e1, e2), map [g1, g2] (fun v1 v2 -> (v1, v2)));
  ])

Which needs some form of bind to extricate the nested gen.

let gen =
    bind gengen (fun (e, g) ->
    bind g (fun v ->
    return (e, v)
))

Is this doable with the current interface? I don't think it is, but maybe I missed some combinator trickery.
Is it possible to implement?

There are alternatives:

providing a join operator
providing a way to add more tests from within a test (e.g., a function gen_tests that's like add_test except it just queues more tests for later)

OUnit/Alcotest-style interface?

(After writing this up, I noticed something similar-sounding in the TODO, which is probably a good sign. I'm sending this anyway, to add support for the idea, to flesh out the design a bit, and to open up discussion.)

At present the interface to tests is quite stateful: there's a function for adding tests to a global list

val add_test : ?name:string -> ('f, test_result) gens -> 'f -> unit

and the tests are implicitly invoked by a function installed with at_exit.

This interface makes it a little difficult to integrate crowbar into larger test frameworks.

It would be helpful to have an ounit/alcotest-style interface, where the user constructs test suites, perhaps hierarchically, and where invoking the tests is the user's responsibility.

For example, here's a basic less-stateful interface that sticks quite closely to the spirit of the current design:

type test
val test : ?name:string -> ('f, test_result) gens -> 'f -> test
val run_main : test list -> unit

With an interface like this one might write:

let tests = [
  Crowbar.test "some property"
   test_some_property [int; string] @@ fun i s ->
     (* ... *)
  ;

  Crowbar.test "another property"
   test_another_property [list1 string] @@ fun ss ->
     (* ... *)
  ;
]
    
let () = run_main tests

Crucially, since this interface separates the operations of defining tests and registering/invoking them, it's easier to select tests (or even decide whether or not to invoke the crowbar-based testsuite at all) based on command-line options or other input.

Is there some reason why it'd be tricky to add such an interface? I wonder if the current approach with atexit is just for convenience, or if there's some deeper technical reason why it's better to do things that way.

a ppx_deriving plugin would be super-useful

When I look at @yomimono's heroic code, it appears clear that without a metaprogramming way to generate Crowbar generators for algebraic datatypes to be enumerated in "the obvious way" (with annotations to override special cases, etc.), it can currently be very painful to deploy Crowbar for a library.

Do some people here have plans for such a plugin?

I looked around a bit and there already exist at least one ppx_deriving plugin for random generation, @paurkedal's ppx_deriving_random. Unfortunately, because of the difference in type signatures for generators and generator language in general, I don't think that it would be possible to reuse it directly. It should be a very good start for hacking a crowbar generator, though.

(In general the distance between Crowbar-style generation interfaces and Quickcheck-style generation interface is an uncanny zero, so it sounds like a nice factorization should be within reach, but right now I'm tempted to try Crowbar on stuff and I would really like even a separate/specialized generator.)

Fail instead of looping when data cannot be generated (bad_test hit too often)

If a test is too restrictive:

let is_bad_data : ... -> bool

f data =
  if is_bad_data data
  then Crowbar.bad_test ()
  else ... (* do test *)

let () =
  Crowbar.add_test ~name:"test f" [generator] f

I mean if Crowbar.bad_test is hit some number of times, it would be better to fail at some point, to give feedback to the caller. At the time of writing, Crowbar doesn't terminate. For example Haskell's quickcheck does this, along the lines of giving up generating data after X trials and only Y successes.

As this is a breaking change (it could make tests that pass now (slowly though) fail), care should be taken regarding adoption.

Thanks for your work on crowbar ❤️

Shrinking

Hello!

AFAIU there is currently no shrinking feature in Crowbar (I could only find a few words about it in the Wiki design notes).

What's the status on the idea of shrinking in case of test failure?

I don't want this feature to land in Crowbar
I want this feature to land in Crowbar but I won't have time/resources to make it happen
I want this feature to land in Crowbar and I will do it
Crowbar is no longer maintained (I hope this is not seen as rude; I see that there was no code change in more than a year, which may or may not indicate EOL)
Other?

Thank you for your time!

Uninstrumented crowbar binaries trigger weird error in `afl-fuzz`

@pascutto and I recently ran into a weird issue while trying to fuzz https://github.com/mirage/index.
We tried fuzzing an uninstrumented binary by mistake and got a Fork server handshake failed error from afl-fuzz instead of the usual No instrumentation detected one which made it a bit hard to realize our mistake.

I tried reproducing this on simpler examples from https://github.com/NathanReb/ocaml-afl-examples and it seems to indicate that this happens when using crowbar but not otherwise.

To reproduce you can clone the repo and run the following commands from non afl opam switch:

$ dune build @simple-parser/fuzz
    afl-fuzz alias simple-parser/fuzz/fuzz (exit 1)
(cd _build/default/simple-parser/fuzz && /usr/bin/afl-fuzz -i inputs -o findings -- ./fuzz_me.exe @@)
afl-fuzz 2.52b by <[email protected]>

...

[-] Looks like the target binary is not instrumented! The fuzzer depends on
    compile-time instrumentation to isolate interesting test cases while
    mutating the input data. For more information, and for tips on how to
    instrument binaries, please see /usr/share/doc/afl-doc/docs/README.

    When source code is not available, you may be able to leverage QEMU
    mode support. Consult the README for tips on how to enable this.
    (It is also possible to use afl-fuzz as a traditional, "dumb" fuzzer.
    For that, you can use the -n option - but expect much worse results.)

[-] PROGRAM ABORT : No instrumentation detected
         Location : check_binary(), afl-fuzz.c:6920

and

$ dune build @awesome-list/fuzz
    afl-fuzz alias awesome-list/fuzz/fuzz (exit 1)
(cd _build/default/awesome-list/fuzz && /usr/bin/afl-fuzz -i inputs -o findings -- ./fuzz_me.exe @@)
afl-fuzz 2.52b by <[email protected]>

...

[-] Hmm, looks like the target binary terminated before we could complete a
    handshake with the injected code. There are two probable explanations:

    - The current memory limit (50.0 MB) is too restrictive, causing an OOM
      fault in the dynamic linker. This can be fixed with the -m option. A
      simple way to confirm the diagnosis may be:

      ( ulimit -Sv $[49 << 10]; /path/to/fuzzed_app )

      Tip: you can use http://jwilk.net/software/recidivm to quickly
      estimate the required amount of virtual memory for the binary.

    - Less likely, there is a horrible bug in the fuzzer. If other options
      fail, poke <[email protected]> for troubleshooting tips.

[-] PROGRAM ABORT : Fork server handshake failed
         Location : init_forkserver(), afl-fuzz.c:2253

As you can see, the first example is just a simple binary trying to parse an int from the input. It doesn't use crowbar and we get the expected No instrumentation detected error.

The second one on the other hand uses crowbar and leads to the Fork server handsake failed.

Do you have any idea why afl-fuzz isn't able to detect that the binary isn't instrumented?

"No instrumentation detected" after adding a particular simple property

I have the following property tests:

open Crowbar

let env (type a) (value : a gen) : (string * a) list gen =
  map
    [ list (pair bytes value) ]
    (List.fold_left (fun env (name, value) -> (name, value) :: env) [])

let evenp x = x mod 2 = 0
let sign i = if i < 0 then `Negative else `Positive

let () =
  add_test ~name:"example true property" [ int ] (fun i ->
      Crowbar.check_eq (evenp (i - 1)) (evenp (i + 1)));
  add_test ~name:"example false property" [ int ] (fun i ->
      Crowbar.check (i = -1 || sign i = sign (i + 1)));

  if true then
    add_test
      [ pair (pair bytes int) (pair bytes int); env int ]
      (fun ((l, v), (l', v')) s ->
        check_eq
          (List.assoc_opt l' ((l, v') :: (l, v) :: s))
          (List.assoc_opt l' ((l, v') :: s)))

I build this with the following dune file:

(executable
 (libraries crowbar)
 (name example_proptest)
 (ocamlopt_flags (-afl-instrument)))

When I run this under AFL or AFL++, I get PROGRAM ABORT : No instrumentation detected, which of course seems like it's a problem with my configuration. However, if I change the if true to if false (or otherwise remove the last test), it works perfectly fine under either (and they find the bug in the second property in seconds).

I can reproduce this against OCaml 5.0.0 or 4.14.1, both running on NixOS 23.11 (Linux) on an x86_64 machine.

Is there something particularly weird about that property, or broken with that test, or is this a broader bug?

Shuffled list generator

Would you consider adding a:

val shuffle : 'a list -> 'a list gen

to crowbar? I have a use case for it and I think it could be a worthy addition generally speaking.

I'm happy to contribute if you feel like it should be part of the library!

stedolan / crowbar Goto Github PK

crowbar's People

Contributors

Stargazers

Watchers

Forkers

crowbar's Issues

datatype-centric vs compositional design

design sketch

Recommend Projects

Recommend Topics

Recommend Org