Giter Site home page Giter Site logo

shnatsel / libdiffuzz Goto Github PK

View Code? Open in Web Editor NEW
162.0 8.0 9.0 55 KB

Custom memory allocator that helps discover reads from uninitialized memory

License: Apache License 2.0

Rust 100.00%
security-tools security security-audit security-testing sanitizer memory-allocator fuzzing fuzz-testing

libdiffuzz's Introduction

libdiffuzz: security-oriented alternative to Memory Sanitizer

This is a drop-in replacement for OS memory allocator that can be used to detect uses of uninitialized memory. It is designed to be used in case Memory Sanitizer is not applicable for some reason, such as:

  • Your code contains inline assembly or links to proprietary libraries that cannot be instrumented by MSAN
  • You want to find vulnerabilities in black-box binaries that you do not have the source code for (not always straightforward, see below)
  • You want to check if the bug MSAN found is actually exploitable, i.e. if the uninitialized memory contents actually show up in the output
  • You're debugging code that is specific to an exotic CPU architecture or operating system where MSAN is not available, such as macOS. If you're on a really obscure platform that doesn't have a Rust compiler, a less robust C99 implementation is available.

This is not a drop-in replacement for Memory Sanitizer! It will likely require changes to your code or your testing setup, see below.

How it works

When injected into a process, this library initializes every subsequent allocated region of memory to different values. Using this library you can detect uses of uninitialized memory simply by running a certain operation twice in the same process and comparing the outputs; if they differ, then the code uses uninitialized memory somewhere.

Combine this with a fuzzer (e.g. AFL, honggfuzz) to automatically discover cases when this happens. This is called "differential fuzzing", hence the name.

Naturally, this is conditional on the same operation run twice returning the same results normally. If that is not the case in your program and you cannot make it deterministic - you're out of luck.

TL;DR: usage

  1. Clone this repository, run cargo build --release; this will build libdiffuzz.so and put it in target/release
  2. Make your code run the same operation twice in the same process and compare outputs.
  3. Run your code like this:
    • On Linux/BSD/etc: LD_PRELOAD=/path/to/libdiffuzz.so /path/to/your/binary
    • On macOS: DYLD_INSERT_LIBRARIES=/path/to/libdiffuzz.so DYLD_FORCE_FLAT_NAMESPACE=1 /path/to/your/binary
    • If you're fuzzing with AFL: AFL_PRELOAD=/path/to/libdiffuzz.so afl-fuzz ... regardless of platform. If you're not fuzzing with AFL - you should!
  4. Wait for it to crash
  5. Brag that you've used differential fuzzing to find vulnerabilities in real code

Quick start for Rust code

Note: Memory Sanitizer now works with Rust. You should probably use it instead of libdiffuzz!

If your code does not contain unsafe blocks, you don't need to do a thing! Your code is already secure!

However, if you have read from the black book and invoked the Old Ones...

  1. Clone this repository, run cargo build --release; this will build libdiffuzz.so and put it in target/release
  2. Make sure this code doesn't reliably crash when run on its own, but does crash when you run it like this: LD_PRELOAD=/path/to/libdiffuzz.so target/release/membleed
  3. If you haven't done regular fuzzing yet - do set up fuzzing with AFL. It's not that hard.
  4. In your fuzz target run the same operation twice and assert! that they produce the same result. See example fuzz target for lodepng-rust for reference. A more complicated example is also available.
  5. Add the following to your fuzz harness:
// Use the system allocator so we can substitute it with a custom one via LD_PRELOAD
use std::alloc::System;
#[global_allocator]
static GLOBAL: System = System;
  1. Run the fuzz target like this: AFL_PRELOAD=/path/to/libdiffuzz.so cargo afl fuzz ...

Auditing black-box binaries

Simply preload libdiffuzz into a binary (see "Usage" above), feed it the same input twice and compare the outputs. If they differ, it has exposes uninitialized memory in the output.

If your binary only accepts one input and then terminates, set the environment variable LIBDIFFUZZ_NONDETERMINISTIC; this will make output differ between runs. Without that variable set libdiffuzz tries to be as deterministic as possible to make its results reproducible.

If the output varies between runs under normal conditions, try forcing the binary to use just one thread and overriding any sources of randomness it has.

Limitations and future work

Stack-based uninitialized reads are not detected.

Unlike memory sanitizer, this thing will not make your program crash as soon as a read from uninitialized memory occurs. Instead, it lets you detect that it has occurred after the fact and only if the contents of uninitialized memory leak into the output. I.e. this will help you notice security vulnerabilities, but will not really aid in debugging.

Trophy case

List of previously unknown (i.e. zero-day) vulnerabilities found using this tool, to show that this whole idea is not completely bonkers:

  1. Memory disclosure in Claxon

If you find bugs using libdiffuzz, please open a PR to add it here.

See also

Valgrind, a perfectly serviceable tool to detect reads from uninitialized memory if you're willing to tolerate 20x slowdown and occasional false positives.

Dr. Memory, which claims to be an improvement over Valgrind.

MIRI, an interpreter for Rust code that detects violations of Rust's safety rules. Great for debugging but unsuitable for guided fuzzing.

libdislocator, a substitute for Address Sanitizer that also works with black-box binaries.

For background on how this project came about, see How I've found vulnerability in a popular Rust crate (and you can too).

libdiffuzz's People

Contributors

plasmapower avatar rofrol avatar shnatsel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libdiffuzz's Issues

Passing options by environment variables may set them too late and is not portable

Currently libdiffuzz switches to non-deterministic mode after reading an environment variable from a function called from link-time "constructors" section:

libdiffuzz/src/lib.rs

Lines 31 to 33 in f0c7a8f

#[cfg_attr(any(target_os = "macos", target_os = "ios"), link_section = "__DATA,__mod_init_func")]
#[cfg_attr(not(any(target_os = "macos", target_os = "ios")), link_section = ".ctors")]
pub static CONSTRUCTOR: extern fn() = libdiffuzz_init_config;

This is not a great idea for two reasons:

  1. This is not portable. This is already taking different codepaths depending on whether it's on Linux/BSD or macOS. Windows is currently not supported. What's worse, there is no way to tell if this actually works on your platform or not!
  2. This may kick in too late and miss initializing some heap-allocated memory in other libraries with similar hooks, so libdiffuzz will fail to expose some errors.

Thread safety

The global counter inside malloc() sometimes might not be incremented in multi-threaded programs due to data races. This may result in uninitialized memory accesses in multi-threaded programs not being reliably detected.

It can be fixed by using atomic types from C11 standard. This will require including <stdatomic.h>, adding -std=c11 to compiler flags in Makefile, and probably replacing all alloc_clobber_counter++ with atomic_fetch_add(&alloc_clobber_counter, 1) but I'm not really sure about the details.

Support #![no_std]

libdiffuzz doesn't make much use of the standard library. It can probably be switched to the corresponding libcore primitives and compiled in #![no_std] mode.

Among other things, this will reduce the size of the generated binary and may allow cross-compilation to the more obscure architectures.

Detect out-of-bounds reads

It would be nice to be able to detect out-of-bounds reads as well. This is actually pretty easy to implement - just allocate more memory than was requested and clobber it with the same variable value as the rest of the buffer. If any of the clobbered values show up in the output, then the program is definitely exploitable - either via reads from uninitialized memory or via out-of-bounds reads.

Use case: I needed this functionality to determine whether sile/libflate#16 is exploitable or not.

I have already implemented checks for out-of-bounds reads to the right of the buffer in branch detect-oob-reads, but the ones to the left are still TODO - there's just a static canary there that's inherited from libdislocator.

Segfault with `hashbrown`

Hey @Shnatsel,

I was trying to work out whether msan was giving me false positives when I happened upon libdiffuzz. It segfaulted immediately, but in a completely different part of the code.

I've isolated a small reproduceable test case here that uses toml and hashbrown to trigger the segfault: https://github.com/michaelsproul/hashbrown-crash

Have you seen segfaults like this before when using libdiffuzz? Is this a type of false positive, or is hashbrown really doing something sketchy with uninitialized memory? The fault seems to happen in an unsafe drop_in_place call, so I'm wondering whether hashbrown does contain some optimisation that assumes uninitialized memory to be 0, or something.

The full backtrace is here for reference:

#0  hashbrown::raw::RawIterRange<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>)>::new<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>)> (ctrl=0x7ffff7e2c0c8, data=..., len=<optimised out>) at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:1862
#1  hashbrown::raw::RawTable<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global>::iter<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global> (self=0x7fffffffdbd8) at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:945
#2  hashbrown::raw::RawTable<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global>::drop_elements<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global> (self=0x7fffffffdbd8) at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:603
#3  0x0000555555562ca5 in hashbrown::raw::{impl#17}::drop<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global> (self=0x7fffffffdbd8)
    at /cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.12.0/src/raw/mod.rs:1801
#4  core::ptr::drop_in_place<hashbrown::raw::RawTable<(alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>), alloc::alloc::Global>> ()
    at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ptr/mod.rs:448
#5  core::ptr::drop_in_place<hashbrown::map::HashMap<alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>, std::collections::hash::map::RandomState, alloc::alloc::Global>>
    () at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ptr/mod.rs:448
#6  core::ptr::drop_in_place<std::collections::hash::map::HashMap<alloc::vec::Vec<alloc::borrow::Cow<str>, alloc::alloc::Global>, alloc::vec::Vec<usize, alloc::alloc::Global>, std::collections::hash::map::RandomState>> ()
    at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ptr/mod.rs:448
#7  toml::de::{impl#0}::deserialize_any<hashbrown_crash::_::{impl#0}::deserialize::__Visitor> (self=0x7fffffffdc68, visitor=...) at /home/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/toml-0.5.9/src/de.rs:244
#8  toml::de::{impl#0}::deserialize_struct<hashbrown_crash::_::{impl#0}::deserialize::__Visitor> (self=0x7fffffffdc68, name=..., fields=..., visitor=...)
    at /home/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/toml-0.5.9/src/de.rs:315
#9  0x000055555555fa3e in hashbrown_crash::_::{impl#0}::deserialize<&mut toml::de::Deserializer> (__deserializer=0x7fffffffdc68) at src/main.rs:3
#10 toml::de::from_str<hashbrown_crash::Input> (s=...) at /home/michael/.cargo/registry/src/github.com-1ecc6299db9ec823/toml-0.5.9/src/de.rs:80
#11 0x000055555555e4cd in hashbrown_crash::main () at src/main.rs:11

Drop remaining libdislocator checks

There are at least two superfluous checks this project has inherited from libdislocator:

  1. Additional protected page is allocated at the end of each region. This crashes the binary on buffer overflows, but uses more memory than normal.
  2. There is a global counter for the total amount of memory allocated, and at some point the library refuses to allocate any more. This is useful to detect a memory leak.

The primary reason I want to get rid of these is that I want to be sure that when a binary crashes under libdiffuzz and doesn't crash without it, it's because uninitialized memory is leaked into the output.

There are other excellent tools to isolate buffer overflows or memory leaks (asan, libdislocator), which should be used if those are the issues you're looking for.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.