intellabs / kafl Goto Github PK

View Code? Open in Web Editor NEW

639.0 27.0 84.0 10.18 MB

A fuzzer for full VM kernel/driver targets

Home Page: https://intellabs.github.io/kAFL/

License: MIT License

Makefile 49.44% Jinja 7.68% C 11.58% Dockerfile 31.30%

security fuzzing firmware kernel intel validation research redqueen grimoire qemu

kafl's Introduction

kAFL

HW-assisted Feedback Fuzzer for x86 VMs

kAFL/Nyx is a fast guided fuzzer for the x86 VM. It is great for anything that executes as QEMU/KVM guest, in particular x86 firmware, kernels and full-blown operating systems.

Note: All components are provided for research and validation purposes only. Use at your own Risk

Targets

kAFL is the main fuzzer driving the Linux Security Hardening for Confidential Compute effort, identifing vulnerabilities in a complex setup and improving the security of the Linux kernel for all CC solutions.

Among other successful targets for kAFL/Nyx :

Additionally, kAFL has been used internally at Intel for x86 firmware and drivers validation as well as SMM handlers fuzzing.

Features

kAFL/Nyx uses Intel VT, Intel PML and Intel PT to achieve efficient execution, snapshot reset and coverage feedback for greybox or whitebox fuzzing scenarios. It allows to run many x86 FW and OS kernels with any desired toolchain and minimal code modifications.
kAFL uses a custom kAFL-Fuzzer written in Python. The kAFL-Fuzzer follows an AFL-like design and is optimized for working with many Qemu instances in parallel, supporting flexible VM configuration, logging and debug options.
kAFL integrates the Radamsa fuzzer as well as Redqueen and Grimoire extensions. Redqueen uses VM introspection to extract runtime inputs to conditional instructions, overcoming typical magic byte and other input checks. Grimoire attempts to identify keywords and syntax from fuzz inputs in order to generate more clever large-scale mutations.

For details on Redqueen, Grimoire, IJON, Nyx, please visit nyx-fuzz.com.

Requirements

Intel Skylake or later: The setup requires a Gen-6 or newer Intel CPU (for Intel PT) and adequate system memory (~2GB RAM per CPU)
Patched Host Kernel: A modified Linux host kernel will be installed as part of the setup. Running kAFL inside a VM may work starting IceLake or later CPU.
Recent Debian/Ubuntu: The installation and tutorials are tested for recent Ubuntu LTS (>=20.04) and Debian (>=bullseye).

Getting Started

Once you have python3-venv and make installed, you can install kAFL using make deploy:

sudo apt install python3-venv make git
git clone https://github.com/IntelLabs/kAFL.git
cd kAFl
make deploy

Installation make take some time and require a reboot to update your kernel.

Check the detailed installation guide in case of trouble, or the deployment guide for detailed information and customizing the kAFL setup for your project.

Fuzzing your first target

As a first fuzzing example, we recommend Fuzzing the Linux Kernel.

Other targets are available such as:

A improved documentation is under work for these targets.

Maintainers

License

kafl's People

Contributors

Stargazers

Watchers

Forkers

heruix c0de3 asked637 crackercat ufwt mxisme bl1nnnk yhzx2013 zenhumany killvxk daydayup40 jack51706 hackerago mcgrady1 bb33bb vient evenbily xmoezzz asll666 solar-cat kirasys hktomato mingxiaoshan123 otilrac fuzzingace testcc2c microsvuln junjie-fan stblackcat ch-mk y0ny0ns0n kristal-g gom2002 ejy109 kumarak l1kw1d explife0011 nugmubs nsun1 wjcsharp lilnand il-steffen n3ar d34d633f wenzel wuyadie kharos102 n0-traces xiolee4433 daejin1592 colorlight broly498 jiajinghao1998 rmallof jingfelix h4niz fengjixuchui insu123 iamalch3mist beating-the-machine zezebackup hemx0147 fuzzertv cglosner blueinthedream skandbug legendsaber leonidwang schumilo anhang512 asdyxcyxc m4rm0k yangchuanlongy gehim12 ajgappmark 5angjun luxsnell puier pariigh fateme211 nks0005 element2023h

kafl's Issues

optionally deploy Ghidra

We should probably also replace the Ghidra install script with an ansible task: https://github.com/IntelLabs/kafl.fuzzer/blob/master/scripts/ghidra_install.sh

I guess it can be installed to kafl/ folder to keep everything packaged properly. We can add a new GHIDRA_ROOT to .env and adapt the plugin runner: https://github.com/IntelLabs/kafl.fuzzer/blob/master/scripts/ghidra_run.sh

Failed to match op str on bison

Thanks for efforts maintaining this cool project.

Platform: Ubuntu Server 18.04
CPU: Intel i7 8700k
QEMU: 5.0.0
Kernel of host: 5.8.12

I'm also including both bison_fuzz_initrd.gz and bison_info_initrd.gz as well as full debug.log.

I initially was trying to use the snapshot fuzzing but I kept getting thegot b'R', Expected: b'DZ' issue, even with attempting to add a delay as discussed in #10 . When I added synchronization_lock(cpu); after hypercall_snd_char(KAFL_PROTO_RELEASE); my target would hang forever.

So I took the advice from #17 and set up the user_bench bison test. I got past the qemu_handshake but it quits with the SHM/socket error. It is not an invalid packet like in #17 :

failed to match opstr [rdi*8 + 0x10]
qemu-system-x86_64: /home/rich/qemu-5.0.0/pt/asm_decoder.c:156: asm_decoder_parse_op: Assertion `false' failed.

Any idea for this?

bison_fuzz_initrd.gz
bison_info_initrd.gz
debug.log

If my CPU does not support the intel pt feature, can I still use it to trace branch coverage?

As mentioned in the title

Lock/Release mechanism in qemu instance start/restart need improvements

Hello
Thanks for publishing this project
In line 530 of kAFL-Fuzzer/common/qemu.py sleep(0.1) is added to prevent any race between old instance of qemu and new instance.
In many test cases , understood that 0.1 seconds is not enough to prevent this condition, increased this time little by liitle and reached 3 seconds , i didn't get any errors with 3 seconds of sleep but its not a good way for preventing as you mentioned in

TODO: Need to wait here or else the next instance dies in set_payload()
# Perhaps Qemu should do proper munmap()/close() on exit?

system config:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz
stepping : 10
microcode : 0xde
cpu MHz : 800.059
cache size : 9216 KB

with 16 gb of ram . 4096 gb for qemu instance

Redqueen -fix_hashes

Redqueen frontend in kAFL-Fuzzer/fuzzer/techniques/redqueen was conservatively ported to Python3 but has not been validated/tested beyond checking that LAVA-M bugs still seem to be catched. The -fix-hashes feature is certainly broken and disabled already in upstream.

Failed to reuse ram.qcow2 in parallel fuzzing mode.

The snapshot file "ram.qcow2" is used concurrently by the instances in parallel fuzzing mode. But maybe modern QEMU introduced a new feature "file lock"?

The error log looks like:

qemu-system-x86_64: -hdb /home/jungu/kafl/snapshots/win8_x64/ram.qcow2: Failed to get "write" lock

Does this error take place in your environment?

Kernel Panic when running loader

<< kAFL Usermode Load for Linux x86-64 >>
Kernel Panic Handler Address: ffffffffb9e9b10e
segment fault

Document system-wide changes made by Ansible

As raised in #80 (comment), we need to document the system-wide changes made by Ansible, so users can revert their system back into their original state if desired.

Since Ansible is a configuration management system, maybe it can output a summary of the system modifications at the end of the run ?

Root Snapshot seems not work with Pre-Snapshot

Thank you for sharing great project!

I tried to test kafl sample driver fuzzing with pre-snapshot on Ubuntu 20.04 x64 host, Windows 11 x64 guest.
Only thing I changed was inserting hypercall kAFL_hypercall(HYPERCALL_KAFL_LOCK, 0); as first instruction in the vuln_test.c .
It did work well. Pre-image was dumped correctly, CR3 value was correct, handshake result was fine, but couldn't get coverage information. I thought this was a pre-image bug, so I tried the save/load snapshot function of qemu, but the result was same.
Maybe Root Image does not store device state correctly?

Fuzzing VM: Error in debug_recv(): Got b'R', Expected: b'DZ'

When I start fuzzing with

./kAFL-Fuzzer/kafl_fuzz.py -work_dir <work-dir> -seed_dir <seed-dir> -vm_dir <snapshot-dir> -mem 1024 -agent targets/linux_x86_64/bin/fuzzer/kafl_vuln_test --purge -v -ip0 0xXXXXXXXXXXXXXXX-0xXXXXXXXXXXXXXXX

I get this error:

[WARNING] Slave 0: Error in debug_recv(): Got b'R', Expected: b'DZ'
[FATAL] Failed to launch Qemu, please see logs. Error: Killed Qemu due to protocol error.
[FATAL] Slave has died - check logs!
[FATAL] Master exit: All slaves have died.

I do not understand what this error actually means. It doesn't depend on the IP trace filter range. Everything was set up properly using the provided scripts, the kafl_info.py is working. The debug.log also does not provide more information than above.

Don't get any coverage

Hey,

I'm currenctly again trying to get the fuzzing work on an real ubuntu guest.
So I have a kAFL setup with all the stuff properly running and now I want to fuzz my target. I installed an ubuntu-server on the virtual machine and got this module I like to fuzz:

0xffffffffc028d000-0xffffffffc03c0000 btrfs

Not for any purpose, but just to get known to kAFL in general, and because you already offered an agent for btrfs. I placed some random seeds, but also some img files with btrfs on it. Well, I just get the feedback, that no new coverage was produced.

$ ./kAFL/kAFL-Fuzzer/kafl_fuzz.py -work_dir working/ --purge -seed_dir seed/ -vm_dir qemu/snapshot/ -mem 1024 -ip0 0xffffffffc028d000-0xffffffffc03c0000 -agent kAFL/targets/linux_x86_64/bin/fuzzer/btrfs -v

[...]
Importing payload from /home/simon/working/imports/seed_00000
Imported payload produced no new coverage, skipping..
[...]

May you help me a bit. What else can I try? Am I still doing something wrong with the seeds or something else? What could it be? Thanks

Host: Ubuntu 20.04.1 LTS with 5.8.12-kAFL+ kernel
Guest: Ubuntu-Server 20.04.1 LTS with default kernel

Windows guest hanging

My windows guest work fine for the test driver but keep hanging on real target for some reason.
I use trace-cmd to trace kvm. The guest keep vm_exit due to EXTERNAL_INTERRUPT but i'm not sure why

 qemu-system-x86-10500 [000]  3136.489008: kvm_entry:            vcpu 0
 qemu-system-x86-10500 [000]  3136.493003: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xfffff80135ccba9b info 0 800000ec
 qemu-system-x86-10500 [000]  3136.493004: kvm_entry:            vcpu 0
 qemu-system-x86-10500 [000]  3136.497003: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xfffff80135ccba9b info 0 800000ec
 qemu-system-x86-10500 [000]  3136.497004: kvm_entry:            vcpu 0
 qemu-system-x86-10500 [000]  3136.501003: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xfffff80135ccba9b info 0 800000ec
 qemu-system-x86-10500 [000]  3136.501005: kvm_entry:            vcpu 0
 qemu-system-x86-10500 [000]  3136.501301: kvm_exit:             reason EXTERNAL_INTERRUPT rip 0xfffff80135ccba9b info 0 800000fd
 qemu-system-x86-10500 [000]  3136.501549: kvm_fpu:              unload
 qemu-system-x86-10500 [000]  3136.501550: kvm_userspace_exit:   reason KVM_EXIT_INTR (10)

The rip of the hang is inside ntoskrn!HalpVpptAcknowledgeInterrupt. I googled around and the issue seem to be related to TSC counter interrupt but I can't really figureout why.

Redqueen question

Hi!

I noticed problems getting full coverage on test Windows driver. It looked like Redqueen was not working sometimes, and kAFL was not finding full coverage test set even after tens of millions of attempts. Initially I suspected a race condition occurring somewhere between kAFL Redqueen and QEMU Redqueen, but after reading the code found this line. After removing it, Redqueen started to work perfectly, full coverage test set was found in seconds using Redqueen and not deterministic or havoc approaches.

My question is: looks like you added this line, I don't see it in original RUB-SysSec/redqueen, so what's its purpose? Is it somehow beneficial on large targets?

Reproduce Linux ext4 Fuzzer Results

Hello,

Awesome project, many thanks for maintaining it.

I'm attempting to reproduce the Linux ext4 fuzzer as described in the kAFL USENIX paper. I'm using the -vm_dir/-vm_ram method to start an Ubuntu 16.10 VM, as follows:

python3 kAFL-Fuzzer/kafl_fuzz.py \           
  -vm_dir snapshot \                         
  -vm_ram snapshot/ram.qcow2 \               
  -mem 512 \                                 
  -seed_dir fs-seed \                        
  -p 8 \                                     
  --purge \                                  
  -abort_time 32 \                           
  -work_dir /dev/shm/kafl-fs-fuzz \          
  -catch_resets \                            
  -i 2048-65535 \  # As the paper mentions, limit the deterministic mutations to the first two KB (and last 4B) of the image                        
  -ip0 0xffffffff812bc000-0xffffffff81357af8

Some more details on this configuration:

My machine has a Xeon 4214 (48 cores @ 2.20GHz) and 128GB DDR4 RAM.
I'm using debootstrap to create the Ubuntu image. The kernel I'm using is a precompiled 4.8.0-22-generic.
I'm using dd/mkfs.ext4/tune2fs to generate a few 64KB seed images, then I'm appending 4 bytes to the end for the flags to mount. In practice I usually need to pass at least 4 seeds to stop the fuzzer from getting stuck on the import phase.
To get the -ip0 range I'm looking through /proc/kallsyms for ext4 functions. My first attempt used the entire kernel core range, but that seemed to slow down the fuzzer too much.
I modified fs-fuzzer.c to add HYPERCALL_KAFL_LOCK and a handshake at the beginning so I wouldn't need to use the loader binary.
I made a couple of other modifications to kAFL itself, namely changing the machine type from q35 to pc to get loadvm working, and the parallel -vm_ram fix mentioned here: #15 (comment)

So far my fuzzer appears to run in a stable manner, but the performance I'm seeing makes me suspicious that something isn't configured right:

Execs/s hovers around 400-600 after the first hour or so
Number of paths gets to about 1000 after the full 32 hours
No timeouts or panics yet

Any idea if there's an issue in my configuration that could be hampering performance? Any guidance is greatly appreciated.

Imported payload produced no new coverage, skipping

i met this problem as above ,may be my seed wasn't add new coverage ,but i think no new coverage shouldn't reject fuzz.
some communicator log message indicate fuzzer is ready, i was test the selffuzz_test.exe, i want start a normal fuzz test
what can i do?thanks!

macOS Virtual Machine creation

Hello,

I'm looking for information about the creation of macOS Virtual Machine.

With the help of the kholia/OSX-KVM project I've manage to create VM on macOS Big Sur or Monterey beta.

In order to create a snapshot I've removed the several options (CPU: invtsc ...)that were making the VM non-migratable.

I'm 'able' to create the snapshot but there are several issues with this approach:

I cant use ram and overlay files, only overlay file is used for snapshot storing.
On qemu-2.9.0 the savevm command is not stopping, I need to use the qemu-img info overlay.qcow2 command and kill qemu when the snapshot is shown with that command. (the problem is not present with qemu-4.2.0 btw)
Before using the savevm command I need to remove the OpenCoreBoot, drive used by the kholia/OSX-KVM project, to prevent qemu for writing the snapshot in it.

My snapshot seems to be valid, as I'm able to load it, but I guess there is one problem:

when I'm trying to send the vuln_test agent to crash the vulnerable_driver I'm always getting the phys_addr == -1 errors in read_virtual_memory and write_virtual_memory in memory_access.c.
when I'm sending the info agent the error is not reproduced as I'm able to get the several informations written to the info_buffer.
I have the same issue between the several tested macOS version (HighSierra and BigSur)

So it's possible to read/write at virtual address in userland but it's not possible when the address is in kernelland ?

I bet the way I'm creating the VM, and the snapshot, is not good (or incomplete):

-Does anyone have any idea why I'm encountering this problem ?
-Does anyone could share a better way to create the macOS VM, and snapshot ?

Thank you

ansible deployment bugs

Still a few issues with current deploy playbook. It works fine for normal fresh install but can be buggy when doing partial flows or upgrading existing install:

missing the Makefile in remote mode, and actually the deploy recipies are not useful in this mode. Maybe it is fine to consider remote install only for automated deployments?
partial execution due to existing files or using tags is not reliable, e.g. check_hardware or grub_menuentry_ids can be undefined. Maybe we have to encode these using set_fact or define a separate role to depend on?
playbook should check HW capabilities before anything else. kernel install and grub update should depend on uname -a output, not /dev/kvm ioctl support.

the Obtaining driver virtual address is blank

Thank you for your work! I use the kafl tool to fuzz windows kernel,.when i turn to the step "Obtaining driver virtual address" to get the address of drivers use the command ./kAFL/kAFL-Fuzzer/kafl_info.py -work_dir ./work -vm_dir . -mem 1024 -agent kAFL/targets/windows_x86_64/bin/info/info.exe. there is no output! I do not know why it happy. my host is ubutu20.04 and my window in qemu is window10 enterprise evaluation version 20H2

guest memory may be paged out during read/decode

See related item here RUB-SysSec/kAFL#27

don't reset managed repos by default

ansible git repos are currently force-cloned as part of default playbook. local changes will be reset when running make or make deploy at the toplevel. at the same time, there is no convenient shortcut to just update.

we could use some README notes on useful ansible commands
make should not reset repos by default. if repos exist, maybe deploy should abort and request the user to make distclean?
in more general, we could use a set of tags/tasks and makefile shortcuts to
- update current repos (but do not force, and also not update system packages)
- get current status for all managed repos, to quickly see if there are any local changes in a given install
- trigger rebuild of userspace components after repo update or local changes (but no system-level config or grub changes or repo updates)

Malformed mainifest or wrong version of west?

Following the setup guide I did

git clone $this_repo ~/kafl; cd ~/kafl
make env       # create and activate environment
west update -k # download required sub-components

then I get the error

FATAL ERROR: Malformed manifest file: /home/t/fuzzing/kafl/manifest/west.yml 
  Schema file: /home/t/.local/lib/python3.8/site-packages/west/manifest-schema.yml
  Hint: /home/t/fuzzing/kafl/kafl: "self: import: .submanifests": file /home/t/fuzzing/kafl/kafl/.submanifests not found

Do I need to change the version of west or pip or python or something?
I have not made any edits to the west.yml. It is the same one as in the repo.

Windows guest VM with NIC attached doesn't complete qemu_handshake.

I created Windows 11 VM with "-net nic" option and I edited "kAFL-Fuzzer/common/qemu.py" by adding "-net nic" instead of "-net none". When I execute kafl_info.py the VM just got stuck on the qemu handshake.

Is there anything that I'm doing wrong ?

`install.sh deps` may trigger machine suspend

On vanilla Ubuntu install, install.sh deps may pull in desktop packages as a dependency on qemu, which in turn may activate suspend mode and disable a remote machine. Maybe disabling recommended packages helps. Need testing..

Failed to gather trace data on snapshot mode for Linux

Hi,
First of all, I am grateful to you for your efforts to maintain this fascinating project.

I'm testing this version of kAFL with Linux snapshot(Ubuntu Server 18.04.7) and the vulnerable driver sample given at tests/test_cases/simple/linux_x86-64. I compiled and loaded the kernel module on the QEMU VM and could get an address range for that like below with info feature.

<< kAFL-Fuzzer/kafl_info.py: Agent Info Dumper >>

kAFL Linux x86-64 Kernel Addresses (41 Modules)

START-ADDRESS      END-ADDRESS		DRIVER
0xffffffffc02b0000-0xffffffffc02b4000	kafl_vuln_test
0xffffffffc02a1000-0xffffffffc02a6000	ppdev
...

However when I start fuzzing with kafl_fuzz.py, I keep getting "Imported payload produced no new coverage" no matter of the number of retries or the sample inputs. I slightly modified the code for logging and it seems that the VM keeps emitting MSG_BUSY. (This time I tried with 3 inputs as follows: "aa", "aaa", "abcd")

<< kAFL-Fuzzer/kafl_fuzz.py: Kernel Fuzzer >>

Importing payload from /home/user/kAFL/out/imports/seed_00002
{'type': 1, 'task': {'type': 'import', 'payload': b'aa\n'}}
Imported payload produced no new coverage, skipping..
Importing payload from /home/user/kAFL/out/imports/seed_00000
{'type': 1, 'task': {'type': 'import', 'payload': b'abcd\n'}}
Imported payload produced no new coverage, skipping..
Importing payload from /home/user/kAFL/out/imports/seed_00001
{'type': 1, 'task': {'type': 'import', 'payload': b'aaa\n'}}
Imported payload produced no new coverage, skipping..
{'type': 5}
{'type': 5}

More weird thing is that it happens again when I try with 2 inputs "SERGEJ" and "KERNELAFL" which were to crash the sample driver, except for some feeling of delayed results but no change on the debug.log.

<< kAFL-Fuzzer/kafl_fuzz.py: Kernel Fuzzer >>

Importing payload from /home/user/kAFL/out/imports/seed_00000
{'type': 1, 'task': {'type': 'import', 'payload': b'SERGEJ\n'}}
Imported payload produced no new coverage, skipping..
Importing payload from /home/user/kAFL/out/imports/seed_00001
{'type': 1, 'task': {'type': 'import', 'payload': b'KERNELAFL\n'}}
Imported payload produced no new coverage, skipping..
{'type': 5}
{'type': 5}

Would you please try the snapshot mode on your environment and check this happens? I am curious that this situation happens only to me or if there really exists some issue on the snapshot mode. Thanks.

BTW, I applied the 'sleeping' patch mentioned here(issue #10) to bypass the race issue between the fuzzing logic and the KVM.

Slave has died — fs_shm.seek(0) error in set_payload

So, I tried fuzzing readelf from user_bench a bit, and after several hours I got this:

[FATAL] Failed to set new payload - Qemu crash?
[FATAL] Failed to set new payload - Qemu crash?
[FATAL] Failed to set new payload - Qemu crash?
[FATAL] Failed to set new payload - Qemu crash?
Process Slave 2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/process/slave.py", line 59, in slave_loader
    slave_process.loop()
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/process/slave.py", line 133, in loop
    self.handle_node(msg)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/process/slave.py", line 107, in handle_node
    results, new_payload = self.logic.process_node(payload, meta_data)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/state_logic.py", line 92, in process_node
    new_payload = self.handle_initial(payload, metadata)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/state_logic.py", line 185, in handle_initial
    havoc.mutate_seq_havoc_array(payload, self.execute, num_execs)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/technique/havoc.py", line 64, in mutate_seq_havoc_array
    func(data)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/state_logic.py", line 324, in execute
    bitmap, is_new = self.slave.execute(payload, parent_info)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/process/slave.py", line 251, in execute
    exec_res = self.__execute(data)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/process/slave.py", line 245, in __execute
    return self.__execute(data, retry=retry+1)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/process/slave.py", line 245, in __execute
    return self.__execute(data, retry=retry+1)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/process/slave.py", line 245, in __execute
    return self.__execute(data, retry=retry+1)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/fuzzer/process/slave.py", line 235, in __execute
    self.q.set_payload(data)
  File "/home/vient/git/kAFL/kAFL-Fuzzer/common/qemu.py", line 737, in set_payload
    self.fs_shm.seek(0)
ValueError: mmap closed or invalid
[FATAL] Slave has died - check logs!

Indeed, four files for this slave were gone when I looked at it: bitmap, pt_trace_dump, qemu_payload and interface. QEMU process was missing, and slave was stuck in Z state.

What may have caused it? I had 8 workers, one of them died, others were perfectly normal. I observed same pattern at least one more time (one worker dies, other work) but since it takes a long time I don't know if other workers would die too eventually. Also, kafl_gui was running in both cases all the time. In this particular case kafl_gui have frozen, restart helped. I'll try to fuzz without gui now.

Maybe something can be done to restart broken slaves automatically?

kafl_gui should warn on common problems

kafl_gui.py should warn on stalled/slow execution, non-determinism in the target ('funky' stats counter), empty/overfull bitmap etc. In general there are several incorrect/mocked UI elements can could be filled with life.

Not all workers connecting when running on many cores?

When launching many parallel workers (-p 120), kafl_gui shows the last N workers in STALLED mode. Not clear if this is just the GUI overloaded with looking up worker_stats files, or python multiprocessing failing to connect due to some timeout/buffer limits, or perhaps the scheduler failing to hand out work beyond first 100 workers..?

Update Windows target and others

Hi. I wish to contribute.
What needs to be done in order to migrate targets from kafl_user.h to nyx_api.h? I'm most interested in the Windows target.
I guess I would also need to read some additional info on the topic (kAFL, QEMU, snapshot fuzzing etc.). Any recommendations for that?

'make env' should check for existing pipenv

make env can fail in unfortunate ways when the current new workspace directory already has a pipenv from a previous installation. Solution is to pipenv --rm; but this could also affect an existing pipenv from a parent directory.. :-/

Release Builds

First off, thanks for stepping up and maintaining this project!

This project has made it very easy for someone interested in whole-system fuzzing to jump on board. I see that this has been ported to newer kernels which is awesome. The original project made it seem as if Ubuntu is the only OS which can support this. I seem to recall some conversation backing this. Is there any truth to this claim?

Can an iso of some distro be slipstreamed to include a patched kernel in addition to qemu and fuzzing utilities?

I have several projects where I'd like to integrate kAFL. This usually means recompiling the kernel with my personal patches or something a tad better like the patches found here. To meet my own needs, I was thinking of mirroring the Linux kernel and applying patches to some long term tag. Newer kAFL releases could be rebased on future Linux tags. I'm definitely over-scoping here, but this would make development a lot easier... at least for the KVM portion. It is definitely easier for the typical user to download a specific version and apply patches.

kvm.h patch file problem

There is a problem about kvm vmx pt io control code defined in kvm.h patch file.
The io code KVM_VMX_PT_ENABLE_ADDR3 is conflict with io code KVM_KVMCLOCK_CTRL in kvm.h.
You can see the issus i opened in project redqueen for more information.

Document ghidra usage

Document the flow for obtaining traces of one or more files and visualizing them in Ghidra:

install ghidra and add ghidra_cov_analysis.py to scripts dir
kafl_fuzz.py to create a corpus
kafl_cov.py to create trace files
ghidra_cov_analysis.sh to sort out unique found edges and return coverage report

How to make kafl-ready VM snapshot?

Hi! I can't seem to understand how to launch kAFL, specifically I don't understand how to advance from "created VM with QEMU" state to "got kafl snapshot".

In README, you are referring to original README.kAFL, where I found following instructions:

Execute loader binary <...>. VM should freeze. Switch to the QEMU management console and create a snapshot:

I tried to do the same with my Windows VM and vuln_test.exe loader. VM is not freezing by default when using kAFL patched QEMU. It is freezing when launched with non-patched QEMU, as well as when I add -device kafl,reload_mode=False to kAFL QEMU.

So I used the last method, QEMU+kAFL patch, added kafl device, executed loader, and VM freezed. Next step is to make a snapshot but QEMU hangs in pause_all_vcpus().

What am I doing wrong? Can you elaborate on ways to prepare fuzzing snapshot?

Can't start fuzz except "Importing payload from path/to/imports/seed_00000" when starting fuzz

Dear Developer：when I comes to the final step,I create the seed file and make the seed "0123456789abcdef".Next I used the kafl_fuzz.py to start fuzz.But it only shows that Importing payload from path/to/imports/seed_00000.I wait a long time but there was no other happening next.and I check the import file.it's blank.Is there some steps I took wrongly?Or miss some important steps.

Initial seeds sometimes not traced correctly

The feedback bitmap obtained from qemu is sometimes empty. When starting with only a few seeds, the corresponding inputs are discarded and we end up with an empty queue and execution is stalled.

kafl_fuzz.py will detect and warn about this behavior. Simple workaround is to manually copy the desired seed to $workdir/imports/ where it will be picked up and imported as a new seed on the next scheduler iteration.

The behavior seems sporadic and often disappears after a few attempts to debug.. :-/

steps to make kAFL work

Are there any steps to follow to make kAFL works on Ubuntu?
I follow the below steps and it s not working. I can see the code had been updated. so my previous working steps is not doing it any more

Debian binary

Hello,
While installing redqueen on an Ubuntu 16 host, is it possible to target a binay that was compiled on a Debian os ?
I am getting some issues with the dependancies, not foung locally when generating the packed binaries (binay_info + binary_fuzz).
Any idea about this ? thank you.

Tested compatibility with Comet Lake?

Hi, this issue is not about a technical difficulty but a question for well-experienced kAFL users and developers.

I heard through the grapevine that some processor models can give erratic PT data for fuzzing with kAFL or sibling projects like Redqueen. Has anyone tested if kAFL works fine with the PT facilities of recent Comet Lake processors? I'm eyeing an i9-10900K for buying a machine where I plan to use kAFL and derived projects. Or perhaps suggestions for kAFL-tested high-end i9 CPUs?

Apologies if my request may look odd or off-topic, I thought this may be the best place where one can ask :)

Something wrong

PT support is not detected by KVM

[os]
Linux version 5.3.0-64-generic (buildd@lcy01-amd64-026)(gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2))

[cpu]
Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz

No bitmap updates in linux kernel fuzzing

When launching test linux kernel module fuzzing, kAFL cannot detect crashes and seems to not update bitmaps after the first testcase.
I slighly modified agent for it to output payload through hprintf and see that there are no "fixed" hit values in payload. Even values which are to increase coverage, as I know from the source code, are treated as all others. When I provide an "almost" crashing input, like "KERNELoFL" after some random mutations it leads to the crash, of course, and the fuzzer prints this:

[HPRINTF]	00000000: 4b 45 52 4e 45 4c 5b 46 4c 0a d3 d3 20 e4 e4 fa "KERNEL[FL... ..."
[HPRINTF]	00000000: 4b 45 52 4e 45 4c 5c 46 4c 0a d3 d3 20 e4 e4 fa "KERNEL\FL... ..."
[HPRINTF]	00000000: 4b 45 52 4e 45 4c 42 46 4c 0a d3 d3 20 e4 e4 fa "KERNELBFL... ..."
[HPRINTF]	00000000: 4b 45 52 4e 45 4c 5d 46 4c 0a d3 d3 20 e4 e4 fa "KERNEL]FL... ..."
[HPRINTF]	00000000: 4b 45 52 4e 45 4c 41 46 4c 0a d3 d3 20 e4 e4 fa "KERNELAFL... ..."
Crashing input found (crash), but not new (discarding) 0

If I provide second test case, the fuzzer takes it, runs and considers it to produce no new coverage and finally goes on with dumb mutations without any feedback.

[HPRINTF]	00000000: 4b 45 52 4e 45 4c ff ff ff 7f d3 d3 20 e4 e4 fa "KERNEL...... ..."
Importing payload from /home/user/Documents/fuzz/tools/kafl/fuzz/imports/1
[HPRINTF]	00000000: 53 45 52 31 32 33 0a ff ff 7f d3 d3 20 e4 e4 fa "SER123...... ..."
Imported payload produced no new coverage, skipping..
[HPRINTF]	00000000: 00 20 52 31 32 33 0a ff ff 7f d3 d3 20 e4 e4 fa ". R123...... ..."

I enabled verbose QEMU kAFL component logging via commenting this line in qemu-5.0.0/pt/debug.h #define PT_DEBUG_DISABLE and saw assembly code produced by capstone of qemu in the log.

[QEMU-PT] Diasm: Analyse ASM: ffffffffc02b2000 (4096), max_addr=ffffffffc02b2300
[QEMU-PT] Diasm: Loop: ffffffffc02b2000:        nop     dword ptr [rax + rax], last_nop=0
[QEMU-PT] Diasm: Loop: ffffffffc02b2005:        push    rbp, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b2006:        mov     rbp, rsp, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b2009:        push    r13, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b200b:        push    r12, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b200d:        push    rbx, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b200e:        mov     r12, rsi, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b2011:        mov     rbx, rdx, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b2014:        mov     esi, 0x24000c0, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b2019:        mov     edx, 0x534, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b201e:        sub     rsp, 0x108, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b2025:        mov     rdi, qword ptr [rip - 0x3e0d4b54], last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b202c:        mov     rax, qword ptr gs:[0x28], last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b2035:        mov     qword ptr [rbp - 0x20], rax, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b2039:        xor     eax, eax, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b203b:        call    0xffffffff811fb980, last_nop=1
[QEMU-PT] Redq.: insert hook call ffffffffc02b203b
[QEMU-PT] Diasm: Loop: ffffffffc02b2040:        cmp     rbx, 0xff, last_nop=0
[QEMU-PT] Diasm: Loop: ffffffffc02b2047:        ja      0xffffffffc02b218c, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b204d:        lea     rdi, qword ptr [rbp - 0x120], last_nop=0
[QEMU-PT] Redq.: got boring index
[QEMU-PT] Diasm: Loop: ffffffffc02b2054:        mov     edx, ebx, last_nop=1
[QEMU-PT] Diasm: Loop: ffffffffc02b2056:        mov     rsi, r12, last_nop=1

When disassembling the test module with objdump we see a little different listing.

# objdump -d -M intel kafl_vuln_test.ko

kafl_vuln_test.ko:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <write_info>:
   0:	e8 00 00 00 00       	call   5 <write_info+0x5>
   5:	55                   	push   rbp
   6:	48 89 e5             	mov    rbp,rsp
   9:	41 55                	push   r13
   b:	41 54                	push   r12
   d:	53                   	push   rbx
   e:	49 89 f4             	mov    r12,rsi
  11:	48 89 d3             	mov    rbx,rdx
  14:	be c0 00 40 02       	mov    esi,0x24000c0
  19:	ba 34 05 00 00       	mov    edx,0x534
  1e:	48 81 ec 08 01 00 00 	sub    rsp,0x108
  25:	48 8b 3d 00 00 00 00 	mov    rdi,QWORD PTR [rip+0x0]        # 2c <write_info+0x2c>
  2c:	65 48 8b 04 25 28 00 	mov    rax,QWORD PTR gs:0x28
  33:	00 00 
  35:	48 89 45 e0          	mov    QWORD PTR [rbp-0x20],rax
  39:	31 c0                	xor    eax,eax
  3b:	e8 00 00 00 00       	call   40 <write_info+0x40>
  40:	48 81 fb ff 00 00 00 	cmp    rbx,0xff
  47:	0f 87 3f 01 00 00    	ja     18c <write_info+0x18c>

The first machine instruction is disassembled not correctly by capstone in qemu. Also at the 0x3b and other call offsets capstone calculate wrong destination addresses of the calls. Could it affect the trace to bitmap conversion?

My setup:
Host OS: Ubuntu 20.04 (also tried Debian 10.2)
Guest OS: Ubuntu 16.04 (ro nokaslr nopti oops=panic mitigations=off)
CPU: Intel Core i7-8550U (Kaby Lake R)

processor requirements

Hi
I would like to know which intel processor is perfect to run kAFL
Do we need PTWRITE and others

My processor setup look like this

Intel PT : YES
CR3 Filtering : YES
IP Filtering : YES
PSB/CYC Accurate : YES
MTC Packets : YES
PTWRITE : no
Power Events : no
Single-Range Output: YES
ToPA Output : YES
ToPA Multi-Output : YES
TT Subsys Output : no
IP Payloads : RIP
Address Ranges : 2

intelpt trace feature is not working properly.

Thank you for awesome project.

I tested kafl, version kafl_0.2, on Ubuntu 20.04 x64 with fuzzing Windows 10 x64 kernel.
I want to fuzzing windows default kernel driver.
But target driver is not traced properly.
So I debugged windows qemu with gdb and target driver breakpoint hit properly.
But ntoskrnl.exe is traced well.
I think this problem occurs because of intelpt.
Can you give me an advice for me which code process this problem?

Thank you.

Testing the testing branch

Multiple proposed fixes and updates in the testing branch, including long-standing issues like Qemu hanging on snapshot.
Also a new Linux tutorial based on #27 and #31.

I have kind of moved on to using the Qemu-PT/KVM-PT from Nyx, but let me know how it works and we can merge any fixes.

Cannot find the Log file containting the syscalls the Fuzzer executes .

I am trying to access the log file containing the syscalls. I tried the fuzzer command with the -v flag hoping that the debug log will contain the syscalls. the command:

however the log says "Input validation failed! Target is funky?.."
the debug.log file:

If this -v flag is the way to get the syscalls how do i fix the input validation error. If not can someone guide me on how to get a log file of all the syscalls the fuzzer tries to execute?

Mutation and scheduling improvements

You are saying havoc should be better - but compared to what, and how can we fix it?

Well, I don't really know. What can you say about these tweets? https://twitter.com/gamozolabs/status/1284854239232053248 and https://twitter.com/gamozolabs/status/1284854896483659776.

I can also say that I have a feeling more time was spent in afl_splice than in afl_havoc on this target.

ARITH_MAX=128 looks interesting for example, but I would only make it default if it shows good results across multiple targets. Makes sense?

Since it's deterministic stage, it surely won't make things worse. I think the question is, is it beneficial in the terms of edges per second to make deterministic stage 4 times longer? At least it can be said that that way we fully bruteforce each byte in the test before moving to havoc stage — a nice property.

I think this can be a good starting point for a regular performance/regression test.

Yeah, I think I can start testing with some Linux programs. Do you consider Objdump and NASM to be the best two examples from user_bench? I don't have computing resource to run extensive tests on all programs there but may run one or two.

Originally posted by @vient in #12 (comment)

PT filter range must be page aligned

The IP range given as -ip0 must be page aligned due to implicit assumptions in the KVM/Qemu backend. The python frontend should probably perform a check on this.

Inconsistent payload and bitmap sizes

The maximum payload size and bitmap size is defined in several places across Qemu and Python code. It would be nice to harmonize those so that we can actually make them configurable.

Just a question

How does the intel PT in kAFL notifies the AFL if a new coverage has been hit ?

Breaking change on master branch

Heads up on planned re-organization of the installation flow + git repo structure

The current "workspace" installation approach where we separate the overall setup from python fuzzer and other components seems to work well. Folks can fork the "workspace" and pin some particular versions or add more repos for their desired fuzzing pipeline. Problems are:

workspace as a branch is confusing, people tend to just click "Clone" on the main kAFL page
setup with custom Makefile/install.sh is fragile and does not scale well for automated deployment
the kAFL-Fuzzer is now just one of the components to deploy, and will be better to maintain as such
Docker kind of works for automating kAFL deployment but we also tend to require a special kernel or other system-level properties

So going forward (next few days) we plan to:

move kAFL-Fuzzer in the current master branch to a separate repo (kafl.fuzzer, to keep with the fantastic naming scheme)
turn the 'kAFL' project into the top-level repo with scripts + docs to deploy and use kAFL/Nyx stack
replace install steps + scripts with Ansible playbooks for local and remote installation (feature branch)

The result of Ansible setup will be quite similar to current make install steps, but it may have some gaps initially. We can keep a branch "v0.4" of the current kAFL master for anyone too busy to switch their stuff to the new install.

I don't expect a lot of users being impacted here, basically you can just hold off on updating until you can switch over to the new install flow. But wanted to give a heads-up + reasoning either way.

Let me know if you have any input. Ansible certainly feels like pulling in yet another big piece of infrastructure, but in my experience so far it has been very lightweight, basically just requiring sudo + ssh. It comes with a rich ecosystem to automate the system-level deployment tasks we need and it can also replace West, which has not been as unproblematic as I'd hoped.