Giter Site home page Giter Site logo

jp2k_fuzz's Introduction

jp2k_fuzz

This repository contains a harness that can be used with WinAFL to fuzz Acrobat's JPEG2000 library. It was used to find CVE-2019-7794.

Details

JP2KLib.dll is a closed source DLL that is used by Adobe Acrobat to decode JPEG2000 images. Since it's a binary with no source code, its exports have an unknown API. Our goal is to invoke the exported functions properly and get the library to decode our image independently of Acrobat. Before we can invoke the exported functions, we need to observe how they're used by the application and we can do that through API Monitor. Doing so shows us the following sequence of calls:

JP2KLibInitEx(MemObj *obj);
MemObj *obj = JP2KGetMemObjEx();
DecOpt *opt = JP2KDecOptCreate();
JP2KDecOptInitToDefaults(opt);
Image *img = JP2KImageCreate();
JP2KImageInitDecoderEx(img, struct_unk_1, JP2KStreamProcsEx*, opt, struct_unk_3);

You might be wondering where the JPEG2000 image data is passed in. Perhaps through JP2KImageCreate? Nope, that's where the parsed data is written. What actually happens is that Acrobat reads the data from the PDF and initializes a stream object called JP2KCodeStm that JP2KImageInitDecoderEx reads during its decoding process. Luckily for us, there exist some symbols for it in the DLL. We then use the frida_trace.js script to identify which of its methods are called during ordinary image decoding. Once we know what we need to stub, we perform some quick reversing:

JP2KCodeStm::InitJP2KCodeStm(unsigned __int64,int,void *,JP2KStreamProcsEx *,JP2KStmOpenMode,int)
JP2KCodeStm::GetCurPos(void) - returns current pos (+28)
JP2KCodeStm::IsSeekable(void) - returns 1
JP2KCodeStm::read(uchar *outBuf, int nCount) - read nCount bytes from stream into outBuf, returns num bytes read
JP2KCodeStm::seek(int flag, __int64 pos) - seek to pos, returns current pos

Now that we know how JP2KImageInitDecoderEx reads data from the stream we can detour the above functions to read/seek from our buffer containing JPEG2000 data! We use adobe_jp2k.py to read our JPEG2000 image and insert the bytes as an array in our frida_harness.js script. This script will attempt to emulate the above functions, e.g. for JP2KCodeStm::read:

var jp2kBytes = [<bytes from JPEG2000 image>];
...
case "?read@JP2KCodeStm@@QAEHPAEH@Z":
    Interceptor.replace(exports[i].address, new NativeCallback(function(outBuf, nCount) {
        console.log('[i] JP2KCodeStm::read() - writing ' + nCount + ' bytes to ' + outBuf + ' curpos=' + curPos);
        var readBytes = jp2kBytes.slice(curPos, curPos + nCount);
        curPos += nCount;
        Memory.writeByteArray(outBuf, readBytes);
        dumpAddr('read()', outBuf, nCount);
        return readBytes.length;
    }, 'int', ['pointer', 'int'], 'stdcall'));
    break;

Provided our reversed implementation is correct, this should work right? But it doesn't. We're missing one more thing - the decoding function uses MemObj for memory management which in turn uses Acrobat's own memory management primitives. Since we're calling into the DLL directly, we don't have these available so the decoding fails. We have to emulate MemObj ourselves and register it through JP2KLibInitEx. We start by reversing the MemObjEx struct:

struct MemObjEx
{
	int(__cdecl *init_something)(int);
	int(__cdecl *get_something)(int);
	int(__cdecl *not_impl)();
	void(__cdecl *free_2)(void *);
	void *(__cdecl *malloc_1)(int);
	void(__cdecl *free_1)(void *);
	void *(__cdecl *memcpy_memset)(void *dest, void *src, int size);
	void *(__cdecl *memset_wrapper)(void *dest, int val, int size);
};

At this point we write our C++ harness where we can (mostly) route the above methods to whatever heap we have available. Our final harness has the following high-level logic:

// Load our target library
HINSTANCE JP2KLib = LoadLibrary(L"JP2KLib.dll");

// Obtain references to the exported functions we're interested in
libInit = (JP2KLibInitEx)GetProcAddress(JP2KLib, (LPCSTR)185);
imgCreate = (JP2KImageCreate)GetProcAddress(JP2KLib, (LPCSTR)58);
decOptCreate = (JP2KDecOptCreate)GetProcAddress(JP2KLib, (LPCSTR)43);
decOptInit = (JP2KDecOptInitToDefaults)GetProcAddress(JP2KLib, (LPCSTR)45);
decode = (JP2KImageInitDecoderEx)GetProcAddress(JP2KLib, (LPCSTR)157);

// Use Detours to hook the JP2KCodeStm methods above to return data from our file buffer
hook_jp2kstm();

// Init our fake MemObj with JP2KImageInitDecoderEx 
hook_memobj();

// WinAFL will pass the filename of the mutated data through argv
// This function will initialize a JP2KCodeStm from the file and attempt to decode the image 
fuzz_jp2k(argv[1]);

Once we compile our corpus and minimize it while maximizing coverage, we achieve decent performance with WinAFL and even manage to find a bug (CVE-2019-7794).

Unfortunately for me, Checkpoint was doing all of this and more at the same time: https://research.checkpoint.com/2018/50-adobe-cves-in-50-days/. That research was published a few months after I had written all of this. ¯\(ツ)

jp2k_fuzz's People

Contributors

ronwai avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.