Giter Site home page Giter Site logo

bao's Introduction

bao - the PDB compiler

bao allows you to generate debug information from C code in the CodeView format, which is mostly known for its use in PDBs. JSON is used to assign the types and functions to addresses within the binary.

Showcase

To showcase how bao works I will use a module from Valve's anti-cheat "solution" VAC. It's an ideal sample to test on, because the modules are rather small and as such can be fully analyzed very easily. Complimenting these conditions is the fact that the different modules are similar to each other as all of them I've analyzed contain the ICE cipher.

Before

Before

After applying a generated PDB by Bao

After

Code

int ice_sboxes_initialised;
struct IceKey {
    int		_size;
    int		_rounds;
    struct  IceSubkey	*_keysched;
};

struct IceKey* __thiscall IceKey__IceKey(struct IceKey*, int nRounds);
void* __fastcall VAC_malloc(size_t dwBytes);
void ice_sboxes_init(void);

Configuration

{
  "functions": [
    {
      "name": "IceKey__IceKey",
      "pattern": "56 57 33 FF"
    },
    {
      "name": "VAC_malloc",
      "pattern": "E8 ? ? ? ? 89 7E 04",
      "extra": 1,
      "rip_relative": true,
      "rip_offset": 4
    },
    {
      "name": "ice_sboxes_init",
      "pattern": "75 0B E8 ? ? ? ?",
      "extra": 3,
      "rip_relative": true,
      "rip_offset": 4
    }
  ],
  "globals": [
    {
      "name": "ice_sboxes_initialised",
      "pattern": "47 83 3D ? ? ? ? ?",
      "offsets": [
        3
      ],
      "relative": true
    }
  ]
}

Dependencies

  • LLVM 10 is used to generate the resulting PDB files. This is delegated to the accompanying pdb_wrapper.
  • Clang 10 is used for parsing the C files. This enables us to use the powerful preprocessor.

Known issues

  • enums are not supported (yet?).
  • unions are not supported (yet?).
  • C++ support is experimental and untested.
  • Functions that don't have parameters will not be assigned a type by IDA Pro. This might be a bug in IDA Pro.
  • The GUID and age from the original binary aren't applied to the generated PDB. This means that you'll have to confirm an extra warning dialog in IDA Pro when loading the PDB.
  • The memory model is LLP64 (sizeof(long) == 4), regardless of host platform.

Usage

Generating the PDB as seen in the example is as easy as running bao -c config.json -- vac.dll src/structs.c. The resulting PDB will be saved as vac.dll.pdb in this example. You may pass the -o option to save the resulting PDB somewhere else.

Setup

The easiest way to run bao is to use Docker:

~$ git clone https://github.com/not-wlan/bao.git
~$ cd bao
~$ docker build . -t bao:latest
~$ docker run -v /path/to/project:/project -it bao:latest
#$ cd /project
#$ bao-pdb -o vac.pdb -c vac.json vac.dll structs.c

The first three commands are only necessary on your first run or after an update of bao.

Alternatively you can install the dependencies on your own machine. Be warned though, this is not recommended on a Windows machine!

Usecases

You can use bao to:

  • transfer your reverse engineering efforts from one version of a binary to another one
  • emulate virtual method tables by building structs of function pointers
  • generate PDBs from leaked source code that wouldn't build in its entirety
  • transfer your reverse engineering efforts from one tool to another one

Thanks to

  • FakePDB for inspiring this project and helping me with some of the LLVM API calls.
  • hazedumper-rs for the pattern format.
  • All the people who have helped me along the way.

bao's People

Contributors

not-wlan avatar jfm535 avatar jfmherokiller avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.