Giter Site home page Giter Site logo

starcrossportal / sleighcraft Goto Github PK

View Code? Open in Web Editor NEW
245.0 10.0 18.0 16.9 MB

sleigh craft!

License: Apache License 2.0

Python 0.79% Rust 9.45% CMake 0.66% Makefile 1.69% C++ 86.38% C 0.89% JavaScript 0.14%
rust sleigh ghidra python-api static-analysis binary-analysis

sleighcraft's Introduction

SleighCraft

SleighCraft is one of the BinCraft project.

SleighCraft is a decoder (or, linear disassembler) based on ghidra's decompiler implementation. Sleighcraft can be used in Rust or Python, with both high-level and low-level API.

In general, sleighcraft is just like capstone but with IR and more archs.

Features:

  • Rust based API and Python scripting API.
  • Decoding with IR as the semantic meaning.
  • Archs: 110 architectures.

️️✔️: provided

❌: not provided

🚧: in construction

🤔: not sure, maybe not

Comparison with capstone:

Feature SleighCraft Capstone Engine
disassemble ✔️ ✔️
IR ✔️️
C API 🚧 ✔️
custom architecture ️✔️

Architectures comparision with capstone (according to capstone arch list):

Arch Names SleighCraft Capstone Engine
6502 ✔️ 🤔
6805 ✔️ 🤔
8051 ✔️ 🤔
8048 ✔️ 🤔
8085 ✔️ 🤔
68000 ✔️ 🤔
aarch64(armv8) ✔️ ️️✔️
arm ✔️ ️️✔️
cp1600 ✔️ 🤔
cr16 ✔️ 🤔
avr8 ✔️ ️️🤔
dalvik ✔️ 🤔
jvm ✔️ 🤔
mips ✔️ ️️✔️
powerpc ✔️ ️️✔️
sparc ✔️ ️️✔️
tricore ✔️ 🤔
riscv ✔️ 🤔
z80 ✔️ 🤔
System Z ✔️
xCore ✔️

How to install

Rust

Use cargo:

sleighcraft = { git = "https://github.com/StarCrossPortal/sleighcraft" }

The repo is a bit large to submit on crates-io (because of predefined sla files), but save you the complex of compiling sleigh files yourself.

Python:

# quick install it with pip
$ pip3 install bincraft

# or download binaries than choose the corresponding architecture
$ pip3 install bincraft-0.1.0-cp39-cp39-Arch.whl

# or manual, to do this, you need to have rust compiler installed and maturin
# better with rustup.
$ pip3 install maturin
$ maturin build
$ pip3 install bincraft-0.1.0-cp39-cp39-Arch.whl

NodeJs:

# quick install it with npm 
$ npm i bincraft

# or manual, to do this, you need to have rust compiler installed, nodejs and neon
# better with rustup.
$ npm install -g neon-cli
$ neon build

How to Use

One could refer to doc.rs to see how Rust binding can be used.

Python binding:

from bincraft import Sleigh

code = [0x90, 0x31, 0x32] # code to disassemble

# init the sleigh engine Sleigh(arch, code)
sleigh = Sleigh("x86", code)

# now we are prepared to disassemble!
# disasm(start_addr)
for asm in sleigh.disasm(0):
    addr = asm.addr()
    mnem = asm.mnemonic()
    body = asm.body()

    # quite like capstone, right?
    print(f'Addr: {addr}\t  mnemonic: {mnem}\t body: {body}')

    # but! we also have the IR!
    pcodes = asm.pcodes()
    for pcode in pcodes:
        opcode = pcode.opcode()
        vars = pcode.vars()
        print(f'opcode: {opcode}\t vars: {vars}\t')
    print()

Nodejs binding:

const Sleigh = require('bincraft');
//or const Sleigh = require('.');

// init the sleigh engine Sleigh(arch, code) like python
const sleigh = new Sleigh("x86",[0x90,90]);

// disasm(start_addr) 
// - start: Default is 0
const asms = sleigh.disasm();

asms.forEach(asm => {
    let addr = asm.addr();
    let mnemonic = asm.mnemonic();
    let body = asm.body();
    // dump instruction
    console.log(`addr: ${addr}\t mnemonic: ${mnemonic}\t body: ${body}`);
    
    // And we have IR!
    let pcodes = asm.pcodes();
    pcodes.forEach(pcode => {
        opcode = pcode.opcode();
        vars = pcode.vars();
        
        console.log(`opcode: ${opcode}\t vars: ${vars}`);
    });
});

Rust (kinda low level):

// Overall procedure:
// 1. get the spec, this is where we know how to decode anything
// 2. get a loader, this is where we fill the input bytes to the engine.
// A predefined loader is provided: `PlainLoadImage`, which sets
// the things to decode by using a single buf.
// 3. set the AssemblyEmit and PcodeEmit instance, these are two
// traits that defines the callback at the decode time.
// 4. do the decode
use sleighcraft::*;
let mut sleigh_builder = SleighBuilder::default();
let spec = arch("x86").unwrap();
let buf = [0x90, 0x32, 0x31];
let mut loader = PlainLoadImage::from_buf(&buf, 0);
sleigh_builder.loader(&mut loader);
sleigh_builder.spec(spec);
let mut asm_emit = CollectingAssemblyEmit::default();
let mut pcode_emit = CollectingPcodeEmit::default();
sleigh_builder.asm_emit(&mut asm_emit);
sleigh_builder.pcode_emit(&mut pcode_emit);
let mut sleigh = sleigh_builder.try_build().unwrap();

sleigh.decode(0).unwrap();

println!("{:?}", asm_emit.asms);
println!("{:?}", pcode_emit.pcode_asms);

A more detailed documentation of Rust API is still under development.

About Us

This is a project started by StarCrossTech PortalLab.

Any contribution through pull request is welcome. ✌️

sleighcraft's People

Contributors

escapingbug avatar ioo0s avatar wcampbell0x2a avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sleighcraft's Issues

Change different variables to the same name

Take IDA, for example.

In some re challenge, there are to many variables named vxx, but in fact they are the same one.

Because the variable operation after disassembly appears as assignment between many registers or others. The ida can't recognize that they're the same variable (The file has removed the symbol table).

IDA name them different variables just to be safe.

At the same time, ida doesn't support name them the same name, so, it give me a hard time when the input data is heavily processed, the input string can be named in dozen different names.

Perhaps BinCraft can do this better than Ida

x86-64 throws BadDataError on basic disassembly

While attempting to use sleighcraft for x86-64 disassembly I've run into a problem with BadDataErrors being thrown on valid code.

Code to reproduce when using the python package:

from bincraft import Sleigh
# Opcodes for xor rax, rax in x86-64
# Also for dec eax; xor eax, eax in x86
code = [72, 49, 192]

# x86 test case
sleigh = Sleigh("x86", code)
print(sleigh.disasm(0)) # works

sleigh = Sleigh("x86-64", code)
print(sleigh.disasm(0)) # fails with OSError: cpp exception: BadDataError

I'm unsure as to what could be causing this issue, or if I missed a configuration step somewhere in the process.
Any feedback is appreciated.

X86-64 architecture program decompile error

When using code code = [15, 31, 128, 0, 0, 0, 0] in sleighcraft and set MODE_64, We will get an error BadDataError.

crash demo

  let mut sleigh_builder = SleighBuilder::default();
    let spec = arch("x86-64").unwrap();
    let buf = [15, 31, 128, 0, 0, 0, 0];
    let mut loader = PlainLoadImage::from_buf(&buf, 0);
    sleigh_builder.loader(&mut loader);
    sleigh_builder.spec(spec);
    sleigh_builder.mode(MODE64);
    let mut asm_emit = CollectingAssemblyEmit::default();
    let mut pcode_emit = CollectingPcodeEmit::default();
    sleigh_builder.asm_emit(&mut asm_emit);
    sleigh_builder.pcode_emit(&mut pcode_emit);
    let mut sleigh = sleigh_builder.try_build().unwrap();

    sleigh.decode(0).unwrap();

    println!("{:?}", asm_emit.asms);
    println!("{:?}", pcode_emit.pcode_asms);

But using capstone is normal
image

More fluent convert to char/hex/dec display experience

Currently convert functionality (by EquatePlugin) has a strange behavior: after convert, sometimes the decompiler also follows the converting, but sometimes it does not.
No explicit message is showed to user. And the expected behavior is not ensured to succeed each time.

The reason behind is that current decompiler only support show constant in five possible format:

  • decimal
  • oct
  • hex
  • char
  • binary

This can be proved in decompiler source code database.hh:
image

Formats such as floating is not yet supported (or string? not sure).

There are two possible solutions to this problem:

  1. show a error message when decompiler cannot follow the rule
  2. fix decompiler, adding the missing parts.

One more task is to add the convert part to the decompiler, but we identify it not included in 0.1 release.

Better Rust API?

As noted in the README, the Rust API is kind of low level. Users need to construct internal structures like CollectingAssemblyEmit and call internal methods sleigh.decode(0).unwrap() (what does this do?) to get the results.
I guess the developer may expect Rust users to be skilled enough, so they can even develop fancier features based on those low level APIs?
How about also providing a higher level one, like shown in the Python/Nodejs bindings?

By the way, the implementation code contains many wrappers of XXXEmit, such as AssemblyEmit, RustAssemblyEmit, CollectingAssemblyEmit, and the Pcode-series emitters. They are basically doing similar things.
I guess the authors may want to provide a callback mechanism and also a default callback that collects the emitted code into a vector. But I think it is kind of over-designed.
Maybe a cleaner way is simply returning an interator, so users can iterate through the generated code and collect them in whatever way they want.

IDA-like default variable names in decompiler

Current ghidra default variable names are verbose, especially those contain stringified address in it. Most of the time, those addresses are not useful. We should strip them out to provide a cleaner decompiler output.

Example (ghidra):
image

Same function in IDA:
image

Ignore the array analysis and the alloca thing. v2 is definitely visually better than those uStack77832 thing. Nobody cares about 77832 kinda thing.

This functionality could be also provided to upstream. But in our case, we could set it to enabled by default but official upstream should have it disabled.

My previous commit can be an illustration of how this can be done properly. But that's not a complete commit, as it does not cover all of the variable name generation algorithm.

One can implement this by following my commit and complete the whole variable generation.
Also, better simpler variable name genration algorithm is also welcome.

UI modernize: dark theme

Complete UI modernize tracking issue.

Tasks:

  • dark theme introduce (with FlatLaf)
  • forground coloring fixing: Ghidra uses hard coded colors all across the project. After introducing dark theme, some of the letters can be hard to recognize because of the default color. (Basic level is done already but the next things should rely on #17, so we consider this done for now)
  • other elements flattening
  • better icons

Current showcase:
image
image

Rust decompiler improvement

Current ghidra has a hard problem decompiling Rust programs.

Fixes:

  • display proper string representation when strings are concatenated into one (in Rust) case. This is resolved in this PR already.
  • wrong stack analysis. Resolved by this PR.
  • wrong parameter analysis. Resolved by this PR

Pcode patching

This is required for more flexible IR arrangements.

The background is that, currently the only way to modify semantic of the program is through instruction patching.
However, the instruction patching has some drawbacks:

  1. instruction patching cannot insert any instruction
  2. instruction patching cannot modify patch a longer instruction and keep the next instruction untouched

And, to be honest, those drawbacks are preventing strong analysis such as deobfuscating control flow flattening.

Obfuscations like control flow flattening would rearrange the basic blocks.
But because of the drawbacks mentioned, no possible rearrangements can be done in Ghidra (or IDA).
At least, not easily possible.

The solution of this problem is to allow pcode patching.
That is, we allow user to display the raw-pcode and patch them.

What we need:

  • an action that pops only when clicked on raw-pcode (this is possible by checking which "row" the user clicked on the instruction.)
  • parsing the user input Pcode as the reverse version of the PcodeFormatter.
  • record the pcode
  • use the recorded pcode and bypass the decompiler calling sleigh engine

The reason of the last two is that the pcode is not stored in the database and is lifted each time by the sleigh engine as mentioned in this issue.

So maybe we could find out some way to bypass the translation and remember the last time lifted and use it for the pcode patching feature.
Note that not all the functions need the pcode stored, only the ones patched. Or else we might have a database exploded in disk space.

UI color configuration refactor

Current UI color is scattered in the code that describes the UI components. Many components use hard-coded color which does not care about the overall look and feel in anyway and may not be configuration. One such example is, if you search for "Color.BLUE" you will likely get a tons of such hard coded color.

One possible solution for this is to refactor all the place that needs the color and use a seperate config file (xml or json, whatever) to describe those colors.

As a side note of how this can be implemented, ghidra instance always need an ApplicationLayout to start,

For example, the GhidraLauncher uses GhidraApplicationLayout class:

image

The application layout stores many whole-ghidra level directory structure:

image

So, when instantiating the GhidraApplicationLayout, we can use the dir info to find the color configuration file resided in the ghidra installation dir.

Then, we can instantiate a singleton called something like ColorConfig and parse the file (xml or json?) to get a configuration.
To ease the choice of color, we can just name the colors like "primary", "secondary", "foreground_default".

Whenever some class wants the color, it should query the singleton and get a color of a particular name.
In this way, if the LaF of java swing switched (just like in dark theme), we should also provide a new configuration of colors so that the colors switches accordingly.

Equate Symbol Storage

When you set a new equate to a number appear in Listing area but not be identified to a variable in deompile area, and then if you want to rename a variable in decompile area, you will get a error message.

And I have found the reason, ghidra will storage all equate symbols in symbol table and produce their hash storage in LocalSymbolMap, so when you rename a variable, ghidra will search for the LocalSymbolMap and make you operate failed.

Tool requirement: binary generation from API

The Sleigh engine is the core of ghidra decompiler. It can deal with the binary stream, disassemble it into instructions and lift it into IRs.

However, its restriction is that it can only deal with the binary stream instead of text streams.
Sometimes we are given the text streams, and we know the underlining semantic of each text instruction. To deal with such situation, the usage of sleigh engine is hard.

A possible solution of this is to write a tool (possibly in Python?) that could generate the binary according to the text instructions and a sleigh specification that could further translate the binary back to the text format.

This allows the sleigh engine to be bypassed and let the ghidra do the rest of the job as it is.

What we need:

  • API design
  • instruction choice algorithm (choose the binary format of each instruction when instructions are fed into the API)
  • sleigh generation algorithm
  • complete tool

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.