Giter Site home page Giter Site logo

leanderhang / bitpunch Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jonathan-gramain/bitpunch

0.0 1.0 0.0 2.89 MB

Swiss Army knife for arbitrarily structured binary files

License: Other

Makefile 0.43% Shell 0.05% C 78.20% Lex 0.60% Yacc 3.00% Python 15.48% C++ 2.24%

bitpunch's Introduction

BitPunch

A Swiss Army knife for binary data files

BitPunch is a core library and a set of tools that allow a user to access meaningful data from arbitrary binary files.

BitPunch schemas form a descriptive, declarative language used to describe the structure of the data stored in the files, and schemas can express a wide range of binary structures.

Once a schema is defined, APIs in various programming languages can be used to access data in an object-oriented way using path expressions or iteration. Currently APIs in C and Python2 are available.

What for?

  • to convert binary data coming from custom binary formats into human-friendly representations like JSON with minimal effort (e.g. for interoperable data exchange);

  • to check sanity of files and be able to recover sane data from corrupt files that cannot be read with standard tools;

  • to implement a reliable and safe parser for some binary format instead of writing it directly in a low-level language like C;

  • for reverse-engineering of uncommonly used or proprietary file formats, and extract data from them or translate to a better data interchange format like JSON;

  • for writing a variety of applications that can benefit from knowing the structure of the files they manipulate. For example, a structure-aware hex editor that can highlight or collapse sections in a tree structure that maps to the binary schema;

  • because it's cool to find out about all the crap that is in your favorite MP4 videos using the CLI! :)

Development status

The current code is to be considered in alpha stage:

  • heavy development ongoing

  • syntax of BP files likely to evolve

Overview

Right now bitpunch is made of:

  • a C library (libbitpunch) that contains low-level C routines to:

    • parse schema specifications found in .bp files

    • open binary files to expose their contents to filters defined in .bp files

  • a Python module that allows accessing raw data through an object model

  • a CLI with analysis commands to access binary files in an interactive fashion, with powerful autocompletion support and inline help. The CLI relies on the Python library.

It understands a custom specialized syntax (let's call it the BitPunch syntax, or BP syntax) that describes precisely the layout of the binary format. Once loaded, binary files can be opened and mapped to the described layout, so that their meaningful content becomes readily available through iteration and path expressions.

It's meant to be able to cover as many as possible of the numerous binary formats in existence, which is possible if a conformant .bp file can be written for those formats. This is a key design goal because esoteric or legacy formats are notoriously hard to exploit by the lack of available tools to do it, or lack of support on modern OSes and languages. With some effort, even undocumented proprietary formats should be exploitable with some reverse-engineering and help from bitpunch tooling.

It shall support very large binary files without trouble, since mmap() is used to map the file contents into memory, and browsing specific elements does not require to browse the entire file first, information is gathered in a lazy fashion. (The other side of the coin is that you may be notified of a format error only when you attempt to access items affected by the error).

More

  • Install instructions and basic usage in INSTALL.md

  • For details about the BP file format, see BP.md

  • A list of planned features is in TODO.md

History

The project started in Boston, Massachusetts in May 2015.

The original motivation was to investigate a compatibility issue with MP4 files on an embedded device, and the tool was meant to help in understanding the format and comparing files in their detailed layout. It then evolved in scope towards a general-purpose tool for any kind of structured binary file.

License

  • The libbitpunch C library and the bitpunch Python package (the Python bindings) are distributed under a 3-Clause BSD License.

  • The bitpunch Command-Line tool and its helper Python package "nestedcmd" are distributed under a GPL v3 license.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.