Giter Site home page Giter Site logo

pstirparo / machofile Goto Github PK

View Code? Open in Web Editor NEW
46.0 3.0 3.0 46 KB

machofile is a module to parse Mach-O binary files

Home Page: http://threatresearch.ch

License: MIT License

Python 100.00%
mach-o malware-analysis malware-research python python3 macos

machofile's Introduction

machofile

machofile is a module to parse Mach-O binary files

Inspired by Ero Carrera's pefile, this module aims to provide a similar capability but for Mach-O binaries instead. Reference material and documentation used to gain the file format knowledge, the basic structures and constant are taken from the resources listed below.

machofile is self-contained. The module has no dependencies; it is endianness independent; and it works on macOS, Windows, and Linux.

While there are other mach-o parsing modules out there, the motivations behind developing this one are:

  • first and foremost, for me this was a great way to deep dive and learn more about the Mach-O format and structures
  • to provide a simple way to parse Mach-O files for analysis
  • to not depend on external modules (e.g. lief, macholib, macho, etc.), since everything is directly extracted from the file and is all in pure python.

This is the very first/alpha version still (2023.11.04), so please let me know if you try or find bugs but also be gentle ;) code will be optimized and more features will be added in the near future.

Current Features:

  • Parse Mach-O Header
  • Parse Load Commands
  • Parse File Segments
  • Parse Dylib Commands
  • Parse Dylib List

Note: as of now, this has initially be tested against x86 and x86_64 Mach-O samples.

Next features to be implemented:

  • extract Entry Point
  • Parse Code Signature information
  • Embedded strings
  • File Attributes
  • data entropy calculation
  • flag for suspicious libraries
  • Packer detection
  • Hashes: dylib hash, import hash, export hash, ...
  • prettify output to console
  • add output option to yaml and json
  • add options to parse only specific structures

Credits

Those are the people that I would like to thank for being the inspiration that led me to write this module:

Usage and example

You can either use it from command line or import it as a module in your python code, and call each function individually to parse only the structures you are interested in.

Module version

It expect to be supplied with either a file path or a data buffer to parse.

import machofile
macho = MachO(file_path='/path/to/machobinary')
macho = MachO('/path/to/machobinary')

The above two lines are equivalent and would load the Mach-O file and parse it. If the data buffer is already available, it can be supplied directly with:

import machofile
macho = MachO(data=bytes_variable)

You will then need to invoke the parse() method to start the parsing process, and can then call each function individually to parse only the structures you are interested in.

macho.parse()
dylib_cmd_list, dylib_lst = macho.get_dylib_commands()
...

Command Line version

From CLI, at the moment it just retrieves all the structures parsed, in the future there will be flags to just get one specific structure or a list of them.

% python3 machofile-cli.py -h
usage: machofile-cli.py [-h] -f FILE [-a] [-i] [-hd] [-l] [-sg] [-d]

Parse Mach-O file structures.

options:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  Path to the file to be parsed
  -a, --all             Print all info about the file
  -i, --info            Print general info about the file
  -hd, --header         Print Mach-O header info
  -l, --load_cmd_t      Print Load Command Table and Command list
  -sg, --segments       Print File Segments info
  -d, --dylib           Print Dylib Command Table and Dylib list

Example output:

% python3 machofile-cli.py -a -f b4f68a58658ceceb368520dafc35b270272ac27b8890d5b3ff0b968170471e2b 

[General File Info]
        Filename:    b4f68a58658ceceb368520dafc35b270272ac27b8890d5b3ff0b968170471e2b
        Filesize:    54240
        Filetype:    Mach-O i386 executable
        Flags:       <NOUNDEFS|DYLDLINK|TWOLEVEL>
        MD5:         20ffe440e4f557b9e03855b5da2b3c9c
        SHA1:        1bf61ecad8568a774f9fba726a254a9603d09f33
        SHA256:      b4f68a58658ceceb368520dafc35b270272ac27b8890d5b3ff0b968170471e2b

[Mac-O Header]
        magic:       MH_MAGIC (32-bit)
        cputype:     Intel i386
        cpusubtype:  x86_ALL, x86_64_H, x86_64_LIB64
        filetype:    MH_EXECUTE
        ncmds:       13
        sizeofcmds:  1180
        flags:       MH_NOUNDEFS, MH_DYLDLINK, MH_TWOLEVEL

[Load Cmd table]
        {'cmd': 'LC_SEGMENT', 'cmdsize': 56}
        {'cmd': 'LC_SEGMENT', 'cmdsize': 192}
        {'cmd': 'LC_SEGMENT', 'cmdsize': 328}
        {'cmd': 'LC_SEGMENT', 'cmdsize': 192}
        {'cmd': 'LC_SEGMENT', 'cmdsize': 56}
        {'cmd': 'LC_SYMTAB', 'cmdsize': 24}
        {'cmd': 'LC_DYSYMTAB', 'cmdsize': 80}
        {'cmd': 'LC_LOAD_DYLINKER', 'cmdsize': 28}
        {'cmd': 'LC_UUID', 'cmdsize': 24}
        {'cmd': 'LC_UNIXTHREAD', 'cmdsize': 80}
        {'cmd': 'LC_LOAD_DYLIB', 'cmdsize': 52}
        {'cmd': 'LC_LOAD_DYLIB', 'cmdsize': 52}
        {'cmd': 'LC_CODE_SIGNATURE', 'cmdsize': 16}

[Load Commands]
        LC_CODE_SIGNATURE
        LC_DYSYMTAB
        LC_LOAD_DYLIB
        LC_LOAD_DYLINKER
        LC_SYMTAB
        LC_UNIXTHREAD
        LC_UUID

[File Segments]
        SEGNAME    VADDR VSIZE OFFSET SIZE  MAX_VM_PROTECTION INITIAL_VM_PROTECTION NSECTS FLAGS
        ----------------------------------------------------------------------------------------
        __PAGEZERO 0     4096  0      0     0                 0                     0      0    
        __TEXT     4096  28672 0      28672 7                 5                     2      0    
        __DATA     32768 4096  28672  4096  7                 3                     4      0    
        __IMPORT   36864 4096  32768  4096  7                 7                     2      0    
        __LINKEDIT 40960 20480 36864  17376 7                 1                     0      0    

[Dylib Commands]
        DYLIB_NAME_OFFSET DYLIB_TIMESTAMP DYLIB_CURRENT_VERSION DYLIB_COMPAT_VERSION DYLIB_NAME                   
        ----------------------------------------------------------------------------------------------------------
        24                2               65536                 65536                b'/usr/lib/libgcc_s.1.dylib' 
        24                2               7274759               65536                b'/usr/lib/libSystem.B.dylib'

[Dylib Names]
        b'/usr/lib/libgcc_s.1.dylib'
        b'/usr/lib/libSystem.B.dylib'

Reference/Documentation links:

machofile's People

Contributors

pstirparo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

machofile's Issues

Problem parse()

Traceback (most recent call last):
File "test.py", line 3, in
macho.parse()
File "machofile.py", line 393, in parse
self.load_commands, self.load_commands_set = self.get_macho_load_cmd_table()
File "machofile.py", line 519, in get_macho_load_cmd_table
load_commands.append({"cmd": LOAD_COMMAND_TYPES[cmd], "cmdsize": cmdsize})
KeyError: 44

Fixed: add ("LC_ENCRYPTION_INFO_64", 0x2C),

add a changelog

add a changelog section in the readme or as separate file to keep track of changes between versions

JSON output and Pypi package

hey, first of all great idea :)

I would love to add this tool at the speed of light in IntelOwl if there was a JSON output available and a pypi package :P

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.