Giter Site home page Giter Site logo

xcyclopedia's Introduction

xCyclopedia Logo

xCyclopedia

Encyclopedia for Executables

What is xCyclopedia?

The xCyclopedia project attempts to document all executable binaries (and eventually scripts) that reside on a typical operating system. Currently, this includes all observed EXE and DLL files, as well as COM Objects (new!). It provides a web page to view the data as well as a machine-readable format (JSON and CSV) that can be immediately usable in other systems such as SIEMs to enrich observed executions with contextual data.

What data points are available?

  • Runtime data (Standard Out, Standard Error, Children Processes, Screenshots, Open Handles, Loaded Modules, Window Title)
  • File metadata (File Description, Original File Name, Product Name, Comments, Company Name, File Version, Product Version, Copyright, PE Machine Type)
  • Digital signature validity and associated metadata (Serial, Thumbprint, Issuer, Subject)
  • File hashes (MD5, SHA1, SHA256, SHA384, SHA512, IMP, PESHA1, PE256)
  • Fuzzy file hash (ssdeep)
  • Similar files* (available on xCyclopedia web page only)
  • External References* (available on xCyclopedia web page only)
    • Examples of misuse (e.g. malicious use of legitimate executable)
    • Microsoft Documentation
  • File scan results (VirusTotal)
  • DLL Exported Functions (DLL files only)
  • (NEW!) COM Objects (CLSID, Friendly Names, Mappings to EXE/DLLs, Exposed methods/properties, other metadata) - Gathered via Get-ComObjects

How is this done?

The results provided in the output directory were gathered in virtual machines of various Windows OS versions and patch levels (currently a very manual process). For your own usage, it is always recommended these scripts be first executed in test environments.

Get-Xcyclopedia

The Get-Xcyclopedia script iterates recursively through all directories and starts any executables found. It then gathers a multitude of artifacts (which is slowly being improved). For example, it grabs the command line output, in search of helpful syntax messages. And if a window is visible, it will take a screenshot.

Get-ComObjects

The Get-ComObjects script iterates through each CLSID and enumerates its associated registry keys and exposed methods/properties.

Where is this data stored?

JSON/CSV

For the machine-readable data (JSON & CSV):

Web Page (Markdown)

For a web-based view of the data click here: strontic.github.io/xcyclopedia. Note: the web view includes a few bonus features that the JSON/CSV files do not currently include; namely the following:

Can I collect this data myself?

Sure! The powershell scripts are here! See syntax/usage section below.

Collector Script Usage

Syntax

 Get-Xcyclopedia
 #Synopsis: Iterate through all executable files in a specified directory (default target is .EXE). Gather CLI usage/syntax, screenshots, file hashes, file metadata, signature validity, and child processes.
   -save_path                  #path to save output
   -target_path                #target path for enumerating files (non-recursive). Comma-delimited for multiple paths.
   -target_path_recursive      #target path for enumerating files (recursive). Comma-delimited for multiple paths.
   -target_file_extension      #File extension to target (default = ".exe")
   -execute_files    [bool]    #Execute each for gathering syntax/usage info (stdout/stderr)
   -take_screenshots [bool]    #Take a screenshot if a given process has a window visible. This requires execute_files to be enabled.
   -minimize_windows [bool]    #Minimizing windows helps with screenshots, so that other windows do not get in the way. This only takes effect if execute_files and $take_screenshots are both enabled.
   -xcyclopedia_verbose   [bool] #Verbose Output
   -transcript_file       [bool] #Write console output to a file (job.txt)
   -export_ssdeep_list    [bool] #Export ssdeep results to a ssdeep-compatible csv file
   -export_ssdeep_list_with_md5 [bool] #Include MD5 with ssdeep file export. Useful for determining similarity of unique files.
   -get_sigcheck          [bool] #Use Sigcheck (Sysinternals) to obtain additional file signatures and PE metadata.
   -get_virustotal        [bool] #Use Sigcheck (Sysinternals) to obtain VirusTotal detection ratio. It does NOT submit file by default.
   -accept_virustotal_tos [bool] #Accept VirusTotal's Terms of Service (https://www.virustotal.com/en/about/terms-of-service/)
   -path_to_file_arg1            #This filepath will be provided as an argument to each binary (to test their response to a file being provided as input)
   -path_to_file_arg2            #This filepath will be provided as an argument to each binary (to test their response to a file being provided as input)
   -convert_to_csv        [bool] #CSV export is enabled by default but can be disabled if desired -- JSON will always be exported.

 Coalesce-Json
   #Synopsis: Combine JSON files into a single file. Only works with PowerShell-compatible JSON files.
   -target_files          #List of JSON files (comma-delimited) to combine. NOTE: The first file listed takes precedence in case of duplicates.
   -save_path             #Path to save the combined JSON file.
   -verbose_output [bool]
   -save_json      [bool] #Save file as JSON
   -save_csv       [bool] #Save file as CSV
   
 Get-ComObjects
   #Iterate through all COM Objects by CLSID. Gather ProgIDs, File Paths, Descriptions, and any other data present in the Classes Root. COM Methods can also be collected. Saves as JSON and CSV.
   -save_path              #path to save output
   -transcript_file [bool] #Write console output to a file (job.txt)
   -create_instance [bool] #UNSAFE! System crash may occur. When enabled, a COM instance is created for CLSID. This is required for determining COM methods.
   -verbose         [bool]

Example

Get-Xcyclopedia -save_path "c:\xCyclopedia\out\" -target_path "$env:windir\system32" -target_file_extension ".exe"
Coalesce-Json -save_path "c:\xCyclopedia\out\" -target_files "c:\temp\A.json","c:\temp\B.json"
Get-ComObjects -save_path "c:\xCyclopedia\out\" -create_instance $true

Optional Dependencies:

  • ssdeep: For obtaining ssdeep fuzzy hashes (useful for finding similar files). You must extract the ssdeep ZIP file (available here) into a subfolder called "bin/ssdeep-2.14.1".
  • Sysinternals Handle: For obtaining the open handles of a given process. You must place handle64.exe (available here) in a subfolder called "bin/sysinternals/handle".
  • Sysinternals Sigcheck: For obtaining additional file hashes, VirusTotal detections, and PE machine-type. You must place sigcheck64.exe (available here) in a subfolder called "bin/sysinternals/sigcheck".
  • DLL Export Viewer: For obtaining Exported Functions from DLLs. You must place dllexp.exe (available here) in a subfolder called "bin/dllexp-x64".

How can I contribute?

  • Share it with friends
  • Provide feedback

TODO

  • Convince a linux/macos guru to script this for other OS's :)
  • Use a more reliable method for determining children processes (and for stopping them)
  • Use Logman.exe (or equivalent) to determine which ETW providers are being populated by a given process.
  • Use SilkETW (or equivalent) for vastly improved runtime metadata gathering.
  • Identify runtime deltas in different executable versions. (e.g. when a new command-line switch is added to the standard output)

xcyclopedia's People

Contributors

strontic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xcyclopedia's Issues

[Feature request] Search filename or filehash through URL

Hi @strontic,

Thanks a lot for your project, very useful for security investigation! ❤️

Is it possible to search file with filename or filehash, directly through the URL?
Something like:

The idea is to take advantage of the functionality 'Custom search engine' on Firefox and Google Chrome, and finally add your project into https://github.com/wikijm/Awesome-Custom-Search-Engines

Regards,
WikiJM

[Feature request] Versatile management of file versions and architectures

A given executable file will typically exists in many flavours, mainly for

  • each supported architecture x86, x86-64/x64/amd64, IA64, arm, ...
  • each build number of the OS (in relationship with the dev lifecycle phase : alpha, beta, rtm, post-rtm)
  • type of build free/checked
  • each SKU

The SKU would be related to:

  • editions of the OS
  • supported languages
  • each licensing mode (VL, retail, ....)
  • timebomb or not
  • eval or not
  • each market channel : oem, volume licensing, msdn, ...
  • each installation modes (upgrade only, or full packaged product)

Creating such index will be very useful for security analysis and investigations.
For there a smart UI and API could be developed to leverage the database model.

JSON files produced cannot be parsed with jq

Thanks for the initiative. It's great.

It seems that the JSON produced cannot be parsed with jq which is standard JSON parser.

cat strontic-xcyclopedia.json | jq .
parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 22163, column 3776

strontic-xcyclopedia.json is not valid

Hi there! When you want to work with this json with python, ValueError: No JSON object could be decoded appears. This happens because the json is not valid.

dos2unix strontic-xcyclopedia.json
dos2unix: Binary symbol 0x1B found at line 22163
dos2unix: Skipping binary file strontic-xcyclopedia.json

-target_path / -target_path_recursive are ignored, scripts try to scan my whole system

PS C:\git\xcyclopedia\script> .\Get-Xcyclopedia.ps1 -target_path "C:\users\Adam\OneDrive\bin"
Transcript stopped, output file is C:\temp\strontic-xcyclopedia\2021-11-27T13-28-19-job.txt
Transcript started, output file is c:\temp\strontic-xcyclopedia\2021-11-27T13-29-50-job.txt
Starting directory listing...
--> Starting directory listing... C:\Windows\system32 (recursive)
--> Starting directory listing... C:\Windows\SysWOW64 (recursive)
--> Starting directory listing... C:\ProgramData (recursive)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.