Giter Site home page Giter Site logo

Project dead? Need takeover? about aleph HOT 22 CLOSED

deadbits avatar deadbits commented on August 20, 2024 1
Project dead? Need takeover?

from aleph.

Comments (22)

deadbits avatar deadbits commented on August 20, 2024 3

Well initially I have some ideas for a bit of everything. Is this the type of work you are as inline with the project? If so, I can take over leading development here or at the very least implement some new features and help with PRs and issues

General:

  • Migrate to Python 3 (EOL for 2.7 is about a year away)
  • creating tests
  • Pull Request and Issue templates
  • Tests before merging new PRs

Collectors (updated):

  • paste hunting via YARA signatures / regex
    • Base64 PE header post, Hex encoded PE, PowerShell with suspicious keywords, etc
    • these would go through decoder plugin and check if it's worth keeping the sample
  • Twitter monitoring for search keywords (hxxp, hashes from researchers tweets to fetch from VTI, opendir, dailyscriptlet, etc.)
  • VirusTotal Intelligence Hunting notifications
  • REST API submission endpoint for Collector
  • SQS, S3, and/or DigitalOcean Spaces watcher for Collector

Parsing & Enrichment (updated):

  • ability to parse Outlook messages for the Email monitor
  • Plugin to YARA scan any file type
  • adding more information to the PE model
    • Check for commonly suspicious APIs
    • Check for and verify Authenticode
  • adding an ELF and macho model
  • Enrichment of extracted IOCs (ASN, geoip, DNS resolution, etc)

Exporting:

  • export data to arbitrary REST endpoint
  • export JSON pipeline result to SQS, S3 of Digital Ocean spaces bucket so users can act on pipeline results anyway they want to
  • export JSON pipeline result on disk for use in systems other than Elasticsearch, or just plain old viewing the data

from aleph.

jseidl avatar jseidl commented on August 20, 2024 2

from aleph.

merces avatar merces commented on August 20, 2024 1

Hi @deadbits!

Thank you very much for your offer. It's really appreciated!

I definitely believe in Aleph and the only reason it seems abandoned is indeed the lack of developers/time. Are you interested on leading the development here? What exactly do you have in mind?

Bringing up this discussion already helps. =)

from aleph.

jseidl avatar jseidl commented on August 20, 2024 1

from aleph.

jseidl avatar jseidl commented on August 20, 2024 1

from aleph.

deadbits avatar deadbits commented on August 20, 2024

Some more feature thoughts (updated) :

  • tokenizing emails and using those as passwords for protected attachments instead of the hard-coded list that exists now
  • add plug-ins to send files to different online/free or local sandbox services (hybrid, any.run, cuckoo)
  • submit url artifacts to URLScan.io? Idk if we'd want to send everything there
  • VBA extraction from Documents
    • Attempted deoobfuscation
  • Check strings for highly suspicious keywords
    Living of the land binary references, etc.
  • Process scriptlet files for malicious indications and send through pipeline if found
    • HTA, SCT, XML, WS, etc.
  • Reputation DB check for extracted URLs
  • Optional ability to allow for SOCKS5 proxy use for external requests

from aleph.

deadbits avatar deadbits commented on August 20, 2024

from aleph.

deadbits avatar deadbits commented on August 20, 2024

I know we'll likely move this to another ticket, or several, but I put all my ideas into one list so it's easier to view instead of my comments above:

General

  • Migrate to Python 3
  • Create tests
  • Create Pull Request and Issue templates
  • Github integration tests in Pull Request

Samples Object

  • Add filenames list
  • Add first seen / last seen
    • If file is seen more than once just update last seen
    • filenames list can be updated too if available

Collectors

  • Collectors service could be separate so not all collectors are stopped if user stops the service
  • REST API endpoint
    • Submit files directly
    • Submit PCAP, extract files as sample
  • S3 bucket monitor
  • SQS Collector / Polling
  • DigitalOcean Spaces Collector
  • Twitter searches Collector
    • Query and hashtag search of popular researchers and tags; extract hashes and URLs
    • Try to fetch hashes from VTI/Hybrid
    • Check if url is alive, download sample
  • Paste site monitor (this may be more trouble than its worth and leaning too far away from the Aleph purpose imho, but just a thought either way)
    • Keywords
    • Regex
    • YARA
    • Only keep pastes that look to be malware samples

Plugins

  • PE file parser enhancements
    • Certificate
    • Authenicode
    • Suspicious API exports
    • Anti-VM & Anti-Analysis checks
  • Accept and parse ELF and Macho files
  • Subfile extraction (hachoir-subfile + dd, etc)
  • Extract Base64 from files
    • Decode and if interesting MIME type create Sample
  • Extract email attachments for Collectors
  • Outlook email parser
  • Office document parser
    • VBA Macro extraction & deoobfuscation
  • YARA scan
  • ELF parser
  • Macho parser
  • Strings extraction enhancements
    • Find interesting patterns (emails, URLs, IPs, domains, BTC address, phone number)
    • Enrich extracted interesting patterns (hosts, URLs, IPs, emails)
  • VirusTotal plugin enhancements:
    • Get report if it exists
    • Optionally submit file for analysis if no report found (Aleph might be used in sensitive envs where not all files should be uploaded)
  • VTI daily check (requires paid account)
    • Clear notifications once completed
  • HTML parser enhancements
    • Extract links / urls
      • If URL matches pre-defined MIME types to keep, save as sample
      • Maybe crawl found links for more files with interesting MIME types and create a child Samples
  • Check hosts against reputation databases
    • threatexpert, FireHOL lists, VT host check, ShadowServer whitelist check (There's too many choices to list)
  • Submit samples on free online sandboxes or local installation python-sanboxapi
  • Zip and GZip enhancements
    • Tokenize emails to use passwords as brute force list
    • Let user define list of keywords in a config file to try as passwords
  • Ability to define SOCKS5 proxy for web requests
  • Scriptlet file parsers (HTA, SCT, XML, WS, etc)
    • Either as direct submission or as a child
    • If as child, feed back to Collector for pipeline processing

Decoders (subset of plugin to run under certain conditions?)

  • Base64
  • Reverse Base64
  • Hex to binary decoding
    Note: These are based on the Paste scraper finding these types of encoded files

Export Options
* Send to Elasticsearch
* Send to Splunk
* Save JSON to disk as storage
* Send to S3 bucket as storage
* Send to SQS so user can integrate with other systems and workflows
* Send to DigitalOcean Spaces as storage

from aleph.

merces avatar merces commented on August 20, 2024

Wow. This is a lot of good ideas indeed! Thanks for that, @deadbits!

@jseidl We can leverage the "Development" branch as this is not being used by anyone. Would you be able to upload this new code there by the end of next week? I think we should leverage the energy @deadbits is willing to put on it to start it as soon as possible. 🙂

Thank you all!

from aleph.

deadbits avatar deadbits commented on August 20, 2024

@merces
Definitely a lot of ideas, indeed heh I don't know how many fit with the direction of this project and imho some would be higher priority than others. Not to mention implementing all those would take quite some time.

There's also a handful of open-source Python libraries I have in mind to lean on for some of the ideas so it's not code written from scratch. Though, I think a decent amount of them are quick-wins while others require more major work.

Regardless, I'm definitely up to help out in any way I can, and working with you and @jseidl to figure out what should be kept or scrapped, what should be prioritized, etc., etc.

from aleph.

deadbits avatar deadbits commented on August 20, 2024

from aleph.

jseidl avatar jseidl commented on August 20, 2024

from aleph.

jseidl avatar jseidl commented on August 20, 2024

from aleph.

jseidl avatar jseidl commented on August 20, 2024

Ok, attaching PDF from mail didn't work. Uploaded to my Drive here: https://drive.google.com/open?id=1lvNFhJcguHfLgXHm865XXWVnfahTQcOA

from aleph.

deadbits avatar deadbits commented on August 20, 2024

Also I'd like to all the collectors fist save the sample locally then consume the local file into the transport to avoid losing the sample in case connection fails abruptly or something else weird happens during collection.
...
On the processor side, on starting up reprocess samples left in the temp
dir, delete from temp dir only when making sure all data is stored on the
backends.

I built a project similar to this architecture and this is definitely the best approach. I'm guessing you're already planning this but storing locally by hash is a solid way to avoid collisions (instead of uuid4 or what not)

Basically:

  • Receive sample from wherever
  • Store locally with a unique file name
  • Put the file into transport
    • When you're sure it's stored on the backend DB (or at least accepted by the Consumer as an Object), delete it locally

@jseidl We can schedule some time to sync up next week maybe? Or this weekend even if that works for you. My weekday evenings are typically open, tomorrow I'm out most of the afternoon. Outside of that, I'm ready to get rolling 🚀

from aleph.

deadbits avatar deadbits commented on August 20, 2024

Read through your presentation last night - good stuff! Overall it sounds like a really solid framework and the ideas on how to scale it, create the components separately, etc., are all awesome.

I saw you had that plugins would "run in order". I might have misread or skipped a part but is the idea for plugins to run one a a time on any given Processor, or would plugins for a MIME type run in parallel via threading/multiprocessing?


These are thoughts for way down the road but just had it on my mind after reading your PDF:
Another idea could be to have the plugins have an order of execution per MIME type, so each plugin can act on the results of the last. For example, maybe a Zip file comes in so it hits the "brute_zip" plugin, inside is an executable so the "yara_scan" plugin runs; the results of "yara_scan" says that the executable is Trojan ABC- so the "malware_decoder" plugin runs, and then "extract_iocs" runs on the results on malware_decoder, and so on... That way you get the results of all the plugins still but get to provide deeper levels of context as opposed to say: If file == EXE run strings and extract_iocs, sort of thing

Basically sending files down different plugins "paths" depending on their MIME type and any useful information from the previous plugin.


Also, The malware framework FAME also has a pretty cool feature for their plugins where a plugin inheriting the base class can use "acts_on", "generates", "triggered_by" and a few others. It's an interesting idea that might be useful to think on how to implement something similar. "generates" alerts of various types, or "triggered_by" another module in my example above
https://github.com/certsocietegenerale/fame/blob/ab0e9cc3640b2337dbd873a41e03987ba1ba8035/docs/modules.rst#scope

from aleph.

jseidl avatar jseidl commented on August 20, 2024

from aleph.

deadbits avatar deadbits commented on August 20, 2024

from aleph.

jseidl avatar jseidl commented on August 20, 2024

from aleph.

deadbits avatar deadbits commented on August 20, 2024

from aleph.

deadbits avatar deadbits commented on August 20, 2024

@jseidl we'll have to use Meet since Duo is mobile only and doesn't support screen share etc.
I just need your email address or you can send me an invite to [email protected] for today at 3PM Eastern if that still works

from aleph.

deadbits avatar deadbits commented on August 20, 2024

We can probably close this at this point 😏

from aleph.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.