Giter Site home page Giter Site logo

maldoca's Introduction

MalDocA - Malicious Document Analyzer

MalDocA is a library to parse and extract features from Microsoft Office documents. It supports both OLE and OOXML documents.

The project's goal is to analyze potentially malicious documents to improve user safety and security.

REQUIREMENTS

  • Bazel (recommended version: 4 or 5)
  • Clang (recommended version: 11 or 12)
  • OS: Linux or Windows

GENERAL

Some testdata files contain malicious code! Hence, we use a xor-encoding for some testdata files as a safety measure (key = 0x42). Additionally, they are prefixed by "MALICIOUS_" and postfixed by "_xor_0x42_encoded". In general, be very careful when opening / processing test files!

For convenience, we provide a python script ("testdata_encode.py") to encode / decode those files. The script's output is stored in the same path, having "_xored" as file name appendix. Keep in mind that encoding a file twice decodes it again, i.e. restores the original file.

Example usage: python testdata_encode.py maldoca/service/testdata/c98661bcd5bd2e5df06d3432890e7a2e8d6a3edcb5f89f6aaa2e5c79d4619f3d.docx

WINDOWS

  • Bazel has some Windows related problems, e.g. maximum path length limitations. Make sure to read the best-practices to avoid them.
  • Enable symlink support (how-to) as it is required by Bazel.

CHECKOUT

git clone --recurse-submodules https://github.com/google/maldoca.git

cd maldoca

BUILD

Linux: bazel build --config=linux //maldoca/...

Windows: bazel build --config=windows //maldoca/...

TEST

Linux: bazel test --config=linux //maldoca/...

Windows: bazel test --config=windows //maldoca/...

DOCKER

We provide a docker file in "docker/Dockerfile". This is the reference platform we use for continuous integration and optionally (arguably recommended) for development as well. Please check the documentation in "docker/Dockerfile" on how to build and use for development.

CONTACT

[email protected]

DISCLAIMER

This is not an official Google product.

maldoca's People

Contributors

b-maldoca avatar dtao-oss avatar oanise93 avatar pi-rate14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

maldoca's Issues

oss_utils.cc uses deprecated libxml functions

xmlSAXParseMemory, which is used in oss_utils.cc, has been marked as deprecated in libxml, which I discovered while rolling new commits into chromium's copy of libxml:
https://chromium-review.googlesource.com/c/chromium/src/+/3863846
https://ci.chromium.org/ui/p/chromium/builders/try/linux-rel/1116534/overview

I'm not sure exactly how to migrate this off of xmlSAXParseMemory, but the comment says to use xmlNewSAXParserCtxt and xmlCtxtReadMemory instead.

cc @nwellnhof @b-maldoca @dtao-oss

Remove all references to base/cxx17_backports.h in the code

According to the instructions of the following issue, we need to convert base::clamp to std::clamp. After this work is completed, the base/cxx17_backports.h file needs to be deleted. This work is still in progress.

https://bugs.chromium.org/p/chromium/issues/detail?id=1373621

At this stage, I found that only bazel/mini_chromium.BUILD needs to be modified in the code base of maldoca. The other references come from mini_chromium, and I have submitted the code to the mini_chromium code base.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.