Giter Site home page Giter Site logo

samarthdd / cdr-plugin-folder-to-folder Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lucia15/cdr-plugin-folder-to-folder

0.0 0.0 0.0 43.3 MB

License: Apache License 2.0

Python 32.23% Jupyter Notebook 59.99% Shell 3.70% Dockerfile 0.12% Mustache 0.14% HCL 3.09% Makefile 0.05% Smarty 0.44% HTML 0.25%

cdr-plugin-folder-to-folder's Introduction

cdr-plugin-folder-to-folder

Repo Workflows

Project

  • This project aims to create a deployment that is able to process 1TB of unique unsafe files, shared on the harddrive.
  • Implementation should pick up these files and process them via Rebuild engine in a deployment based on 2 ESXi servers.

Arhitecture

  • Workflow cluster
  • Worker cluster
  • Load balancer
  • Monitoring
  • 3 Harddisks (source, evidence and target)

cdr-plugin-folder-to-folder-architecture

Flow diagram

image

Data mapping

image

The Metadata module

The Metadata_Service class manages the creation and updating of the metadata.json files in the HASH directories on HD2

  • get_metadata - takes the path of the file and creates the JSON object
  • get_from_file - get the JSON object from the metadata.json file in the HASH directory
  • write_metadata_to_file - saves the current JSON object to metadata.jsom file in the HASH directory
  • get_original_file_path - obtains the original file path from metadata in the HASH directory
  • get_status - gets current status stored in the metadate.json file of the HASH directory
  • set_status - updates the status stored in the metadate.json file of the HASH directory

PreProcessing Module Flow

image

Processing Module Flow

  • Iterates through the HASH folders created during pre-processing on HD2

  • For each HASH folder:

    • If the status in metadata is not "INITIAL" does nothing
    • Otherwise:
      • Updates the status in metadata to "IN PROGRESS"
      • Sends the file to be processed
      • Saves the processed file to the corresponding directory in HD3
      • Saves the processing report to the HASH folder
      • Updates the status to "COMPLETED"
  • In Loops Class, LoopHashDirectories function iterates through HASH directories of HD2, for each of the directories, it initiated file processing with a call to processDirectory of the File_Processing class

  • The File_Processing class is accessed with the processDirectory function. The function gets a HASH directory path on HD2 as a parameter and processes it.

image

Usage

  • Set minio and jupyter notebook password

     export ACCESS_TOKEN=
    
    
  • Run docker-compose up

Services

Fast API

  • Open http://localhost:8880
  • Endpoints
API Endpoint Method Description
/ GET Home Screen
/pre-process GET Pre processing of file from HD1 to HD2
/loop GET Processing of HD2 files and storing result in HD3

Jupyter Notebook

  • Open http://localhost:8888/
  • Use access_token as password

Minio

  • Open http://localhost:9000/
  • Use access_token as password

Elastic Search

  • Open http://localhost:5601/
  • Use access_token as password

cdr-plugin-folder-to-folder's People

Contributors

diniscruz avatar ggrig avatar samarthdd avatar pranaysahith avatar lucia15 avatar dpatel7 avatar girish-pingala avatar lestat06 avatar ginagc avatar mane2020 avatar dinis-cruz-gw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.