Giter Site home page Giter Site logo

up2040499 / auto-osint-v Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 31.46 MB

An automated tool for Validating OSINT. This forms part of the final step of OSINT production as detailed by NATO's open source handbook (2001). This is a research artefact for my Dissertation at the University of Portsmouth.

Home Page: https://up2040499.github.io/auto-osint-v/

License: Creative Commons Zero v1.0 Universal

Python 98.75% Dockerfile 0.92% Shell 0.33%
google-custom-search-api google-custom-search-engine osint-tool python3 requests spacy-ner transformers selenium-wire

auto-osint-v's Introduction

auto-osint-v

An automated tool for Validating OSINT. This forms part of the final step of OSINT production as detailed by NATO's open source handbook (2001). This is a research artefact for my Dissertation at the University of Portsmouth

See the results of the different Entity Recognition language models here. Note how the spaCy standard 'en_core_web_sm' NER model struggles to recognise military information compared to the model used for this project using the Defence Science and Technology Laboratory 're3d' dataset.

📁 Installation

Note First, please attempt to use the Google Colab, more info below.

Linux / Windows

  • Clone this GitHub repository git clone https://github.com/UP2040499/auto-osint-v.git
  • Install conda (mamba also works)
  • Check conda is installed by checking the version: conda --version
  • Move into the repo
    cd ~/<install directory>/auto-osint-v 
  • Create a conda OR mamba environment and install dependencies:
    • Install dependencies with conda:
      conda env create -f environment.yml -n auto-osint-v-python38
    • Install dependencies using mamba
      mamba env create -f environment.yml
  • Activate conda environment and run the tool.

    Linux (bash)

    eval "$(conda shell.bash hook)" #copy conda command to shell
    conda activate auto-osint-v-python38
    python -m auto_osint_v

    Windows

    Open an 'Anaconda Powershell Prompt' from the Start Menu, then run the following:

    conda init powershell
    conda activate auto-osint-v-python38
    python -m auto_osint_v

🚀 Usage

💻 Command line instructions:

python -m auto_osint_v <ARGS>

🚧 Arguments 🚧

The following descriptions can also be found by running auto_osint_v -h.

  • -s/--Silent Assumes you have already entered the intelligence statement here
  • -n/--NoEditor Input intelligence statement into command line rather than into text editor.
  • --html Output will be in HTML (default: csv).
  • -m/--markdown Output will be in markdown (default: csv).
  • -f/--FileToUse Specify the file to read the intelligence statement from
  • -p/--output_postfix Specify the output file's postfix, e.g. 'output3.txt' rather than default 'output.txt'

Example usage:

Typical use / First time use

python -m auto_osint_v

Use with options

This reads the statement from the existing intelligence file, and output the results in a markdown file called 'output0.md'.

python -m auto_osint_v -s -m -p 0

The postfix (0 in this case) is useful if you are running the tool multiple times and want to save the results separately.


🎓 Google Colab

Previously, I recommended using Google Colab to run this tool. However, the default machine in the Google Colab performs worse than most local machines would (this is likely due to CPU limits in place). You can pay for a higher-performing machine with a GPU, this does improve performance.

The Google Colab can be found here

The reason it is recommended to use Google Colab is because it runs the tool remotely. While performance on a local machine may be better, most of my (underpowered) machine's available resources (CPU, RAM) were utilised by the tool.

If the tool struggles to run on your local machine use Google Colab to avoid hogging your computer's resources.

auto-osint-v's People

Contributors

up2040499 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

auto-osint-v's Issues

Popular information finder

This finds information that is popular amongst the sources found in Source Aggregation #17.
Once found, individual (and discrete) entities are stored in a Popular Entity Store.
This is accessed by the Priority Manager #20.

Final Summary of Sources

Takes all sources from Evidence Store #22 and Bias Sources Store #16.

Outputs a list of sources that corroborate the intelligence statement, sorted by confidence scores. Higher = better.

For each source, summary information, and results of the semantic analysis are outputs for the user to see.

This summary is meant to help guide the user to the most useful (and most validating) open sources for their given intelligence statement.
This can help shape their conclusions with regard to the validity of the information.

Source Aggregation

This consists of the following components:

  • Google Search
  • Social Media Search
  • Source similarity checker
  • Key information gatherer
  • Semantic analyser of key information and headlines
    • Sources with very poor semantics are discarded
  • Each source, along with associated semantic analysis results and web links, stored in a Potential Corroboration Store #23.

Create Evidence Source Store

This is a store of all sources that will be useful for validating the intelligence statement.

Included are all sources including the semantic analysis of the intelligence statement #18.

Google Search

For all searching, I don't want just to search open sources for the intelligence statement itself.
Key information (keywords) will need to be extracted from the statement. This differs from entity extraction as it requires finding the keywords/information that can be used in a search.

Search Query Generator

Should be reusable - used to generate key info from statement and key info from sources

Priority Manager

This takes information from Popular Entity Store #19, Target Information Stores #15, and Potential Corroboration Store #23.

Assigns scores for sources that independently mention ‘target information’ and relatively lower scores to sources that proffer ‘popular information’.

Each source gets points for independent mentions, i.e. a source that repeats the given information will only get points for the first mention.

Good (or neutral semantics) semantic analysis results lead to an increase in the source’s overall score. Bad (or emotional semantics) semantic analysis results give the source no points.

Sources are added to Evidence Store #22 with an associated priority score.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.