up2040499 / auto-osint-v Goto Github PK

An automated tool for Validating OSINT. This forms part of the final step of OSINT production as detailed by NATO's open source handbook (2001). This is a research artefact for my Dissertation at the University of Portsmouth.

Home Page: https://up2040499.github.io/auto-osint-v/

License: Creative Commons Zero v1.0 Universal

Python 98.75% Dockerfile 0.92% Shell 0.33%

google-custom-search-api google-custom-search-engine osint-tool python3 requests spacy-ner transformers selenium-wire

auto-osint-v's Introduction

auto-osint-v

See the results of the different Entity Recognition language models here. Note how the spaCy standard 'en_core_web_sm' NER model struggles to recognise military information compared to the model used for this project using the Defence Science and Technology Laboratory 're3d' dataset.

📁 Installation

Note First, please attempt to use the Google Colab, more info below.

Linux / Windows

Clone this GitHub repository git clone https://github.com/UP2040499/auto-osint-v.git
Install conda (mamba also works)
- Conda installation guide
Check conda is installed by checking the version: conda --version
Move into the repo
```
cd ~/<install directory>/auto-osint-v 
```
Create a conda OR mamba environment and install dependencies:
- Install dependencies with conda:
```
conda env create -f environment.yml -n auto-osint-v-python38
```
- Install dependencies using mamba
```
mamba env create -f environment.yml
```

Activate conda environment and run the tool.

Linux (bash)

eval "$(conda shell.bash hook)" #copy conda command to shell
conda activate auto-osint-v-python38
python -m auto_osint_v

Windows

Open an 'Anaconda Powershell Prompt' from the Start Menu, then run the following:

conda init powershell
conda activate auto-osint-v-python38
python -m auto_osint_v

🚀 Usage

💻 Command line instructions:

python -m auto_osint_v <ARGS>

🚧 Arguments 🚧

The following descriptions can also be found by running auto_osint_v -h.

-s/--Silent Assumes you have already entered the intelligence statement here
-n/--NoEditor Input intelligence statement into command line rather than into text editor.
--html Output will be in HTML (default: csv).
-m/--markdown Output will be in markdown (default: csv).
-f/--FileToUse Specify the file to read the intelligence statement from
-p/--output_postfix Specify the output file's postfix, e.g. 'output3.txt' rather than default 'output.txt'

Example usage:

Typical use / First time use

python -m auto_osint_v

Use with options

This reads the statement from the existing intelligence file, and output the results in a markdown file called 'output0.md'.

python -m auto_osint_v -s -m -p 0

The postfix (0 in this case) is useful if you are running the tool multiple times and want to save the results separately.

🎓 Google Colab

Previously, I recommended using Google Colab to run this tool. However, the default machine in the Google Colab performs worse than most local machines would (this is likely due to CPU limits in place). You can pay for a higher-performing machine with a GPU, this does improve performance.

The Google Colab can be found here

The reason it is recommended to use Google Colab is because it runs the tool remotely. While performance on a local machine may be better, most of my (underpowered) machine's available resources (CPU, RAM) were utilised by the tool.

If the tool struggles to run on your local machine use Google Colab to avoid hogging your computer's resources.

auto-osint-v's People

Contributors

Stargazers

Watchers

auto-osint-v's Issues

Time-check: how long will it take to implement these techniques? Is it doable in the time given?

Feature: Attach sources to intel statement

See discussion #7

It could be possible to just use the bias source store for this.

May just require a change to the user prompt.

Create the sentiment analyser. This first analysis will analyse the intelligence statement.

The results of the sentiment analysis will be stored in Evidence Store #22.

Store results of the analysis as an entry into the evidence store.
This entry would have the source of ‘semantic analysis of intelligence statement’.

User input section. User enters intelligence statement & bias source.

Produce a plan & design for the overall tool

Popular information finder

This finds information that is popular amongst the sources found in Source Aggregation #17.
Once found, individual (and discrete) entities are stored in a Popular Entity Store.
This is accessed by the Priority Manager #20.

Create the bias source information store.

Create information stores for target entity information.

Media Processor

What tools are required in producing the output from the input

Pylint github action is broken. Takes ages to install dependencies and still does not recognise import packages

Source Similarity Check

Discard sources with poor similarity scores.

Final Summary of Sources

Takes all sources from Evidence Store #22 and Bias Sources Store #16.

Outputs a list of sources that corroborate the intelligence statement, sorted by confidence scores. Higher = better.

For each source, summary information, and results of the semantic analysis are outputs for the user to see.

This summary is meant to help guide the user to the most useful (and most validating) open sources for their given intelligence statement.
This can help shape their conclusions with regard to the validity of the information.

Automatically create and deploy documentation.

Combine into one .yml file for running pylint, creating and deploying docs.

Determine how these techniques will work together in producing the output.

Social Media Search

See Google Search issue #27

Figure out input dataset for the tool

Key information generator from source document

Likely to use BERT QA or KeyBERT models.

Source Aggregation

This consists of the following components:

Google Search
Social Media Search
Source similarity checker
Key information gatherer
Semantic analyser of key information and headlines
- Sources with very poor semantics are discarded
Each source, along with associated semantic analysis results and web links, stored in a Potential Corroboration Store #23.

Test and evaluate tool's performance

How to go about this?

Final Report

Determine what the output should be

Create Evidence Source Store

This is a store of all sources that will be useful for validating the intelligence statement.

Included are all sources including the semantic analysis of the intelligence statement #18.

Google Search

For all searching, I don't want just to search open sources for the intelligence statement itself.
Key information (keywords) will need to be extracted from the statement. This differs from entity extraction as it requires finding the keywords/information that can be used in a search.

Process to store specific entities obtained from intelligence statement.

Create Potential Corroboration Source Store

A store of all potential sources that may help in validating the intelligence statement (through corroboration).

Used in #20

Search Query Generator

Should be reusable - used to generate key info from statement and key info from sources

Priority Manager

This takes information from Popular Entity Store #19, Target Information Stores #15, and Potential Corroboration Store #23.

Assigns scores for sources that independently mention ‘target information’ and relatively lower scores to sources that proffer ‘popular information’.

Each source gets points for independent mentions, i.e. a source that repeats the given information will only get points for the first mention.

Good (or neutral semantics) semantic analysis results lead to an increase in the source’s overall score. Bad (or emotional semantics) semantic analysis results give the source no points.

Sources are added to Evidence Store #22 with an associated priority score.

Sentiment analysis of key information & headlines

Discard sources with poor semantic scores.
Finally, store all potentially corroborating sources (after this step).

Fix Sentiment/Semantic Analysis Confusion

I have confused Sentiment and Semantic analysis throughout this project so far.
Correction needed.