Giter Site home page Giter Site logo

blacklistnamematcher's Introduction

Blacklist Name Matcher

Description:

  • If using a file: Imports a file with a list of blacklisted names, cleans it against a file with irrelevant names and contents
  • If using EUROPA sanctions list: Imports XML from EUROPA sanctions website, processes it to list type
  • Then implements a strict find against a query name,
  • If no results are found, a modified (partial) search is implemented

Contents

Components:

  • scraper.py (1 method) - scrape EUROPA XML data and process to list type
  • importer.py (1 method) - import file
  • processor.py (1 method) - process file to list
  • cleaner.py (1 method) - clean file against a noisefile with irrelevant words
  • terrorist_finder.py (3 methods) - find matches in file against query
  • main.py - command line program

Data:

  • blacklist.tsv, blacklist.txt, ...
  • noisefile.tsv, noisefile.txt, ...

Example command line process

tanel@tanel:~/Documents/pyScript$ python main.py
Please enter name to search in the terrorist list 
> Robert Mugabe
Do you want to import a file or use the EUROPA database? (Input 'file', otherwise EUROPA is used) 
> europa
We will use the default sanctions list on ec.europa.eu and 8110 records (as at 27.12.2016)

Give it a few seconds...

------------------------------------------------------------------------------
IMPORTANT!
If this text is followed by an error
it is most likely you requested # names that's more than there are in the list
The list has  8110  existing names
------------------------------------------------------------------------------
No strict match, looking for partial matches...
TERRORIST MATCHED!
Certainty:  100.0 %
Name:  Robert Gabriel Mugabe

TERRORIST MATCHED!
Certainty:  100.0 %
Name:  Robert Gabriel Mugabe

TERRORIST MATCHED!
Certainty:  50.0 %
Name:  Grace Mugabe

TERRORIST MATCHED!
Certainty:  50.0 %
Name:  Grace Mugabe

TERRORIST MATCHED!
Certainty:  50.0 %
Name:  Robert Konars

Program logic

ProgramLogic

Program setup:

ProgramSetUp

Details

  • file has one name in every row (no XML, JSON,.., formatting)
  • common filetypes: txt, csv, tsv

Tech used:

  • Python 2.7
  • Lubuntu 16.04

Current issues:

  • every time a name is queried, data is reimported and processed
    • In reality, a cronjob would do it in every X amount of time
  • Partial matches are not ordered
  • User raw input is not cleaned
  • Does not handle foreign keyboards (e.g. kirillitsa)

Content works, not yet implemented:

  • (levenshteinDistance.py): fuzzy search using Levenshtein distance, if strict and partial matches don't return anything

blacklistnamematcher's People

Contributors

tanel3203 avatar

Watchers

James Cloos avatar  avatar

blacklistnamematcher's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.