Giter Site home page Giter Site logo

kais-viz / scrubadub Goto Github PK

View Code? Open in Web Editor NEW

This project forked from leapbeyond/scrubadub

0.0 0.0 0.0 861 KB

Clean personally identifiable information from dirty dirty text.

Home Page: http://scrubadub.readthedocs.io/

License: Apache License 2.0

Shell 0.27% Python 99.73%

scrubadub's Introduction

scrubadub

Remove personally identifiable information from free text. Sometimes we have additional metadata about the people we wish to anonymize. Other times we don't. This package makes it easy to seamlessly scrub personal information from free text, without compromising the privacy of the people we are trying to protect.

scrubadub currently supports removing:

  • Names
  • Email addresses
  • Addresses/Postal codes (US, GB, CA)
  • Credit card numbers
  • Dates of birth
  • URLs
  • Phone numbers
  • Username and password combinations
  • Skype/twitter usernames
  • Social security numbers (US and GB national insurance numbers)
  • Tax numbers (GB)
  • Driving licence numbers (GB)
Build Status Version Downloads Test Coverage Documentation Status

Quick start

Getting started with scrubadub is as easy as pip install scrubadub and incorporating it into your python scripts like this:

>>> import scrubadub

# My cat may be more tech-savvy than most, but he doesn't want other people to know it.
>>> text = "My cat can be contacted on [email protected], or 1800 555-5555"

# Replaces the phone number and email addresse with anonymous IDs.
>>> scrubadub.clean(text)
'My cat can be contacted on {{EMAIL}}, or {{PHONE}}'

There are many ways to tailor the behavior of scrubadub using different Detectors and PostProcessors. Scrubadub is highly configurable and supports localisation for different languages and regions.

Installation

To install scrubadub using pip, simply type:

pip install scrubadub

There are several other packages that can optionally be installed to enable extra detectors. These scrubadub_address, scrubadub_spacy and scrubadub_stanford, see the relevant documentation (address detector documentation and name detector documentation) for more info on these as they require additional dependencies. This package requires at least python 3.6. For python 2.7 or 3.5 support use v1.2.2 which is the last version with support for these versions.

New maintainers

LeapBeyond are excited to be supporting scrubadub with ongoing maintenance and development. Thanks to all of the contributors who made this package a success, but especially @deanmalmgren, IDEO and Datascope.

scrubadub's People

Contributors

thomasbird avatar acampello avatar mynameisbasit avatar mirandachong avatar kais-viz avatar hugofvs avatar anatolyilinleap avatar jrings avatar ivyleavedtoadflax avatar movermeyer avatar roman-y-korolev avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.