Giter Site home page Giter Site logo

ppomes / myanon Goto Github PK

View Code? Open in Web Editor NEW
78.0 4.0 10.0 1.56 MB

A mysqldump anonymizer

Home Page: https://ppomes.github.io/myanon/

License: Other

Makefile 0.75% Shell 0.02% M4 6.29% C 79.83% Yacc 6.43% Lex 6.56% Dockerfile 0.13%
mysqldump anonymizer database-anonymizer gpdr rgpd anonymization mysql anonymized-data anonymized-database maskingdb obfuscator anonymize sqldump dump sql

myanon's Introduction

Myanon

Myanon is a MySQL dump anonymizer, reading a dump from stdin, and producing an anonymized version to stdout.

Anonymization is done through a deterministic hmac processing based on sha-256. When used on fields acting as foreign keys, constraints are kept.

A configuration file is used to store the hmac secret and to select which fields need to be anonymized. A self-commented sample is provided (main/myanon-sample.conf)

This tool is in alpha stage. Please report any issue.

Simple use case

Example to create both a real crypted (sensitive) backup and an anonymized (non-sentitive) backup from a single mysqldump command:

mysqldump mydb | tee >(myanon -f myanon.cfg | gzip > mydb_anon.sql.gz) | gpg -e -r [email protected] > mydb.sql.gz.gpg

Installation from sources

Build Requirements

  • autoconf
  • automake
  • make
  • a C compiler (gcc or clang)
  • flex
  • bison

Example on a Fedora system:

$ sudo dnf install autoconf automake gcc make flex bison
[...]

Example on a Debian/Ubuntu system:

$sudo apt-get install autoconf automake flex bison build-essential
[...]

On macOS, you need to install Xcode and homebrew, and then:

$ brew install autoconf automake flex bison
[...]

(Please ensure binaries installed by brew are in your $PATH)

Build/Install

./autogen.sh
./configure
make
make install

Compilation/link flags

Flags are controlled by using CFLAGS/LDFLAGS when invoking make. To create a debug build:

make CFLAGS="-O0 -g"

To create a static build on Linux:

make LDFLAGS="-static"

Run/Tests

main/myanon -f tests/test1.conf < tests/test1.sql
zcat tests/test2.sql.gz | main/myanon -f tests/test2.conf

Installation from packages (Ubuntu)

A PPA is available at: https://launchpad.net/~pierrepomes/+archive/ubuntu/myanon

Docker Build / Run

tl;dr:

docker build --tag myanon .
docker run -it --rm -v ${PWD}:/app myanon sh -c '/bin/myanon -f /app/myanon.conf < /app/dump.sql | gzip > /app/dump-anon.sql.gz'

Why Docker?

An alternative to the above build or run options is to use the provided Dockerfile to build inside an isolated environment, and run myanon from a container.

It's useful when:

  • you can't or don't want to install a full C development environment on your host
  • you want to quickly build for or run on a different architecture (e.g.: amd64 or arm64)
  • you want to easily distribute a self-contained myanon (e.g.: for remote execution & processing on a Kubernetes cluster)

The provided multistage build Dockerfile is using the official gcc Docker image for the build phase and the alpine Docker image for runtime (some myanon use-cases need a shell, so a distroless base image would not work here).

Build using Docker

Build a static binary using the provided Dockerfile:

# recommended, to start from a clean state 
make clean
# build using your default architecture
docker build --tag myanon .

For Apple Silicon users who want to build for amd64:

# recommended, to start from a clean state 
make clean
# build using the amd64 architecture
docker build --tag myanon --platform=linux/amd64 .

Run using Docker

In this example we will:

  • use a myanon configuration file (myanon.conf)
  • use a MySQL dump (dump.sql)
  • generate an anonymized dump (dump-anon.sql) based on the configuration and the full dump.

Sharing the local folder as /app on the Docker host:

docker run -it --rm -v ${PWD}:/app myanon sh -c '/bin/myanon -f /app/myanon.conf < /app/dump.sql > /app/dump-anon.sql'

For Apple Silicon users who want to run as amd64:

docker run -it --rm --platform linux/amd64 -v ${PWD}:/app myanon sh -c '/bin/myanon -f /app/myanon.conf < /app/dump.sql > /app/dump-anon.sql' 

Refer to the different options from the documentation above for detailed usage options.

myanon's People

Contributors

asgrim avatar pierrepomes avatar ppomes avatar sjourdan avatar trilliot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

myanon's Issues

Documentation

Please update the documentation for Ubuntu to include the following:

sudo apt-get install build-essential

Once the user runs that step on a fresh Ubuntu 20.04 server the ./configure process runs without error.

Blob values are not quoted

Given the following config:

# Config file for test1.sql
secret = 'lapin'
stats  = 'no'

tables = {
   `lottypes` = {
     `int1`      = inthash 2
     `int2`      = fixed '9'
     `datetime1` = fixed '1970-01-01 12:00:00'
     `text1`     = texthash 5
     `text2`     = fixed null
     `blob1`     = fixed 'hello'
     `blob2`     = texthash 5
#      `blob3`     = fixed '\'hi\''
   }
}

When I run

build/main/myanon -f tests/test1.conf < tests/test1.sql

I expect to see see

INSERT INTO `lottypes` VALUES (... ,'hello','migez', ...);

But I actually see

INSERT INTO `lottypes` VALUES (... ,hello,migez, ...);

I tried quoting/escaping (by uncommenting the config line for blob3) , but I received the error:

Config parsing error at line 14: Syntax error - Unexpected [h]

Generate fields with data from other fields

Given a field like id and a field like username, I'd like to keep id the same but set username to user<id>.

I imagine something like:

tables = {
   `people` = {
     `id`   = texthash 10
     `username` = sql CONCAT('user', id);
   }
}

Consider this a feature request. Thanks :)

Adding column names results in syntax error

Using SQL:

CREATE TABLE `test_with_column_names` (
    `a` int(10) unsigned NOT NULL
) ENGINE=InnoDB;
INSERT INTO `test_with_column_names` (`a`) VALUES (1);

And configuration:

# Config file for test1.sql
secret = 'lapin'
stats  = 'no'

tables = {
   `test_with_column_names` = {
     `a` = inthash 2
   }
}

Gives a syntax error, running with myanon -d, the output is:

main/myanon -d -f tests/test1.conf
CREATE TABLE `test_with_column_names` (
    `a` int(10) unsigned NOT NULL
) ENGINE=InnoDB;
INSERT INTO `test_with_column_names` (FOUND TABLE `test_with_column_names`

ENTERING STATE ST_TABLELOOKING FOR  `test_with_column_names`:`a`

ENTERING STATE INITIALFOUND TABLE `test_with_column_names`

ENTERING STATE ST_VALUES
Dump parsing error at line 3: syntax error - Unexpected [(]

Process finished with exit code 1

NULL values are not maintained for rows with both NULL and non-NULL values

When anonymizing a column that contains both NULL and non-NULL values the NULL values are hashed. I would expect only the non-null values to be hashed and the NULL rows to maintain their current NULL value.

I've attached a tar.gz with an example:

  • null-example.conf = the myanon config
  • null-example-mysqldump.sql = the mysqldump of the database I'm using as an example
  • null-example-anonymized.sql = the myanon result after anonymizing null-example-mysqldump.sql

On line 48 of null-example-anonymized.sql you can see that rows 3 & 5 hash the NULL value to 'ahavykafkojauwmdriqpohobuuttmiif'. I would expect those values to remain NULL.
null-example.tar.gz

Support for complex fields

I have field data in JSON Arrays and simple coma separated list in string, right now whole field is anonymized which "destroys" the array format of it.

Randomise the seed

After running this a few times, on different environments, I've noted that my first "texhhash" username is always the same random value - seems like need to add some randomisation onto the base?

Support for data-only dumps (--no-create-info)

It would be a really nice feature for myself if it could parse a data-only dump (mysqldump --no-create-info). I fiddled with it for a while and couldn't get it to work.

An easy workaround for now is to process the entire dump, and then just remove the create statements from the resulting script. This is a great little tool, thank you!

Can't parse "set"

Can't parse set, like this:
CREATE TABLE some_table (
some_field int NOT NULL AUTO_INCREMENT,
some_field timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
some_field timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
some_field int NOT NULL,
some_field int NOT NULL,
some_field decimal(28,8) NOT NULL,
flags set('vat_included') CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
some_field int NOT NULL,
fails with the error:
Dump parsing error at line 8: Unable to read table definition - Unexpected [s]
Could you fix it please?

Using mysqldump with `--hex-blob` flag breaks myanon

Given the following SQL:

DROP TABLE IF EXISTS `the_blobs`;
CREATE TABLE `the_blobs` (
  `blob1` blob,
  `blob2` tinyblob,
  `blob3` mediumblob,
  `blog4` longblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `the_blobs` VALUES (
  0x68656c6c6f,
  0x68656c6c6f,
  0x68656c6c6f,
  0x68656c6c6f
);

And the following myanon configuration:

# Config file for test1.sql
secret = 'lapin'
stats  = 'no'

tables = {
   `the_blobs` = {
     `blob1`     = fixed '0x0000000000'
   }
}

When I run build/main/myanon -f tests/test1.conf < tests/test1.sql, I should expect to see:

DROP TABLE IF EXISTS `the_blobs`;
CREATE TABLE `the_blobs` (
  `blob1` blob,
  `blob2` tinyblob,
  `blob3` mediumblob,
  `blog4` longblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `the_blobs` VALUES (
  0x0000000000,
  0x68656c6c6f,
  0x68656c6c6f,
  0x68656c6c6f
);

But I actually see:

DROP TABLE IF EXISTS `the_blobs`;
CREATE TABLE `the_blobs` (
  `blob1` blob,
  `blob2` tinyblob,
  `blob3` mediumblob,
  `blog4` longblob
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `the_blobs` VALUES (
  0x0000000000x68656c6c6f,
  0x68656c6c6f,
  0x68656c6c6f,
  0x68656c6c6f
);

Dump parsing error at line 7: syntax error - Unexpected []

Fixed text truncated

we have a rule
``key = fixed '0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF'
64 chars, going into a varchar(64) column, but the last character is being truncated, which then effects all our other data. If we try and extend this, we get an error as > 64 characters. Really love a fix :) or push of new code if you have any

Does fixed even replace if null?

Got some areas where I have to use fixed to set the value, but looks like if this was null it still replaces it with fixed (I believe?) text hash only seems to replace when it's not null, so maybe need a fixed leave null flag / option? fixed "1234567" true (not default, don't replace null)

Config errors

Would be nice to get some feedback if a column specified in the config file didn't exist rather than silently ignore... had misspelled one column so had the potential to leak PII data (which we are trying to avoid).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.