Giter Site home page Giter Site logo

tonicai / masquerade Goto Github PK

View Code? Open in Web Editor NEW
181.0 21.0 16.0 86 KB

A Postgres Proxy to Mask Data in Realtime

Home Page: https://www.tonic.ai/post/masquerade-a-postgres-proxy/

License: MIT License

C# 100.00%
postgresql postgres synthetic-data fake-data

masquerade's Introduction

Side by side of psql running against the proxy and against the database

** Check out our blog post on Masquerade!! **

Masquerade: A Postgres proxy that masks sensitive datasets

Masquerade is two things. It is a TCP proxy that sits between your Postgres Database and client and proxies data back and forth. The proxy listens to all messages sent from the database to the client and provides a hook for users to modify data in-transit and before being received by the client. Second, it is a masker. It has built-in functionality that can mask sensitive data in the database so that it cannot be seen by the client.

Masquerade works out of the box and you can start masking your data immediately. Simply run the proxy (see below) and begin issues SQL queries as you would normally do and you'll see the masked reults.

Masquerade is developed by Tonic.ai

Requirements

Getting started

This project comes with a Postgres database inside a Docker container. The database contains a toy dataset which you can use to try out the proxy before connecting it to your own database.

Start the docker-container via

docker-compose up -d

This will start a Postgres database on local port 15432. The default configuration of the proxy will connect to this database. If you want to connect to your own database see below.

Run the proxy by navigating to the repository's parent folder and executing

dotnet run

This will run the proxy with a default configuration. The default configuration applies a masking function to all supported data types (see Issue #1 below).

Connect to the proxy as you would to the real database but change the host and port to that which the proxy is running on. The default ip and port of the proxy is 127.0.0.1:20000. You'll need to provide the username, password, and database to the postgres client but ensure you enter the IP and Port of the proxy.

For example, to connect to our Postgres database with psql you would do the following:

psql "host=localhost port=20000 dbname=test_data" user

The Schema of the included database is found in data/init.sql. Currently, it is a single table called 'events' that has a variety of columns.

Connecting to your own database

If you would like to connect to your own database instead of the toy database then open up config.json to modify the connection details of database. You should modify the values in the "db_connection_details" object.

"db_connection_details": {
        "port": 15432,
        "ip": "127.0.0.1",
        "user":"user",
        "password":"password",
        "database":"test_data"
    },

To connect to the proxy enter the username, password, and database into your Postgres clientbut make sure you enter the IP and Port of the proxy.

Note: You must set sslmode=disable on your given Postgres client

Configuration and Additional Setup

The configuration can be found in the config.json file in the root directory of this repository. Below is a description.

Config Variable Description
proxy_port Local port on which Proxy will listen for client connections
proxy_source_ip Local IP address on which Proxy will listen for client connections
db_connection_details Connection details of the Postgres Database. See the config.json for a list of required fields.
masking_options Details for how you want to mask your data. Includes information on how to handle primary and foreign keys, how to mask based on column names or data types. See below for a more detailed description

Masking Options

Config Variable Description
preserve_keys true/false denoting whether or not primary and foreign keys should be masked
data_type_masks Masking information to be applied to columns with specific data types.
column_masks Masking information to be applied to specific columns based on the schema, table, and column name.

data_type_masks expects an array of JSON objects which look like this:

{
    "data_type": "bigint",
    "masking_function":"maskbigint"
}

where "data_type" is the Postgres data type and "masking_function" is the name of the function masking function to be applied to values of the respective column.

Likewise, column_masks are also an array of JSON objects and they look like this:

{
    "column":"first_name",
    "table":"example",
    "schema":"public",
    "masking_function":"character_substitution"
}

You must specify the column, table, and schema names as well as the masking_function.

Masking Functions

When masking you must provide a masking function. A masking function is a function that takes the original value as input and returns a masked value. You can create your own custom masking functions by creating a new function in the TonicPgProxy.Maskers namespace. Here is an example function which replaces all of the characters of a value with 'x'.

static public string MaskX(string value)
{
    return new string('x',value.Length);
}

Masking Priority

data_type_masks are overrided by column_masks. To better understand lets use the example table below along with the following masking_options.

CREATE TABLE example
(
    first_name text,
    last_name text
);
"masking_options": {
        "preserve_keys": false,
        "column_masks": [
            {
                "column":"first_name",
                "table":"example",
                "schema":"public",
                "masking_function":"maskx"
            }
        ],
        "data_type_masks": [
            {
                "data_type": "text",
                "masking_function":"character_substitution"
            },
        ]
}

Connecting to the proxy and performing a SELECT * FROM public.example will result in the the character_substitution masking function being applied to the last_name column while the first_name column uses the maskx function.

Known Issues

These are known issues and limitations of the product. If any of these issues are blocking for you to use the proxy then please reach out to [email protected] or file a Github issue.

Issue 1

We currently do not support data type masking on the following data types:

box, bytea, cidr, circle, interval, line, lseg, path, point, polygon, tsquery, tsvector

This can lead to issues where data can potentially be leaked. For example, imagine the following table

CREATE TABLE example ( val text );

SELECT val::tsvector from example;

The SELECT statement will return unmasked data since tsvector doesn't have an assigned data type masker.

Issue 2

Imagine the following table

CREATE TABLE example ( val text );

SELECT val AS v from example;

Using the AS operator renames the Postgres Field Name from the column name 'val' to 'v'. This will allow the field to bypass any associated column masker. However, if a data type masker exists for the text data type it will still be applied.

Issue 3

Proxy current supports only a single connection

Issue 4

SSL is not yet supported. If you connect with sslmode=require you will not be able to connect.

Supported clients

At least some amount of testing has been done on the following Postgres clients:

  • Datagrip
  • pgAdmin
  • Postico
  • psql

masquerade's People

Contributors

akamor avatar binarydev avatar calexander3 avatar dependabot[bot] avatar karlhanson avatar yuriipolishchuk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

masquerade's Issues

what is mask function and how to not mask columns in table

Hello, I want to mascarade my DATA with TonicAI / masquerade
i did everything in start guide, and it works, but how to get to know about masarading functions?
is it possible do not mask some columns from my tables?

And here is an answer
go to folder masquerade/Maskers/
analyse namespaces
found mask function :Identity
it returns the same value

Pass sslmode paramater

Hi,
I wanna pass sslmode : "disable" in JSON config file but it seems that proxy is ignoring this line

COPY statements

I am trying to run pg_dump but fails with the message bellow:

pg_dump: error: query failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. pg_dump: error: query was: COPY public.table_name (<COLUMNS_NAMES>) TO stdout;

pg_dump with insert statements works fine.

Because the restore with inserts takes too long, would be nice to have support for COPY statements.

Trying not to be too hideous myself, Is there interest in supporting this?
Thank you.

Proxy terminates connection when running within a docker container

Hey guys,

Awesome tool you've created here, but I've run into some snags when attempting to dockerize it.

I've gotten the tool itself running successfully within a docker container. It can see my dockerized DB, and it's able to connect according to the following output:

$ docker-compose up proxy
Recreating masquerade-proxy_proxy_1 ... done
Attaching to masquerade-proxy_proxy_1
proxy_1  | Starting Proxy...
proxy_1  | Proxy Running:
proxy_1  | 	Proxy Port: 20000
proxy_1  | 	Database Details: [email protected]:5432/test_db_proxy

Docker status while running shows it online and forwarding the proper port:

CONTAINER ID        IMAGE                    COMMAND                  CREATED              STATUS              PORTS                      NAMES
a7bc6476f7fe        masquerade-proxy_proxy   "/bin/sh -c ./start.…"   About a minute ago   Up About a minute   0.0.0.0:20000->20000/tcp   masquerade-proxy_proxy_1

Here's my dockerfile:

FROM mcr.microsoft.com/dotnet/core/sdk:2.2
RUN apt-get update && apt-get -y install git
WORKDIR /app
RUN git clone https://github.com/TonicAI/masquerade.git .
COPY start.sh ./
RUN chmod +x start.sh
CMD ./start.sh

the start.sh entrypoint:

#!/bin/bash
# Find the IP of the PG container and use it to populate the config.json file
POSTGRES_IP=`getent hosts postgres | awk '{ print $1 }'`
sed 's/POSTGRES_IP/'"$POSTGRES_IP"'/g' config.sample.json > config.json
dotnet run

docker-compose file:

version: "3"
services:
  proxy:
    build: .
    ports:
      - 20000:20000
    external_links:
      - postgres_db:postgres
    volumes:
      - "~/Projects/masquerade-proxy/config.sample.json:/app/config.sample.json"
networks:
  default:
    external:
      name: test_default

and config.sample.json as well:

{
  "proxy_port":20000,
  "db_connection_details": {
      "port": 5432,
      "ip": "POSTGRES_IP",
      "user":"postgres",
      "password":"dev",
      "database":"test_db_proxy"
  },
  "masking_options": {
      "preserve_keys": false,
      "column_masks": [{
          "column":"full_name",
          "table":"users",
          "schema":"public",
          "masking_function":"maskx"
      }],
      "data_type_masks": [
          {
              "data_type": "text",
              "masking_function":"maskcharacters"
          }
      ]
  }
}

However, when I try to connect to the proxy, which has port 20000 exposed to the docker host, I get the following:

$ psql "host=127.0.0.1 port=20000 dbname=test_db_proxy sslmode=disable" postgres
psql: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

When I terminate the proxy, I get the expected error of Connection refused because the PG server cannot be found, like you would if you tried to connect on a random port where nothing is listening. This would indicate that my DB clients (tried psql, Postico, and DBeaver) are able to see the proxy, but they cannot properly connect to it.

Any ideas as to what could be causing this would be appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.