Giter Site home page Giter Site logo

datamasker's Introduction

.NET Core

DataMasker

A free data masking and/or anonymizer library for Sql Server written in .NET

If you've ever needed to pull down databases from a live environment to stage or even dev you'll need to think about masking any personal information. There are options out there paid and free, however the free ones I've found do not provide genuine data and the paid options are too pricey when it's only a few tables.

Data generation is provided by https://github.com/bchavez/Bogus

Bogus

Hello. I'm your host Brian Chavez (twitter). Bogus is a simple and sane fake data generator for .NET languages like C#, F# and VB.NET. Bogus is fundamentally a C# port of faker.js and inspired by FluentValidation's syntax sugar.

Bogus will help you load databases, UI and apps with fake data for your testing needs. If you like Bogus star โญ๏ธ the repository and show your friends! ๐Ÿ˜„

Nuget

DataMasker is available as a library over on nuget.org should you need to customize any part of the process. Alternatively just ask for a new feature and I'll see what I can do.

install-package DataMasker

Configuration

All configuration is done via a .json file, of which an example can be seen below.

Example Config

{
  "dataSource": {
    "type": "SqlServer",
    "config": {
       "name": "databasename",
       "userName": "databaseUserName",
       "password": "databasePassword",
       "server": "databaseServer"
    }   
  },  
  "tables": [    
    {
      "name": "Users",
      "schema": "dbo",
      "primaryKeyColumn": "UserId",
      "columns": [
        {
          "name": "FirstName",
          "type": "FirstName",
          "useGenderColumn": "Gender"
        },
        {
          "name": "LastName",
          "type": "LastName",
          "retainEmptyStringValues": true,
          "useGlobalValueMappings": true,
          "valueMappings": {
            "oldValue": "Jones",
            "newValue": "Smith"
          }
        },
        {
          "name": "Password",
          "type": "None",
          "retainNullValues": false,
          "useValue": "PasswordForDevEnvironment"
        },
        {
          "name": "DOB",
          "type": "DateOfBirth",
          "min" :"1901-01-01",
          "max": "2000-0101"
        },
        {
          "name": "EmailAddress",
          "type": "Bogus",
          "unique": true,
          "stringFormatPattern": "{{internet.email}}"
        },
        {
          "name": "Gender",
          "ignore": true,
          "type": "None"
        },
        {
          "name": "Address",
          "type": "Bogus",
          "retainNullValues": false,
          "stringFormatPattern": "{{address.fullAddress}}"
        },
        {
          "name": "ContactNumber",
          "type": "PhoneNumber",
          "retainNullValues": false,
          "stringFormatPattern": "+447#########"
        },
        {
          "name": "CompanyName",
          "type": "Sql",
          "sqlValue": {
            "query":"SELECT Name FROM SampleCompanyNames WHERE UserId = @UserId",
            "valueHandling": "Null"
          }
        }
      ]
    }
  ]
}

Table configuration is supplied by either the "tables" property or "tablesConfigPath" property.

If you want to keep your table configuration external to the main .json file this can be done by supplying "tablesConfigPath" with a file path. An example of this can be found in the example-configs directory in the example project.

External table config json

{
    "dataSource": {
    "type": "SqlServer",
    "config": {
       "name": "databasename",
       "userName": "databaseUserName",
       "password": "databasePassword",
       "server": "databaseServer"
    }   
  }, 
  "tablesConfigPath": "example-configs\\config-example2-table-config.json",
}

or embedded

{
    "dataSource": {
    "type": "SqlServer",
    "config": {
       "name": "databasename",
       "userName": "databaseUserName",
       "password": "databasePassword",
       "server": "databaseServer"
    }   
  }, 
  "tables": [... insert config here ...],
}

If both tables and tablesConfigPath is supplied then tablesConfigPath wins.

Column Configuration

Property Name Values
type None, Bogus, FirstName, lastName, DateOfBirth, Rant, StringFormat, FullAddress, PhoneNumber
name Database column name
schema Name of the schema in which the tables lives, defaults to dbo
valueMappings Object with value mappings, e.g map "James" to "John"
useGenderColumn Name of the database column to use as for the gender
ignore true/false
min Minimum value to use for the given data type
max Maxiumum value to use for the given data type
stringFormatPattern From Bogus, numbers #, letters ?, or * random number or letter
useValue A hardcoded value to use for every row
retainNullValues true/false
retainEmptyStringValues true/false - when true if the existing value is null or empty (whitespace) then it will use the original value
useLocalValueMappings true/false
useGlobalValueMappings true/false
unique true/false - when true it will attempt to generate a unique value for this column
sqlValue Used when type is set to sql

Sql Value Configuration

When using data type Sql this allows you to get values from other tables within the same database. The configuration object is made up of the following properties

Property Name Values
query The query to use for the lookup, the current row will be passed into the query as parameters for use, see the example config above that uses @UserId. Columns are passed back into the query as parameters, any columns with spaces in their name, will be replaced with '_' . e.g. A column name "User Id" would become "User_Id"
valueHandling "Null" or "KeepValue". If the query executes and no data is returned, this tells the masker what to do, null will set the value to Null while KeepValue will keep the existing value on that row

Data types

None

To use None you must specify either valueMappings or useValue, no data will be generated for this type. If you specify only valueMappings and the target value is not found, an error will be thrown.

{
  "name": "Title",
  "type": "None",
  "valueMappings": {    
    "Mr": "Master"
  },
  "ignore":"true/false",
  "useValue": "Miss",
  "retainNullValues": "true/false",
  "retainEmptyStringValues": "true/false",
  "useLocalValueMappings": "true/false",
  "useGlobalValueMappings": "true/false"
}
Bogus

Bogus is a type that when specified requires the stringFormatPattern option, which is passed directly to the Bogus API, see here for available options

{
  "name": "PhoneNumber",
  "type": "Bogus",
  "valueMappings": {    
    "+555-555-555": "+444-555-555-55"
  },
  "ignore":"true/false",
  "useValue": "+50559-5-5-555",
  "retainNullValues": "true/false",
  "retainEmptyStringValues": "true/false",
  "useLocalValueMappings": "true/false",
  "useGlobalValueMappings": "true/false",
  "stringFormatPattern": "{{phonenumbers.phonenumber}}"
}
FirstName
{
  "name": "FirstName",
  "type": "FirstName",
  "valueMappings": {    
    "James": "Bob"
  },
  "ignore":"true/false",
  "useValue": "Steve",
  "retainNullValues": "true/false",
  "retainEmptyStringValues": "true/false",
  "useLocalValueMappings": "true/false",
  "useGlobalValueMappings": "true/false",
  "useGenderColumn": "Gender"
}
LastName
{
  "name": "Surname",
  "type": "LastName",
  "valueMappings": {    
    "Smith": "Jojnes"
  },
  "ignore":"true/false",
  "useValue": "Timms",
  "retainNullValues": "true/false",
  "retainEmptyStringValues": "true/false",
  "useLocalValueMappings": "true/false",
  "useGlobalValueMappings": "true/false"
}
DateOfBirth
  • min will default to 1901-01-01
  • max will default to current date
  • date format is fullyear-month-day
{
  "name": "DOB",
  "type": "DateOfBirth",
  "valueMappings": {    
    "1990-01-02": "1990-02-02"
  },
  "ignore":"true/false",
  "useValue": "1940-02-02",
  "retainNullValues": "true/false",
  "retainEmptyStringValues": "true/false",
  "useLocalValueMappings": "true/false",
  "useGlobalValueMappings": "true/false",
  "min": "1901-12-25",
  "max": "2000-11-20"
}
Rant
  • max will default to 25
{
  "name": "Comments",
  "type": "Rant",
  "valueMappings": {    
    "A comment": "Becomes this"
  },
  "ignore":"true/false",
  "useValue": "A really important comment",
  "retainNullValues": "true/false",
  "retainEmptyStringValues": "true/false",
  "useLocalValueMappings": "true/false",
  "useGlobalValueMappings": "true/false",
  "max": 15
}
Lorem
  • min will default to 5
  • max will default to 30
{
  "name": "Comments",
  "type": "Lorem",
  "valueMappings": {    
    "A comment": "Becomes this"
  },
  "ignore":"true/false",
  "useValue": "A really important comment",
  "retainNullValues": "true/false",
  "retainEmptyStringValues": "true/false",
  "useLocalValueMappings": "true/false",
  "useGlobalValueMappings": "true/false",
  "min": 5,
  "max": 15
}
StringFormat

Check out the Bogus API for supported values

{
  "name": "Comments",
  "type": "StringFormat",
  "valueMappings": {    
    "A comment": "Becomes this"
  },
  "ignore":"true/false",
  "useValue": "A really important comment",
  "retainNullValues": "true/false",
  "retainEmptyStringValues": "true/false",
  "useLocalValueMappings": "true/false",
  "useGlobalValueMappings": "true/false",
  "stringFormatPattern": "#####****?????"
}
FullAddress
{
  "name": "Address",
  "type": "FullAddress",
  "valueMappings": {    
    "55 Long Name Street, Long Name Village, Long Name Town...": "Becomes this"
  },
  "ignore":"true/false",
  "useValue": "55 Long Name Street, Long Name Village, Long Name Town...",
  "retainNullValues": "true/false",
  "retainEmptyStringValues": "true/false",
  "useLocalValueMappings": "true/false",
  "useGlobalValueMappings": "true/false",
}
PhoneNumber
{
  "name": "PhoneNumber",
  "type": "PhoneNumber",
  "valueMappings": {    
    "+555-555-555": "+444-555-555-55"
  },
  "ignore":"true/false",
  "useValue": "+50559-5-5-555",
  "retainNullValues": "true/false",
  "retainEmptyStringValues": "true/false",
  "useLocalValueMappings": "true/false",
  "useGlobalValueMappings": "true/false",
  "stringFormatPattern": "+1 ########-#-###-#"
}
Sql

The current row is passed in as parameters and can be accessed using @ColumnName

{
  "name":"CompanyName",
  "type": "Sql",
  "sqlValue": {
    "query":"SELECT Name FROM SampleCompanyNames WHERE UserId = @UserId",
    "valueHandling": "Null/KeepValue"
  }
}

name & type are required everything else is optional unless specified

Most data can be generated perfectly fine just by using the Bogus or StringFormat data types.

Min & max

Only some data types currently use the min/max properties on the column configurations.

  • Lorem, Rant & DOB

Example Usage

//load our configuration
Config config = Config.Load($"example-configs\\config-example1.json")
// you also can pass the JSON content directly:
// Config config = Config.LoadFromString(....);

//create a data masker
IDataMasker dataMasker = new DataMasker(new DataGenerator(config.DataGeneration));

//grab our dataSource from the config, note: you could just ignore the config.DataSource.Type
//and initialize your own instance
IDataSource dataSource = DataSourceProvider.Provide(config.DataSource.Type, config.DataSource);

//Enumerable all our tables and begin masking the data
foreach (TableConfig tableConfig in config.Tables)
{
    //load the data, this needs optimizing for large tables
    IEnumerable<IDictionary<string, object>> rows = dataSource.GetData(tableConfig);
    foreach (IDictionary<string, object> row in rows)
    {
        //mask each row
        dataMasker.Mask(row, tableConfig);

        //update per row, or see below,
        //dataSource.UpdateRow(row, tableConfig);
    }

    //update all rows, in batches of 100
    dataSource.UpdateRows(rows, tableConfig, 100);
}

All of the objects/datatypes from Bogus are supported, you can use the type "Bogus" in combination with "stringFormatPattern" to acheive any valueMappings

{
  "name": "Address",
  "type": "Bogus",
  "stringFormatPattern": "{{address.fullAddress}}"
}

You can combine multiple objects to generate complex data

{
  "name": "Name",
  "type": "Bogus",
  "stringFormatPattern": "{{name.prefix}} {{name.firstName}} {{name.lastName}}"
}

Data Sources

There are only two DataSources available at the moment

  • InMemoryFake - is there only for the examples
  • SqlServer - can pull and push data to SQL Server

There is some additional configuration required when using SqlServer, on the dataSource object a dynamic config property is available, you'll need to supply the name, server, userName & password for the connection or a connection string.

N.B. if the "connectionString" value is set name, server, userName & password will be ignored

{
  "dataSource":{
    "config":{
      "name": "xxx",
      "server": "xxx",
      "userName": "xxx",
      "password": "xxx",
      "connectionString": "Data Source=(localdb)\\mssqllocaldb;Initial Catalog=Clients;Integrated Security=SSPI;"
    }
  }
}

SqlServer Data Source

Dry run is supported. A transaction is created, the update statement is executed and then the transaction is rolled back.

Gender

To ensure the new data is more accurate you may want to take gender into consideration when generating certain data types such as names. This can be achieved with a small amount of additional configuration. If no gender is specified then non gender specific names are generated.

You must define the gender column and then tell your target column to use this when generating data.

Here we are saying, use the column "Gender" when generating data for "FirstName". We then include the "Gender" column but tell it to be ignored by the IDataMasker, it is purley there as a dependency to "FirstName".

"columns": [
  {
    "name": "FirstName",
    "type": "FirstName",
    "useGenderColumn": "Gender"
  },
  {
    "name": "Gender",
    "ignore": true,
    "type": "None"
  },  
]

Locale

By default the locale is "en", the locale.

The locale is used by Bogus to generate data, the locale can be changedby setting a property on the dataGeneration object

{
  "dataGeneration": {
      "locale": "en"
    }  
}

Check out the Bogus page for a list of supported locales

DataMasker.Runner (CLI)

The latest version can be found at https://github.com/Steveiwonder/DataMasker/releases

This is a CLI interface for the data masking tool. You might want to use this as part of your continuous integration if you backup/restore your live environments back to stage/dev after a release

The options are as follows

-c, --config-file Required. the json configuration to be

-d, --dry-run (Default: false) dry run, only supported by some data sources

-l, --locale set the locale

-u, --update-batchsize batch size to use when upating records

--print-options (Default: false) prints the arguments passed into this tool in a json format with executing anything else

--no-output (Default: false) if set, no output to the console will be written

--help Display this help screen.

--version Display version information.

datamasker's People

Contributors

dependabot[bot] avatar gabbersepp avatar kungkung134 avatar mataqvazadeh avatar mgblackwater avatar shawnstrickland avatar simonmckenzie avatar steveiwonder avatar verbeurew avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

datamasker's Issues

Release assets missing CLI executable

When downloading the latest release asset "DataMasker.Runner.zip", I found the DataMasker.Runner release is missing the DataMasker.Runner.exe file. I ended up needing to download the source and building to get all of the needed files.

[Question] How to substitute 300,000 names

Hi

How can I substitute 300,000 names? In the examples in the readme it has an entry for a name substitution, it's not really feasible to put 300,000 names in there, is there another way?

Thanks

Incorrect syntax near 'nvarchar' when using column type Sql

Hi,

I have a table where I want to update the value in the column Search Name to the uppercase value (already anonymized) in the column Name.
So in the case below, I should get HEBER STREICH as result in the column Search Name

image

In my config file, my tables are defined as followed:

"tables": [    
    {
      "name": "CRONUS International Ltd_$Contact$437dbf0e-84ff-417a-965d-ed2bb9650972",
      "schema": "dbo",
      "primaryKeyColumn": "No_",
      "columns": [
        {
          "name": "Name",
          "type": "Bogus",
          "retainNullValues": false,
          "stringFormatPattern": "{{Name.fullName}}"
        },
        {
          "name": "Address",
          "type": "Bogus",
          "retainNullValues": false,
          "stringFormatPattern": "{{address.StreetAddress}}"
        }
      ]
    },
    {
      "name": "CRONUS International Ltd_$Contact$437dbf0e-84ff-417a-965d-ed2bb9650972",
      "schema": "dbo",
      "primaryKeyColumn": "No_",
      "columns": [
        {
          "name": "Search Name",
          "type": "Sql",
          "retainNullValues": false,
          "sqlValue": {
            "query":"SELECT UPPER(Name) FROM [CRONUS International Ltd_$Contact$437dbf0e-84ff-417a-965d-ed2bb9650972] WHERE No_ = @No_",
            "valueHandling": "KeepValue"
          }
        }
      ]
    }
  ]

When I try to run, I receive following error: System.Data.SqlClient.SqlException: 'Incorrect syntax near 'nvarchar'.'

image

Sequence contains no matching element

Hi,

When running DataMasker in a Form Application, I receive following message:

Exception thrown: 'System.InvalidOperationException' in System.Linq.dll
An unhandled exception of type 'System.InvalidOperationException' occurred in System.Linq.dll
Sequence contains no matching element

This is my configuration file:

{
  "dataSource": {
    "type": "SqlServer",
    "config": {
      "connectionString": "Data Source=datamask\\SQLEXPRESS;Initial Catalog=CRONUS;Persist Security Info=True;User ID=aaa;Password=bbb"
    }
  },
  "dataGeneration": {
    "locale": "en"
  },
  "tables": [    
    {
      "name": "CRONUS International Ltd_$Contact$437dbf0e-84ff-417a-965d-ed2bb9650972",
      "schema": "dbo",
      "primaryKeyColumn": "No_",
      "columns": [
        {
          "name": "Name",
          "type": "Bogus",
          "retainNullValues": true,
          "stringFormatPattern": "{{Name.fullName}}"
        }
      ]
    }
  ]
}

I copied the code from the example folder and the debugger stops on:

                //update all rows
                var masked = rows.Select(row =>
                {
                    //mask the data
                    return dataMasker.Mask(row, tableConfig);
                });

image

Net Standard upgrade

Hello Steve,

I had a chance to use DataMasker with one of my projects recently. I must say it made my job a lot easier and efficient. I would like to thank you first.

I am using .Net Core 3.1 now. Do you have any plans to upgrade the project to net standard?

Thanks,
Ramesh

Create new table instead of replacing

Hi again
I was thinking it would be a cool feature to allow the user to create a new table (on the same server or in a different one) instead of modifying directly the source table.
Would this be hard?

Best regards

Use Select in JSON (Question)

Hi,

is it possible to add SQL statements in the config.json file in order to get an already anonimized value from another table?
for example:

{
  "dataSource": {
    "type": "SqlServer",
    "config": {
       "name": "databasename",
       "userName": "databaseUserName",
       "password": "databasePassword",
       "server": "databaseServer"
    }   
  },  
  "tables": [    
    {
      "name": "Users",
      "schema": "dbo",
      "primaryKeyColumn": "UserId",
      "columns": [
        {
          "name": "FirstName",
          "type": "FirstName"
        }
      ]
    },
    {
      "name": "Tools",
      "schema": "dbo",
      "primaryKeyColumn": "ToolId",
      "columns": [
        {
          "name": "FirstName",
          "type": "calculated",
          "value": "SELECT FirstName FROM Users Where UserId = UserId"
        }
      ]
    }
  ]
}

Breaking application

When i run the app in android it break my phone! Now i need to reinstall windows! Please fix this, thanks.

Other DBs?

Hi there
Amazing tool
I wanted to ask if it would be possible to add Oracle DB support and other DBs.
Thank you in advance.

Example using SQL Server

Hi,
Using your example, I get 'Config' does not contain a definition for 'LoadConfig' - Config.Config seems to work though - which one is correct?

Is there like full doumentation for this?

I can't seem to ligin to a SQL Server DB with Windows Authentication - e.g. i leave the username / password blank it doesn't log in - if I leave them out of the config, same problem.

Also, what I in reality want to do is this ...
Login to Source DB on server1
login to target DB on server 2

loop through tables/data from Source, and UpdateRow(?) on Target DB

Is this possible?

Regards,
Steve Bowker

Enforce Uniqueness on columns?

Is there any way to enforce uniqueness on a column? Some columns might for instance have a unique constraint. I assume you can get around this by assuming that most values will be unique and to implement a retry mechanism when exception is thrown, but it would be nice if the library could handle this...

Possible enhancements

Hi @Steveiwonder,

Sorry to contact you this way, but I could not find your contact details in your profile, so ....

This looks like a useful really tool which does many of the things I am trying to achieve, although not quite yet all of them.

I have the use-case of an upgrade to a vendor supplied system:

My workflow would be:

  1. Make a 'masked' copy of a live database on 'day 1' and keep a record of what the orginal and new values were - ideally in a different server/database to the 'subject' database.
  2. On 'day 2', take another fresh copy of the live database and apply the same exact changes to the records which were already there (using the previously stored values), and also 'mask' any new records which had appeared since the previous day (again adding to the record of what the orginal and new values had been).
  3. Repeat from 2 until the upgrade is complete.

It would also be useful to produce audit reports showing what was changed, and when, so it would be useful to have date/time/user for each time a 'before/after' record was added.

Is this this kind of direction you have in mind ?

Cheers,

SilverZippo

Invalid object name

In config file,
"tables": [
{
"name": "User", // the default schema is dbo, and this table is under dbo schema
.............................
}
]
ABOVE CONFIG WORKS LIKE A CHARM.

When selecting table from different schema it will show invalid object name error.
Like if I had:
"tables": [
{
"name": "schemaName.tableName",
.............................
}
]

It will show Invalid Object name. Can you help please? How can I choose two different table from two different schema? Thanks

DataMasker runs out of memory when processing large datasets

Hi,

Firstly, thanks for the great app - I've been finding it very useful.

I see that the app always runs as 32-bit - would it be possible for this to run at native bit-ness on 64-bit machines (i.e. by turning off "Prefer 32-bit")?

I have built the masker on my own machine with this flag turned off, and it worked fine, so it seems that it'd be a good improvement for the pre-built artifacts.

Thanks,

Simon

Does support database integrity

Does DataMasker support database integrity like preserving the relations between tables based on the foreign key relations between tables?

json schema isn't in Repo

Thanks for the effort on this! It's just what I was looking for.

The json schema isn't present in the repo.

Thanks!
Nick

System.IO.FileNotFoundException: Could not load file or assembly 'System.Runtime

Hi,

if I want to use the cli of the DataMasker.Runner.dll, I receive dependency errors.
For example:

DataMasker.Runner.dll --version

Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly 'System.Runtime, Version=4.2.2.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies. The system cannot find the file specified.

Any idea how to solve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.