Giter Site home page Giter Site logo

my_obfuscate's Introduction

MyObfuscate

<img src=“https://travis-ci.org/mavenlink/my_obfuscate.png”>

You want to develop against real production data, but you don’t want to violate your users’ privacy. Enter MyObfuscate: standalone Ruby code for the selective rewriting of SQL dumps in order to protect user privacy. It supports MySQL, Postgres, and SQL Server.

Install

(sudo) gem install my_obfuscate

Example Usage

Make an obfuscator.rb script:

#!/usr/bin/env ruby
require "rubygems"
require "my_obfuscate"

obfuscator = MyObfuscate.new({
  :people => {
    :email                     => { :type => :email, :skip_regexes => [/^[\w\.\_]+@my_company\.com$/i] },
    :ethnicity                 => :keep,
    :crypted_password          => { :type => :fixed, :string => "SOME_FIXED_PASSWORD_FOR_EASE_OF_DEBUGGING" },
    :salt                      => { :type => :fixed, :string => "SOME_THING" },
    :remember_token            => :null,
    :remember_token_expires_at => :null,
    :age                       => { :type => :null, :unless => lambda { |person| person[:email] == "[email protected]" } },
    :photo_file_name           => :null,
    :photo_content_type        => :null,
    :photo_file_size           => :null,
    :photo_updated_at          => :null,
    :postal_code               => { :type => :fixed, :string => "94109", :unless => lambda {|person| person[:postal_code] == "12345"} },
    :name                      => :name,
    :full_address              => :address,
    :bio                       => { :type => :lorem, :number => 4 },
    :relationship_status       => { :type => :fixed, :one_of => ["Single", "Divorced", "Married", "Engaged", "In a Relationship"] },
    :has_children              => { :type => :integer, :between => 0..1 },
  },

  :invites                     => :truncate,
  :invite_requests             => :truncate,
  :tags                        => :keep,

  :relationships => {
    :account_id                => :keep,
    :code                      => { :type => :string, :length => 8, :chars => MyObfuscate::USERNAME_CHARS }
  }
})
obfuscator.fail_on_unspecified_columns = true # if you want it to require every column in the table to be in the above definition
obfuscator.globally_kept_columns = %w[id created_at updated_at] # if you set fail_on_unspecified_columns, you may want this as well
# If you'd like to also validate against your schema.rb file to make sure all fields and tables are present, see https://gist.github.com/cantino/5376e73b0ad806dc4da4
obfuscator.obfuscate(STDIN, STDOUT)

And to get an obfuscated dump:

mysqldump -c --add-drop-table --hex-blob -u user -ppassword database | ruby obfuscator.rb > obfuscated_dump.sql

Note that the -c option on mysqldump is required to use my_obfuscator. Additionally, the default behavior of mysqldump is to output special characters. This may cause trouble, so you can request hex-encoded blob content with –hex-blob. If you get MySQL errors due to very long lines, try some combination of –max_allowed_packet=128M, –single-transaction, –skip-extended-insert, and –quick.

Database Server

By default the database type is assumed to be MySQL, but you can use the builtin SQL Server support by specifying:

obfuscator.database_type = :sql_server
obfuscator.database_type = :postgres

If using Postgres, use pg_dump to get a dump:

pg_dump database | ruby obfuscator.rb > obfuscated_dump.sql

Types

Available types include: email, string, lorem, name, first_name, last_name, address, street_address, secondary_address, city, state, zip_code, phone, company, ipv4, ipv6, url, integer, fixed, null, and keep.

Helping with creation of the “obfuscator.rb” script

If you don’t want to type all those table names and column names into your obfuscator.rb script, you can use my_obfuscate to do some of that work for you. It can consume your database dump file and create a “scaffold” for the script. To run my_obfuscate in this mode, start with an “empty” scaffolder.rb script as follows:

#!/usr/bin/env ruby
require "rubygems"
require "my_obfuscate"

obfuscator = MyObfuscate.new({})
obfuscator.scaffold(STDIN, STDOUT)

Then feed in your database dump:

mysqldump -c  --hex-blob -u user -ppassword database | ruby scaffolder.rb > obfuscator_scaffold.rb_snippet
pg_dump database | ruby scaffolder.rb > obfuscator_scaffold.rb_snippet

The output will be a series of configuration statements of the form:

  :table_name => {
    :column1_name     => :keep   # scaffold
    :column2_name     => :keep   # scaffold
	... etc.

Scaffolding also works if you have a partial configuration. If your configuration is missing some tables or some columns, a call to ‘scaffold’ will pass through the configuration that exists and augment it with scaffolding for the missing tables or columns.

Changes

  • Support for Postgres. Thanks @samuelreh!

  • Support for SQL Server

  • :unless and :if now support :nil as a shorthand for a Proc that checks for nil

  • :name, :lorem, and :address are all now supported types. You can pass :number to :lorem to specify how many sentences to generate. The default is one.

  • { :type => :whatever } is now optional when no additional options are needed. Just use :whatever.

  • Warnings are thrown when an unknown column type or table is encountered. Use :keep in both cases.

  • { :type => :fixed, :string => Proc { |row| ... } } is now available.

Note on Patches/Pull Requests

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add tests for it. This is important so I don’t break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history. (If you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)

  • Send me a pull request. Bonus points for topic branches.

Thanks

Thanks to Honk for the original gem, Iteration Labs for prior maintenance work, and Pivotal Labs for patches and updates!

LICENSE

This work is provided under the MIT License. See the included LICENSE file.

The included English word frequency list used for generating random text is provided under the Creative Commons – Attribution / ShareAlike 3.0 license by invokeit.wordpress.com/frequency-word-lists/

my_obfuscate's People

Contributors

benmelz avatar bobziuchkovski avatar cantino avatar indirect avatar perspectivezoom avatar samuelreh avatar werebus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

my_obfuscate's Issues

Exception when found insert lines from stored procedures

my_obfuscate-0.5.3/lib/my_obfuscate/copy_statement_parser.rb:20:in `block in parse': Cannot obfuscate Postgres dumps containing INSERT statements. Please use COPY statments. (RuntimeError)

copy_statement_parser.rb Throws an error when found inserts in PL/pgSQL stored procedures definition, the rest of the SQL have the COPY statments,

CREATE FUNCTION actualizar_existencia() RETURNS void
    LANGUAGE plpgsql
    AS $$
declare
...
        if not found then
            -- si no existe, creamos
            insert into lote_deposito_temp(cantidad, lote_id, deposito_id)
...

Will be good if my_obfuscate just ingore the inserts from store procedures,

How to handle uniqueness?

I want to generate ids of type integer from 1..100000, but for some reason I'm getting 2 or 3 duplicate values due to which dump of that one particular table fails. Any way to generate unique numbers throughout

Skip some rows?

Is there a way to selectively truncate certain rows?
e.g.

Obfuscating this:

INSERT INTO table (id, value) VALUES (1, "a")
INSERT INTO table (id, value) VALUES (1, "a")
INSERT INTO table (id, value) VALUES (1, "b")
INSERT INTO table (id, value) VALUES (1, "b")

Results in this:

INSERT INTO table (id, value) VALUES (1, "a")
INSERT INTO table (id, value) VALUES (1, "a")

Invalid byte sequence in UTF-8

Hello,

I'm reaching such error with utf8mb4_general_ci collation:

/var/lib/gems/2.3.0/gems/my_obfuscate-0.3.7/lib/my_obfuscate/mysql.rb:6:in `match': invalid byte sequence in UTF-8 (ArgumentError)
	from /var/lib/gems/2.3.0/gems/my_obfuscate-0.3.7/lib/my_obfuscate/mysql.rb:6:in `parse_insert_statement'
	from /var/lib/gems/2.3.0/gems/my_obfuscate-0.3.7/lib/my_obfuscate.rb:41:in `block in obfuscate'

Any idea how to fix it?

Scaffolding throwing error

Hey guys,

I'm trying to do a scaffold, but it is throwing error:


mysqldump -c  --hex-blob -uroot -p db | ruby scaffolder.rb > obfuscator_scaffold.rb_snippet

scaffolder.rb:6:in `<main>': undefined method `scaffold' for #<MyObfuscate:0x007fc91597ef38 @config={}> (NoMethodError)

When I try to see the methods in MyObfuscate, I'm not able to see scaffold - any idea?

irb(main):011:0> ob = MyObfuscate.new
=> #<MyObfuscate:0x007f88c222b490 @config={}>
irb(main):012:0> ob.methods.sort
=> [:!, :!=, :!~, :<=>, :==, :===, :=~, :__id__, :__send__, :check_for_defined_columns_not_in_table, :check_for_table_columns_not_in_definition, :class, :clone, :config, :config=, :database_helper, :database_type, :database_type=, :define_singleton_method, :display, :dup, :enum_for, :eql?, :equal?, :extend, :fail_on_unspecified_columns, :fail_on_unspecified_columns=, :fail_on_unspecified_columns?, :freeze, :frozen?, :globally_kept_columns, :globally_kept_columns=, :hash, :inspect, :instance_eval, :instance_exec, :instance_of?, :instance_variable_defined?, :instance_variable_get, :instance_variable_set, :instance_variables, :is_a?, :kind_of?, :method, :methods, :missing_column_list, :nil?, :obfuscate, :obfuscate_bulk_insert_line, :object_id, :private_methods, :protected_methods, :public_method, :public_methods, :public_send, :reassembling_each_insert, :remove_instance_variable, :respond_to?, :send, :singleton_class, :singleton_method, :singleton_methods, :taint, :tainted?, :tap, :to_enum, :to_s, :trust, :untaint, :untrust, :untrusted?]


"" in postgres table names

The script doesn't seem to work with postgres tablenames using "":

I get:

Deprecated: "user" was not specified in the config.  A future release will cause this to be an error.  Please specify the table definition or set it to :keep.

with the following specified in obfuscate.rb

  :"user" => {...}

uninitialized constant MyObfuscate::ConfigApplicator::Faker

Sorry for what may be an easy question...

I have made a couple of improvements and want to contribute them back. However before doing that naturally want to include some rspec tests.

For some reason I can't get the rspec suite to run successfully (even before my changes).
For 29 of the 90 tests it is throwing the error: uninitialized constant MyObfuscate::ConfigApplicator::Faker.

For example:

1) MyObfuscate::ConfigApplicator.apply_table_config should work on email addresses Failure/Error: new_row = MyObfuscate::ConfigApplicator.apply_table_config(["blah", "something_else"], {:a => {:type => :email}}, [:a, :b])
NameError:
uninitialized constant MyObfuscate::ConfigApplicator::Faker
# ./lib/my_obfuscate/config_applicator.rb:33:in block in apply_table_config' # ./lib/my_obfuscate/config_applicator.rb:8:in each'
# ./lib/my_obfuscate/config_applicator.rb:8:in apply_table_config' # ./spec/my_obfuscate/config_applicator_spec.rb:8:in block (4 levels) in <top (required)>'
# ./spec/my_obfuscate/config_applicator_spec.rb:7:in times' # ./spec/my_obfuscate/config_applicator_spec.rb:7:in block (3 levels) in <top (required)>'

Any clues?

License missing from gemspec

Some companies will only use gems with a certain license.
The canonical and easy way to check is via the gemspec
via e.g.

spec.license = 'MIT'
# or
spec.licenses = ['MIT', 'GPL-2']

There is even a License Finder to help companies ensure all gems they use
meet their licensing needs. This tool depends on license information being available in the gemspec.
Including a license in your gemspec is a good practice, in any case.

If you need help choosing a license, github has created a license picker tool

How did I find you?

I'm using a script to collect stats on gems, originally looking for download data, but decided to collect licenses too,
and make issues for missing ones as a public service :)
https://gist.github.com/bf4/5952053#file-license_issue-rb-L13 So far it's going pretty well.
I've written a blog post about it

Source code inconsistencies within gem repository

Amazon Linux 2015.03 gem for my_obfuscate 0.5.3 has inconsistent my_obfuscate.rb and my_obfuscate/config_applicator.rb. The former requires ffaker, the latter references Faker, resulting in uninitialized constant MyObfuscate::ConfigApplicator::Faker (NameError).

Specs are failing occasionally

"MyObfuscate.apply_table_config should work on email addresses" is failing occasionally. Here is the output:

.....................................F................................................

Failures:

  1) MyObfuscate MyObfuscate.apply_table_config should work on email addresses
     Failure/Error: new_row.first.should =~ /^[\w\.]+\@\w+\.\w+\.[a-f0-9]{5}\.example\.com$/
       expected: /^[\w\.]+\@\w+\.\w+\.[a-f0-9]{5}\.example\.com$/
            got: "[email protected]" (using =~)
       Diff:
       @@ -1,2 +1,2 @@
       -/^[\w\.]+\@\w+\.\w+\.[a-f0-9]{5}\.example\.com$/
       +"[email protected]"
     # ./spec/my_obfuscate_spec.rb:30:in `block (3 levels) in <top (required)>'

Finished in 0.0774 seconds
86 examples, 1 failure

Failed examples:

rspec ./spec/my_obfuscate_spec.rb:27 # MyObfuscate MyObfuscate.apply_table_config should work on email addresses

Postgres 10.3's pg_dump scopes tables with public schema

Perhaps due to CVE-2018-1058 pg_dump now prefaces tables with the schema, by default that will be public. The README documentation is now out of date with regards to postgres 10.3+, I was able update an obfuscator config like this to make it work:

obfuscator = MyObfuscate.new({
  :'public.users'     => :keep
})

Get obfuscated value

Hello,

I tried to explore the code, but have not found answer. Is it possible to get in proc already obfuscated data?

Like here:

  :users => {
    :username           => :email,,
    :username_canonical => { :type => :fixed, :string => proc { |row| row[:username] }},
    :email              => { :type => :fixed, :string => proc { |row| row[:username] }},
    :email_canonical    => { :type => :fixed, :string => proc { |row| row[:username] }},
    :first_name         => :first_name,
 },

I would like to set same obfuscated value for 1th - 4th fields. In above example 2th - 4th are not obfuscated.

Truncate to last x records

Is there a way to truncate the data to the last (or first) x records?

Production databases often contain more data than needed in development. Say I have an invoices table with a few 1000 records. I obviously need them in production but, to keep my development database small and fast, I probably only want to keep like 10 or so in development.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.