Giter Site home page Giter Site logo

gdpr-dump's People

Contributors

balintbrews avatar beuss avatar bomoko avatar geek-merlin avatar jancis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

gdpr-dump's Issues

Add non-mysql settings paths and files

Something that will definitely be useful for our (read Amazee(Labs/io)'s) use case is adding extra mysqloption file's.

The problem with the existing my.cnf files (and paths) is that it's impossible to tell if there is a gdpr specific file. Here I'm thinking of cases where the setup of the gdpr-dump comes apart from the setup of mysql and mysqldump itself.
For instance, if we have a module that exports the gdpr-expressions data, it may be useful to mark it as something other than .my.cnf (and related files).

My suggestion, to begin, is to add the environment variable GDPR_DUMP_HOME where we would check for gdpr.cnf and .gdpr.cnf files, as well as testing for the existence of a .gdpr.cnf file in the MYSQL_HOME directory.

Testing framework/approach for gdpr-dump

One of the major issues I'm having evaluating PRs is the lack of a test framework. Since we're trying to move a POC that was built pretty quickly, we're severely lacking testing.

I think that this should be addressed as a matter of immediate and serious concern.

I'd love to bounce some ideas of anyone other there about this.

Essentially we not only need to think about unit testing the various bits, but also some way of easily testing that the DB dumps themselves are applying (and keep applying) the appropriate transformations after we've incorporated new code.

Please, if anyone has any suggestions here, I'd really love to hear them.

From my side, I'm going to be working on some minimal set of containerized tests to try get us going - but I'm open to any suggestions (and PRs)

Which license?

Hi @axel-rutz (cc @fjgarlin)

Is there any particular license you wanted to use for this project?
Are you happy I add, say, MIT?
Any other preference?

Simplify gdpr-replacement to make it Faker agnostic

I suggest accepting a series of formatters that might or might not be implemented via faker. Faker is still the way to go but might not be always suitable. We'd just need to create a map of key and how to map that key.

This will also simplify the gdpr replacements expression. See below.

Suggested gdpr-replacements:
{"tableName":{"columnName1":{"formatter":"username"},"columnName2":{"formatter":"password"},"columnName3":{"formatter":"email"}}}

Add effective transformation output

It would be useful to output the effective output what sanitization details are being read from the configuration files - that is, to display the transformation mappings that will be used if the dump was run.

Since we read configuration information from several places, this would be useful in terms of displaying exactly what we can expect on run.

mysqldump password behaviour?

When using original mysqldump, I can write

mysqldump -u username -p db_name

And mysqldump will request for a passwort
Using gdpr-dump's mysqldump replacement, it doesn't understand that "db_name" might not be a password, instead it complains " Not enough arguments (missing: "db-name")." If I use the format

mysqldump db_name -u username -p
It tries to access the db without a password, doesn't request for a password.

I just don't like typing my password into the commandline, that's why I always use mysqldump as mentioned above.

Could you have it going like the original mysqldump: if -p is given but db-name appears missing, consider the last parameter the db-name and request for a password?

Integrate Faker - discussion

Hi @axel-rutz

Thanks a bunch for pushing this. I think it's a super exciting approach that's able to be used in several places.

Second, I've been thinking about the approach and one of the things I was thinking is that perhaps we might want to integrate Faker into the process, rather than the expressions based approach (or, better, the two might live side by side).

I've been doing some work on my side, and while it's a total WIP at the moment (in particular, I'm really uncomfortable with the way I've abstracted the expression stuff, and will rewrite ASAP), I thought that since I've got something this would be a good opportunity as any to get talking about it.

So what I've done for now is changed up the --gdpr-expressions switch a little to accept something like this

--gdpr-expressions=\'{"fakertest":{"name":{"transformer":"faker","formatter":"name"}, "telephone":{"transformer":"faker","formatter":"phoneNumber"}}}\'

Which marks particular columns as engaging Faker and which formatter will be used for output. Obviously this could be expanded to include Faker arguments etc. if we wanted them.
If an object isn't passed, it interprets it as a DB expression and uses your current approach.

I'd love to get a discussion started about this.

Generate phar

It seems as though this would be useful to distribute as a single .phar. Thoughts?

Broken table data export

Exporting data even without any parameters breaks sql dump. It shows the table structure, but when the data should be returned, process stops with no errors.

  KEY `user_field__created` (`created`),
  KEY `user_field__access` (`access`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='The data table for user entities.';
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `users_field_data`
--

LOCK TABLES `users_field_data` WRITE;
/*!40000 ALTER TABLE `users_field_data` DISABLE KEYS */;
SET autocommit=0;
 [error]  Database dump failed 
 [error]  Unable to dump database. Rerun with --debug to see any error message. 

And adding a --debug does not reveal anything.

What I found out, it breaks here

$columnTypes = $this->tableColumnTypes()[$tableName];

And it's because ifsnop/mysqldump-php#141 was never accepted and so the new solution using hooks was not implemented. Also, I have a situation where I can't edit project's json file to add patches and just using composer require does not apply patches (137 to be specific).

My proposal is adding a check on $this->gdprExpressions[$tableName] before invoking $this->tableColumnTypes()[$tableName]. See PR here: #27

Does not work with current ifsnop/mysqldump-php

The composer.json specifies dev-master for mysqldump, however, this program is not compatible above version 2.7 currently. Until support for the newer mysqldump is added the composer.json should specify:

"ifsnop/mysqldump-php": "2.7"

Support for truncating table

Is there any support for truncating a table?
For example, I don't want to sanitize the data in my webform submissions table, I want to truncate it.
So basically I just want the structure of this table to be exported.

Accept gdpr-skip-tables argument

When dumping a database we might not want to copy across or sanitize all tables (ie: cache tables). I suggest accepting this parametre (and/or in the configuration files) to be able to just create those tables but don't transfer any data.

Suggested:
"gdpr-skip-tables":["tableName1","tableName2","tableName3"]

Drush & Faker

When using drush and a Faker formatter, I get following error:

SQLSTATE[42S22]: Column not found: 1054 Unknown column 'Array' in 'field list'

This is my drush command:

drush sql-dump --tables-list=users_field_data --extra-dump=$'--gdpr-expressions='{"users_field_data":{"name":"uid","mail":"uid","init":"uid","pass":{"formatter":"clear"}}}''

Am I doing something wrong or is this a bug?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.