Giter Site home page Giter Site logo

mysql-to-google-bigquery's Introduction

MySQL to Google BigQuery Logo

MySQL to Google BigQuery Sync Tool

Table of Contents

How it works

Steps when no order column has been supplied:

  • Count MySQL table rows
  • Count BigQuery table rows
  • MySQL rows > BigQuery rows?
  • Get the rows diff, split in batches of XXXXX rows/batch

Steps when order column has been supplied:

  • Get max value for order column from MySQL table
  • Get max value for order column from BigQuery table
  • Max value MySQL > Max value BigQuery?
  • Delete all rows with order column value = max value BigQuery to make sure no duplicate records are being created in BigQuery
  • Get max value for order column from BigQuery table
  • Get the rows diff based on new max value BigQuery, split in batches of XXXXX rows/batch

Final three steps:

  • Dump MySQL rows to a JSON
  • Send JSON to BigQuery
  • Repeat until all batches are sent

Tip: Create a cron job for keep syncing the tables using an interval like 15 minutes (respect the Load Jobs quota policy)

Requirements

The following PHP versions are supported:

  • PHP 7
  • HHVM
  • PDO Extension with MySQL driver

Usage

Download the library using composer:

$ composer require memeddev/mysql-to-google-bigquery

Now, define some environment variables or create a .env file on the root of the project, replacing the values:

BQ_PROJECT_ID=bigquery-project-id
BQ_KEY_FILE=google-service-account-json-key-file.json
BQ_DATASET=bigquery-dataset-name

DB_DATABASE_NAME=mysql-database-name
DB_USERNAME=mysql_username
DB_PASSWORD=mysql_password
DB_HOST=mysql-host
DB_PORT=3306

IGNORE_COLUMNS=password,hidden_column,another_column

PS: To create the Google Service Account JSON Key File, access https://console.cloud.google.com/apis/credentials/serviceaccountkey

Run:

vendor/bin/console sync table-name

If you want to auto create the table on BigQuery:

vendor/bin/console sync table-name --create-table

If you want to delete (and create) the table on BigQuery for a full dump:

vendor/bin/console sync table-name --delete-table

Credits

❤️ Memed SA (memed.com.br)

License

MIT license, see LICENSE

mysql-to-google-bigquery's People

Contributors

gabrielrcouto avatar vidaxl avatar bviolier avatar silverwolfx10 avatar mmann099 avatar pionell avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.