Giter Site home page Giter Site logo

sqldump2json's Introduction

sqldump2json

PyPI - Python Version PyPI - Version Total Downloads

Converts SQL dump to a JSON stream.

A tool for administrators, data scientists and hackers. With this tool you no longer need to import dumps into Databases. You can extract INSERT data as JSON and analyze them with jq or insert into MongoDB/Elastic/etc. The dump is not read entirely into RAM, so this utility can be used to process files of any size. And it can even parse corrupted dumps. No dependencies!

Supported DBMS: MySQL, SQL Server, PotsgreSQL and some other (not all formats).

RESTRICTIONS:

  • Syntax is checked only for INSERT INTO and CREATE TABLE.
  • The common SQL syntax is used which does not fully correspond to either MySQL or Postgres.
  • Function calls and subquieries in INSERT satetements are not supported.

Installation for normal Arch-based Linux ditros:

# install pipx
yay -S python-pipx

# install package from pypi
pipx install sqldump2json

# install lastet version from github
pipx install git+https://github.com/s3rgeym/sqldump2json.git

For other shit like Ubuntu you need to do more steps:

  • Install pyenv or asdf-vm.
  • Install latest python version and make it global via pyenv or asdf-vm.
  • Install sqldump2json.
  • Or use Docker.

CLI

Usage:

sqldump2json [ -h ] [ -i INPUT ] [ -o OUTPUT ] [ ... ]

Output format is JSONL:

echo "INSERT INTO db.data VALUES (1, 'foo'), (2, 'bar'), (3, 'baz');" | sqldump2json
{"table": "data", "schema": "db", "values": [1, "foo"]}
{"table": "data", "schema": "db", "values": [2, "bar"]}
{"table": "data", "schema": "db", "values": [3, "baz"]}

Values are converted to dict only if the INSERT INTO contains a list of fields or the fields are declared in CREATE TABLE:

$ sqldump2json <<< "INSERT INTO data VALUES (NULL, 3.14159265, FALSE, 'Привет', 0xDEADBEEF);" | jq
{
  "table": "data",
  "values": [
    null,
    3.14159265,
    false,
    "Привет",
    "3q2+7w=="
  ]
}

$ sqldump2json <<< 'INSERT INTO `page` (title, contents) VALUES ("Title", "Text goes here");' | jq
{
  "table": "page",
  "values": {
    "title": "Title",
    "contents": "Text goes here"
  }
}

Using together with grep:

grep 'INSERT INTO `users`' /path/to/dump.sql | sqldump2json | jq -r '.values | [.username, .email, .password] | @tsv' > output.csv

Scripting

If you were looking for a way how to import data from SQL to NoSQL databases and etc:

#!/usr/bin/env python
from sqldump2json import DumpParser
...
if __name__ == '__main__':
    parse = DumpParser()
    for val in parse("/path/to/dump.sql"):
        do_something(val)

Development

Run tests:

poetry run python -m unittest

TODO LIST

  • Add support mysql strings with charset (eg, _binary '\x00...'). + X'...'
  • Строки должны конкатенироваться, если идут подряд.
  • Ускорить парсинг.

Notes

После создания этого пакета я случайно узнал про существование sqldump-to. Тот проект заброшен, и та утилита НЕ МОЖЕТ ПАРСИТЬ ДАМПЫ ПО 100500 ГИГАБАЙТ.

Я пробовал ускорить парсинг с помощью orjson (реализован на говнорасте и отвечает за парсинг JSON), но вопреки заявленному ускорению в 10 раз, получил замедление при парсинге 23-гигового дампа, содержащего 160 миллинов вставок, с 5 часов до 7.

sqldump2json's People

Contributors

s3rgeym avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.