Giter Site home page Giter Site logo

ddl-generator's Introduction

DDL Generator

Infers SQL DDL (Data Definition Language) from table data.

Use at command line:

$ ddlgenerator -i postgresql '[{"Name": "Alfred", "species": "wart hog", "kg": 22}]'

DROP TABLE generated_table;
CREATE TABLE generated_table (
        name VARCHAR(6) NOT NULL,
        kg INTEGER NOT NULL,
        species VARCHAR(8) NOT NULL
)
;
INSERT INTO generated_table (kg, Name, species) VALUES (22, 'Alfred', 'wart hog');

Reads data from files:

$ ddlgenerator postgresql mydata.yaml > mytable.sql

Enables one-line creation of tables with their data

$ ddlgenerator --inserts postgresql mydata.json | psql

To use in Python:

>>> from ddlgenerator.ddlgenerator import Table
>>> table = Table([{"Name": "Alfred", "species": "wart hog", "kg": 22}])
>>> sql = table.sql('postgresql', inserts=True)

Supported data formats

  • Pure Python
  • YAML
  • JSON
  • CSV
  • Pickle
  • HTML

Features

  • Supports all SQL dialects supported by SQLAlchemy
  • Coerces data into most specific data type valid on all column's values
  • Takes table name from file name
  • Guesses format of input data if unspecified by file extension
  • with -i/--inserts flag, adds INSERT statements
  • with -u/--uniques flag, surmises UNIQUE constraints from data
  • Handles nested data, creating child tables as needed
  • Reads HTML tables, including those embedded in noisy websites

Options

-h, --help            show this help message and exit
-k KEY, --key KEY     Field to use as primary key
-r, --reorder         Reorder fields alphabetically, ``key`` first
-u, --uniques         Include UNIQUE constraints where data is unique
-t, --text            Use variable-length TEXT columns instead of VARCHAR
-d, --drops           Include DROP TABLE statements
-i, --inserts         Include INSERT statements
--no-creates          Do not include CREATE TABLE statements
--save-metadata-to FILENAME
                      Save table definition in FILENAME for later --use-
                      saved-metadata run
--use-metadata-from FILENAME
                      Use metadata saved in FROM for table definition, do
                      not re-analyze table structure
-l LOG, --log LOG     log level (CRITICAL, FATAL, ERROR, DEBUG, INFO, WARN)

Generate SQLAlchemy models

Use sqlalchemy as the model to generate Python for defining SQLAlchemy models:

$ ddlgenerator sqlalchemy '[{"Name": "Alfred", "species": "wart hog", "kg": 22}]'

Table0 = Table('Table0', metadata,
  Column('species', Unicode(length=8), nullable=False),
  Column('kg', Integer(), nullable=False),
  Column('name', Unicode(length=6), nullable=False),
  schema=None)

Generate Django models

If Django is installed on the path then using django as the model will run the generated ddl through Django's inspectdb management command to produce a model file:

$ ddlgenerator django '[{"Name": "Alfred", "species": "wart hog", "kg": 22}]'

# This is an auto-generated Django model module.
# You'll have to do the following manually to clean this up:
#   * Rearrange models' order
#   * Make sure each model has one field with primary_key=True
#   * Remove `managed = False` lines if you wish to allow Django to create and delete the table
# Feel free to rename the models, but don't rename db_table values or field names.
#
# Also note: You'll have to insert the output of 'django-admin.py sqlcustom [appname]'
# into your database.
from __future__ import unicode_literals

from django.db import models

class Table0(models.Model):
    species = models.CharField(max_length=8)
    kg = models.IntegerField()
    name = models.CharField(max_length=6)
    class Meta:
        managed = False
        db_table = 'Table0'

Large tables

As of now, ddlgenerator is not well-designed for table sizes approaching your system's available memory.

One approach to save time and memory for large tables is to break your input data into multiple files, then run ddlgenerator with --save-metadata against a small but representative sample. Then run with --no-creates and -use-saved-metadata to generate INSERTs from the remaining files without needing to re-determine the column types each time.

Installing

Requires Python3.

From PyPI:

pip3 install ddlgenerator

From source:

git clone https://github.com/catherinedevlin/ddl-generator.git
cd ddl-generator
pip3 install .

Alternatives

Credits

  • Mike Bayer for sqlalchemy
  • coldfix and Mark Ransom for their StackOverflow answers
  • Audrey Roy for cookiecutter
  • Brandon Lorenz for Django model generation

ddl-generator's People

Contributors

blorenz avatar catherinedevlin avatar kmmbvnr avatar mdarmetko avatar techtonik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ddl-generator's Issues

deceptive import trap

If the user tries to use ddlgenerator with psycopg2, but it is not installed, the import trap at line 6 of console.py makes it look like the import of Table failed. It's hard to figure out that psycopg2 is what is actually missing.

handle reserved keyword column names

When a column name happens to be a database reserved keyword, generated INSERT statements need to be quote-enclosed to make legal SQL. For instance: column named "user" in PostgreSQL.

move units from value to column name

so a "cost" column with ["$6", "$3.22"] should become a "cost_$" column with [Decimal('6'), Decimal('3.22')]

likewise physical units like lb, km2, etc

Can not set database schema

I'm not sure if I am missing something, but when trying to include a schema in table_name - the period gets converted to underscore.

Is there another parameter for setting schema? (I could not find one)

RuntimeError: OrderedDict mutated during iteration

mattjoiner@mattjoiners-LM-MBP:~$ ddlgenerator -i postgresql sample.json
Traceback (most recent call last):
File "/usr/local/bin/ddlgenerator", line 11, in
sys.exit(generate())
File "/usr/local/lib/python3.5/site-packages/ddlgenerator/console.py", line 112, in generate
generate_one(datafile, args, file=file)
File "/usr/local/lib/python3.5/site-packages/ddlgenerator/console.py", line 58, in generate_one
loglevel=args.log, limit=args.limit)
File "/usr/local/lib/python3.5/site-packages/ddlgenerator/ddlgenerator.py", line 170, in init
force_pk=force_pk)
File "/usr/local/lib/python3.5/site-packages/ddlgenerator/reshape.py", line 289, in unnest_children
for (key, val) in row.items():
RuntimeError: OrderedDict mutated during iteration

Read from stdio

Allow generating DDL from the output of another command. This should be easy to implement and give a ton of flexibility. "-" seems to be a standard to specify stdin. But it could also be implemented as a command line option.

Two ways I see this being used.

  1. A command generates CSV that you want to generate DDL for.:
    getdata -f csv | ddlcreate oracle -

  2. Adding a header when the CSV file doesn't include one. We can add a header to the csv via:
    cat header.csv data.csv | ddlcreate oracle -

Unnecessary dependency on "dateutils"

It seems that you have dateutils in your setup.py and requirements.txt. That refers to this package which is not currently being used. It is possible this started out as a confusion about the python-dateutil package name which was corrected by adding python-dateutil but you forgot to remove dateutils? You don't seem to be using it.

Generate linked graph format

It is very useful for reversing to get structure of DB with all linked entities in some visual form (for example some tree format, suitable for d3.js representation).

pk_name='index' cause NotImplementedError?

Hi, this is so nice package but I found a bug.

version

Python 3.7.6
ddlgenerator 0.1.9

Code/Output

from csv

table = Table('my_data.csv', pk_name='index')
sql = table.sql('postgresql', inserts=False)
print(sql)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-c070dabded13> in <module>
----> 1 table = Table('db_data/shop_master.csv', pk_name='index')
      2 sql = table.sql('postgresql', inserts=False)
      3 print(sql)
      4 
      5 # table_names = {'shop_master': 'shop_master', 'brand_master': 'brand_master', 'cateogry_master': 'cateogry_master', 'all_product_df': 'product_price'}

/opt/conda/lib/python3.7/site-packages/ddlgenerator/ddlgenerator.py in __init__(self, data, table_name, default_dialect, save_metadata_to, metadata_source, varying_length_text, uniques, pk_name, force_pk, data_size_cushion, _parent_table, _fk_field_name, reorder, loglevel, limit)
    168                                             parent_name=self.table_name,
    169                                             pk_name=pk_name,
--> 170                                             force_pk=force_pk)
    171 
    172         self.default_dialect = default_dialect

/opt/conda/lib/python3.7/site-packages/ddlgenerator/reshape.py in unnest_children(data, parent_name, pk_name, force_pk)
    284     field_names_used_by_children = defaultdict(set)
    285     child_fk_names = {}
--> 286     parent = ParentTable(data, parent_name, pk_name=pk_name, force_pk=force_pk)
    287     for row in parent:
    288         try:

/opt/conda/lib/python3.7/site-packages/ddlgenerator/reshape.py in __init__(self, data, singular_name, pk_name, force_pk)
    214         self.pk_name = pk_name
    215         if force_pk or (self.pk_name and self.is_in_all_rows(self.pk_name)):
--> 216             self.assign_pk()
    217         else:
    218             self.pk = None

/opt/conda/lib/python3.7/site-packages/ddlgenerator/reshape.py in assign_pk(self)
    254             raise Exception('Duplicate values in %s.%s, unsuitable primary key'
    255                             % (self.name, self.pk_name))
--> 256         self.use_this_pk(self.pk_name, key_type)
    257         if suitability in ('absent', 'partial'):
    258             for row in self:

/opt/conda/lib/python3.7/site-packages/ddlgenerator/reshape.py in use_this_pk(self, pk_name, key_type)
    238     def use_this_pk(self, pk_name, key_type):
    239         if key_type == int:
--> 240             self.pk = UniqueKey(pk_name, key_type, max([0, ] + all_values_for(self, pk_name)))
    241         else:
    242             self.pk = UniqueKey(pk_name, key_type)

TypeError: '>' not supported between instances of 'str' and 'int'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.