Giter Site home page Giter Site logo

goose's Introduction

Hi there ๐Ÿ‘‹

goose's People

Contributors

steder avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

goose's Issues

Python Migrations

Python Migrations allow you to run migrations run some logic that is particularly unwieldy in SQL and integrate that migration logic into the rest of your migration process. They might also be used to implement migrations for NoSQL stores.

I'm thinking that every python migration is something like:

begin python migration

def forward(cursor):
"""Migrate the database forward to the next version by manipulating the cursor"""

def reverse(cursor):
"""Roll back this migration using the cursor provided"""

For SQL migrations the cursor allows us to control the transaction boundary within Goose, the python migration scripts can do whatever they want and apply whatever logic, but they should run within a single transaction if possible.

For NoSQL migrations the signature might be more like:

def forward(connection_to_db):
""""Do migrations using the provided connection_to_db"""

This way, while the migrations really don't have transactions to leverage they can at least not have to figure out independently how to connect to the DB.

SQL Output for Deployment

When a migration tool is part of a larger deployment process it usually makes more sense to do the dependency analysis once and create a single migration file to be applied to the production database. The other benefit here is that the migration tool doesn't need to run in the production environments just on the build machine. (You won't need to install dependencies just for the migration tool everywhere).

Basically something like:

goose [options] sql > release_2012_03_06.sql

Would output all the SQL that needs to be run to migrate the database to the latest version.

Someday

Ideas I'm considering that don't really have a time frame.

Add reverse migrations

Being able to undo or reverse a migration could be a useful feature for testing and rollback.

I think the configuration file could be updated from:

migrations:

  • migration1.sql
  • migration2.sql

To something more like:

migrations:

  • migration1: # defines both forward and reverse
    forward: migration1.sql
    reverse: migration1_rollback.sql
  • migration2.sql # no reverse migration/rollback

The actual logic for running migrations in reverse should be pretty straightforward, we'll just need a toggle to explicitly enable rollbacks.

However, how do we handle something like:

migrations:

  • migration 0.sql
  • migration1:
    forward: migration1.sql
    reverse: migration1.rollback.sql
  • migration2.sql
  • migration3:
    forward: migration3.sql
    reverse: migration3.rollback.sql

Can we rollback from 3 to 0? Or just from 3 to 2?

Must there be a reverse migration for every version?

Is it an error to try to rollback a migration that has no reverse or do you allow it?

Say we allow a rollback from 3 to 0. When rolling forward again to 3 or 4, do we rerun migration 2? I think we can reasonably be expected to remember that 2 was run and not run it again, but depending on the contents of 2 it may need to be rerun...

Ultimately this depends on having a workflow / migration policy for your specific project or schema so there's only so much we can do in the tool. We should be conservative in what we do and liberal in what we accept.

Nicer UX: Commandline examples are kind of verbose / repetitive

For example, every command requires you to specify the database / dsn and migrations directory:

goose -d sqlite:///my.db -m migrations/ init

I'm thinking the above would look a lot nicer if it used context and/or a configuration file to determine what to do and we could instead just write:

goose init

So we might consider a goose.conf in the current directory used to determine what commandline arguments to use or we might be able to use convention and say that if there is a migrations directory in the CWD that's what is used. Exactly what DSN is harder to handle through convention although I'm thinking that maybe init should write a configuration file that's used on subsequent invocations from that same directory.

So you'd do:

goose -d sqlite:///mysqlite.db init

-m migrations_dir would be figured out based on convention and a configuration file would be written to the current workind directory with the database DSN.

Future commands like would work without specifying either of these redundant arguments as long as those commands were run from the appropriate directory.

Alternatively goose could become a bit more knowledgeable about application configuration files and could read the appropriate DSN from settings.py (for django) or some other application config file.

Migration failure needs to be obvious and clean

Currently migration failures are obvious because we just throw the sqlalchemy.exc.OperationalError exception and the big ol' most irrelevant stack trace.

The user needs to know what sql it was trying to run that failed so that they can go fix it.

The current behavior is correct because it's a obvious failure and it stops the migration run.

However, I think that throwing a stack trace at the user when it is not a Python problem is inappropriate and misleading. The real problem in this particular case is with their SQL. I'd prefer someone only see a stack trace from Goose when they've encountered a bug in the program.

If Goose is used as a library then it's perfectly reasonable to catch and rethrow the exception but as a commandline tool it should behave, display an appropriate error message and exit with a non-zero status code.

A python script that exits with an uncaught exception will return status code 1 appropriately indicating failure.

The current stack trace is:

Traceback (most recent call last):
File "./bin/goose", line 5, in
core.main(sys.argv[1:])
File "/Users/steder/Sites/Goose/goose/core.py", line 283, in main
init=init)
File "/Users/steder/Sites/Goose/goose/core.py", line 249, in migrate
migrator.migrate(fromVersion, toVersion, selectedMigrations)
File "/Users/steder/Sites/Goose/goose/core.py", line 190, in migrate
rval = self.runSql(migration, version)
File "/Users/steder/Sites/Goose/goose/core.py", line 162, in runSql
executeBatch(self.session, sql)
File "/Users/steder/Sites/Goose/goose/core.py", line 44, in executeBatch
cursor.execute(st)
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/orm/session.py", line 801, in execute
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/engine/base.py", line 1405, in execute
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/engine/base.py", line 1538, in _execute_clauseelement
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/engine/base.py", line 1646, in _execute_context
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/engine/base.py", line 1639, in _execute_context
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/engine/default.py", line 330, in do_execute
sqlalchemy.exc.OperationalError: (OperationalError) near "FRO": syntax error u'\nSELECT * FRO Track' ()

Modifying that to re-raise a Goose exception: (I'm just using Exception for expediency) still presents the user with:

Traceback (most recent call last):
File "./bin/goose", line 5, in
core.main(sys.argv[1:])
File "/Users/steder/Sites/Goose/goose/core.py", line 283, in main
init=init)
File "/Users/steder/Sites/Goose/goose/core.py", line 249, in migrate
migrator.migrate(fromVersion, toVersion, selectedMigrations)
File "/Users/steder/Sites/Goose/goose/core.py", line 190, in migrate
rval = self.runSql(migration, version)
File "/Users/steder/Sites/Goose/goose/core.py", line 170, in runSql
raise Exception("%s"%(e,))
Exception: (OperationalError) near "FRO": syntax error u'\nSELECT * FRO Track' ()

I'm proposing the output look like:

Error occured running migration: bad.sql
(OperationalError) near "FRO": syntax error (bad.sql:1)

Of course, if the script no longer exits with a traceback we need to still exit appropriately with a non-zero status code just like python would've done had we raised an exception.

Why it gots to be yaml?

Seriously, why introduce a dependency on a yaml parser, when we already have json available in python2.6 and higher?

Templates for Migrations

$ goose [options] new --engine=postgresql

Creates a template file .sql and .rollback.sql and adds them to the configuration file.

These templates would include in the comments a quick reference to the common syntax for creating a stored proc, view, table, or altering any of the above for a specific DB engine. The template would only include comments and would not do anything by default but act as a quick starting point.

This could be smarter to be templates for specific things. E.g.:

$ goose [options] alter table
$ goose [options] create table
$ goose [options] create view

Structuring Migrations (Removing Monolithic Migrations)

I'm kicking around some ideas for a more structured approach to migrations. Something between modeling ORM Entity changes to dynamically build migrations and the current state of Goose which is just a migration is just any SQL. Currently all migrations can change anything they want in any order. In some cases this can cause problems that require developers to maybe split common changes into another migration script or reorder some migrations to deal with dependencies.

I thinking that an approach that helps organize and order SQL migrations could offer some benefits.

For example, by separating SQL by type (stored proc, table, view, etc) or by affected table you could:

  • order things correctly (all stored procs, all table creates/alters, all views, etc)
  • keep changes for a particular table together in one place to easily reconstruct the history of the table
  • post process table migrations to create a single up to date create table statement to view the current schema and to create new databases more quickly
  • order and re-order table migrations based on foreign keys (dependency tracking)

Basically by making the unit of work smaller and more structured we can make it potentially much easier to read the migrations as they'll be nicely categorized by type / table but you'll also be able to make the tool smarter and capable of merging work from many developers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.