steder / goose Goto Github PK
View Code? Open in Web Editor NEWSimple configuration driven SQL migration tool
License: MIT License
Simple configuration driven SQL migration tool
License: MIT License
Python Migrations allow you to run migrations run some logic that is particularly unwieldy in SQL and integrate that migration logic into the rest of your migration process. They might also be used to implement migrations for NoSQL stores.
I'm thinking that every python migration is something like:
def forward(cursor):
"""Migrate the database forward to the next version by manipulating the cursor"""
def reverse(cursor):
"""Roll back this migration using the cursor provided"""
For SQL migrations the cursor allows us to control the transaction boundary within Goose, the python migration scripts can do whatever they want and apply whatever logic, but they should run within a single transaction if possible.
For NoSQL migrations the signature might be more like:
def forward(connection_to_db):
""""Do migrations using the provided connection_to_db"""
This way, while the migrations really don't have transactions to leverage they can at least not have to figure out independently how to connect to the DB.
When a migration tool is part of a larger deployment process it usually makes more sense to do the dependency analysis once and create a single migration file to be applied to the production database. The other benefit here is that the migration tool doesn't need to run in the production environments just on the build machine. (You won't need to install dependencies just for the migration tool everywhere).
Basically something like:
goose [options] sql > release_2012_03_06.sql
Would output all the SQL that needs to be run to migrate the database to the latest version.
Ideas I'm considering that don't really have a time frame.
Being able to undo or reverse a migration could be a useful feature for testing and rollback.
I think the configuration file could be updated from:
migrations:
To something more like:
migrations:
The actual logic for running migrations in reverse should be pretty straightforward, we'll just need a toggle to explicitly enable rollbacks.
However, how do we handle something like:
migrations:
Can we rollback from 3 to 0? Or just from 3 to 2?
Must there be a reverse migration for every version?
Is it an error to try to rollback a migration that has no reverse or do you allow it?
Say we allow a rollback from 3 to 0. When rolling forward again to 3 or 4, do we rerun migration 2? I think we can reasonably be expected to remember that 2 was run and not run it again, but depending on the contents of 2 it may need to be rerun...
Ultimately this depends on having a workflow / migration policy for your specific project or schema so there's only so much we can do in the tool. We should be conservative in what we do and liberal in what we accept.
http://pypi.python.org/pypi/cram
We could actually test commandline interactions. They're slower of course so we don't want a lot of these.
For example, every command requires you to specify the database / dsn and migrations directory:
goose -d sqlite:///my.db -m migrations/ init
I'm thinking the above would look a lot nicer if it used context and/or a configuration file to determine what to do and we could instead just write:
goose init
So we might consider a goose.conf in the current directory used to determine what commandline arguments to use or we might be able to use convention and say that if there is a migrations directory in the CWD that's what is used. Exactly what DSN is harder to handle through convention although I'm thinking that maybe init should write a configuration file that's used on subsequent invocations from that same directory.
So you'd do:
goose -d sqlite:///mysqlite.db init
-m migrations_dir would be figured out based on convention and a configuration file would be written to the current workind directory with the database DSN.
Future commands like would work without specifying either of these redundant arguments as long as those commands were run from the appropriate directory.
Alternatively goose could become a bit more knowledgeable about application configuration files and could read the appropriate DSN from settings.py (for django) or some other application config file.
Currently migration failures are obvious because we just throw the sqlalchemy.exc.OperationalError exception and the big ol' most irrelevant stack trace.
The user needs to know what sql it was trying to run that failed so that they can go fix it.
The current behavior is correct because it's a obvious failure and it stops the migration run.
However, I think that throwing a stack trace at the user when it is not a Python problem is inappropriate and misleading. The real problem in this particular case is with their SQL. I'd prefer someone only see a stack trace from Goose when they've encountered a bug in the program.
If Goose is used as a library then it's perfectly reasonable to catch and rethrow the exception but as a commandline tool it should behave, display an appropriate error message and exit with a non-zero status code.
A python script that exits with an uncaught exception will return status code 1 appropriately indicating failure.
The current stack trace is:
Traceback (most recent call last):
File "./bin/goose", line 5, in
core.main(sys.argv[1:])
File "/Users/steder/Sites/Goose/goose/core.py", line 283, in main
init=init)
File "/Users/steder/Sites/Goose/goose/core.py", line 249, in migrate
migrator.migrate(fromVersion, toVersion, selectedMigrations)
File "/Users/steder/Sites/Goose/goose/core.py", line 190, in migrate
rval = self.runSql(migration, version)
File "/Users/steder/Sites/Goose/goose/core.py", line 162, in runSql
executeBatch(self.session, sql)
File "/Users/steder/Sites/Goose/goose/core.py", line 44, in executeBatch
cursor.execute(st)
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/orm/session.py", line 801, in execute
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/engine/base.py", line 1405, in execute
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/engine/base.py", line 1538, in _execute_clauseelement
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/engine/base.py", line 1646, in _execute_context
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/engine/base.py", line 1639, in _execute_context
File "/Users/steder/VirtualEnvs/goose/lib/python2.7/site-packages/SQLAlchemy-0.7.5-py2.7-macosx-10.4-x86_64.egg/sqlalchemy/engine/default.py", line 330, in do_execute
sqlalchemy.exc.OperationalError: (OperationalError) near "FRO": syntax error u'\nSELECT * FRO Track' ()
Modifying that to re-raise a Goose exception: (I'm just using Exception for expediency) still presents the user with:
Traceback (most recent call last):
File "./bin/goose", line 5, in
core.main(sys.argv[1:])
File "/Users/steder/Sites/Goose/goose/core.py", line 283, in main
init=init)
File "/Users/steder/Sites/Goose/goose/core.py", line 249, in migrate
migrator.migrate(fromVersion, toVersion, selectedMigrations)
File "/Users/steder/Sites/Goose/goose/core.py", line 190, in migrate
rval = self.runSql(migration, version)
File "/Users/steder/Sites/Goose/goose/core.py", line 170, in runSql
raise Exception("%s"%(e,))
Exception: (OperationalError) near "FRO": syntax error u'\nSELECT * FRO Track' ()
I'm proposing the output look like:
Error occured running migration: bad.sql
(OperationalError) near "FRO": syntax error (bad.sql:1)
Of course, if the script no longer exits with a traceback we need to still exit appropriately with a non-zero status code just like python would've done had we raised an exception.
Reverse migration cases (from version 4 to version 3) should probably throw an error instead of just not doing anything.
Let's test Goose against different Python versions.
Goose isn't really a great name. Perhaps "North" works as a pun on the http://south.aeracode.org/ project.
Seriously, why introduce a dependency on a yaml parser, when we already have json available in python2.6 and higher?
I think it's recommended to use Distribute instead of Setuptools.
Let's break out Sphinx and Sphinx doctests.
$ goose [options] new --engine=postgresql
Creates a template file .sql and .rollback.sql and adds them to the configuration file.
These templates would include in the comments a quick reference to the common syntax for creating a stored proc, view, table, or altering any of the above for a specific DB engine. The template would only include comments and would not do anything by default but act as a quick starting point.
This could be smarter to be templates for specific things. E.g.:
$ goose [options] alter table
$ goose [options] create table
$ goose [options] create view
I'm kicking around some ideas for a more structured approach to migrations. Something between modeling ORM Entity changes to dynamically build migrations and the current state of Goose which is just a migration is just any SQL. Currently all migrations can change anything they want in any order. In some cases this can cause problems that require developers to maybe split common changes into another migration script or reorder some migrations to deal with dependencies.
I thinking that an approach that helps organize and order SQL migrations could offer some benefits.
For example, by separating SQL by type (stored proc, table, view, etc) or by affected table you could:
Basically by making the unit of work smaller and more structured we can make it potentially much easier to read the migrations as they'll be nicely categorized by type / table but you'll also be able to make the tool smarter and capable of merging work from many developers.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.