Overview
This RFC document tackles the issue of multiple databases in Synnefo. In the first section, we explain what is the current situation and what limitations it poses. Next, we proceed with an overview of existing Django mechanisms that are used to handle multiple databases and finally, we present the design and implementation of the proposed solution.
Current situation and limitations
Synnefo is designed for installation in an arbitrary number of nodes. Usually, in multi-node installations, the administrators can assign Synnefo components to specific nodes for redundancy, security or performance reasons. This logic also applies to the databases used by Synnefo, which in production environments are usually installed in separate nodes.
Currently, the suggested way of registering a remote database for a Synnefo node is to edit the 10-snf-webproject-databases.conf
file and add a "default" database entry:
DATABASES = {
'default': {
# 'postgresql_psycopg2', 'postgresql','mysql', 'sqlite3' or 'oracle'
'ENGINE': 'django.db.backends.postgresql_psycopg2',
# ATTENTION: This *must* be the absolute path if using sqlite3.
# See: http://docs.djangoproject.com/en/dev/ref/settings/#name
'NAME': 'snf_apps',
'USER': 'synnefo', # Not used with sqlite3.
'PASSWORD': 'example_passw0rd', # Not used with sqlite3.
# Set to empty string for localhost. Not used with sqlite3.
'HOST': '203.0.113.1',
# Set to empty string for default. Not used with sqlite3.
'PORT': '5432',
}
}
So, if we had four nodes, we could do the following:
- Install the Cyclades database in Node 1.
- Install the Astakos database in Node 2.
- Install the Cyclades app in Node 3.
- Also, register the Cyclades database of Node 1 as "default".
- Install the Astakos app in Node 4.
- Also, register the Astakos database of Node 2 as "default".
and that would work perfectly.
However, what would happen if the administrator wanted to install Astakos and Cyclades in the same node? In this case, this node would need to have access to two remote databases, which is not possible if we use only one database in the settings. Thus we need a way of managing multiple databases.
Django database routing 101
Django has introduced the notion of "multiple databases" in version 1.2. You can find detailed documentation on this subject for Django 1.4 here. In the following paragraphs, we'll try to briefly explain how Django handles multiple databases. For a more in-depth explanation, we'd suggest you take a look at the docs.
Multiple databases
In order to have multiple databases, we need to add extra ones in the DATABASES dictionary of Django settings, like so:
DATABASES = {
'default' : {...}
'db1': {...}
'db2': {...}
}
Note1: Django 1.4 always needs a database called "default", so we can't have a database dictionary with only "db1" and "db2" as entries.
Note2: Django does not provide cross-database relations. This means that if a model uses a field of another model as its foreign key, then both models must be in the same database.
Automatic database routing
Having multiple databases means that we need a way to select a database for operations on a model. To assist us in this task, Django suggests using database routers. Now, simply put, database routers are Python classes that have methods which can be called by Django when it needs to operate on a model.
There are four router class methods that Django 1.4 defines:
- db_for_read: Expects a model and returns a database to read from.
- db_for_write: Expects a model and returns a database to write to.
- allow_relation: Expects two models and decides if these two models can point to each other with foreign keys
- allow_syncdb: Decides if a model can be synced to a given database.
Moreover, we need a way to enable these routers. To do this, Django provides the DATABASE_ROUTERS list setting. DATABASE_ROUTERS is a list of stringified paths to the router classes, e.g. "path.to.my.router.MyRouter1", "path.to.my.router.MyRouter2". The order is important, as the routers are processed in the order they are listed. This is the same logic that is used in urls.py
, so most should be familiar with it.
Finally, if no router has taken a decision, the django falls back to its default routing scheme, e.g. what you would expect if you had only one database.
Note1: Django specifies that not all of the four methods need to exist. If one does not exist, it will continue with the next router.
Note2: A router may choose not to take a decision, e.g. because it knows nothing about a model. In this case, it can return None, and Django will continue with the rest of the routers.
Manual database selection
Django also has a handy "method" called using
. It applies to a single Django operation (save
/delete
/get
) and takes as a single argument an existing database. For more information about using
, we suggest you take a look at the Django documentation.
Proposed routing scheme
Basic goals
The first goal is to provide the administrator with maximum elasticity. This means that if the administrator wants to split the databases in any imaginable way, the software should offer a basic set of tools to do so.
The second goal is to provide the necessary logic to warn the administrator when he/she attempts an unreasonable setup (e.g. due to cross-database relations).
The third goal is to allow and encourage the power user to plug routers of its own, if he/she uses an exotic database setup.
The fourth and final goal is to do all the above with as few changes as possible :-).
Design
We believe that the right way to go is with database routers. The alternative of using
is not flexible enough and it requires to rewrite our codebase or provide custom QuerySets.
So, we suggest that each app defines its own routers. These routers, when queried about one of the app models, should return the appropriate database. Routers should be disabled by default and to enable them, the administrator will need to add them in the DATABASE_ROUTERS list. An interesting feature of this approach is that the administrator can add custom routers, therefore allowing any database setup (as always, within reason).
Implementation details
-
In the top directory of the cyclades, astakos and pithos apps, we add a routers.py
file.
-
In this file we import the installed_apps
setting of the specific app, as well as the global DATABASES
setting.
-
The database router class has the following simple skeleton (you can replace cyclades with pithos or astakos):
def db_for_read(model):
def db_for_write(model):
find the app of this model
if app not in installed_apps:
return None
if 'cyclades' in DATABASES:
return 'cyclades'
else
return None
def allow_relation(model1, model2):
find the apps of the models
if none of the apps in installed_apps
return None
elif both apps in installed_apps:
return True
else:
return False
def allow_syncdb(db, model):
find the app of this model
if app not in installed_apps:
return None
if 'cyclades' in DATABASES: # We sync only in 'cyclades'...
if db is 'cyclades':
return True
elif db is 'default': # ...unless they don't exist, so we use the defaults
return None
return False
-
On 10-snf-webproject-databases.conf
, we add a DATABASE_ROUTERS
empty list and in the comments we list the available databases as well as the databases they expect to operate on:
# Available databases:
# 'synnefo.routers.CycladesRouter':
# Operates on 'cyclades' database, falls back on 'default'
# 'astakos.routers.AstakosRouter':
# Operates on 'astakos' database, falls back on 'default'
# 'pithos.routers.PithosRouter':
# Operates on 'pithos' database, falls back on 'default'
DATABASE_ROUTERS = []
This way, if the database setup is typical, the administrator doesn't need to configure anything. Even if a router is added without a change in DATABASES, the router will fall back to the default database. For more advanced database setups, the user will be able to add the necessary routers in the DATABASE_ROUTERS list, as well as the new databases in the DATABASES list. Finally, for exotic database configurations, the power user can do just about anything using custom routers.