Giter Site home page Giter Site logo

sqlalchemy_bulk_lazy_loader's Introduction

SQLAlchemy Bulk Lazy Loader

A custom lazy loader for SQLAlchemy relations which ensures relations are always loaded efficiently. This loader automatically solves the n+1 query problem without needing to manually add joinedload or subqueryload statements to all your queries.

The Problem

The n + 1 query problem arises whenever relationships are lazy-loaded in a loop. For example:

students = session.query(Student).limit(100).all()
for student in students:
    print('{} studies at {}'.format(student.name, student.school.name))

In the above code 101 SQL queries will be generated - one to load the list of students and then 100 individual queries for each student to load that student's school. The statements look like:

SELECT * FROM students LIMIT 100;
SELECT * FROM schools WHERE schools.student_id = 1 LIMIT 1;
SELECT * FROM schools WHERE schools.student_id = 2 LIMIT 1;
SELECT * FROM schools WHERE schools.student_id = 3 LIMIT 1;
SELECT * FROM schools WHERE schools.student_id = 4 LIMIT 1;
...

This is bad.

The traditional way to solve this with SQLAlchemy is to add a joinedload or subqueryload to the initial query to include the schools along with the students, like below:

students = (
    session.query(Student)
        .options(subqueryload(Student.school))
        .limit(100)
        .all()
)
for student in students:
    print('{} studies at {}'.format(student.name, student.school.name))

While this works, it needs to be added to every query that's performed, and when there's a lot of related models being added you can easily end up with a massive list of subqueryload and joinedload. If you forget even one anywhere then you're silently back to the n + 1 query problem. Also, if you stop needing a relation later you need to remember to remove it from the original query or else you're now loading too much data. Furthermore, it's just a huge pain to have to maintain these lists of related models throughout your code everywhere there's a database query.

Wouldn't it be great if you didn't have to worry about adding subqueryload and joinedload and yet also be guaranteed that all your relations are loading efficiently?

How The Bulk Lazy Loader Works

99% of the time, if there is a list of models loaded in memory and a relation on one of them is lazy-loaded then you're in a loop and the same relationship is going to be requested on every other model. SQLAlchemy Bulk Lazy Loader assumes this is the case and whenever a relation on a model is lazy-loaded, it will look through the current session for any other similar models that need that same relation loaded and will issue a single, bulk SQL statement to load them all at once.

This means you can load all the relations you want in loops while being guaranted that all relations are loaded performantly, and only the relations that are used are loaded. For example, here's the same code from above:

students = session.query(Student).limit(100).all()
for student in students:
    print('{} studies at {}'.format(student.name, student.school.name))

The Bulk Lazy Loader will issue only 2 SQL statements, the same as if you had specified subqueryload on the initial query, except that now your code is a lot cleaner and you're guaranteed to be loading just the relations you need. Yay!

Installation

SQLAlchemy Bulk Lazy Loader can be installed via pip

pip install SQLAlchemy-bulk-lazy-loader

Usage

Before you declare your SQLAlchemy mappings you need to run the following:

from sqlalchemy_bulk_lazy_loader import BulkLazyLoader
BulkLazyLoader.register_loader()

This registers the loader with sqlalchemy and makes it available on your relations by specifying lazy='bulk' in your relation mappings. For example:

class Student(db.model):
    id = db.Column(db.Integer, primary_key=True)
    school_id = db.Column(db.Integer, db.ForeignKey('school.id'))

class School(db.model):
    id = db.Column(db.Integer, primary_key=True)
    students = db.relationship('Student', lazy='bulk', backref=db.backref('school', lazy='bulk'))

And that's it! The bulk lazy loader will be used for student.school and school.students relations.

Limitations

Currently only relations on a single primary key or a simple secondary join are supported.

students = relationship('Student', lazy='bulk') # OK!
students = relationship('Student', lazy='bulk', order_by=Student.id) # OK!
student = relationship('Student', lazy='bulk', uselist=False) # OK!
students = relationship('Student', lazy='bulk', secondary=school_to_students) # OK!
students = relationship('Student', lazy='bulk', secondary=school_to_students, primaryjoin='and_(...)') # NOT SUPPORTED

Python 2 is not supported.

But I have this one case where I want to load the relations differently!

If you want to load relations in the query still using subqueryload or joinedload you can still do that - the bulk lazy loader will only kick in when it's asked for a relation on a model that isn't already loaded. If you really need fine-grained control of relation loading in a specific case you can also use attributes.set_committed_value(model, <relation_name>, <related_model/s>) to explicitly set related models. In fact this is how BulkLazyLoader works behind the scenes.

Contributing

Contributions are welcome! Create a pull request and make sure to add test coverage. Tests use the SQLAlchemy test framework and can be run with py.test.

Happy loading!

sqlalchemy_bulk_lazy_loader's People

Contributors

chanind avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.