spenceforce / cuttlepool Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 1.0 164 KB

A resource pool implementation.

License: BSD 3-Clause "New" or "Revised" License

Python 81.69% Shell 18.31%

pool resource

cuttlepool's Introduction

DEPRECATED

I don't have the time or desire to continue maintaining this project.

If somebody wants to maintain this project and continue publishing to PyPI, please reach out and I will relinquish the name cuttlepool on PyPI if you want it.

CuttlePool

CuttlePool is a general purpose, thread-safe resource pooling implementation for use with long lived resources and/or resources that are expensive to instantiate. It's key features are:

Pool overflow: Creates additional resources if the pool capacity has been reached and will remove the overflow when demand for resources decreases.
Resource harvesting: Any resources that haven't been returned to the pool and are no longer referenced by anything outside the pool are returned to the pool. This helps prevent pool depletion when resources aren't explicitly returned to the pool and the resource wrapper is garbage collected.
Resource queuing: If all else fails and no resource can be immediately found or made, the pool will wait a specified amount of time for a resource to be returned to the pool before raising an exception.

How-to Guide

Using CuttlePool requires subclassing a CuttlePool object with optional user defined methods normalize_resource() and ping(). The example below uses mysqlclient connections as a resource, but CuttlePool is not limited to connection drivers.

>>> import MySQLdb
>>> from cuttlepool import CuttlePool
>>> class MySQLPool(CuttlePool):
...     def ping(self, resource):
...         try:
...             c = resource.cursor()
...             c.execute('SELECT 1')
...             rv = (1,) in c.fetchall()
...             c.close()
...             return rv
...         except MySQLdb.OperationalError:
...             return False
...     def normalize_resource(self, resource):
...         # For example purposes, but not necessary.
...         pass
>>> pool = MySQLPool(factory=MySQLdb.connect, db='ricks_lab', passwd='aGreatPassword')

Let's break this down line by line.

First, the MySQLdb module is imported. MySQLdb.connect will be the underlying resource factory.

CuttlePool is imported and subclassed. The ping() method is implemented, which also takes a resource as a parameter. ping() ensures the resource is functional; in this case, it checks that the MySQLdb.Connection instance is open. If the resource is functional, ping() returns True else it returns False. In the above example, a simple statement is executed and if the expected result is returned, it means the resource is open and True is returned. The implementation of this method is really dependent on the resource created by the pool and may not even be necessary.

There is an additional method, normalize_resource(), that can be implemented. It takes a resource, in this case a MySQLdb.Connection instance created by MySQLdb.connect, as a parameter and changes it's properties. This can be important because a resource can be modified while it's outside of the pool and any modifications made during that time will persist; this can have unintended consequences when the resource is later retrieved from the pool. Essentially, normalize_resource() allows the resource to be set to an expected state before it is released from the pool for use. Here it does nothing (and in this case, it's not necessary to define the method), but it's shown for example purposes.

Finally an instance of MySQLPool is made. The MySQLdb.connect method is passed to the instance along with the database name and password.

The CuttlePool object and as a result the MySQLPool object accepts any parameters that the underlying resource factory accepts as keyword arguments. There are three other parameters the pool object accepts that are unrelated to the resource factory. capacity sets the max number of resources the pool will hold at any given time. overflow sets the max number of additional resources the pool will create when depleted. All overflow resources will be removed from the pool if the pool is at capacity. timeout sets the amount of time in seconds the pool will wait for a resource to become free if the pool is depleted when a request for a resource is made.

A resource from the pool can be treated the same way as an instance created by the resource factory passed to the pool. In our example a resource can be used just like a MySQLdb.Connection instance.

>>> con = pool.get_resource()
>>> cur = con.cursor()
>>> cur.execute(('INSERT INTO garage (invention_name, state) '
...              'VALUES (%s, %s)'), ('Space Cruiser', 'damaged'))
>>> con.commit()
>>> cur.close()
>>> con.close()

Calling close() on the resource returns it to the pool instead of closing it. It is not necessary to call close() though. The pool tracks resources so any unreferenced resources will be collected and returned to the pool. It is still a good idea to call close() though, since explicit is better than implicit.

Note

Once close() is called on the resource object, it renders the object useless. The resource object received from the pool is a wrapper around the actual resource object and calling close() on it returns the resource to the pool and removes it from the wrapper effectively leaving it an empty shell to be garbage collected.

To automatically "close" resources, get_resource() can be used in a with statement.

>>> with pool.get_resource() as con:
...     cur = con.cursor()
...     cur.execute(('INSERT INTO garage (invention_name, state) '
...                  'VALUES (%s, %s)'), ('Space Cruiser', 'damaged'))
...     con.commit()
...     cur.close()

API

The API can be found at read the docs.

FAQ

How do I install it?

pip install cuttlepool

How do I use `cuttlepool` with sqlite3?

Don't.

SQLite does not play nice with multiple connections and threads. If you need to make concurrent writes to a database from multiple connections, consider using a database with a dedicated server like MySQL, PostgreSQL, etc.

Contributing

It's highly recommended to develop in a virtualenv.

Fork the repository.

Clone the repository:

git clone https://github.com/<your_username>/cuttlepool.git

Install the package in editable mode:

cd cuttlepool
pip install -e .[dev]

Now you're set. See the next section for running tests.

Running the tests

Tests can be run with the command pytest.

Where can I get help?

If you haven't read the How-to guide above, please do that first. Otherwise, check the issue tracker. Your issue may be addressed there and if it isn't please file an issue :)

cuttlepool's People

Contributors

Stargazers

Watchers

Forkers

nuuk42

cuttlepool's Issues

Update README to use different example than sqlite3

sqlite3 connections do not work across threads. Use different example like some kind of socket server or something along those lines.

Reset cursor class on connection

A connection's cursor class can be modified when out and about which can cause undesired behavior when it is later retrieved from the pool as it will be expected to create regular cursors but will not. When it's returned to the CuttlePool object it's cursorclass attribute should be set to the base cursor class.

Clean up bare exceptions

There are a few bare exceptions like:

try:
    ...
except:  # bare exception
    ...

These should catch more specific exceptions.

Offset use of threading locks to queue.Queue's native support

Python's queue.Queue already implements thread locking, so any locking done by CuttlePool should be aimed at internal processes unrelated to the underlying Queue.

Clean up tests

Test public API and use environment variables to determine which type of sql to use.

Use ping instead of try/except in _close_connection.

Using ping will check if the connection is open and it moves the burden of proper exception handling to the user. It's impossible to determine the user's needs given any SQL driver so it's best left to them to decide what's best. Related to #21

Migrate tests to pytest

Pytest is easier to work with.

Does normalize_connection() need to be set by the user?

Is it possible to set normalize_connection() based on the original connection properties?

Improper use of RLock

Currently RLock is instantiated every time it is needed. The proper usage is for a connection pool object to have one RLock object that handles all locking instead.

CuttlePool del method should call empty_pool()

It currently calls _close_connections() which no longer exists

Allow subclasses of PoolConnection to be used by CuttlePool.

A CuttlePool instance should be able to accept a subclass of PoolConnection for use on all calls to get_connection(). get_connection() should also accept a subclass of PoolConnection to supersede the default connection wrapper.

Add more drivers to test suite.

Internally track connections?

Keep references to all connections in CuttlePool object whether they are in the queue or not?

Would make it easier to prevent improper things being passed in and would simplify dealing with the size increments/decrements.

the attribute of the sqlite3 which is "check_same_thread" not supporting?

I want to use the cuttlepool in multi thread envrionment, but it seems that it doesn't support the option of the sqlite3 ,"check_same_thread : False" .
Am I wrong or Do I have to look for other libraries?
self.pool = SQLitePool(factory=sqlite3.connect,capacity=4,database='/mnt/config/test.db',isolation_level=None,check_same_thread=False)

It is not working at all

Connection arguments should be modifiable.

To allow connection arguments to be modified, a mechanism has to be in place to close incoming connections instead of recycling them.

Give option for user to default ping to True

Change directions for running tests

With mock sql connection objects, the previous testing instructions are invalid.

Fix tutorial paragraph about `normalize_connection()`

Here's the paragraph:

CuttlePool is imported and subclassed. The normalize_connection() method takes a Connection object as a parameter and changes it's properties. This is important because a Connection object can be modified while it's outside of the pool and any modifications made during that time

The final sentence is unfinished.

Advise forking project instead of cloning directly.

Add API to README

Make default values global

A mild change for potentially improved usability in development (for cuttlepool or extensions).

Write Docs

Context Manager functionality for getting connections.

Write README

Update changelog for best practices

Follow guidelines here: http://keepachangelog.com/en/1.0.0/

Make _close_connections() a public method

Make Cuttle Pool sql driver independent

Design cuttle pool to work with any sql driver.

Add repr methods

Race-Condition

The class CuttlePool has a race-condition in its method get_resource.

Setup:

the pool contains one available resource
no resources are in use
two threads are using the pool

Sequence of events:

thread-1:: calls "get_resource"
thread-1:: the method "_get()" returns the object "_ResourceTracker-1" and move this
object to the part of the "_reference_queue" that contains the resources that
are in use.
thread-1:: calls "self.ping" using the resource from "_ResourceTracker-1" as argument.
Note: at this point in time a call to the method "available()" of the object
"_ResourceTracker-1" returns "True" because the "weakref" to the wraped resource
has not yet been established. This happens later in "get_resource" with a call to
the method "wrap_resource".
thread-2:: calls "get_resource"
thread-2:: The pool's "empty()" return "True" and so "_harvest_lost_resources" is called to look
for resources that haven been properly returned to pool.
thread-2:: "_harvest_lost_resources" loops the part of the "_reference_queue" that contains the
"_ResourceTracker" objects of resources that are in use. It finds "_ResourceTracker-1"
and calls the "available" method which returns "True".
The method "_harvest_lost_resources" then returns the object "_ResourceTracker-1" to the
part of the "_reference_queue" that contains the available resources.
thread-2:: The method "_get()" returns "_ResourceTracker-1"

As a result, boths thread are using the same resource.