redis-collections / redis-collections Goto Github PK

View Code? Open in Web Editor NEW

116.0 116.0 27.0 561 KB

Set of basic Python collections backed by Redis

Home Page: https://redis-collections.readthedocs.io/

License: ISC License

Python 100.00%

redis-collections's Introduction

Redis Collections

redis-collections is a Python library that provides a high-level interface to Redis, the excellent key-value store.

As of 2024, this project is retired. This repository will remain available as a public archive.

Quickstart

Import the collections from the top-level redis_collections package.

Standard collections

The standard collections (e.g. Dict, List, Set) behave like their Python counterparts:

>>> from redis_collections import Dict, List, Set

>>> D = Dict()
>>> D['answer'] = 42
>>> D['answer']
42

Collection	Redis type	Description
`Dict`	Hash	Emulates Python's `dict`
`List`	List	Emulates Python's `list`
`Set`	Set	Emulates Python's `set`
`Counter`	Hash	Emulates Python's `collections.Counter`
`DefaultDict`	Hash	Emulates Python's `collections.defaultdict`
`Deque`	List	Emulates Python's `collections.deque`

Syncable collections

The syncable collections in this package provide types whose contents are kept in memory. When their sync method is called those contents are written to Redis:

>>> from redis_collections import SyncableDict

>>> with SyncableDict() as D:
...     D['a'] = 1  # No write to Redis
...     D['a'] += 1  # No read from or write to Redis
>>> D['a']  # D.sync() is called at the end of the with block
2

Collection	Python type	Description
`SyncableDict`	`dict`	Syncs to a Redis Hash
`SyncableList`	`list`	Syncs to a Redis List
`SyncableSet`	`set`	Syncs to a Redis Set
`SyncableCounter`	`collections.Counter`	Syncs to a Redis Hash
`SyncableDeque`	`collections.deque`	Syncs to a Redis List
`SyncableDefaultDict`	`collections.defaultdict`	Syncs to a Redis Hash

Other collections

The LRUDict collection stores recently used items in in memory. It pushes older items to Redis:

>>> from redis_collections import LRUDict

>>> D = LRUDict(maxsize=2)
>>> D['a'] = 1
>>> D['b'] = 2
>>> D['c'] = 2  # 'a' is pushed to Redis and 'c' is stored locally
>>> D['a']  # 'b' is pushed to Redis and 'a' is retrieved for local storage
1
>>> D.sync()  # All items are copied to Redis

The SortedSetCounter provides access to the Redis Sorted Set type:

>>> from redis_collections import SortedSetCounter

>>> ssc = SortedSetCounter([('earth', 300), ('mercury', 100)])
>>> ssc.set_score('venus', 200)
>>> ssc.get_score('venus')
200.0
>>> ssc.items()
[('mercury', 100.0), ('venus', 200.0), ('earth', 300.0)]

Documentation

For more information, see redis-collections.readthedocs.io

License: ISC

This work is licensed under ISC license.

This library is not affiliated with Redis Labs, Redis, or redis-py. Govern yourself accordingly!

redis-collections's People

Contributors

Stargazers

Watchers

redis-collections's Issues

I'm working on the same idea...

Hello, folks!

I just now found Redis Collections, which is crazy, because I'm working on the same idea. I call my package Pottery. I see that you support Python 2.6+, while I'm focused on Python 3.4+. You had this idea first, but I came up with it independently. Is there any way for us to collaborate?

Thanks, and keep up the great work!
Raj

Implement SortedSet

Packaging

Do it according to https://caremad.io/blog/setup-vs-requirement/

Return normal Python objects for methods that make copies

This might be controversial, but I'd like to change the behavior of the collections such that methods that return "new" collections create normal Python objects rather than new Redis collections. For example, slicing a list creates a new Redis-backed List, but I'd like it to create a standard list.

My reasoning is that you can't set the redis, key, or writeback from these methods, and in my experience you more often want to iterate over a slice than store it permanently.

For methods like copy where it's possible to pass in kwargs I would keep the current behavior, but for things that use the slicing or arithmetic operators (not the in-place ones) I'd return the Python version.

I'd rather not make this a per-instance setting, but I could compromise on that.

Redis version support policy

This issue is to clarify what Redis features are supported by the library.

The goals are:

Add a note to documentation about which Redis versions are supported
Add workarounds for behaviors that don't work on old-but-still-supported Redis versions

My inclination is that the oldest supported version of Redis will be the one provided by the oldest supported Ubuntu LTS release.

As of this writing, that release is 12.04, and it provides Redis 2.2.

In 2017, 14.04 will be the oldest supported Ubuntu LTS release, and it provides Redis 2.8.

Of the commands currently used by the library, these are the ones that have additional features since 2.0.0 (when the HASH data type was introduced):

| Command     | Available since | Changes                  |
|-------------|-----------------|--------------------------|
| hdel        | 2.0.0           | Multi-delete since 2.4.0 |
| lpush       | 1.0.0           | Multi-push since 2.4.0   |
| rpush       | 1.0.0           | Multi-push since 2.4.0   |
| sadd        | 1.0.0           | Multi-add since 2.4.0    |
| srandmember | 1.0.0           | Count since 2.6          |
| srem        | 1.0.0           | Multi-remove since 2.4.0 |

The library only relies on > 2.0.0 behavior for Set.random_sample. It should be possible to backport that to older versions in Python.

Implement Counter

Suggest testing against standard objects

The tests could be generalized and improved if they tested themselves against the standard (non-redis) implementations of these objects.

For example, the List class should operate the same as build-in list class, at least for all operations it supports. So if the same members are called with the same arguments, the results should be the same.

This would have exposed the bugs in List.insert implementation, fixed in recent pull.

Implement and test equality consistently

I think we want Lists to compare equal to lists, Dicts to compare equal to dicts, etc., provided their elements are the same.

This issue is to make sure that's the case and add test cases for it.

Add Travis-CI

Split into multiple modules

One module starts to be unreadable.

Do not use variable names colliding with Python builtins

id, type, ...

synchronization among multiple instances

Not a bug, but a feature request.

Assume two instances (A and B) running

L = List(key=, redis=)

B:
L = List(key=, redis=)

If A is updated and sync'ed to redis, the B instance will not update, and it can be updated with reinitialization

L= List(key=, redis=)

Is there an easier way to achieve this? e.g. with a function,

L.sync_from_db()

def sync_from_db(self):
return self.init(key=self.key, redis=self.redis)

I feel that the functional form is easier to read/understand and more concise syntax wise. Thanks

Implement all methods for Set

Some are not very efficient, some are missing:

version missmatch with redis

version 0.7.1 has the following requirement on redis:

redis<3.4.0,>=3.1.0

Since redis is currently at 3.4.1 this breaks builds.

List.getitem creating random keys

When we try to access List.getitem, it does return the correct result, but generates a random key in the database. I identified the root cause to
redis_collections/base.py:136
The section says

settings = {
                'key': key,
                'redis': self.redis,
                'pickler': self.pickler,
            }

This is called by lists.py/slice_trans() at line 122. The calling function does not pass the key which should have been self.key

Support SCAN for iterating over collection elements

Some collection operations, such as iterating over a List require retrieving everything stored in Redis to memory.

redis-py provides Pythonic support for the Redis SCAN commands, which allow for incremental iteration.

The Redis docs warn:

A given element may be returned multiple times. It is up to the application to handle the case of duplicated elements, for example only using the returned elements in order to perform operations that are safe when re-applied multiple times.

If that limitation is difficult to work around I may postpone this feature.

The scan features were added in Redis 2.8, so per #74, this will not replace the current __iter__ methods until 2019 at the earliest.

1.0 and 1 shouldn't both be allowed in a Set

I fixed this for the Dict classes, but it still affects the Set class as well.

1.0 and 1 have the same hash, are equal to each other, but pickle to different values. This means when they're stored in Redis they're separate.

>>> for name, init in (('redis', Set), ('python', set)):
...     s = init()
...     s.add(1.0)
...     s.add(1)
... 
...     t = init()
...     t.add(1)
...     t.add(1.0)
...     
...     print(name, list(s), list(t), sep='\t')
redis   [1.0, 1]    [1.0, 1]
python  [1.0]   [1]

On Python 2 the same problem exists with u'a' and b'a' in the same Set.

The trick that worked with Dict won't work for Set, so this issue is for figuring out a workaround. One idea is to let Redis do its thing, but to normalize on the way back to Python. However, this makes pop problematic. Another would be to special case pickling of floats and unicode strings?

If anyone knows any other examples of things that have equal hashes but pickle to different things let me know!

Python 3.3 hash randomization breaks Dict access across processes

One thing I didn't consider when updating the Dict classes to use Python hashing (in order to make key behavior match Python) is that as of Python 3.3 hash values aren't consistent from one run to the next.

This issue is to restore cross-process access to a Redis collection, and may or may not involve fixing #49.

Manually-synchronized collections

The collections currently available always write to Redis when a change is made (even when using a local cache for Dict and List instances). This means that an application that quits unexpectedly can retrieve its collections afterward (if it knows their key), but it also means that writes are much slower than they would be for a local collection.

This issue is to explore and (probably) implement versions of the current collections that normally use in-memory Python objects, but can sync their contents to Redis on demand.

Possibly something like this recipe, but potentially something simpler.

API: Something like this, I think?

from redis_collections import SyncDict

# Load contents from 'some_key' into memory
with SyncDict(key='some_key') as D:
    # Retrieve something from an earlier session from memory
    previous_value = D.get('previous_key')
    # Store something in memory, not to Redis
    D['new_key'] = 'new_value'

# Changes are automatically written to Redis

Naming: Are SyncDict, SyncList, SyncSet good names? Would LocalDict, etc. be better? PersistentDict, etc?

Complete slicing support for List

Currently we raise NotImplementedError for:

Setting a slice to a value
Deleting a slice with a step
Deleting a slice out of the middle of a list
Deleting an item from the middle of a list
Inserting an item not at the beginning of a list
Popping from the middle of a list

This issue is to implement these in at least an inefficient way such that they no longer raise an Exception.

JSON support?

Hi - is there an easy way to serialize data using JSON instead of pickle? One major motivation for this is pickle includes classnames which broke some stuff when I moved a file.

If it's not supported, any thoughts on difficulty adding? If I wrote a PR, would the repo owners be willing to merge it and make it part of the official package?

Thanks.

Implement Deque

Add changelog

Maybe https://github.com/semantic-release/ could help? Python implementation: https://github.com/relekang/python-semantic-release

Regroup classes in modules

Put all sets in one module sets, dicts into dicts, etc. Do not forget to adjust tests and documentation.

Missing list.count()

0.3.0 release plan

My plan for the next release:

Since it will involve breaking changes, it will be 0.3.0
Work will be done on the 0.3.0 branch. PRs targeting the 0.3.0 issues will be merged into that branch
Once it's time for a release the 0.3.0 branch will be merged into master

Also:

Bug fixes for 0.2.x will be merged into master and released to PyPI until 0.3.0 is ready.
After 0.3.0 is out bug fixes will go into 0.3.x releases
The next release with breaking changes will be 0.4.0
At some point we may call a release "1.0" - after that we'll use semantic versioning - major version increments for breaking changes, minor version increments for backward-compatible additions, and patch version increments for bug fixes.

Mutable values cache for List

#26 applies to Dict and mutable values, but something similar can happen with List.

... redis_list = List()
... python_list = []
... 
... redis_list.append({'one': 1})
... python_list.append({'one': 1})
... 
... redis_list[0]['one'] = 2
... python_list[0]['one'] = 2
... 
... print(list(redis_list), python_list, sep='\t')
[{'one': 1}]    [{'one': 2}]

This issue is to add the Dict-style writeback and context manager features for List.

Design notes:

The cache should to be a Python dict rather than a Python list so the whole list doesn't need to be in memory
Using a dict makes negative indexing a problem; specifying a negative index should retrieve the correct thing from the cache

Implement NumericDict

hincrby, hincrbyfloat
own versions of _pickle, _unpickle

Make use of HMGET

Expose HMGET.

Use default ReadTheDocs theme

...not a custom one.

Allow custom key

custom key for collection so it can be used with another access to Redis
ID is maybe just redundant layer of complexity

Add dictionary merge and union support for Python 3.9

See what's new in Python 3.9 for details.

Loss of perfomance after X number of items in dict?

I run the following code

con = redis.StrictRedis(host='localhost', port=6379, db=0)

writes = [
    10,
    100,
    1000,
    10000,
    100000,
    1000000,
    100000,
    10000,
    1000,
    100,
    10,
]

def test(c, w):
    t = time.time()
    data = SyncableDict(key='test', redis=c, writeback=True)

    with data as d:
        for i in range(w):
            d[i] = 'test%d' % i
    print 'writes: %7d time:   %8f' % (w, time.time() - t)

for wr in writes:
    test(con, wr)

and get following stats:

writes:      10 time:   1.0080
writes:     100 time:   0.0050
writes:    1000 time:   0.0360
writes:   10000 time:   0.3300
writes:  100000 time:   3.3900
writes: 1000000 time:   34.9170
writes:  100000 time:   2.1360
writes:   10000 time:   14.4460
writes:    1000 time:   14.6300
writes:     100 time:   14.4970
writes:      10 time:   14.5950

if i run the code again against the same key test, i get the following stats:

writes:      10 time:   15.7820
writes:     100 time:   14.7450
writes:    1000 time:   14.7040
writes:   10000 time:   14.7170
writes:  100000 time:   14.7690
writes: 1000000 time:   35.4770
writes:  100000 time:   2.1710
writes:   10000 time:   14.7980
writes:    1000 time:   14.7320
writes:     100 time:   14.8530
writes:      10 time:   14.7780

i get same speeds with .get() after there's ~1m keys in SyncableDict
i guess after a certain amount of key:val in SyncableDict, its perfomance deteriorates?
when i try to write 10m, python's mem usage goes into gigabytes and it never finishes
is there a limit on number of keys or am i doing it wrong?
my python ver is Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)] on win32

Better repr

Something like

>>> repr(d)
"{'a': 1, 'b': 2}"

Calling Redis does not matter as it is used mostly in shell or during debugging.

Implement DefaultDict

Find maintainers

I don't have time (motivation) currently to maintain redis-collections much (not even for PR reviews, which is pretty sad) and I'm looking for a maintainer. It wouldn't be much hassle as there are just minor bugs with existing PRs and the only major feature is Python 3 support and it also has a PR in progress. If anyone feels like taking over this library, it would be much appreciated.

Why the keys are converted to string?

Python dict:

d = {}
d[1] = 'test'
{1: 'test'}

With RedisDict:

d = Dict(redis=r)
d[1] = 'test'
{'1': 'test'}

Is the same issue that disqus/durabledict#4

Inconsistent pickling causes problems in Set and Dict classes (Python 2)

The output of the pickle.dumps can cause problems when checking whether an element is present in a Set or a key is present in a Dict.

# Python 2.7.11
>>> from redis_collections import Set
... s = Set()
... key = (1, u'foo')
... s.add((1, u'foo'))
True
>>> (1, u'foo') in s
True
>>> key in s
False

This has been a problem for Set since forever, and Dict since my changes in #65. As far as I can tell it's not a problem in Python 3.

Python Issue 5518 discusses how the pickle module can produce different output for seemingly identical inputs, and that this is not a bug. The gist is that pickle.dumps is not suitable for hashing.

This blog post illustrates the problem and describes a solution. Unfortunately, it's the same solution I was using in #45, which is (a) not suitable for Set, and (b) not suitable for Python 3.3+ when used with multiple processes.

I'm not sure how to fix this! I don't think the built-in shelve module has this issue, so I'll try to see what it does.

Remove prefixing

It makes all things only more complicated.

Sorting

Implement sorting by SORT.

Employ all good advices from PyPA

Read http://python-packaging-user-guide.readthedocs.io/ and employ all good modern practices written there.

Also: https://caremad.io/2013/07/setup-vs-requirement/

Redis Cluster Support

Opening a ticket for providing Redis cluster support. Will work on this in the meantime.

Defaultdict support?

Similar line of questions to #112 - does this library have any ideas for supporting defaultdict? If I added it would owners be willing to merge a PR and make it official? Thanks.

Don't do Redis operations on objects from different servers

It doesn't make sense to do a Redis operation if the objects involved are stored on different servers.
Specifically, Set._are_set_instances should compare the backends.

Version 0.5.x should document changes in default pickle protocol

When using it with the ElasticCache Redis, the 0.5.0 is no longer working.
I fixed my application with the 0.4.2 to keep all working.
A simple Dict instance didn't returns any value, using square brackets or a .get().
Although when traversing the dictionary using a for, the key is returned and also a len() returns the correct size of the dictionary.

Dependabot couldn't authenticate with https://pypi.python.org/simple/

Dependabot couldn't authenticate with https://pypi.python.org/simple/.

You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.

View the update logs.

Make all operations atomic

Requires reconsidering of the design and larger code refactoring.

High-level interface for GEO commands

I'd like to include a means of accessing the Redis 3.2+ GEO commands with redis-collections. My thought is to introduce a GeoDB class that provides a high-level interface.

I have some test code in the geodb branch, and am interested if any watchers or users have suggestions for what might be useful here.

My current implementation allows for setting and retrieving places:

>>> from redis_collections import GeoDB
>>> geodb = GeoDB()

# Adding items to the DB
>>> geodb.set_location('St. Louis', 38.6270, -90.1994)
>>> geodb.set_location('Bahia', -11.4099, -41.2809)
>>> geodb.set_location('Berlin', 52.5200, 13.4050)
>>> geodb.set_location('Sydney', -33.8562, 151.2153)

# Retrieving an item from the DB
>>> geodb.get_location('St. Louis')
{u'latitude': 38.62699975742192, u'longitude': -90.19939810037613}

It allows for distance computations with different units:

>>> geodb.distance_between('St. Louis', 'Bahia')  # Default unit is km
7528.1327
>>> geodb.distance_between('St. Louis', 'Bahia', unit='m')
7528132.7209

It allows for searches within an area, by place or by location:

>>> geodb.places_within_radius(place='St. Louis', radius=7600)
[{u'distance': 0.0,
  u'latitude': 38.62699975742192,
  u'longitude': -90.19939810037613,
  u'place': 'St. Louis',
  u'unit': u'km'},
 {u'distance': 7501.6718,
  u'latitude': 52.51999907056681,
  u'longitude': 13.405002057552338,
  u'place': 'Berlin',
  u'unit': u'km'},
 {u'distance': 7528.1327,
  u'latitude': -11.409899017576471,
  u'longitude': -41.280899941921234,
  u'place': 'Bahia',
  u'unit': u'km'}]
>>> geodb.places_within_radius(latitude=38.6, longitude=-90.2, radius=200, count=1)
[{u'distance': 3.0035,
  u'latitude': 38.62699975742192,
  u'longitude': -90.19939810037613,
  u'place': 'St. Louis',
  u'unit': u'km'}]

Open questions:

Add __getitem__ and __setitem__ support so this can be used like a Dict?
- I didn't do this with SortedSetCounter because it's not clear whether the ssc[thing] should index by rank or retrieve by key.
Should places_within_radius accept positional args (two means place and radius, three means latitude, longitude, and radius)?
Should places_within_radius return a list of dicts or a list of namedtuples? Or something else?
Should get_location return a dict or a namedtuple?

Add setup.py test

nested dict?

Just to be sure, is this supposed to work:

d = Dict()
d['a'] = {}
d['a']['b'] = 'c'

It seems that it should not, but I've been obtaining conflicting results (still trying to figure out why exactly), and the doc mentions nested Redis Collections, but nothing explicitly about nested regular data structures.