Giter Site home page Giter Site logo

redis-collections's Introduction

Redis Collections

redis-collections is a Python library that provides a high-level interface to Redis, the excellent key-value store.

As of 2024, this project is retired. This repository will remain available as a public archive.

Quickstart

Import the collections from the top-level redis_collections package.

Standard collections

The standard collections (e.g. Dict, List, Set) behave like their Python counterparts:

>>> from redis_collections import Dict, List, Set

>>> D = Dict()
>>> D['answer'] = 42
>>> D['answer']
42
Collection Redis type Description
Dict Hash Emulates Python's dict
List List Emulates Python's list
Set Set Emulates Python's set
Counter Hash Emulates Python's collections.Counter
DefaultDict Hash Emulates Python's collections.defaultdict
Deque List Emulates Python's collections.deque

Syncable collections

The syncable collections in this package provide types whose contents are kept in memory. When their sync method is called those contents are written to Redis:

>>> from redis_collections import SyncableDict

>>> with SyncableDict() as D:
...     D['a'] = 1  # No write to Redis
...     D['a'] += 1  # No read from or write to Redis
>>> D['a']  # D.sync() is called at the end of the with block
2
Collection Python type Description
SyncableDict dict Syncs to a Redis Hash
SyncableList list Syncs to a Redis List
SyncableSet set Syncs to a Redis Set
SyncableCounter collections.Counter Syncs to a Redis Hash
SyncableDeque collections.deque Syncs to a Redis List
SyncableDefaultDict collections.defaultdict Syncs to a Redis Hash

Other collections

The LRUDict collection stores recently used items in in memory. It pushes older items to Redis:

>>> from redis_collections import LRUDict

>>> D = LRUDict(maxsize=2)
>>> D['a'] = 1
>>> D['b'] = 2
>>> D['c'] = 2  # 'a' is pushed to Redis and 'c' is stored locally
>>> D['a']  # 'b' is pushed to Redis and 'a' is retrieved for local storage
1
>>> D.sync()  # All items are copied to Redis

The SortedSetCounter provides access to the Redis Sorted Set type:

>>> from redis_collections import SortedSetCounter

>>> ssc = SortedSetCounter([('earth', 300), ('mercury', 100)])
>>> ssc.set_score('venus', 200)
>>> ssc.get_score('venus')
200.0
>>> ssc.items()
[('mercury', 100.0), ('venus', 200.0), ('earth', 300.0)]

Documentation

For more information, see redis-collections.readthedocs.io

License: ISC

© 2016-2024 Bo Bayles <[email protected]> and contributors © 2013-2016 Honza Javorek <[email protected]> and contributors

This work is licensed under ISC license.

This library is not affiliated with Redis Labs, Redis, or redis-py. Govern yourself accordingly!

redis-collections's People

Contributors

bbayles avatar doc-hex avatar happyholic1203 avatar honzajavorek avatar icecrime avatar lionelnicolas avatar mjschultz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

redis-collections's Issues

I'm working on the same idea...

Hello, folks!

I just now found Redis Collections, which is crazy, because I'm working on the same idea. I call my package Pottery. I see that you support Python 2.6+, while I'm focused on Python 3.4+. You had this idea first, but I came up with it independently. Is there any way for us to collaborate?

Thanks, and keep up the great work!
Raj

Return normal Python objects for methods that make copies

This might be controversial, but I'd like to change the behavior of the collections such that methods that return "new" collections create normal Python objects rather than new Redis collections. For example, slicing a list creates a new Redis-backed List, but I'd like it to create a standard list.

My reasoning is that you can't set the redis, key, or writeback from these methods, and in my experience you more often want to iterate over a slice than store it permanently.

For methods like copy where it's possible to pass in kwargs I would keep the current behavior, but for things that use the slicing or arithmetic operators (not the in-place ones) I'd return the Python version.

I'd rather not make this a per-instance setting, but I could compromise on that.

Redis version support policy

This issue is to clarify what Redis features are supported by the library.

The goals are:

  • Add a note to documentation about which Redis versions are supported
  • Add workarounds for behaviors that don't work on old-but-still-supported Redis versions

My inclination is that the oldest supported version of Redis will be the one provided by the oldest supported Ubuntu LTS release.

As of this writing, that release is 12.04, and it provides Redis 2.2.

In 2017, 14.04 will be the oldest supported Ubuntu LTS release, and it provides Redis 2.8.


Of the commands currently used by the library, these are the ones that have additional features since 2.0.0 (when the HASH data type was introduced):

| Command     | Available since | Changes                  |
|-------------|-----------------|--------------------------|
| hdel        | 2.0.0           | Multi-delete since 2.4.0 |
| lpush       | 1.0.0           | Multi-push since 2.4.0   |
| rpush       | 1.0.0           | Multi-push since 2.4.0   |
| sadd        | 1.0.0           | Multi-add since 2.4.0    |
| srandmember | 1.0.0           | Count since 2.6          |
| srem        | 1.0.0           | Multi-remove since 2.4.0 |

The library only relies on > 2.0.0 behavior for Set.random_sample. It should be possible to backport that to older versions in Python.

Suggest testing against standard objects

The tests could be generalized and improved if they tested themselves against the standard (non-redis) implementations of these objects.

For example, the List class should operate the same as build-in list class, at least for all operations it supports. So if the same members are called with the same arguments, the results should be the same.

This would have exposed the bugs in List.insert implementation, fixed in recent pull.

Implement and test equality consistently

I think we want Lists to compare equal to lists, Dicts to compare equal to dicts, etc., provided their elements are the same.

This issue is to make sure that's the case and add test cases for it.

synchronization among multiple instances

Not a bug, but a feature request.

Assume two instances (A and B) running

A:

L = List(key=, redis=)

B:
L = List(key=, redis=)

If A is updated and sync'ed to redis, the B instance will not update, and it can be updated with reinitialization

L= List(key=, redis=)

Is there an easier way to achieve this? e.g. with a function,

L.sync_from_db()

def sync_from_db(self):
return self.init(key=self.key, redis=self.redis)

I feel that the functional form is easier to read/understand and more concise syntax wise. Thanks

version missmatch with redis

version 0.7.1 has the following requirement on redis:

redis<3.4.0,>=3.1.0

Since redis is currently at 3.4.1 this breaks builds.

List.__getitem__ creating random keys

When we try to access List.getitem, it does return the correct result, but generates a random key in the database. I identified the root cause to
redis_collections/base.py:136
The section says

settings = {
                'key': key,
                'redis': self.redis,
                'pickler': self.pickler,
            }

This is called by lists.py/slice_trans() at line 122. The calling function does not pass the key which should have been self.key

Support SCAN for iterating over collection elements

Some collection operations, such as iterating over a List require retrieving everything stored in Redis to memory.

redis-py provides Pythonic support for the Redis SCAN commands, which allow for incremental iteration.


The Redis docs warn:

A given element may be returned multiple times. It is up to the application to handle the case of duplicated elements, for example only using the returned elements in order to perform operations that are safe when re-applied multiple times.

If that limitation is difficult to work around I may postpone this feature.


The scan features were added in Redis 2.8, so per #74, this will not replace the current __iter__ methods until 2019 at the earliest.

1.0 and 1 shouldn't both be allowed in a Set

I fixed this for the Dict classes, but it still affects the Set class as well.

1.0 and 1 have the same hash, are equal to each other, but pickle to different values. This means when they're stored in Redis they're separate.

>>> for name, init in (('redis', Set), ('python', set)):
...     s = init()
...     s.add(1.0)
...     s.add(1)
... 
...     t = init()
...     t.add(1)
...     t.add(1.0)
...     
...     print(name, list(s), list(t), sep='\t')
redis   [1.0, 1]    [1.0, 1]
python  [1.0]   [1]

On Python 2 the same problem exists with u'a' and b'a' in the same Set.

The trick that worked with Dict won't work for Set, so this issue is for figuring out a workaround. One idea is to let Redis do its thing, but to normalize on the way back to Python. However, this makes pop problematic. Another would be to special case pickling of floats and unicode strings?

If anyone knows any other examples of things that have equal hashes but pickle to different things let me know!

Manually-synchronized collections

The collections currently available always write to Redis when a change is made (even when using a local cache for Dict and List instances). This means that an application that quits unexpectedly can retrieve its collections afterward (if it knows their key), but it also means that writes are much slower than they would be for a local collection.

This issue is to explore and (probably) implement versions of the current collections that normally use in-memory Python objects, but can sync their contents to Redis on demand.

Possibly something like this recipe, but potentially something simpler.


API: Something like this, I think?

from redis_collections import SyncDict

# Load contents from 'some_key' into memory
with SyncDict(key='some_key') as D:
    # Retrieve something from an earlier session from memory
    previous_value = D.get('previous_key')
    # Store something in memory, not to Redis
    D['new_key'] = 'new_value'

# Changes are automatically written to Redis

Naming: Are SyncDict, SyncList, SyncSet good names? Would LocalDict, etc. be better? PersistentDict, etc?

Complete slicing support for List

Currently we raise NotImplementedError for:

  • Setting a slice to a value
  • Deleting a slice with a step
  • Deleting a slice out of the middle of a list
  • Deleting an item from the middle of a list
  • Inserting an item not at the beginning of a list
  • Popping from the middle of a list

This issue is to implement these in at least an inefficient way such that they no longer raise an Exception.

JSON support?

Hi - is there an easy way to serialize data using JSON instead of pickle? One major motivation for this is pickle includes classnames which broke some stuff when I moved a file.

If it's not supported, any thoughts on difficulty adding? If I wrote a PR, would the repo owners be willing to merge it and make it part of the official package?

Thanks.

Regroup classes in modules

Put all sets in one module sets, dicts into dicts, etc. Do not forget to adjust tests and documentation.

0.3.0 release plan

My plan for the next release:

  • Since it will involve breaking changes, it will be 0.3.0
  • Work will be done on the 0.3.0 branch. PRs targeting the 0.3.0 issues will be merged into that branch
  • Once it's time for a release the 0.3.0 branch will be merged into master

Also:

  • Bug fixes for 0.2.x will be merged into master and released to PyPI until 0.3.0 is ready.
  • After 0.3.0 is out bug fixes will go into 0.3.x releases
  • The next release with breaking changes will be 0.4.0
  • At some point we may call a release "1.0" - after that we'll use semantic versioning - major version increments for breaking changes, minor version increments for backward-compatible additions, and patch version increments for bug fixes.

Mutable values cache for List

#26 applies to Dict and mutable values, but something similar can happen with List.

... redis_list = List()
... python_list = []
... 
... redis_list.append({'one': 1})
... python_list.append({'one': 1})
... 
... redis_list[0]['one'] = 2
... python_list[0]['one'] = 2
... 
... print(list(redis_list), python_list, sep='\t')
[{'one': 1}]    [{'one': 2}]

This issue is to add the Dict-style writeback and context manager features for List.

Design notes:

  • The cache should to be a Python dict rather than a Python list so the whole list doesn't need to be in memory
  • Using a dict makes negative indexing a problem; specifying a negative index should retrieve the correct thing from the cache

Allow custom key

  • custom key for collection so it can be used with another access to Redis
  • ID is maybe just redundant layer of complexity

Loss of perfomance after X number of items in dict?

I run the following code

con = redis.StrictRedis(host='localhost', port=6379, db=0)

writes = [
    10,
    100,
    1000,
    10000,
    100000,
    1000000,
    100000,
    10000,
    1000,
    100,
    10,
]

def test(c, w):
    t = time.time()
    data = SyncableDict(key='test', redis=c, writeback=True)

    with data as d:
        for i in range(w):
            d[i] = 'test%d' % i
    print 'writes: %7d time:   %8f' % (w, time.time() - t)

for wr in writes:
    test(con, wr)

and get following stats:

writes:      10 time:   1.0080
writes:     100 time:   0.0050
writes:    1000 time:   0.0360
writes:   10000 time:   0.3300
writes:  100000 time:   3.3900
writes: 1000000 time:   34.9170
writes:  100000 time:   2.1360
writes:   10000 time:   14.4460
writes:    1000 time:   14.6300
writes:     100 time:   14.4970
writes:      10 time:   14.5950

if i run the code again against the same key test, i get the following stats:

writes:      10 time:   15.7820
writes:     100 time:   14.7450
writes:    1000 time:   14.7040
writes:   10000 time:   14.7170
writes:  100000 time:   14.7690
writes: 1000000 time:   35.4770
writes:  100000 time:   2.1710
writes:   10000 time:   14.7980
writes:    1000 time:   14.7320
writes:     100 time:   14.8530
writes:      10 time:   14.7780

i get same speeds with .get() after there's ~1m keys in SyncableDict
i guess after a certain amount of key:val in SyncableDict, its perfomance deteriorates?
when i try to write 10m, python's mem usage goes into gigabytes and it never finishes
is there a limit on number of keys or am i doing it wrong?
my python ver is Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)] on win32

Better repr

Something like

>>> repr(d)
"{'a': 1, 'b': 2}"

Calling Redis does not matter as it is used mostly in shell or during debugging.

Find maintainers

I don't have time (motivation) currently to maintain redis-collections much (not even for PR reviews, which is pretty sad) and I'm looking for a maintainer. It wouldn't be much hassle as there are just minor bugs with existing PRs and the only major feature is Python 3 support and it also has a PR in progress. If anyone feels like taking over this library, it would be much appreciated.

Inconsistent pickling causes problems in Set and Dict classes (Python 2)

The output of the pickle.dumps can cause problems when checking whether an element is present in a Set or a key is present in a Dict.

# Python 2.7.11
>>> from redis_collections import Set
... s = Set()
... key = (1, u'foo')
... s.add((1, u'foo'))
True
>>> (1, u'foo') in s
True
>>> key in s
False

This has been a problem for Set since forever, and Dict since my changes in #65. As far as I can tell it's not a problem in Python 3.


Python Issue 5518 discusses how the pickle module can produce different output for seemingly identical inputs, and that this is not a bug. The gist is that pickle.dumps is not suitable for hashing.

This blog post illustrates the problem and describes a solution. Unfortunately, it's the same solution I was using in #45, which is (a) not suitable for Set, and (b) not suitable for Python 3.3+ when used with multiple processes.

I'm not sure how to fix this! I don't think the built-in shelve module has this issue, so I'll try to see what it does.

Redis Cluster Support

Opening a ticket for providing Redis cluster support. Will work on this in the meantime.

Defaultdict support?

Similar line of questions to #112 - does this library have any ideas for supporting defaultdict? If I added it would owners be willing to merge a PR and make it official? Thanks.

Version 0.5.x should document changes in default pickle protocol

When using it with the ElasticCache Redis, the 0.5.0 is no longer working.
I fixed my application with the 0.4.2 to keep all working.
A simple Dict instance didn't returns any value, using square brackets or a .get().
Although when traversing the dictionary using a for, the key is returned and also a len() returns the correct size of the dictionary.

High-level interface for GEO commands

I'd like to include a means of accessing the Redis 3.2+ GEO commands with redis-collections. My thought is to introduce a GeoDB class that provides a high-level interface.

I have some test code in the geodb branch, and am interested if any watchers or users have suggestions for what might be useful here.


My current implementation allows for setting and retrieving places:

>>> from redis_collections import GeoDB
>>> geodb = GeoDB()

# Adding items to the DB
>>> geodb.set_location('St. Louis', 38.6270, -90.1994)
>>> geodb.set_location('Bahia', -11.4099, -41.2809)
>>> geodb.set_location('Berlin', 52.5200, 13.4050)
>>> geodb.set_location('Sydney', -33.8562, 151.2153)

# Retrieving an item from the DB
>>> geodb.get_location('St. Louis')
{u'latitude': 38.62699975742192, u'longitude': -90.19939810037613}

It allows for distance computations with different units:

>>> geodb.distance_between('St. Louis', 'Bahia')  # Default unit is km
7528.1327
>>> geodb.distance_between('St. Louis', 'Bahia', unit='m')
7528132.7209

It allows for searches within an area, by place or by location:

>>> geodb.places_within_radius(place='St. Louis', radius=7600)
[{u'distance': 0.0,
  u'latitude': 38.62699975742192,
  u'longitude': -90.19939810037613,
  u'place': 'St. Louis',
  u'unit': u'km'},
 {u'distance': 7501.6718,
  u'latitude': 52.51999907056681,
  u'longitude': 13.405002057552338,
  u'place': 'Berlin',
  u'unit': u'km'},
 {u'distance': 7528.1327,
  u'latitude': -11.409899017576471,
  u'longitude': -41.280899941921234,
  u'place': 'Bahia',
  u'unit': u'km'}]
>>> geodb.places_within_radius(latitude=38.6, longitude=-90.2, radius=200, count=1)
[{u'distance': 3.0035,
  u'latitude': 38.62699975742192,
  u'longitude': -90.19939810037613,
  u'place': 'St. Louis',
  u'unit': u'km'}]

Open questions:

  • Add __getitem__ and __setitem__ support so this can be used like a Dict?
    • I didn't do this with SortedSetCounter because it's not clear whether the ssc[thing] should index by rank or retrieve by key.
  • Should places_within_radius accept positional args (two means place and radius, three means latitude, longitude, and radius)?
  • Should places_within_radius return a list of dicts or a list of namedtuples? Or something else?
  • Should get_location return a dict or a namedtuple?

nested dict?

Just to be sure, is this supposed to work:

d = Dict()
d['a'] = {}
d['a']['b'] = 'c'

It seems that it should not, but I've been obtaining conflicting results (still trying to figure out why exactly), and the doc mentions nested Redis Collections, but nothing explicitly about nested regular data structures.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.