Giter Site home page Giter Site logo

ordered-set's Introduction

Pypi

An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index number that can be looked up.

Installation

ordered_set is available on PyPI and packaged as a wheel. You can list it as a dependency of your project, in whatever form that takes.

To install it into your current Python environment:

pip install ordered-set

To install the code for development, after checking out the repository:

pip install flit
flit install

Usage examples

An OrderedSet is created and used like a set:

>>> from ordered_set import OrderedSet

>>> letters = OrderedSet('abracadabra')

>>> letters
OrderedSet(['a', 'b', 'r', 'c', 'd'])

>>> 'r' in letters
True

It is efficient to find the index of an entry in an OrderedSet, or find an entry by its index. To help with this use case, the .add() method returns the index of the added item, whether it was already in the set or not.

>>> letters.index('r')
2

>>> letters[2]
'r'

>>> letters.add('r')
2

>>> letters.add('x')
5

OrderedSets implement the union (|), intersection (&), and difference (-) operators like sets do.

>>> letters |= OrderedSet('shazam')

>>> letters
OrderedSet(['a', 'b', 'r', 'c', 'd', 'x', 's', 'h', 'z', 'm'])

>>> letters & set('aeiou')
OrderedSet(['a'])

>>> letters -= 'abcd'

>>> letters
OrderedSet(['r', 'x', 's', 'h', 'z', 'm'])

The __getitem__() and index() methods have been extended to accept any iterable except a string, returning a list, to perform NumPy-like "fancy indexing".

>>> letters = OrderedSet('abracadabra')

>>> letters[[0, 2, 3]]
['a', 'r', 'c']

>>> letters.index(['a', 'r', 'c'])
[0, 2, 3]

OrderedSet implements __getstate__ and __setstate__ so it can be pickled, and implements the abstract base classes collections.MutableSet and collections.Sequence.

OrderedSet can be used as a generic collection type, similar to the collections in the typing module like List, Dict, and Set. For example, you can annotate a variable as having the type OrderedSet[str] or OrderedSet[Tuple[int, str]].

OrderedSet in data science applications

An OrderedSet can be used as a bi-directional mapping between a sparse vocabulary and dense index numbers. As of version 3.1, it accepts NumPy arrays of index numbers as well as lists.

This combination of features makes OrderedSet a simple implementation of many of the things that pandas.Index is used for, and many of its operations are faster than the equivalent pandas operations.

For further compatibility with pandas.Index, get_loc (the pandas method for looking up a single index) and get_indexer (the pandas method for fancy indexing in reverse) are both aliases for index (which handles both cases in OrderedSet).

Authors

OrderedSet was implemented by Elia Robyn Lake (maiden name: Robyn Speer). Jon Crall contributed changes and tests to make it fit the Python set API. Roman Inflianskas added the original type annotations.

Comparisons

The original implementation of OrderedSet was a recipe posted to ActiveState Recipes by Raymond Hettiger, released under the MIT license.

Hettiger's implementation kept its content in a doubly-linked list referenced by a dict. As a result, looking up an item by its index was an O(N) operation, while deletion was O(1).

This version makes different trade-offs for the sake of efficient lookups. Its content is a standard Python list instead of a doubly-linked list. This provides O(1) lookups by index at the expense of O(N) deletion, as well as slightly faster iteration.

In Python 3.6 and later, the built-in dict type is inherently ordered. If you ignore the dictionary values, that also gives you a simple ordered set, with fast O(1) insertion, deletion, iteration and membership testing. However, dict does not provide the list-like random access features of OrderedSet. You would have to convert it to a list in O(N) to look up the index of an entry or look up an entry by its index.

ordered-set's People

Contributors

amagee avatar bmwant avatar dahlia avatar eric-arellano avatar erotemic avatar felixonmars avatar ghisvail avatar hvox avatar jdufresne avatar jeltef avatar jlowryduda avatar kairi003 avatar kkirsche avatar leo-labs avatar marcinkonowalczyk avatar moss avatar rominf avatar rspeer avatar toilal avatar xsuchy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ordered-set's Issues

version inconsistent?

setup.py was updated to 2.0, but line 20 in ordered_set.py states __version__ = '1.5.0'

Comparison operators are inconsistent

Because we implement an order-sensitive __eq__ but inherit methods such as __le__ from OrderedSet, we get this inconsistent result:

>>> from ordered_set import OrderedSet
>>> a = OrderedSet(['b', 'a'])
>>> b = OrderedSet(['a', 'b'])
>>> a < b
False
>>> a > b
False
>>> a == b
False

Dropping Python 2 support

Hello. Thank you for this awesome library!

I'm wondering if you have thought about removing Python 2 support?

The main benefit would be being able to move the type hints from the type stub library directly to the source implementation. This would simplify things for type-hint users because type hints would work out of the box, rather than needing to install a separate library and needing to ensure that MyPy points to the stubs.

--

Thanks to setup.py's python_requires, Python 2 users would safely resolve ordered-set==3.1.1 when calling pip2 install ordered-set. Meanwhile, Python 3 users would resolve the newest version, e.g. ordered-set==4.0.0.

That is, this won't negatively impact Python 2 users at all outside of missing possible future changes (although, this library seems very stable already).

--

Given the go-ahead, I'd be happy to put up a PR that inlines the type hints, tweaks setup.py, and runs MyPy in CI.

(Type stubs give some issues to the wrapper around MyPy that I'm using, so I would love to see this change and am eager to help however possible.)

--

Finally, see https://python3statement.org for precedent from most of the Python ecosystem for no longer supporting Python 2 once it goes EOL in 5 days. Includes Pytest, TensorFlow, Apache Spark, and Requests. https://python3statement.org

Tag 3.1.1

Could you push a git tag for the v3.1.1 release currently on PyPI, please?

Thanks!

Overriding add() requires overriding append() as well

When overriding the .add() method in a subclass, I think the desired behavior of obj.append() would be to call the .add() method of the subclass, right? However, the append method is currently "implemented" as append = add in OrderedSet, which does not lead to a call of .add() in the subclass.

Would the following implementation of the .append() method in OrderedSet maybe resolve the issue?

def append(self, key: T) -> int:
    return self.add(key)

a bug in a new version of ordered_set

I run a pylatex program on ubuntu on Ubuntu 16.04 with Python 3.5. I got following error:
Traceback (most recent call last):
File "test1.py", line 2, in
import pylatex as pl
File "/home/dongl/.local/lib/python3.5/site-packages/pylatex/init.py", line 8, in
from .basic import HugeText, NewPage, LineBreak, NewLine, HFill, LargeText,
File "/home/dongl/.local/lib/python3.5/site-packages/pylatex/basic.py", line 9, in
from .base_classes import CommandBase, Environment, ContainerCommand
File "/home/dongl/.local/lib/python3.5/site-packages/pylatex/base_classes/init.py", line 8, in
from .latex_object import LatexObject
File "/home/dongl/.local/lib/python3.5/site-packages/pylatex/base_classes/latex_object.py", line 9, in
from ordered_set import OrderedSet
File "/home/dongl/.local/lib/python3.5/site-packages/ordered_set.py", line 51, in
class OrderedSet(MutableSet[T], Sequence[T]):
File "/home/dongl/.local/lib/python3.5/site-packages/ordered_set.py", line 186, in OrderedSet
def update(self, sequence: SetLike[T]) -> int:
File "/usr/lib/python3.5/typing.py", line 546, in getitem
"Cannot subscript an existing Union. Use Union[u, t] instead.")
TypeError: Cannot subscript an existing Union. Use Union[u, t] instead.

Siunds like a bug in view version of ordered_set, how can I solve this?

Interest in including OrderedSet in python standard library?

Hello, I am interested to know if the maintainers of this package would have interest in including this functionality in the python standard library. It seems natural for OrderedSet to sit next to OrderedDict in the collections package. There is a discussion on the Python Discussion Forums. It is not clear-cut that it should be added but I wanted to gauge the interest of the folks maintaining the current Python ordered set implementations.

https://discuss.python.org/t/add-orderedset-to-stdlib/12730/15

Add py.typed to distribution

ordered-set is great, and even has type annotations. 👏

Unfortunately, mypy is rather strict about PEP 561 and doesn't support use the type annotations in consuming applications. 😿

It would be great to update the source distribution to:

ordered_set/
  __init__.py # current contents of ordered_set.py
  py.typed

So mypy and other type checkers will properly recognize the library.

Adding FrozenOrderedSet

Hey Robyn, would you be willing to consider me putting up a PR to create a FrozenOrderedSet to go along with OrderedSet, similar to the stdlib's set vs. frozenset? (Followup to #35).

My project wants a collection with three properties:

  1. deduplicates elements
  2. consistent iteration order
  3. immutability

There are multiple collections that implement 2 of the 3, but none that implement all 3 in the Python ecosystem.

I know we could inline this or I could create my own library, but I wanted to try contributing to this repo first as I believe it will be helpful to others like @rominf and I don't want to bifurcate the community.

orderedset.update -> should be iterable

Current:

    def update(self, sequence: Union[Sequence[T], Set[T]]) -> int:

Proposed:

    def update(self, sequence: Iterable[T]) -> int:

The reason is that I push in a generator, which works just fine. But the IDE complains because it's neither a Sequence, nor a Set.

Importing ordered_set fails on Python 3.5

Error:

Python 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ordered_set
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Rohan\AppData\Local\Programs\Python\Python35\lib\site-packages\
ordered_set.py", line 55, in <module>
    class OrderedSet(MutableSet[T], Sequence[T]):
  File "C:\Users\Rohan\AppData\Local\Programs\Python\Python35\lib\site-packages\
ordered_set.py", line 84, in OrderedSet
    def __getitem__(self, index: Sequence[int]) -> List[T]:
  File "C:\Users\Rohan\AppData\Local\Programs\Python\Python35\lib\typing.py", li
ne 1181, in overload
    raise RuntimeError("Overloading is only supported in library stubs")
RuntimeError: Overloading is only supported in library stubs

Typing library:

Name: typing
Version: 3.7.4.1
Summary: Type Hints for Python
Home-page: https://docs.python.org/3/library/typing.html

I do not understand some code of the oredered-set

the method "add".
if key not in self.map:
self.map[key] = len(self.items)
self.items.append(key)
It seems like the items (a list) does not make any senses. You already add the index to the key. The better way is that self.map[key]=len(self.map). I cannot see any differences.

Unit tests & travis-ci

Would you want to have unit tests and travis-ci configuration for this project ?

I can open a pull request to perform this, but you will have to open an account on travis-ci and enable the travis-ci service on github.

Support for 3.10

Hello,

I am using orderedset for Nuitka (https://nuitka.net/), however building it from Source for Windows fails in this way:

Pardon the German in here, it basically says "unresolved symbol":

  _orderedset.obj : error LNK2001: Nicht aufgel”stes externes Symbol "_PyGen_Send".
  build\lib.win-amd64-3.10\orderedset\_orderedset.cp310-win_amd64.pyd : fatal error LNK1120: 1 nicht aufgel”ste Externe
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\link.exe' failed with exit code 1120

This seems no more exported. Do you have any plans to support 3.10?

Thanks in advance,
Kay Hayen

Python 3.5 is not actually supported

I'm building ConceptNet 5.7 on Ubuntu 16.04 with Python 3.5. I get following error:

  File "/home/conceptnet/conceptnet5/conceptnet5/db/prepare_data.py", line 6, in <module>
    from ordered_set import OrderedSet
  File "/home/conceptnet/env/lib/python3.5/site-packages/ordered_set.py", line 51, in <module>
    class OrderedSet(MutableSet[T], Sequence[T]):
  File "/home/conceptnet/env/lib/python3.5/site-packages/ordered_set.py", line 186, in OrderedSet
    def update(self, sequence: SetLike[T]) -> int:
  File "/usr/lib/python3.5/typing.py", line 546, in __getitem__
    "Cannot subscript an existing Union. Use Union[u, t] instead.")
TypeError: Cannot subscript an existing Union. Use Union[u, t] instead.

Looks like compatibility with Python 3.5 was lost. I suggest to remove it from a list of supported Python versions.

Equality comparisons with regular set don't commute

Equality with set is not commutative:

>>> s = {'x', 'y'}
>>> os = OrderedSet(['x', 'y'])
>>> s == os
False
>>> os == s
True

It will be difficult to fix because this returns False instead of NotImplemented

>>> s.__eq__(os)
False

I'm not sure the best way forward. You can allow OrderedSet.__eq__ to take the precedence if it is a proper set subclass..

OrderedSet.index(...) function throws error when key is a frozenset

I'm unable to use OrderedSet.index(...) function with frozensets as values. It throws an error when the key is a frozenset.

Code to recreate the error:

os = OrderedSet([frozenset([1, 2])])
i = os.index(frozenset([1, 2]))
Traceback (most recent call last):
  File "<PATH>\src\main.py", line 11, in <module>
    i = os.index(frozenset([1, 2]))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\Python311\Lib\site-packages\ordered_set\__init__.py", line 246, in index
    return [self.index(subkey) for subkey in key]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<PATH>\Python\Python311\Lib\site-packages\ordered_set\__init__.py", line 246, in <listcomp>
    return [self.index(subkey) for subkey in key]
            ^^^^^^^^^^^^^^^^^^
  File "<PATH>\Python\Python311\Lib\site-packages\ordered_set\__init__.py", line 247, in index
    return self.map[key]
           ~~~~~~~~^^^^^
KeyError: 1

Thanks for this amazing utility!

Merge upstream? StableSet: Adding an additional implementation, new features and tests and more.

Hi @rspeer
Thanks for this nice library!

I made a very large refactor, added another implementation, many features, tests and compatibility with other implementations and some more. The new version passes all the tests from the other 3 repositories almost completely unchanged.
https://github.com/idanmiara/ordered-set

Since that's a major addition, I didn't know how/if you would want to merge this work and since I needed to use this for some project already - I released it under to pypi under a new name stableset (with proper credits).
Having said that, I'd be happy to merge this work upstream if you are interested, if so, let me know how you'd like to proceed with this.

Performance

I have benchmarked this implementation, but it seems that it does not have a better performance.

import time

import collections

class OrderedSet(collections.MutableSet):

    def __init__(self, iterable=None):
        self.end = end = [] 
        end += [None, end, end]         # sentinel node for doubly linked list
        self.map = {}                   # key --> [key, prev, next]
        if iterable is not None:
            self |= iterable

    def __len__(self):
        return len(self.map)

    def __contains__(self, key):
        return key in self.map

    def add(self, key):
        if key not in self.map:
            end = self.end
            curr = end[1]
            curr[2] = end[1] = self.map[key] = [key, curr, end]

    def discard(self, key):
        if key in self.map:        
            key, prev, next = self.map.pop(key)
            prev[2] = next
            next[1] = prev

    def __iter__(self):
        end = self.end
        curr = end[2]
        while curr is not end:
            yield curr[0]
            curr = curr[2]

    def __reversed__(self):
        end = self.end
        curr = end[1]
        while curr is not end:
            yield curr[0]
            curr = curr[1]

    def pop(self, last=True):
        if not self:
            raise KeyError('set is empty')
        key = self.end[1][0] if last else self.end[2][0]
        self.discard(key)
        return key

    def __repr__(self):
        if not self:
            return '%s()' % (self.__class__.__name__,)
        return '%s(%r)' % (self.__class__.__name__, list(self))

    def __eq__(self, other):
        if isinstance(other, OrderedSet):
            return len(self) == len(other) and list(self) == list(other)
        return set(self) == set(other)


"""
An OrderedSet is a custom MutableSet that remembers its order, so that every
entry has an index that can be looked up.

Based on a recipe originally posted to ActiveState Recipes by Raymond Hettiger,
and released under the MIT license.

Rob Speer's changes are as follows:

    - changed the content from a doubly-linked list to a regular Python list.
      Seriously, who wants O(1) deletes but O(N) lookups by index?
    - add() returns the index of the added item
    - index() just returns the index of an item
    - added a __getstate__ and __setstate__ so it can be pickled
    - added __getitem__
"""
import collections

SLICE_ALL = slice(None)
__version__ = '2.0.1'


def is_iterable(obj):
    """
    Are we being asked to look up a list of things, instead of a single thing?
    We check for the `__iter__` attribute so that this can cover types that
    don't have to be known by this module, such as NumPy arrays.

    Strings, however, should be considered as atomic values to look up, not
    iterables. The same goes for tuples, since they are immutable and therefore
    valid entries.

    We don't need to check for the Python 2 `unicode` type, because it doesn't
    have an `__iter__` attribute anyway.
    """
    return hasattr(obj, '__iter__') and not isinstance(obj, str) and not isinstance(obj, tuple)


class OrderedSet2(collections.MutableSet):
    """
    An OrderedSet is a custom MutableSet that remembers its order, so that
    every entry has an index that can be looked up.
    """
    def __init__(self, iterable=None):
        self.items = []
        self.map = {}
        if iterable is not None:
            self |= iterable

    def __len__(self):
        return len(self.items)

    def __getitem__(self, index):
        """
        Get the item at a given index.

        If `index` is a slice, you will get back that slice of items. If it's
        the slice [:], exactly the same object is returned. (If you want an
        independent copy of an OrderedSet, use `OrderedSet.copy()`.)

        If `index` is an iterable, you'll get the OrderedSet of items
        corresponding to those indices. This is similar to NumPy's
        "fancy indexing".
        """
        if index == SLICE_ALL:
            return self
        elif hasattr(index, '__index__') or isinstance(index, slice):
            result = self.items[index]
            if isinstance(result, list):
                return OrderedSet(result)
            else:
                return result
        elif is_iterable(index):
            return OrderedSet([self.items[i] for i in index])
        else:
            raise TypeError("Don't know how to index an OrderedSet by %r" %
                    index)

    def copy(self):
        return OrderedSet(self)

    def __getstate__(self):
        if len(self) == 0:
            # The state can't be an empty list.
            # We need to return a truthy value, or else __setstate__ won't be run.
            #
            # This could have been done more gracefully by always putting the state
            # in a tuple, but this way is backwards- and forwards- compatible with
            # previous versions of OrderedSet.
            return (None,)
        else:
            return list(self)

    def __setstate__(self, state):
        if state == (None,):
            self.__init__([])
        else:
            self.__init__(state)

    def __contains__(self, key):
        return key in self.map

    def add(self, key):
        """
        Add `key` as an item to this OrderedSet, then return its index.

        If `key` is already in the OrderedSet, return the index it already
        had.
        """
        if key not in self.map:
            self.map[key] = len(self.items)
            self.items.append(key)
        return self.map[key]
    append = add

    def update(self, sequence):
        """
        Update the set with the given iterable sequence, then return the index
        of the last element inserted.
        """
        item_index = None
        try:
            for item in sequence:
                item_index = self.add(item)
        except TypeError:
            raise ValueError('Argument needs to be an iterable, got %s' % type(sequence))
        return item_index

    def index(self, key):
        """
        Get the index of a given entry, raising an IndexError if it's not
        present.

        `key` can be an iterable of entries that is not a string, in which case
        this returns a list of indices.
        """
        if is_iterable(key):
            return [self.index(subkey) for subkey in key]
        return self.map[key]

    def pop(self):
        """
        Remove and return the last element from the set.

        Raises KeyError if the set is empty.
        """
        if not self.items:
            raise KeyError('Set is empty')

        elem = self.items[-1]
        del self.items[-1]
        del self.map[elem]
        return elem

    def discard(self, key):
        """
        Remove an element.  Do not raise an exception if absent.

        The MutableSet mixin uses this to implement the .remove() method, which
        *does* raise an error when asked to remove a non-existent item.
        """
        if key in self:
            i = self.map[key]
            del self.items[i]
            del self.map[key]
            for k, v in self.map.items():
                if v >= i:
                    self.map[k] = v - 1

    def clear(self):
        """
        Remove all items from this OrderedSet.
        """
        del self.items[:]
        self.map.clear()

    def __iter__(self):
        return iter(self.items)

    def __reversed__(self):
        return reversed(self.items)

    def __repr__(self):
        if not self:
            return '%s()' % (self.__class__.__name__,)
        return '%s(%r)' % (self.__class__.__name__, list(self))

    def __eq__(self, other):
        if isinstance(other, OrderedSet):
            return len(self) == len(other) and self.items == other.items
        try:
            other_as_set = set(other)
        except TypeError:
            # If `other` can't be converted into a set, it's not equal.
            return False
        else:
            return set(self) == other_as_set


def processing(num, set1, item):
  for i in range(num):
    set1.add(item)
    item in set1
    set1.remove(item)
    set1.add(item)
    set1.pop()
    set1.add(item)
    set1.discard(item)

def benchmark(num):
  set = OrderedSet(list(range(200)))
  t1 = time.time()
  processing(num, set, 300)
  t2 = time.time()
  elapsed = t2 - t1
  print "total time: %.4f" % elapsed
  print "sum: %d" % num

  set = OrderedSet2(list(range(200)))
  t1 = time.time()
  processing(num, set, 300)
  t2 = time.time()
  elapsed = t2 - t1
  print "total time: %.4f" % elapsed
  print "sum: %d" % num

if __name__ == '__main__':
  benchmark(10000)

The result is

total time: 0.0568
sum: 10000
total time: 0.3500
sum: 10000

Add support for typing annotations

It would be nice if OrderedSet supported PEP 484 typing module, so that we can do in Python 3:

my_set: OrderedSet[Context] = OrderedSet()

Right now pycharm/pylint flag it as a typing error:

Class 'type' does not define '__getitem__', so the '[]' operator cannot be used on its instances

Related: pylint-dev/pylint#1657

The __getitem__ type annotations are missing single int index overload.

Thanks for this library. It works like a charm!
I found a very minor 'issue'. Getting a single item isn't supported according to the type hints but it does function properly.

image

^ Screenshot of code in PyCharm

This feature works properly (and is also shown in the docstring example). The reason it's not showing as type annotation is because there's no overload for it. Adding the 3 lines below resolves the 'issue'.

@overload
def __getitem__(self, index: int) -> T:
    ...

Location: https://github.com/rspeer/ordered-set/blob/master/ordered_set.py#L90

The actual __getitem__ has the : int definition, but that is ignored for type hinting if @overloads are used.

Support item assignment

Since OrderedSet supports indexing, it's natural to assume that assignments and del would work as well, but they don't:

>>> s = OrderedSet(['foo'])
>>> s[0]
'foo'
>>> s[0] = 'bar'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'OrderedSet' object does not support item assignment
>>> del s[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'OrderedSet' object doesn't support item deletion

There's an easy workaround for del (s.remove(s[0])), but not for item assignments. As far as I can tell, there is no way to reorder items at all, so it's impossible to replace an element with another one (while preserving its index).

OrderedSet is slower than OrderedDict; should use OrderedDict instead

Since it requires Python >= 3.5 anyway, OrderedSet can be reduced to an adapter of collections.OrderedDict. One can easily use a dict as a set by ignoring the dict's values.

A synthetic benchmark I ran shows that, if CPU caching does not come into play (a big if), an OrderedDict is faster (especially in the case of deletion, which is O(n) ):

Wrote profile results to set_test.py.lprof
Timer unit: 1e-06 s

Total time: 6.21423 s
File: set_test.py
Function: bench at line 13

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    13                                           @profile
    14                                           def bench():
    15                                               # Creation
    16         1       5228.0   5228.0      0.1      s = OrderedSet(included)
    17         1      17110.0  17110.0      0.3      d = OrderedDict({i: 1 for i in included})
    18                                           
    19                                               # Insertion
    20      3001       1518.0      0.5      0.0      for i in missing:
    21      3000       5976.0      2.0      0.1          s.add(i)
    22      3001       1518.0      0.5      0.0      for i in missing:
    23      3000       2231.0      0.7      0.0          d[i] = 1
    24                                           
    25                                               # Deletion
    26      3001       2161.0      0.7      0.0      for i in missing:
    27      3000    6159236.0   2053.1     99.1          s.remove(i)
    28      3001       1483.0      0.5      0.0      for i in missing:
    29      3000       2533.0      0.8      0.0          d.pop(i)
    30                                           
    31                                               # Membership test (included)
    32      3001       1492.0      0.5      0.0      for i in included:
    33      3000       2764.0      0.9      0.0          assert i in s
    34      3001       1534.0      0.5      0.0      for i in included:
    35      3000       1786.0      0.6      0.0          assert i in d
    36                                           
    37                                               # Membership test (missing)
    38      3001       1510.0      0.5      0.0      for i in missing:
    39      3000       2836.0      0.9      0.0          assert i not in s
    40      3001       1519.0      0.5      0.0      for i in missing:
    41      3000       1795.0      0.6      0.0          assert i not in d

I am quite convinced there would be performance improvements, if we changed the implementation to use an OrderedDict instead of a set and a list (or a Python 3.6 dict, which is ordered, as @Eric-Arellano suggests).

Edit: However, this would make the access by index (like OrderedSet[1]) much slower (complexity O(n) instead of the current O(1)).

Would you accept a PR doing so?

.index method handles tuple keys as iterables

Apparently .index(key) handles all iterables (except strings) as list of subkeys.
When using tuples as keys, this will break.

Can we either add an extra case for tuples or remove the extra logic for iterables?

Please consider switching to flit_core

I'm the @gentoo maintainer for core Python packages. We're currently considering options for bootstrapping setuptools while unvendoring its dependencies in the PEP 517 world. One of the interesting options is to use flit as it has much fewer dependencies than setuptools and we've got its bootstrap trivially covered.

Using flit as ordered-set's build system would mean that we can cleanly install it before building setuptools. Flit can also generate a backwards-compatible setup.py (or be used alongside setup.py, though for this to work for us pyproject.toml would have to specify flit_core as the build-backend). FWICS it is becoming increasingly popular these days.

Would you be interested in this? If yes, then I can prepare a PR either converting the project to flit entirely, or adding it alongside existing setup.py, as you prefer.

SyntaxError in main line 218

Attempting to use ordered set from main, it fails with a SyntaxError and fails to install.

 ~ 🐚 pip-run git+https://github.com/rspeer/ordered-set -- -c "import ordered_set"
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [21 lines of output]
      Traceback (most recent call last):
        File "/Users/jaraco/.local/pip-run/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
          ~~~~^^
        File "/Users/jaraco/.local/pip-run/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/jaraco/.local/pip-run/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/private/var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/pip-build-env-ys6j54g6/overlay/lib/python3.13/site-packages/flit_core/buildapi.py", line 32, in get_requires_for_build_wheel
          docstring, version = get_docstring_and_version_via_ast(module)
                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
        File "/private/var/folders/f2/2plv6q2n7l932m2x004jlw340000gn/T/pip-build-env-ys6j54g6/overlay/lib/python3.13/site-packages/flit_core/common.py", line 149, in get_docstring_and_version_via_ast
          node = ast.parse(f.read())
        File "/opt/python/lib/python3.13/ast.py", line 54, in parse
          return compile(source, filename, mode, flags,
                         _feature_version=feature_version, optimize=optimize)
        File "<unknown>", line 218
          raise ValueError(f"Argument needs to be an iterable, got {type(sequence}")
                                                                                 ^
      SyntaxError: closing parenthesis '}' does not match opening parenthesis '('
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Pickling then unpickling fails for the empty OrderedSet().

The following fails:

>>> import pickle
>>> from ordered_set import OrderedSet
>>> pickle.loads(pickle.dumps(OrderedSet()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/ordered_set.py", line 126, in __repr__
    if not self:
  File "/Users/mj/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/ordered_set.py", line 49, in __len__
    return len(self.items)
AttributeError: 'OrderedSet' object has no attribute 'items'

because OrderedSet.__getstate__ returns an empty list, causing pickle to assume you don't want __setstate__ to be called when loading again.

This can be fixed by using a __getstate__ that wraps the list in a 1-element tuple:

def __getstate__(self):
    return (list(self),)

def __setstate__(self, state):
    if isinstance(state, tuple):
        state = state[0]   # new pickle format
    self.__init__(state)    

By explicitly testing for a tuple in __setstate__ previously pickled data can still be loaded.

Also see http://stackoverflow.com/questions/24846605/cannot-pickle-empty-ordered-set

Efficiency of difference_update is much worse than set

I use ordered-set quite a bit, and I noticed the difference_update op was taking an extremely long time.

After looking into it I found that the current implementation of difference_update makes M calls to discard where M is the number of items to remove.

        for item in it.chain.from_iterable(sets):
            self.discard(item)

Then I took a look at the discard logic and it actually needs to make an O(N) loop to update indices after each discard.

I wrote a benchmark script to investigate how much of a problem this causes:

"""
Requirements:
    pip install timerit
"""
import timerit
import copy
import ordered_set


def benchmark_difference_update():
    N = 1000
    items = list(range(N))

    o_set = ordered_set.OrderedSet(items)
    u_set = set(items)

    to_remove = items[::2]

    for timer in timerit.Timerit(100, bestof=3, label='(Unordered) difference update (inplace)'):
        _u_set = copy.deepcopy(u_set)
        with timer:
            _u_set.difference_update(to_remove)

    for timer in timerit.Timerit(10, bestof=3, label='(Ordered) difference update (inplace)'):
        _o_set = copy.deepcopy(o_set)
        with timer:
            _o_set.difference_update(to_remove)

    for timer in timerit.Timerit(100, bestof=3, label='(Ordered) difference (not inplace)'):
        with timer:
            _new_o_set = o_set.difference(to_remove)

    def _discard_update_alternative(self, *sets):
        import itertools as it
        keys = set(it.chain(*sets))
        _discard_many(self, keys)

    def _discard_many(self, keys):
        """
        sets = [to_remove]
        self = o_set
        """
        idxs = [self.map[key] for key in keys]
        # Remove indices in reverse order
        for idx in sorted(idxs)[::-1]:
            del self.items[idx]

        # Update mapping requires O(N) traversal
        # I don't see any way around it (doesn't mean there isn't one)
        for idx, key in enumerate(self.items):
            self.map[key] = idx

    for timer in timerit.Timerit(100, bestof=3, label='(Ordered) difference update (inplace alt impl)'):
        _o_set_alt = copy.deepcopy(o_set)
        with timer:
            _discard_update_alternative(_o_set_alt, to_remove)

    assert _o_set_alt == _new_o_set
    assert _o_set_alt == _o_set
    assert _u_set == _o_set

In these tests I test difference_update with (1) the builtin set class, (2) the OrderedSet class, (3) the non-inplace version of OrderedSet.difference and (4) an alternate implementation of OrderedSet.difference_update.

The results are clear:

Timed best=9.976 µs, mean=10.163 ± 0.1 µs for (Unordered) difference update (inplace)
Timed best=39.947 ms, mean=40.133 ± 0.2 ms for (Ordered) difference update (inplace)
Timed best=367.376 µs, mean=378.990 ± 22.6 µs for (Ordered) difference (not inplace)

Timed best=177.687 µs, mean=187.137 ± 8.0 µs for (Ordered) difference update (inplace alt impl)

Unordered set differences are by far the fastest. When N=1000 OrderedSet is 4000x slower than builtins.set.

If one was to use OrderedSet.difference instead of OrderedSet.difference_update, you are only 40x slower, which is much better.

My alternate implementation of difference_update makes use of a discard_many function, which is much more efficient when multiple items are discarded as it saves the O(N) map update until the end. I'm not sure if there is any way around this O(N) update. If there is it would likely require modifying the underlying data structure. However, the O(N) overhead cost is much better than the O(N*M) overhead that was previously incurred. With my alt implementation OrderedSet is only 20x slower than builtins.set. Not perfect, but better.

The intersection_update also has the O(N * M) overhead. It probably makes sense to simply rewrite it in terms of difference_update, so we only have to worry about optimizing code in one place. IE:

    def intersection_update_alt(self, other):
        to_remove = [item for item in self if item not in other]
        self.difference_update(to_remove)

Maintenance plan

Given that this project accepted a SyntaxError in the code that's gone unaddressed for over two years (#90), I'm worried this project may be abandoned. Are there plans to maintain it? How can I help?

OrderedSet.pop() with a non-default index breaks the contract

>>> from ordered_set import OrderedSet
>>> a = OrderedSet(["a", "b", "c"])
>>> a.index("b")
1
>>> a.pop(0)
'a'
>>> a.index("b")
1
>>> a[1]
'c'

This was revealed by looking into #79. pop() would need to rebuild the mapping in place, like discard() does. Its current behavior, which only works for index=-1, would be an optimization.

"native" implementation on python 3.7?

Since in python 3.7 the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec (see What’s New In Python 3.7), dict is already almost an ordered-set, except for that it doesn't have the interface of a set. Likely, wrapping dict such that it's keys are exposed with the interface of ordered-set would be more efficient than the python implementation.

Inefficient code in discard?

I think perhaps I'm missing something, but it seems to me the following replaces an O(n) operation with an O(1) operation, leaving functionality identical.

diff --git a/ordered_set.py b/ordered_set.py
index bf88cd8..118672e 100644
--- a/ordered_set.py
+++ b/ordered_set.py
@@ -160,7 +160,7 @@ class OrderedSet(collections.MutableSet):
         *does* raise an error when asked to remove a non-existent item.
         """
         if key in self:
-            i = self.items.index(key)
+            i = self.map[key]
             del self.items[i]
             del self.map[key]
             for k, v in self.map.items():

I initially thought perhaps the index was looked up via the list because the indices in the map weren't necessarily up to date, but the trailing loop updates them when deleting an element. Is there some other reason the list needs to be used for the lookup here? Thanks for your time.

new version on pypi?

In the 1.4 version of OrderedSet, pop is implemented (via MutableSet), but in such a way that it returns the first item instead of the last. It's kind of annoying, and it's already fixed in github, so it would be nice if there were a release. Thanks.

Adapt with normal set?

Why not wrap up the OrderedSet and some set operation(such as issubset injection) in a new class,
The new class is an ordered set and can interact with normal set

Changelog?

Is there any chance you could keep a changelog? I happily use this library in my project, but always like to check changes before upgrading dependencies. Thank you.

Upload wheels to PyPI

Wheels (.whl) for this package are currently missing from PyPI. Could wheels be uploaded for the current and future releases?

Read more about the advantages of wheels to understand why generating wheel distributions are important.

To create a wheel along with source distribution:

(venv) $ pip install --upgrade pip setuptools wheel
(venv) $ python setup.py sdist bdist_wheel

# See dist/*.whl

To upload wheels:

(venv) $ pip install twine
(venv) $ twine upload dist/*

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.