Giter Site home page Giter Site logo

ultradict's Introduction

UltraDict

Sychronized, streaming Python dictionary that uses shared memory as a backend

Warning: This is an early hack. There are only few unit tests and so on. Maybe not stable!

Features:

  • Fast (compared to other sharing solutions)
  • No running manager processes
  • Works in spawn and fork context
  • Safe locking between independent processes
  • Tested with Python >= v3.8 on Linux, Windows and Mac
  • Convenient, no setter or getters necessary
  • Optional recursion for nested dicts

PyPI Package Run Python Tests Python >=3.8 License

General Concept

UltraDict uses multiprocessing.shared_memory to synchronize a dict between multiple processes.

It does so by using a stream of updates in a shared memory buffer. This is efficient because only changes have to be serialized and transferred.

If the buffer is full, UltraDict will automatically do a full dump to a new shared memory space, reset the streaming buffer and continue to stream further updates. All users of the UltraDict will automatically load full dumps and continue using streaming updates afterwards.

Issues

On Windows, if no process has any handles on the shared memory, the OS will gc all of the shared memory making it inaccessible for future processes. To work around this issue you can currently set full_dump_size which will cause the creator of the dict to set a static full dump memory of the requested size. This full dump memory will live as long as the creator lives. This approach has the downside that you need to plan ahead for your data size and if it does not fit into the full dump memory, it will break.

Alternatives

There are many alternatives:

How to use?

Simple

In one Python REPL:

Python 3.9.2 on linux
>>>
>>> from UltraDict import UltraDict
>>> ultra = UltraDict({ 1:1 }, some_key='some_value')
>>> ultra
{1: 1, 'some_key': 'some_value'}
>>>
>>> # We need the shared memory name in the other process.
>>> ultra.name
'psm_ad73da69'

In another Python REPL:

Python 3.9.2 on linux
>>>
>>> from UltraDict import UltraDict
>>> # Connect to the shared memory with the name above
>>> other = UltraDict(name='psm_ad73da69')
>>> other
{1: 1, 'some_key': 'some_value'}
>>> other[2] = 2

Back in the first Python REPL:

>>> ultra[2]
2

Nested

In one Python REPL:

Python 3.9.2 on linux
>>>
>>> from UltraDict import UltraDict
>>> ultra = UltraDict(recurse=True)
>>> ultra['nested'] = { 'counter': 0 }
>>> type(ultra['nested'])
<class 'UltraDict.UltraDict'>
>>> ultra.name
'psm_0a2713e4'

In another Python REPL:

Python 3.9.2 on linux
>>>
>>> from UltraDict import UltraDict
>>> other = UltraDict(name='psm_0a2713e4')
>>> other['nested']['counter'] += 1

Back in the first Python REPL:

>>> ultra['nested']['counter']
1

Performance comparison

Lets compare a classical Python dict, UltraDict, multiprocessing.Manager and Redis.

Note that this comparison is not a real life workload. It was executed on Debian Linux 11 with Redis installed from the Debian package and with the default configuration of Redis.

Python 3.9.2 on linux
>>>
>>> from UltraDict import UltraDict
>>> ultra = UltraDict()
>>> for i in range(10_000): ultra[i] = i
...
>>> len(ultra)
10000
>>> ultra[500]
500
>>> # Now let's do some performance testing
>>> import multiprocessing, redis, timeit
>>> orig = dict(ultra)
>>> len(orig)
10000
>>> orig[500]
500
>>> managed = multiprocessing.Manager().dict(orig)
>>> len(managed)
10000
>>> r = redis.Redis()
>>> r.flushall()
>>> r.mset(orig)

Read performance

>>> timeit.timeit('orig[1]', globals=globals()) # original
0.03832335816696286
>>> timeit.timeit('ultra[1]', globals=globals()) # UltraDict
0.5248982920311391
>>> timeit.timeit('managed[1]', globals=globals()) # Manager
40.85506196087226
>>> timeit.timeit('r.get(1)', globals=globals()) # Redis
49.3497632863
>>> timeit.timeit('ultra.data[1]', globals=globals()) # UltraDict data cache
0.04309639008715749

We are factor 15 slower than a real, local dict, but way faster than using a Manager. If you need full read performance, you can access the underlying cache ultra.data directly and get almost original dict performance, of course at the cost of not having real-time updates anymore.

Write performance

>>> min(timeit.repeat('orig[1] = 1', globals=globals())) # original
0.028232071083039045
>>> min(timeit.repeat('ultra[1] = 1', globals=globals())) # UltraDict
2.911152713932097
>>> min(timeit.repeat('managed[1] = 1', globals=globals())) # Manager
31.641707635018975
>>> min(timeit.repeat('r.set(1, 1)', globals=globals())) # Redis
124.3432381930761

We are factor 100 slower than a real, local Python dict, but still factor 10 faster than using a Manager and much fast than Redis.

Testing performance

There is an automated performance test in tests/performance/performance.py. If you run it, you get something like this:

python ./tests/performance/performance.py

Testing Performance with 1000000 operations each

Redis (writes) = 24,351 ops per second
Redis (reads) = 30,466 ops per second
Python MPM dict (writes) = 19,371 ops per second
Python MPM dict (reads) = 22,290 ops per second
Python dict (writes) = 16,413,569 ops per second
Python dict (reads) = 16,479,191 ops per second
UltraDict (writes) = 479,860 ops per second
UltraDict (reads) = 2,337,944 ops per second
UltraDict (shared_lock=True) (writes) = 41,176 ops per second
UltraDict (shared_lock=True) (reads) = 1,518,652 ops per second

Ranking:
  writes:
    Python dict = 16,413,569 (factor 1.0)
    UltraDict = 479,860 (factor 34.2)
    UltraDict (shared_lock=True) = 41,176 (factor 398.62)
    Redis = 24,351 (factor 674.04)
    Python MPM dict = 19,371 (factor 847.33)
  reads:
    Python dict = 16,479,191 (factor 1.0)
    UltraDict = 2,337,944 (factor 7.05)
    UltraDict (shared_lock=True) = 1,518,652 (factor 10.85)
    Redis = 30,466 (factor 540.9)
    Python MPM dict = 22,290 (factor 739.31)

I am interested in extending the performance testing to other solutions (like sqlite, memcached, etc.) and to more complex use cases with multiple processes working in parallel.

Parameters

Ultradict(*arg, name=None, create=None, buffer_size=10000, serializer=pickle, shared_lock=False, full_dump_size=None, auto_unlink=None, recurse=False, recurse_register=None, **kwargs)

name: Name of the shared memory. A random name will be chosen if not set. By default, if a name is given a new shared memory space is created if it does not exist yet. Otherwise the existing shared memory space is attached.

create: Can be either True or False or None. If set to True, a new UltraDict will be created and an exception is thrown if one exists already with the given name. If kept at the default value None, either a new UltraDict will be created if the name is not taken or an existing UltraDict will be attached.

Setting create=True does ensure not accidentally attaching to an existing UltraDict that might be left over.

buffer_size: Size of the shared memory buffer used for streaming changes of the dict. The buffer size limits the biggest change that can be streamed, so when you use large values or deeply nested dicts you might need a bigger buffer. Otherwise, if the buffer is too small, it will fall back to a full dump. Creating full dumps can be slow, depending on the size of your dict.

Whenever the buffer is full, a full dump will be created. A new shared memory is allocated just big enough for the full dump. Afterwards the streaming buffer is reset. All other users of the dict will automatically load the full dump and continue streaming updates.

(Also see the section Memory management below!)

serializer: Use a different serialized from the default pickle, e. g. marshal, dill, jsons. The module or object provided must support the methods loads() and dumps()

shared_lock: When writing to the same dict at the same time from multiple, independent processes, they need a shared lock to synchronize and not overwrite each other's changes. Shared locks are slow. They rely on the atomics package for atomic locks. By default, UltraDict will use a multiprocessing.RLock() instead which works well in fork context and is much faster.

(Also see the section Locking below!)

full_dump_size: If set, uses a static full dump memory instead of dynamically creating it. This might be necessary on Windows depending on your write behaviour. On Windows, the full dump memory goes away if the process goes away that had created the full dump. Thus you must plan ahead which processes might be writing to the dict and therefore creating full dumps.

auto_unlink: If True, the creator of the shared memory will automatically unlink the handle at exit so it is not visible or accessible to new processes. All existing, still connected processes can continue to use the dict.

recurse: If True, any nested dict objects will be automaticall wrapped in an UltraDict allowing transparent nested updates.

recurse_register: Has to be either the name of an UltraDict or an UltraDict instance itself. Will be used internally to keep track of dynamically created, recursive UltraDicts for proper cleanup when using recurse=True. Usually does not have to be set by the user.

Memory management

UltraDict uses shared memory buffers and those usually live is RAM. UltraDict does not use any management processes to keep track of buffers. Also it cannot know when to free those shared memory buffers again because you might want the buffers to outlive the process that has created them.

By convention you should set the parameter auto_unlink to True for exactly one of the processes that is using the UltraDict. The first process that is creating a certain UltraDict will automatically get the flag auto_unlink=True unless you explicitly set it to False. When this process with the auto_unlink=True flag ends, it will try to unlink (free) all shared memory buffers.

A special case is the recursive mode using recurse=True parameter. This mode will use an additional internal UltraDict to keep track of recursively nested UltraDict instances. All child UltraDicts will write to this register the names of the shared memory buffers they are creating. This allows the buffers to outlive the processes and still being correctly cleanup up by at the end of the program.

Buffer sizes and read performance:

There are 3 cases that can occur when you read from an `UltraDict:

  1. No new updates: This is the fastes cases. UltraDict was optimized for this case to find out as quickly as possible if there are no updates on the stream and then just return the desired data. If you want even better read perforamance you can directly access the underlying data attribute of your UltraDict, though at the cost of not getting real time updates anymore.

  2. Streaming update: This is usually fast, depending on the size and amount of that data that was changed but not depending on the size of the whole UltraDict. Only the data that was actually changed has to be unserialized.

  3. Full dump load: This can be slow, depending on the total size of your data. If your UltraDict is big it might take long to unserialize it.

Given the above 3 cases, you need to balance the size of your data and your write patterns with the streaming buffer_size of your UltraDict. If the streaming buffer is full, a full dump has to be created. Thus, if your full dumps are expensive due to their size, try to find a good buffer_size to avoid creating too many full dumps.

On the other hand, if for example you only change back and forth the value of one single key in your UltraDict, it might be useless to process a stream of all these back and forth changes. It might be much more efficient to simply do one full dump which might be very small because it only contains one key.

Locking

Every UltraDict instance has a lock attribute which is either a multiprocessing.RLock or an UltraDict.SharedLock if you set shared_lock=True when creating the UltraDict.

RLock is the fastest locking method that is used by default but you can only use it if you fork your child processes. Forking is the default on Linux systems.

In contrast, on Windows systems, forking is not available and Python will automatically use the spawn method when creating child processes. You should then use the parameter shared_lock=True when using UltraDict. This requires that the external atomics package is installed.

How to use the locking?

ultra = UltraDict(shared_lock=True)

with ultra.lock:
	ultra['counter']++

# The same as above with all default parameters
with ultra.lock(timeout=None, block=True, steal=False, sleep_time=0.000001):
	ultra['counter']++

# Busy wait, will result in 99 % CPU usage, fastest option
# Ideally number of processes using the UltraDict should be < number of CPUs
with ultra.lock(sleep_time=0):
	ultra['counter']++

try:
	result = ultra.lock.acquire(block=False)
	ultra.lock.release()
except UltraDict.Exceptions.CannotAcquireLock as e:
	print(f'Process with PID {e.blocking_pid} is holding the lock')

try:
	with ultra.lock(timeout=1.5):
		ultra['counter']++
except UltraDict.Exceptions.CannotAcquireLockTimeout:
	print('Stale lock?')

with ultra.lock(timeout=1.5, steal_after_timeout=True):
	ultra['counter']++

Explicit cleanup

Sometimes, when your program crashes, no cleanup happens and you might have a corrupted shared memeory buffer that only goes away if you manually delete it.

On Linux/Unix systems, those buffers usually live in a memory based filesystem in the folder /dev/shm. You can simply delete the files there.

Another way to do this in code is like this:

# Unlink both shared memory buffers possibly used by UltraDict
name = 'my-dict-name'
UltraDict.unlink_by_name(name, ignore_errors=True)
UltraDict.unlink_by_name(f'{name}_memory', ignore_errors=True)

Advanced usage

See examples folder

>>> ultra = UltraDict({ 'init': 'some initial data' }, name='my-name', buffer_size=100_000)
>>> # Let's use a value with 100k bytes length.
>>> # This will not fit into our 100k bytes buffer due to the serialization overhead.
>>> ultra[0] = ' ' * 100_000
>>> ultra.print_status()
{'buffer': SharedMemory('my-name_memory', size=100000),
 'buffer_size': 100000,
 'control': SharedMemory('my-name', size=1000),
 'full_dump_counter': 1,
 'full_dump_counter_remote': 1,
 'full_dump_memory': SharedMemory('psm_765691cd', size=100057),
 'full_dump_memory_name_remote': 'psm_765691cd',
 'full_dump_size': None,
 'full_dump_static_size_remote': <memory at 0x7fcbf5ca6580>,
 'lock': <RLock(None, 0)>,
 'lock_pid_remote': 0,
 'lock_remote': 0,
 'name': 'my-name',
 'recurse': False,
 'recurse_remote': <memory at 0x7fcbf5ca6700>,
 'serializer': <module 'pickle' from '/usr/lib/python3.9/pickle.py'>,
 'shared_lock_remote': <memory at 0x7fcbf5ca6640>,
 'update_stream_position': 0,
 'update_stream_position_remote': 0}

Note: All status keys ending with _remote are stored in the control shared memory space and shared across processes.

Other things you can do:

>>> # Create a full dump
>>> ultra.dump()

>>> # Load latest full dump if one is available
>>> ultra.load()

>>> # Show statistics
>>> ultra.print_status()

>>> # Force load of latest full dump, even if we had already processed it.
>>> # There might also be streaming updates available after loading the full dump.
>>> ultra.load(force=True)

>>> # Apply full dump and stream updates to
>>> # underlying local dict, this is automatically
>>> # called by accessing the UltraDict in any usual way,
>>> # but can be useful to call after a forced load.
>>> ultra.apply_update()

>>> # Access underlying local dict directly for maximum performance
>>> ultra.data

>>> # Use any serializer you like, given it supports the loads() and dumps() methods
>>> import jsons
>>> ultra = UltraDict(serializer=jsons)

>>> # Close connection to shared memory; will return the data as a dict
>>> ultra.close()

>>> # Unlink all shared memory, it will not be visible to new processes afterwards
>>> ultra.unlink()

Contributing

Contributions are always welcome!

ultradict's People

Contributors

omershubi avatar ronny-rentner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ultradict's Issues

Observer or Change Event

Hi,

can you imagen of an easy way to subscribe to changes on the dictionary? Or of a way to extend the UltraDict to be able to subscribe for changes?

I have two processes sharing a configuration trough an UltraDict, but I would like to avoid polling the values every x seconds to ensure that both processes using the currently set configuration, if one of both changes some values. It would be nicer to have a callback function or something similar.

I am not a native python programmer, so maybe there are already other approaches for it in python, but I couldn't figure out something that I would assume could work in this scenario. What are your thoughts about that?

best regards,
ahorn

Shared memory not always cleared

Hi,

I'm using UltraDict to share data between a master process and several subprocesses.

I have auto_unlink=True on all declarations, but sometimes if the script fails (meaning something wrong in the code, or an unexpected error) it won't clear the memory, thus on the next run, when the master process creates the "new" UltraDict object, it reuses the same information from the previous execution (as the UltraDict names are predefined).

Is there a way to clear the memory of previous executions without having to reboot the server?

Thanks.

locked

One process reads, one process writes, seems to be locked?

AttributeError: data when updating nested UltraDicts

Hi @ronny-rentner,

I am trying to use nested UltraDicts to communicate between processes. The information flow is structured as follows:

  • Data is coming into a regular dict that has the following structure: info = {'data': {'example': 0}
  • The example key is updated frequently
  • I have a UltraDict constructed as follows: ultra_dict = UltraDict(recurse=True)
  • I update the ultra_dict as follows: ultra_dict.update(info), every x seconds
  • After 2 or 3 updates, ultra_dict.update(info) fails with the following stacktrace:
  File "C:\venv\lib\site-packages\UltraDict\UltraDict.py", line 815, in update
    self[k] = v
  File "C:\venv\lib\site-packages\UltraDict\UltraDict.py", line 849, in __setitem__
    if item.name not in self.recurse_register.data:
AttributeError: data

Because of design limitations it is not possible to have the information flow into ultra_dict directly. I expect that I should be able to call ultra_dict.update(info) consecutively, even if the information in info has not changed.
There is another process that is reading the information out of ultra_dict.

Is this expected behaviour? If so, how should I update the UltraDict such that information is correctly passed through to the other process?

Code to reproduce the issue:

ultra_dict = UltraDict(recurse=True)
info = {'data': {'example': 0}}
ultra_dict.update(info)
print(ultra_dict)
info['data']['example'] = 1
ultra_dict.update(info)
print(ultra_dict)
info['data']['example'] = 2
ultra_dict.update(info)

Python version: 3.8.10
UltraDict version: 0.0.6

UltraDict dependency 'atomics' is not compatible with MacBook silicon (m1)

Version : branch master
OS: macOS big sur version 11.6
The scenario:
I'm using this module in an algotrading bot app.
One mechanism I'm driving with this is helping the bot get quick updates from other process which is responsible of transmitting price updates.
The dictionary is a smart move as it is the right tool for the job. Process A fill a dictionary with prices . Process B consume those prices and makes math calculations based on them.

My Issue .
As the bot run inside a while loop , it never really exits gracefully but through an interrupt (SIGINT , then SIGTERM)
if the Producer of dict(Process A) exit by SIGINT its fine. but if Process B (consumer) exit by SIGINT the dictionary seems to enter a state which you can't clear it even with unlink() and close(). only restart helps with this scenario (checked /dev/shm but /shm does not exist on my hd)

That lad me to try the shared lock mechanism (because I thought it might help with accessing this map with a lock)
When I run the code again I was given an error stating "atomics" is not found. after a short pip install atomics I found out they don't have a wheel for Mac arm wheel but only universal one. when running again I get the error of "mech-o:wrong architecture"
even if I exclude t "shared lock=true" it keeps throw errors on the same thing. a restart to the computer is the only thing which clears that thing.

I suggest sort this quick as MacBook m1 computers are not that rare and it's actually a quite great library which I'm currently cannot really use :\

Extremely slow initialization from existing dict

Initializing from a (large) existing dict is slow -- it seems to be serializing every key-value pair as an update:

Traceback (most recent call last):
  File "/global/homes/p/pfasano/group_stats_dict.py", line 327, in <module>
    group_dict = read_groups(partitions, sp_bin)
  File "/global/homes/p/pfasano/group_stats_dict.py", line 206, in read_groups
    return UltraDict(group_dict, auto_unlink=True)
  File "/global/homes/p/pfasano/.local/perlmutter/3.9-anaconda-2021.11/lib/python3.9/site-packages/UltraDict/UltraDict.py", line 301, in __init__
    super().__init__(*args, **kwargs)
  File "/global/common/software/nersc/pm-2022q2/sw/python/3.9-anaconda-2021.11/lib/python3.9/collections/__init__.py", line 1046, in __init__
    self.update(dict)
  File "/global/homes/p/pfasano/.local/perlmutter/3.9-anaconda-2021.11/lib/python3.9/site-packages/UltraDict/UltraDict.py", line 541, in update
    self[k] = v
  File "/global/homes/p/pfasano/.local/perlmutter/3.9-anaconda-2021.11/lib/python3.9/site-packages/UltraDict/UltraDict.py", line 568, in __setitem__
    self.append_update(key, item)
  File "/global/homes/p/pfasano/.local/perlmutter/3.9-anaconda-2021.11/lib/python3.9/site-packages/UltraDict/UltraDict.py", line 482, in append_update
    self.dump()
  File "/global/homes/p/pfasano/.local/perlmutter/3.9-anaconda-2021.11/lib/python3.9/site-packages/UltraDict/UltraDict.py", line 374, in dump
    marshalled = self.serializer.dumps(self.data)

It seems like somehow super().__init__ is calling collections.UserDict.__init__, which in turn calls UltraDict.__setitem__.

I guess I don't quite understand yet how UltraDict works, but why does every key need to be serialized as an update to an empty dict?

Problem updating iterating on values

Hi! i started using your dictionary in my project however I found a bug while trying to iterate on the dictionary values. Those few lines of code trigger the bug.

Screenshot from 2022-05-17 12-15-58
.

It can be solve by applying apply_update before trying to iterate on the values, however the function is already called by the same process before trying to iterate (I added a print) so I do not really understand why is it solving it. However I'm probably going to iterate over keys instead, trying to bypass it by iterating over items but it is not working too :-)

Duplicate logs

Hello maybe this is a noob question, but I'm having this problem that when using the library some logs gets duplicated.

image

image

image

This is a very basic setup of FastAPI with UltraDict

Memory usage analysis

before
image

testing...
image

after
image

seems ok, ultra-dict didnt eats memory after test done. -- i am afraid it allocates memory and did't release thus the server will oom finally.

if you have any thoughts to test it plz let me know, i want to use ultra-dict in our prod env but afraid something went wrong.

Dict is not being shared amongst gunicorn workers

Using Starlette framework.

# this is assigned to a global variable ultra on startup / or just declared at the top
ultra = UltraDict(name="room_chats", create=None, recurse=True)```

and then i am trying to store websockets inside the dict. 

sample code:
```py
# this is on connect
if ultra.get(f"room_chat_{room_id}") is None:
            ultra[f"room_chat_{room_id}"] = set()


ultra[f"room_chat_{room_id}"].add(websocket)

# then in a broadcast method
for connection in ultra[f"room_chat_{room_id}"]:
...

if i print the ultra dict, it only shows one item but sometimes will show multiple connections if same worker gets assigned to those the websocket connections.

{'room_chat_1': {<starlette.websockets.WebSocket object at 0x10740d490>}}

Gunicorn command: gunicorn test:app --name asgi --log-level debug -w 10 --threads 10 -k uvicorn.workers.UvicornH11Worker

Also, on shutdown:

async def shutdown_stuff():
    UltraDict.unlink_by_name("room_chats", ignore_errors=True)
    UltraDict.unlink_by_name('room_chats_memory', ignore_errors=True)

It throws error:
FileNotFoundError: [Errno 2] No such file or directory: '/room_chats'

what am i doing wrong? please let me know if any additional info is required.

Add configurable timeout when waiting to acquire a lock

Currently hardcoded to 100_000 loops.

In Python 3.11, there's a new nanosleep(). Before, it's hard to sleep a nanosecond in Python without using busy wait.

We need to find a better solution for waiting for Python < 3.11

Issue with import

output from terminal:

~/Downloads/UltraDict-0.0.6$ pip install UltraDict
Defaulting to user installation because normal site-packages is not writeable
Collecting UltraDict
Using cached UltraDict-0.0.6.tar.gz (197 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: UltraDict
Building wheel for UltraDict (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for UltraDict (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/UltraDict
copying ./cmpxchg.py -> build/lib.linux-x86_64-cpython-310/UltraDict
copying ./Exceptions.py -> build/lib.linux-x86_64-cpython-310/UltraDict
copying ./UltraDict.py -> build/lib.linux-x86_64-cpython-310/UltraDict
copying ./init.py -> build/lib.linux-x86_64-cpython-310/UltraDict
copying ./setup.py -> build/lib.linux-x86_64-cpython-310/UltraDict
running build_ext
building 'UltraDict' extension
creating build/temp.linux-x86_64-cpython-310
x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/include/python3.10 -c UltraDict.c -o build/temp.linux-x86_64-cpython-310/UltraDict.o
UltraDict.c:18:10: fatal error: Python.h: No such file or directory
18 | #include "Python.h"
| ^~~~~~~~~~
compilation terminated.
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for UltraDict
Failed to build UltraDict
ERROR: Could not build wheels for UltraDict, which is required to install pyproject.toml-based projects

Question - pickle.UnpicklingError: pickle data was truncated

I got an error
pickle.UnpicklingError: pickle data was truncated

while try to utilize this library... how does this error message get generated? and how can I avoid this in the future?

Another weird one
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 201: invalid continuation byte

json str issue

==1== <class 'UltraDict.UltraDict.UltraDict'>
==2== {'1': {'video_local_path': '/Users/wenke/github/tiktoka-studio-uploader-app/tests/videos/1.mp4', 'video_filename': '1.mp4', 'video_title': '1', 'heading': '', 'subheading': '', 'extraheading': '', 'video_description': '', 'thumbnail_bg_image_path': '/Users/wenke/github/tiktoka-studio-uploader-app/tests/videos/1/sp/1-003.jpg', 'thumbnail_local_path': [], 'release_date': '', 'release_date_hour': '10:15', 'is_not_for_kid': True, 'categories': '', 'comments_ratings_policy': 1, 'is_age_restriction': False, 'is_paid_promotion': False, 'is_automatic_chapters': True, 'is_featured_place': True, 'video_language': '', 'captions_certification': 0, 'video_film_date': '', 'video_film_location': '', 'license_type': 0, 'is_allow_embedding': True, 'is_publish_to_subscriptions_feed_notify': True, 'shorts_remixing_type': 0, 'is_show_howmany_likes': True, 'is_monetization_allowed': True, 'first_comment': '', 'subtitles': '', 'tags': ''}}
==3== <class 'dict'>
==4== {'1': {'captions_certification': 0, 'categories': '', 'comments_ratings_policy': 1, 'extraheading': '', 'first_comment': '', 'heading': '3333', 'is_age_restriction': False, 'is_allow_embedding': True, 'is_automatic_chapters': True, 'is_featured_place': True, 'is_monetization_allowed': True, 'is_not_for_kid': True, 'is_paid_promotion': False, 'is_publish_to_subscriptions_feed_notify': True, 'is_show_howmany_likes': True, 'license_type': 0, 'release_date': '', 'release_date_hour': '10:15', 'shorts_remixing_type': 0, 'subheading': '2222', 'subtitles': '', 'tags': '', 'thumbnail_bg_image_path': '', 'thumbnail_local_path': [], 'video_description': '', 'video_filename': '1.mp4', 'video_film_date': '', 'video_film_location': '', 'video_language': '', 'video_local_path': '/Users/wenke/github/tiktoka-studio-uploader-app/tests/videos/1.mp4', 'video_title': '1'}}
wohhha data
```


code is 
```
                                    print('==1==',type(ultra[folder]['videos']))
                                    print('==2==',ultra[folder]['videos'])


                                    print('==3==',type(json.loads(df.to_json())))
                                    print('==4==',json.loads(df.to_json()))
                                    try:
                                        ultra[folder]['videos']=json.loads(df.to_json())
                                    except Exception as e:
                                        print(f'wohhha {e}')
```

Unable to access Ultradict after a certain loop Limit, Issue occurs Only on Linux.............

from UltraDict import UltraDict

ultra = UltraDict({ 'init': 'some initial data' }, name='myname1')

for i in range(1,5000):
print(UltraDict(name='myname1'))

############### ERROR #################
File "/home/merit/miniconda3/lib/python3.9/site-packages/UltraDict/UltraDict.py", line 659, in unlink
self.control.unlink()
File "/home/merit/miniconda3/lib/python3.9/multiprocessing/shared_memory.py", line 241, in unlink
_posixshmem.shm_unlink(self._name)
FileNotFoundError: [Errno 2] No such file or directory: '/myname1'

Spelling mistake in readme.md

Didn't want to create a PR because I hate people that want to fix spelling mistakes just to get into the contributor list.

recurse: If True, any nested dict objects will be automaticall wrapped in an UltraDict allowing transparent nested updates.

automaticall -> automatically

Crashes under high load

master process is writing to 1 nested dict1 (recurse=1) shared between 20-40 processes, total dict1 size ~1500 keys with nested dict (as value, small)

processes created via multiprocessing.Process, and writing to other shared dict - dict2[process_id] once per second, dict2 size - same, but *num_processes

main process analyzing statistics from dict2: for process_id in dict2: dict2[process_id]: ...
and write changes to shared dict1 once per second: for change in changes: dict1['nested'][change] = {'time': 123, 'blah': '123'}

crashing appears if changes size is 300-2000 in 1 second, and read lookups is HUGE (>100k/sec) but i tried to cache it once per second to local dict using deepcopy and this doesnt help...
total memory usage not exceed 2-4GB i think (free ram is about 60GB), CPU usage up to 100%

dict1 size in bytes determined on local dict with same structure is less than 150kb

i tried:

  1. copy.deepcopy(dict1) once per second to create a local copy in processes for cached lookups - doesn't help
  2. shared_lock
  3. with dict1.lock/etc
  4. increasing buffer to huge values, increasing full dump size/etc

and nothing helps... on low speeds (or no/small changes from master to dict1) all is working, or using multiprocessing.manager().dict all is working too, but slow

Examples of exceptions:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
	self.run()
  File "/usr/lib/python3.9/threading.py", line 892, in run
	self._target(*self._args, **self._kwargs)
  File "zvshield.py", line 793, in zvshield.accept_connections
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 585, in __contains__
	self.apply_update()
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 511, in apply_update
	assert bytes(self.buffer.buf[pos:pos+1]) == b'\x00'
AssertionError

File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 248, in __init__
	self.buffer = self.get_memory(create=True, name=self.name + '_memory', size=buffer_size)
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 347, in get_memory
	full_dump = self.serializer.loads(bytes(buf[pos:pos+length]))
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 304, in __init__
	self.apply_update()
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 520, in apply_update
	memory = multiprocessing.shared_memory.SharedMemory(name=name)
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 114, in __init__
	mode, key, value = self.serializer.loads(bytes(self.buffer.buf[pos:pos+length]))
	self._mmap = mmap.mmap(self._fd, size)
OSError: [Errno 12] Cannot allocate memory

EOFError: Ran out of input
Exception ignored in: <function SharedMemory.__del__ at 0x7fb80639e820>
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 184, in __del__
	self.close()
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 227, in close
Exception ignored in: <function SharedMemory.__del__ at 0x7fb80639e820>
	self._mmap.close()
Traceback (most recent call last):
BufferError: cannot close exported pointers exist
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 184, in __del__
	self.close()
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 227, in close
	self._mmap.close()
BufferError: cannot close exported pointers exist
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 184, in __del__
	self.close()
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 227, in close
	self._mmap.close()
BufferError: cannot close exported pointers exist
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
	self.run()
  File "/usr/lib/python3.9/threading.py", line 892, in run
	self._target(*self._args, **self._kwargs)
  File "zvshield.py", line 793, in zvshield.accept_connections
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 585, in __contains__
	self.apply_update()
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 500, in apply_update
	self.load(force=True)
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 450, in load
	full_dump = self.serializer.loads(bytes(buf[pos:pos+length]))
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 304, in __init__
	self.apply_update()
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 520, in apply_update
	mode, key, value = self.serializer.loads(bytes(self.buffer.buf[pos:pos+length]))
EOFError: Ran out of input
Exception ignored in: <function SharedMemory.__del__ at 0x7fc48f4d4820>
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 184, in __del__
	self.close()
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 227, in close
	self._mmap.close()
BufferError: cannot close exported pointers exist
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
	self.run()
  File "/usr/lib/python3.9/threading.py", line 892, in run
	self._target(*self._args, **self._kwargs)
  File "zvshield.py", line 793, in zvshield.accept_connections
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 585, in __contains__
	self.apply_update()
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 500, in apply_update
	self.load(force=True)
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 450, in load
	full_dump = self.serializer.loads(bytes(buf[pos:pos+length]))
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 304, in __init__
	self.apply_update()
  File "/usr/local/lib/python3.9/dist-packages/UltraDict/UltraDict.py", line 520, in apply_update
	mode, key, value = self.serializer.loads(bytes(self.buffer.buf[pos:pos+length]))
EOFError: Ran out of input
Exception ignored in: <function SharedMemory.__del__ at 0x7fc48f4d4820>
Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 184, in __del__
	self.close()
  File "/usr/lib/python3.9/multiprocessing/shared_memory.py", line 227, in close
	self._mmap.close()
BufferError: cannot close exported pointers exist

The dict does not delete items but put an empty string

Hello,
I am currently using your dictionary for my project.
Then I found a problem when I try to delete an item from the dict. Instead of deleting, the dict replaces the value that needs to be deleted by an empty string and it leads to a bug in my project.
I am writing a small piece of code to reproduce this behavior as you can find hereafter. Hope it can help you to figure out the problem.

from UltraDict import UltraDict
import random
import string
letters = string.ascii_lowercase
rand_str =   ''.join(random.choice(letters) for i in range(1000)) 
my_dict = UltraDict()
for i in range(10000):
	my_dict[i] = rand_str
for i in list(my_dict.keys()):
	del my_dict[i]
print (my_dict)

and here are the results I got
{379: b'', 750: b'', 1121: b'', 1492: b'', 1863: b'', 2234: b'', 2605: b'', 2976: b'', 3347: b'', 3718: b'', 4089: b'', 4460: b'', 4831: b'', 5202: b'', 5573: b'', 5944: b'', 6315: b'', 6686: b'', 7057: b'', 7428: b'', 7799: b'', 8170: b'', 8541: b'', 8912: b'', 9283: b'', 9654: b''}

Thank you

Crash

Cannot re-start my app, even after restart the computer.

C:\Users\marce\PycharmProjects\srsapp\venv310\Scripts\python.exe C:/Users/marce/PycharmProjects/srsapp/launcher.py --enable_file_cache True
Traceback (most recent call last):
File "C:\Users\marce\PycharmProjects\srsapp\launcher.py", line 7, in
import globalVariables
File "C:\Users\marce\PycharmProjects\srsapp\globalVariables.py", line 574, in
config = UltraDict(name='config1', size=500000)
File "C:\Users\marce\PycharmProjects\srsapp\venv310\lib\site-packages\UltraDict\UltraDict.py", line 288, in init
super().init(*args, **kwargs)
File "C:\Users\marce\AppData\Local\Programs\Python\Python310\lib\collections_init_.py", line 1092, in init
self.update(kwargs)
File "C:\Users\marce\PycharmProjects\srsapp\venv310\lib\site-packages\UltraDict\UltraDict.py", line 498, in update
self[k] = v
File "C:\Users\marce\PycharmProjects\srsapp\venv310\lib\site-packages\UltraDict\UltraDict.py", line 514, in setitem
self.apply_update()
File "C:\Users\marce\PycharmProjects\srsapp\venv310\lib\site-packages\UltraDict\UltraDict.py", line 464, in apply_update
self.load(force=True)
File "C:\Users\marce\PycharmProjects\srsapp\venv310\lib\site-packages\UltraDict\UltraDict.py", line 398, in load
full_dump_memory = self.get_memory(create=False, name=name)
File "C:\Users\marce\PycharmProjects\srsapp\venv310\lib\site-packages\UltraDict\UltraDict.py", line 329, in get_memory
raise Exception("Could not get memory: ", name)
Exception: ('Could not get memory: ', 'wnsm_0ce9a65a')
Exception ignored in: <function SharedMemory.del at 0x000001F84F477880>
Traceback (most recent call last):
File "C:\Users\marce\AppData\Local\Programs\Python\Python310\lib\multiprocessing\shared_memory.py", line 184, in del
self.close()
File "C:\Users\marce\AppData\Local\Programs\Python\Python310\lib\multiprocessing\shared_memory.py", line 227, in close
self._mmap.close()
BufferError: cannot close exported pointers exist

python3.11 -m build fails with FileNotFoundError

Summary

python3.11 -m build fails with "FileNotFoundError: [Errno 2] No such file or directory: '/tmp/build-via-sdist-k5fbi61p/UltraDict-0.0.6/readme.md'".

Worked out new build system that works.

  • Deleted setup.py
  • Modified pyproject.toml
  • Created setup.cfg

Build/Install/Test System

Raspberry Pi OS (Debian) Bookworm Version 12.2 Rasberry Pi 4 4GB Ram Python 3.11.5

By The Way...

This is an awesome good project. I've been using it for a year or so. The abstraction (shared dictionaries) works really well with my programming style.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.