swansonk14 / p_tqdm Goto Github PK
View Code? Open in Web Editor NEWParallel processing with progress bars
License: MIT License
Parallel processing with progress bars
License: MIT License
I'm trying to accelerate Pandas df.apply()
, and also get a progress bar. The problem is, p_map
is orders of magnitude slower than plain multiprocess.Pool.map()
for a job where most of the processing is done by nltk.sentiment.vader.SentimentIntensityAnalyzer()
.
This notebook is self-explanatory:
https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb
p_map()
is orders of magnitude slower.
However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.
Windows 10, Python 3.8.8, Jupyter Notebook
I have the following code:
import p_tqdm
d = dict()
def modify_dict(a):
d[a] = a ** 2
p_tqdm.p_map(modify_dict, list(range(10)))
print(d)
It outputs {}
instead of {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
.
In comparasion, concurrent.futures
works as expected:
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
executor.map(modify_dict, list(range(10)))
Is there a way to cleanly halt execution when a large job is running? I am using this in a jupyter notebook on macOS 10.15.4 and when I interrupt the cell (or one of the processes exits with an error) the cell shows as finished but my CPU and memory are still being used. In fact, there appears to be a memory leak because after exiting the whole jupyter and python instance, the memory usage of "kernel_task" (the process which was showing high CPU during execution) does not drop.
Thanks for your repository, it's a nice and useful work.
Sorry if this issue is trivial, but I'm a newbie with parallel processing.
I just managed to parallelize my code with Pathos and was searching for a progress bar, when I found p_tqdm.
My code currently uses Pathos' amap (asynchronous)
https://pathos.readthedocs.io/en/latest/pathos.html#pathos.multiprocessing.ProcessPool.amap
.. which does not seem to be supported in your work, but instead umap (unordered). Or are they equivalent? Would you perhaps be so kind to extend your library?
Thank you in advance
I can't get this module to work with other modules in the called function. For example, the following code returns an error:
import time
from p_tqdm import p_map
def _foo(my_number):
square = my_number * my_number
time.sleep(1)
return square
if __name__ == '__main__':
r = p_map(_foo, list(range(0, 30)))
ultiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\multiprocess\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\pathos\helpers\mp_helper.py", line 15, in
func = lambda args: f(*args)
File "c:/Test p_tqdm.py", line 6, in _foo
time.sleep(1)
NameError: name 'time' is not defined
"""The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:/Test p_tqdm.py", line 10, in
r = p_map(_foo, list(range(0, 30)))
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\p_tqdm\p_tqdm.py", line 86, in p_map
result = list(iterator)
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\p_tqdm\p_tqdm.py", line 75, in _parallel
for item in tqdm(map_func(function, *arrays), total=num_iter, **kwargs):
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\tqdm\std.py", line 1081, in iter
for obj in iterable:
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\multiprocess\pool.py", line 748, in next
raise value
NameError: name 'time' is not defined
The following code throws a ValueError
:
from p_tqdm import p_uimap
def increment(x):
return x + 1
it = (i for i in range(5)) # don't use range directly, because it is Sized
for x in p_uimap(f, it):
print(x)
the error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/cthoyt/.virtualenvs/integrator/lib/python3.8/site-packages/p_tqdm/p_tqdm.py", line 42, in _parallel
length = min(len(iterable) for iterable in iterables if isinstance(iterable, Sized))
ValueError: min() arg is an empty sequence
This happens because there are no sized iterables, and it's trying to take the min() of an empty sequence. This could be solved a few ways:
length=None
. This is an optional argument to tqdm()
so this is okay, but will not longer be able to give an estimatetry:
# Determine length of tqdm (equal to length of shortest iterable)
length = min(len(iterable) for iterable in iterables if isinstance(iterable, Sized))
except ValueError:
length = None
length=None
# Determine length of tqdm (equal to length of shortest iterable), if possible
lengths = [len(iterable) for iterable in iterables if isinstance(iterable, Sized)]
length = min(lengths) if lengths else None
I'm not sure which you would prefer, but they effectively accomplish the same thing. I made a PR #28 that uses the second solution.
Traceback (most recent call last):
File "group_notes_by_visit.py", line 13, in <module>
from p_tqdm import p_uimap
File "/home/ga2530/miniconda3/lib/python3.7/site-packages/p_tqdm/__init__.py", line 1, in <module>
from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
File "/home/ga2530/miniconda3/lib/python3.7/site-packages/p_tqdm/p_tqdm.py", line 11, in <module>
from collections import Sized
File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
File "/home/ga2530/miniconda3/lib/python3.7/collections/__init__.py", line 52, in __getattr__
DeprecationWarning, stacklevel=2)
DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
First time I've gotten this. Not sure what caused it
Trying to nest p_map
(e.g. to implement generating a square matrix) causes Python to barf out with a failed assertion.
Steps to reproduce:
from functools import partial
l = [1, 2, 3]
def add(a, b):
return a + b
def gen_mat(x):
nonlocal l
return p_map(partial(add, x), l)
mat = p_map(gen_mat, l)
print(mat)
Expected Result:
[[2,3,4],[3,4,5],[4,5,6]]
Actual Result:
AssertionError: daemonic processes are not allowed to have children
Hi. I was excited to find your package for making a progress bar work with pathos multiprocessing. When I change my original code..
p = pp.ProcessPool(4)
p.map(method, generator)
p.close()
p.join()
to
p_map(method, generator)
I get the following error:
PicklingError: Can't pickle 'tkapp' object: <tkapp object at 0x000000000B66C930>
Should p_tqdm work with Tk objects?
Thanks,
Nick
Progress bar not progressing while running the function, but directly showing full completed bar after 100%.
Hi, I have a p_map set up in jupyter notebook, but instead of the progress bar, I get:
HBox(children=(FloatProgress(value=0.0, max=1827402.0), HTML(value='')))
I have tried the following in terminal:
jupyter nbextension enable --py widgetsnbextension
jupyter labextension install @jupyter-widgets/jupyterlab-manager
but the issue persists after restarting the kernel.
Hi.
I'm seeing the following. I'm clearing my cache, removing my pip --user directory, seeing p_tqdm have trouble installing anyway, and then I show what OS I'm on. Although it's not shown here, I also tried upgrading my version of pip.
$ rm -rf ~/.cache/pip
below cmd output started 2020 Tue Jan 14 12:21:00 PM PST
above cmd output done 2020 Tue Jan 14 12:21:00 PM PST
dstromberg@dstromberg-inspiron-5570:~/src/grok/RM-454-test-train-transient x86_64-pc-linux-gnu 2521
$ rm -rf ~/.local/lib/python3.6
below cmd output started 2020 Tue Jan 14 12:21:04 PM PST
above cmd output done 2020 Tue Jan 14 12:21:04 PM PST
dstromberg@dstromberg-inspiron-5570:~/src/grok/RM-454-test-train-transient x86_64-pc-linux-gnu 2521
$ python3.6 -m pip install --user p_tqdm
below cmd output started 2020 Tue Jan 14 12:21:09 PM PST
Collecting p_tqdm
Downloading https://files.pythonhosted.org/packages/7c/49/e0d744c3aace9e8951725c7e47c4beabf9311cc47b0ead9879a6957e18a4/p_tqdm-1.3.tar.gz
ERROR: Complete output from command python setup.py egg_info:
ERROR: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-xpwevc0i/p-tqdm/setup.py", line 4, in <module>
long_description = f.read()
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1645: ordinal not in range(128)
----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xpwevc0i/p-tqdm/
WARNING: You are using pip version 19.1.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
above cmd output done 2020 Tue Jan 14 12:21:16 PM PST
dstromberg@dstromberg-inspiron-5570:~/src/grok/RM-454-test-train-transient x86_64-pc-linux-gnu 2521
$ cat /etc/lsb-release
below cmd output started 2020 Tue Jan 14 12:21:20 PM PST
DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=19.3
DISTRIB_CODENAME=tricia
DISTRIB_DESCRIPTION="Linux Mint 19.3 Tricia"
Is this happening to everyone? What can I do to get it installed?
Thanks!
I tried to use p_map
to do iteration on a 3d Numpy array, but the answer was not identical to those of for
loop and map
. Below is a simple example.
a = np.arange(12).reshape([2,2,3])
result = []
for i in a:
result.append(np.sum(i))
result
is [15, 51]
.
a = np.arange(12).reshape([2,2,3])
result = list(map(lambda x: np.sum(x), a)
result
is [15, 51]
.
a = np.arange(12).reshape([2,2,3])
result = p_map(lambda x: np.sum(x), a)
result
is [66]
.
I tried p_tqdm
to do multiprocessing within a function. This works extremely slowly:
import spacy
from pathos.pools import ThreadPool as Pool
import time
from p_tqdm import p_map
# Install with python -m spacy download es_core_news_sm
nlp = spacy.load("es_core_news_sm")
def preworker(text, nlp):
return [w.lemma_ for w in nlp(text)]
worker = lambda text: preworker(text, nlp)
texts = ["Este es un texto muy interesante en español"] * 1000
st = time.time()
pool = Pool(3)
r = pool.map(worker, texts)
print(f"Usual pool took {time.time()-st:.3f} seconds")
def out_worker(texts, nlp):
worker = lambda text: preworker(text, nlp)
pool = Pool(3)
return pool.map(worker, texts)
st = time.time()
r = out_worker(texts, nlp)
print(f"Pool within a function took {time.time()-st:.3f} seconds")
def out_worker_tqdm(texts, nlp):
worker = lambda text: preworker(text, nlp)
return p_map(worker, texts)
st = time.time()
r = out_worker_tqdm(texts, nlp)
print(f"p_tqdm within a function took {time.time()-st:.3f} seconds")
def out_worker2(texts, nlp, pool):
worker = lambda text: preworker(text, nlp)
return pool.map(worker, texts)
st = time.time()
pool = Pool(3)
r = out_worker2(texts, nlp, pool)
print(f"Pool passed to a function took {time.time()-st:.3f} seconds")
The output is
Usual pool took 0.052 seconds
Pool within a function took 0.062 seconds
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00, 1.23it/s]
p_tqdm within a function took 8.341 seconds
Pool passed to a function took 0.055 seconds
I got the tip of using threadpool instead of the ususal pool (I guess p_tqdm uses the usual pool underneath, but I haven't checked) from pathos author here.
Very nice library!
It seem to work only with python3..
Is there a python 2 version? Ok if not, though I might would have mentioned that on the README
It would be nice if one can choose the number of workers. Looks like it is currently defaulted according to the number of CPU cores.
Stable diffusion runs perfectly well until you install any extension which calls for tqdm , I've tried with multiple extensions, and its hit and miss which ones contained this error .. but it completely breaks stable diffusion which then refuses to launch once the extension has been added,
"Launching Web UI with arguments: --xformers --medvram
Traceback (most recent call last):
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\launch.py", line 41, in
main()
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\launch.py", line 37, in main
start()
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\modules\launch_utils.py", line 439, in start
import webui
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\webui.py", line 13, in
initialize.imports()
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\modules\initialize.py", line 21, in imports
import gradio # noqa: F401
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio_init_.py", line 3, in
import gradio.components as components
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio\components_init_.py", line 1, in
from gradio.components.annotated_image import AnnotatedImage
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio\components\annotated_image.py", line 8, in
from gradio_client.documentation import document, set_documentation_group
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio_client_init_.py", line 1, in
from gradio_client.client import Client
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio_client\client.py", line 24, in
from huggingface_hub import CommitOperationAdd, SpaceHardware, SpaceStage
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub_init_.py", line 322, in getattr
submod = importlib.import_module(submod_path)
File "C:\Users\Aaron\AppData\Local\Programs\Python\Python310\lib\importlib_init_.py", line 126, in import_module
return _bootstrap.gcd_import(name[level:], package, level)
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub\hf_api.py", line 35, in
from huggingface_hub.utils import (
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub\utils_init.py", line 18, in
from . import tqdm as _tqdm # _tqdm is the module
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub\utils\tqdm.py", line 63, in
from tqdm.auto import tqdm as old_tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
Press any key to continue . . .
When loading my scripts I get the following error using p-tqdm >= 1.3. It works for p-tqdm 1.2, though
Matplotlib support failed Traceback (most recent call last): File "/home/irazall/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/201.7846.77/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 25, in do_import succeeded = activate_func() File "/home/irazall/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/201.7846.77/plugins/python/helpers/pydev/pydev_ipython/matplotlibtools.py", line 155, in activate_pylab pylab = sys.modules['pylab'] KeyError: 'pylab'
My pipfile looks as follows:
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[dev-packages]
[packages]
beautifulsoup4 = "*"
easy-google-docs = "*"
google = "*"
google_auth_oauthlib = "*"
google-api-python-client = "*"
numpy = "*"
openpyxl = "*"
pandas = "*"
pytest = "*"
pytest-parallel = "*"
pyyaml = "*"
requests = "*"
scipy = "*"
soupsieve = "*"
tqdm = "*"
pathos = "*"
p-tqdm = "==1.3"
[requires]
python_version = "3.7"```
Hi,
I changed my preprocessing code so that some iterations take a fairly long time (1-2 minutes). I think this irregular speed is causing everything to hang at the end (usually at about 99%). It never finishes so I think there must be a parallel processing issue. I've tried setting miniters=1
and also changing to p_imap
but nothing seems to resolve it. Any ideas?
Thanks
Within the function call which is wrapped by p_uimap, I've put in a counter to see if anything takes longer then 2 minutes, and it's never triggered, so I imagine it's a parallel issue. Thanks
I've been using v1.2 for a while with no issues (along with tqdm 4.45.0) and recently upgraded p_tqdm and got this error. Didn't go away until I downgraded back to 1.2.
from p_tqdm import p_imap
File "/usr/local/lib/python3.7/site-packages/p_tqdm/__init__.py", line 1, in <module>
from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
File "/usr/local/lib/python3.7/site-packages/p_tqdm/p_tqdm.py", line 15, in <module>
from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
This is on a Docker image where I'm installing a bunch of things, but mostly relevant would be:
ENV TQDM_VERSION 4.45.0
ENV P_TQDM_VERSION 1.2
...
RUN pip install -U p_tqdm==$P_TQDM_VERSION
RUN pip install -U tqdm==$TQDM_VERSION
Increasing the version beyond 1.2 leads to the above error. Any idea what's going wrong?
Hello,
Thanks for the nice work. Do I need any line of code such as multiprocess.close or equivalent at the end or p_tqdm takes care of closing the workers once the work is done?
Maybe it would be nice to add this info in the readme.
Best,
Tommaso
This code breaks with NameError: name 'time' is not defined
:
import time
from tqdm.auto import tqdm
from p_tqdm import p_map, p_umap, p_imap, p_uimap
numbers = list(range(0, 1000))
def heavy_processing(number):
time.sleep(0.05)
output = number + 1
return output
results = p_map(heavy_processing, numbers)
print(results)
Error message:
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\flori\AppData\Roaming\Python\Python38\site-packages\multiprocess\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\flori\AppData\Roaming\Python\Python38\site-packages\pathos\helpers\mp_helper.py", line 15, in <lambda>
func = lambda args: f(*args)
File "<ipython-input-2-a99c335b4b30>", line 5, in add
NameError: name 'time' is not defined
"""
The above exception was the direct cause of the following exception:
NameError Traceback (most recent call last)
<ipython-input-4-1bbf104dda25> in <module>
----> 1 added = p_map(add, l1, l2)
~\AppData\Roaming\Python\Python38\site-packages\p_tqdm\p_tqdm.py in p_map(function, *iterables, **kwargs)
58 ordered = True
59 generator = _parallel(ordered, function, *iterables, **kwargs)
---> 60 result = list(generator)
61
62 return result
~\AppData\Roaming\Python\Python38\site-packages\p_tqdm\p_tqdm.py in _parallel(ordered, function, *iterables, **kwargs)
47 map_func = getattr(pool, map_type)
48
---> 49 for item in tqdm(map_func(function, *iterables), total=length, **kwargs):
50 yield item
51
~\AppData\Roaming\Python\Python38\site-packages\tqdm\notebook.py in __iter__(self)
252 def __iter__(self):
253 try:
--> 254 for obj in super(tqdm_notebook, self).__iter__():
255 # return super(tqdm...) will not catch exception
256 yield obj
~\AppData\Roaming\Python\Python38\site-packages\tqdm\std.py in __iter__(self)
1176
1177 try:
-> 1178 for obj in iterable:
1179 yield obj
1180 # Update and possibly print the progressbar.
~\AppData\Roaming\Python\Python38\site-packages\multiprocess\pool.py in next(self, timeout)
866 if success:
867 return value
--> 868 raise value
869
870 __next__ = next # XXX
NameError: name 'time' is not defined
However, this works fine - the difference being, I've passed the time module as a name to my function:
import time
from tqdm.auto import tqdm
from p_tqdm import p_map, p_umap, p_imap, p_uimap
numbers = list(range(0, 1000))
def heavy_processing(number, time=time):
time.sleep(0.05)
output = number + 1
return output
results = p_map(heavy_processing, numbers)
print(results)
I have Windows 10, Jupyter Notebook, Python 3.8.8, and these packages:
p-tqdm==1.3.3
tqdm==4.61.1
pathos==0.2.8
multiprocess==0.70.12.2
Is it a bug with p-tqdm? Or with one of the other modules? (in which case I will move this bug report to the appropriate repo)
When I run my function with p_map, it runs just fine. If I run it in my script that has more operations after the call to p_map, it takes a very long time and then I start getting various memory errors.
Is there a way to force it to wait for the processes to finish before continuing on to the next line of code?
Thanks!
Thanks for writing this awesome library!
It would be nice if the desc
argument of tqdm would be propagated when using p_tqdm.p_map(..., desc='My awesome parallel task')
to the internal call of tqdm
.
Hi and thanks for a great package!
I was wondering if it would make sense to add a keyword to disable the progress bar? I know that the progress bar is the reason why one would use this package to begin with, but I still sometimes run into cases where it would be useful to be able to disable the progress bar. I was thinking something along the line of:
def _parallel(ordered, function, *iterables, **kwargs):
"""Returns a generator for a parallel map with a progress bar.
Arguments:
ordered(bool): True for an ordered map, false for an unordered map.
function(Callable): The function to apply to each element of the given Iterables.
iterables(Tuple[Iterable]): One or more Iterables containing the data to be mapped.
Returns:
A generator which will apply the function to each element of the given Iterables
in parallel in order with a progress bar.
"""
# Extract num_cpus
num_cpus = kwargs.pop('num_cpus', None)
do_tqdm = kwargs.pop('do_tqdm', True)
# Determine num_cpus
if num_cpus is None:
num_cpus = cpu_count()
elif type(num_cpus) == float:
num_cpus = int(round(num_cpus * cpu_count()))
# Determine length of tqdm (equal to length of shortest iterable)
length = min(len(iterable) for iterable in iterables if isinstance(iterable, Sized))
# Create parallel generator
map_type = 'imap' if ordered else 'uimap'
pool = Pool(num_cpus)
map_func = getattr(pool, map_type)
# create iterable
items = map_func(function, *iterables)
# add progress bar
if do_tqdm:
items = tqdm(items, total=length, **kwargs)
for item in items:
yield item
pool.clear()
How would people think about this?
Cheers,
Christian
Hi,
Is there any way to update the postfix str like loop.set_postfix(name=value)
inside the loop like tqdm
?
When trying to p_map
on an iterator/generator p_tqdm
fails when checking the length while setting up tqdm
.
Steps to reproduce:
def id(x):
return x
a = t_imap(id, [1,2,3,4])
b = p_imap(id, a)
for c in b:
b.write(c)
Expected result:
Prints the numbers 1,2,3,4; each on its own line.
Actual result:
Crashes inside p_tqdm
before setting up tqdm
.
Debugging:
Both p_tqdm._parallel
and p_tqdm._sequential
are affected.
Apologies for not having a small, reproducible snippet for this, but I'm using p_imap
in my codebase which has been working fine for a while. However, recently (perhaps since me upgrading from 1.2?) I've had a couple of reproducible instances where my call to p_imap
hangs. I print
right before the call, and the function(s) that are invoked by p_imap
also print
(with flush=True
) on the first line of those functions and it appears the functions never get invoked.
I'm passing a list of dictionaries to p_imap
and the thing these examples have in common is that the list is longer and the dictionaries are larger. Is there some sort of size limitation on the parameters or is something else going on?
p_tqdm is the easiest and most beautiful way to parallelize computations.
I use it often but sometimes I need to cancel some calculation. When this happens, I need to restart the kernel.
I thought that encapsulating the usage of the Pool in a with ... as ...
clause could solve this problem.
I am presenting a PR implementing this small change to demonstrate it.
There should be an option to allow the caller to provide the chunk size used by the thread pool created by p_tqdm._parallel
. Using the default can be quite inefficient, especially when the caller knows that each of the operations inside the map is usually quite fast.
After installing this project, I run into the following error upon import:
from p_tqdm import p_umap
File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/p_tqdm/__init__.py", line 1, in <module>
from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/p_tqdm/p_tqdm.py", line 13, in <module>
from pathos.helpers import cpu_count
File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/__init__.py", line 55, in <module>
from . import pools
File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/pools.py", line 31, in <module>
from pathos.helpers import ProcessPool as _ProcessPool
File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/helpers/__init__.py", line 9, in <module>
from . import pp_helper
File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/helpers/pp_helper.py", line 30, in <module>
from pp import _Task
File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pp/__init__.py", line 12, in <module>
from ._pp import *
File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pp/_pp.py", line 62, in <module>
import six
ModuleNotFoundError: No module named 'six'
My poetry dependencies are:
python = "^3.7"
tqdm = "^4.41.1"
numpy = "^1.18.1"
p_tqdm = "^1.3"
Is there any explicit dependency missing here?
this will allow to get pool object's details such as processid etc.
I am getting this error using v1.3.3 of this package:
from p_tqdm import p_imap
File "/Users/werner/.pyenv/versions/3.10.0/lib/python3.10/site-packages/p_tqdm/__init__.py", line 1, in <module>
from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
File "/Users/werner/.pyenv/versions/3.10.0/lib/python3.10/site-packages/p_tqdm/p_tqdm.py", line 11, in <module>
from collections import Sized
ImportError: cannot import name 'Sized' from 'collections' (/Users/werner/.pyenv/versions/3.10.0/lib/python3.10/collections/__init__.py)
The import should come from collections.abc
instead of collections
.
Great library! I'm not sure whether this is something you've already looked into/tried or if this would be a new feature addition.
Anyhow, I have a script I am running using p_tqdm
and I'd like to achieve something similar to tqdm.write
where you can have the progress bar fixed to the bottom whilst printed messages to stdout end up above that. The script is rather large and rather than manually going in and changing those print
statements I've borrowed a SO answer to overload print
.
A simple repro script is as follows:
import time
import inspect
from p_tqdm import p_map
from p_tqdm.p_tqdm import tqdm # NOTE: here I've also tried importing tqdm as `from tqdm.auto import tqdm`.. no luck
def divert_stdout_to_tqdm() -> None:
old_print = print
def new_print(*args, **kwargs) -> None:
# if tqdm.tqdm.write raises error, use builtin print
try:
tqdm.write(*args, **kwargs)
except:
old_print(*args, ** kwargs)
inspect.builtins.print = new_print
def do_stuff(num: int) -> None:
print("HIIIIII")
time.sleep(0.5)
divert_stdout_to_tqdm()
# doesn't work
results = p_map(do_stuff, range(100))
# works
for i in tqdm(range(100)):
do_stuff(i)
results = p_map(do_stuff, range(100))
doesn't work exactly as I'd intend as it produces output as such:
HIIIIII
0%| | 0/100 [00:00<?, ?it/s]HIIIIII
HIIIIII
1%|▉ | 1/100 [00:00<01:35, 1.03it/s]HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
17%|███████████████▎ | 17/100 [00:01<00:57, 1.44it/s]HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
for i in tqdm(range(100)):
do_stuff(i)
works as intended and produces output as such:
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
5%|████▌ | 5/100 [00:02<00:47, 1.99it/s]
I am also not an expert in the multiprocessing
library and it is very well possible that this is more related to multiprocessing
than it is p_tqdm
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.