swansonk14 / p_tqdm Goto Github PK

View Code? Open in Web Editor NEW

445.0 7.0 43.0 45 KB

Parallel processing with progress bars

License: MIT License

Python 100.00%

parallel-processing tqdm progress-bar

p_tqdm's People

Contributors

Stargazers

Watchers

Forkers

pombredanne varal7 s-ahuja mickidymick gongyh spmohanty fcoclavero thiagobell wassname popunbom vladperervenko duducosmos sd235634 therockstardba heyuhere cthoyt eduardkononov jurjsorinliviu harenbrs okc13 hugovk mziminski ahmedbesbes chaoswin surya-narayanan x00123 haydnspass python-repository-hub mailhexu steven-murray psantheus kpatsakis qizhangacct jcalifornia bridgerdier fibog tteague19 nickodell isoron darinchau r-menezes sankek vigsivan

p_tqdm's Issues

p_map() very slow compared to multiprocess.Pool.map()

I'm trying to accelerate Pandas df.apply(), and also get a progress bar. The problem is, p_map is orders of magnitude slower than plain multiprocess.Pool.map() for a job where most of the processing is done by nltk.sentiment.vader.SentimentIntensityAnalyzer().

This notebook is self-explanatory:

https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb

p_map() is orders of magnitude slower.

However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.

Windows 10, Python 3.8.8, Jupyter Notebook

Cannot modify global variable

I have the following code:

import p_tqdm
d = dict()

def modify_dict(a):
    d[a] = a ** 2

p_tqdm.p_map(modify_dict, list(range(10)))
print(d)

It outputs {} instead of {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}.

In comparasion, concurrent.futures works as expected:

import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
     executor.map(modify_dict, list(range(10)))

Cleanly halt execution

Is there a way to cleanly halt execution when a large job is running? I am using this in a jupyter notebook on macOS 10.15.4 and when I interrupt the cell (or one of the processes exits with an error) the cell shows as finished but my CPU and memory are still being used. In fact, there appears to be a memory leak because after exiting the whole jupyter and python instance, the memory usage of "kernel_task" (the process which was showing high CPU during execution) does not drop.

amap or umap?

Thanks for your repository, it's a nice and useful work.

Sorry if this issue is trivial, but I'm a newbie with parallel processing.
I just managed to parallelize my code with Pathos and was searching for a progress bar, when I found p_tqdm.

My code currently uses Pathos' amap (asynchronous)
https://pathos.readthedocs.io/en/latest/pathos.html#pathos.multiprocessing.ProcessPool.amap
.. which does not seem to be supported in your work, but instead umap (unordered). Or are they equivalent? Would you perhaps be so kind to extend your library?

Thank you in advance

Global variables not visible by p_tqdm module

I can't get this module to work with other modules in the called function. For example, the following code returns an error:

import time
from p_tqdm import p_map

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   r = p_map(_foo, list(range(0, 30)))

ultiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\multiprocess\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\pathos\helpers\mp_helper.py", line 15, in
func = lambda args: f(*args)
File "c:/Test p_tqdm.py", line 6, in _foo
time.sleep(1)
NameError: name 'time' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "c:/Test p_tqdm.py", line 10, in
r = p_map(_foo, list(range(0, 30)))
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\p_tqdm\p_tqdm.py", line 86, in p_map
result = list(iterator)
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\p_tqdm\p_tqdm.py", line 75, in _parallel
for item in tqdm(map_func(function, *arrays), total=num_iter, **kwargs):
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\tqdm\std.py", line 1081, in iter
for obj in iterable:
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\multiprocess\pool.py", line 748, in next
raise value
NameError: name 'time' is not defined

ValueError thrown if no iterables are sized

The following code throws a ValueError:

from p_tqdm import p_uimap
def increment(x):
    return x + 1
it = (i for i in range(5))  # don't use range directly, because it is Sized
for x in p_uimap(f, it):
    print(x)

the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cthoyt/.virtualenvs/integrator/lib/python3.8/site-packages/p_tqdm/p_tqdm.py", line 42, in _parallel
    length = min(len(iterable) for iterable in iterables if isinstance(iterable, Sized))
ValueError: min() arg is an empty sequence

This happens because there are no sized iterables, and it's trying to take the min() of an empty sequence. This could be solved a few ways:

Surrounding this line with try/except, then setting length=None. This is an optional argument to tqdm() so this is okay, but will not longer be able to give an estimate

try:
    # Determine length of tqdm (equal to length of shortest iterable)
    length = min(len(iterable) for iterable in iterables if isinstance(iterable, Sized))
except ValueError:
    length = None

Save the list of iterables (this won't be so long) that are sized and check it explicitly for not being empty. If it is, set length=None

# Determine length of tqdm (equal to length of shortest iterable), if possible
lengths = [len(iterable) for iterable in iterables if isinstance(iterable, Sized)]
length = min(lengths) if lengths else None

I'm not sure which you would prefer, but they effectively accomplish the same thing. I made a PR #28 that uses the second solution.

Deprecation error

Traceback (most recent call last):
  File "group_notes_by_visit.py", line 13, in <module>
    from p_tqdm import p_uimap
  File "/home/ga2530/miniconda3/lib/python3.7/site-packages/p_tqdm/__init__.py", line 1, in <module>
    from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
  File "/home/ga2530/miniconda3/lib/python3.7/site-packages/p_tqdm/p_tqdm.py", line 11, in <module>
    from collections import Sized
  File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
  File "/home/ga2530/miniconda3/lib/python3.7/collections/__init__.py", line 52, in __getattr__
    DeprecationWarning, stacklevel=2)
DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working

First time I've gotten this. Not sure what caused it

Nested p_map causing AssertionError in thread pool

Trying to nest p_map (e.g. to implement generating a square matrix) causes Python to barf out with a failed assertion.

Steps to reproduce:

from functools import partial

l = [1, 2, 3]

def add(a, b):
    return a + b

def gen_mat(x):
    nonlocal l
    return p_map(partial(add, x), l)

mat = p_map(gen_mat, l)
print(mat)

Expected Result:

[[2,3,4],[3,4,5],[4,5,6]]

Actual Result:

AssertionError: daemonic processes are not allowed to have children

PicklingError: Can't pickle 'tkapp' object: <tkapp object at 0x000000000B66C930>

Hi. I was excited to find your package for making a progress bar work with pathos multiprocessing. When I change my original code..

p = pp.ProcessPool(4)
p.map(method, generator)
p.close()
p.join()

p_map(method, generator)

I get the following error:

PicklingError: Can't pickle 'tkapp' object: <tkapp object at 0x000000000B66C930>

Should p_tqdm work with Tk objects?

Thanks,
Nick

Progress bar not progressing while running function

Progress bar not progressing while running the function, but directly showing full completed bar after 100%.

progress bar not showing in jupyter notebok

Hi, I have a p_map set up in jupyter notebook, but instead of the progress bar, I get:
HBox(children=(FloatProgress(value=0.0, max=1827402.0), HTML(value='')))

I have tried the following in terminal:
jupyter nbextension enable --py widgetsnbextension
jupyter labextension install @jupyter-widgets/jupyterlab-manager

but the issue persists after restarting the kernel.

p_tqdm not installing today on Python 3.6

Hi.

I'm seeing the following. I'm clearing my cache, removing my pip --user directory, seeing p_tqdm have trouble installing anyway, and then I show what OS I'm on. Although it's not shown here, I also tried upgrading my version of pip.

$ rm -rf ~/.cache/pip
below cmd output started 2020 Tue Jan 14 12:21:00 PM PST
above cmd output done    2020 Tue Jan 14 12:21:00 PM PST
dstromberg@dstromberg-inspiron-5570:~/src/grok/RM-454-test-train-transient x86_64-pc-linux-gnu 2521

$ rm -rf ~/.local/lib/python3.6
below cmd output started 2020 Tue Jan 14 12:21:04 PM PST
above cmd output done    2020 Tue Jan 14 12:21:04 PM PST
dstromberg@dstromberg-inspiron-5570:~/src/grok/RM-454-test-train-transient x86_64-pc-linux-gnu 2521

$ python3.6 -m pip install --user p_tqdm
below cmd output started 2020 Tue Jan 14 12:21:09 PM PST
Collecting p_tqdm
  Downloading https://files.pythonhosted.org/packages/7c/49/e0d744c3aace9e8951725c7e47c4beabf9311cc47b0ead9879a6957e18a4/p_tqdm-1.3.tar.gz
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-xpwevc0i/p-tqdm/setup.py", line 4, in <module>
        long_description = f.read()
      File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1645: ordinal not in range(128)
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xpwevc0i/p-tqdm/
WARNING: You are using pip version 19.1.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
above cmd output done    2020 Tue Jan 14 12:21:16 PM PST
dstromberg@dstromberg-inspiron-5570:~/src/grok/RM-454-test-train-transient x86_64-pc-linux-gnu 2521

$ cat /etc/lsb-release 
below cmd output started 2020 Tue Jan 14 12:21:20 PM PST
DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=19.3
DISTRIB_CODENAME=tricia
DISTRIB_DESCRIPTION="Linux Mint 19.3 Tricia"

Is this happening to everyone? What can I do to get it installed?

Thanks!

Numpy array iteration problem

I tried to use p_map to do iteration on a 3d Numpy array, but the answer was not identical to those of for loop and map. Below is a simple example.

a = np.arange(12).reshape([2,2,3])
result = []
for i in a:
    result.append(np.sum(i))

result is [15, 51].

a = np.arange(12).reshape([2,2,3])
result = list(map(lambda x: np.sum(x), a)

result is [15, 51].

a = np.arange(12).reshape([2,2,3])
result = p_map(lambda x: np.sum(x), a)

result is [66].

Use ThreadPool instead of usual Pool

I tried p_tqdm to do multiprocessing within a function. This works extremely slowly:

import spacy
from pathos.pools import ThreadPool as Pool
import time
from p_tqdm import p_map

# Install with python -m spacy download es_core_news_sm
nlp = spacy.load("es_core_news_sm")

def preworker(text, nlp):
    return [w.lemma_ for w in nlp(text)]

worker = lambda text: preworker(text, nlp)

texts = ["Este es un texto muy interesante en español"] * 1000

st = time.time()
pool = Pool(3)
r = pool.map(worker, texts)
print(f"Usual pool took {time.time()-st:.3f} seconds")

def out_worker(texts, nlp):
    worker = lambda text: preworker(text, nlp)
    pool = Pool(3)
    return pool.map(worker, texts)

st = time.time()
r = out_worker(texts, nlp)
print(f"Pool within a function took {time.time()-st:.3f} seconds")

def out_worker_tqdm(texts, nlp):
    worker = lambda text: preworker(text, nlp)
    return p_map(worker, texts)

st = time.time()
r = out_worker_tqdm(texts, nlp)
print(f"p_tqdm within a function took {time.time()-st:.3f} seconds")

def out_worker2(texts, nlp, pool):     
    worker = lambda text: preworker(text, nlp)     
    return pool.map(worker, texts)

st = time.time()
pool = Pool(3) 
r = out_worker2(texts, nlp, pool)
print(f"Pool passed to a function took {time.time()-st:.3f} seconds")

The output is

Usual pool took 0.052 seconds
Pool within a function took 0.062 seconds
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.23it/s]
p_tqdm within a function took 8.341 seconds
Pool passed to a function took 0.055 seconds

I got the tip of using threadpool instead of the ususal pool (I guess p_tqdm uses the usual pool underneath, but I haven't checked) from pathos author here.

python 2 version?

Very nice library!
It seem to work only with python3..
Is there a python 2 version? Ok if not, though I might would have mentioned that on the README

Number of workers

It would be nice if one can choose the number of workers. Looks like it is currently defaulted according to the number of CPU cores.

ModuleNotFoundError: No module named 'tqdm.auto

Stable diffusion runs perfectly well until you install any extension which calls for tqdm , I've tried with multiple extensions, and its hit and miss which ones contained this error .. but it completely breaks stable diffusion which then refuses to launch once the extension has been added,

"Launching Web UI with arguments: --xformers --medvram
Traceback (most recent call last):
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\launch.py", line 41, in
main()
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\launch.py", line 37, in main
start()
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\modules\launch_utils.py", line 439, in start
import webui
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\webui.py", line 13, in
initialize.imports()
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\modules\initialize.py", line 21, in imports
import gradio # noqa: F401
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio_init_.py", line 3, in
import gradio.components as components
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio\components_init_.py", line 1, in
from gradio.components.annotated_image import AnnotatedImage
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio\components\annotated_image.py", line 8, in
from gradio_client.documentation import document, set_documentation_group
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio_client_init_.py", line 1, in
from gradio_client.client import Client
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio_client\client.py", line 24, in
from huggingface_hub import CommitOperationAdd, SpaceHardware, SpaceStage
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub_init_.py", line 322, in getattr
submod = importlib.import_module(submod_path)
File "C:\Users\Aaron\AppData\Local\Programs\Python\Python310\lib\importlib_init_.py", line 126, in import_module
return _bootstrap.gcd_import(name[level:], package, level)
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub\hf_api.py", line 35, in
from huggingface_hub.utils import (
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub\utils_init.py", line 18, in
from . import tqdm as _tqdm # _tqdm is the module
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub\utils\tqdm.py", line 63, in
from tqdm.auto import tqdm as old_tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
Press any key to continue . . .

KeyError 'pylab'

When loading my scripts I get the following error using p-tqdm >= 1.3. It works for p-tqdm 1.2, though

Matplotlib support failed Traceback (most recent call last): File "/home/irazall/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/201.7846.77/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 25, in do_import succeeded = activate_func() File "/home/irazall/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/201.7846.77/plugins/python/helpers/pydev/pydev_ipython/matplotlibtools.py", line 155, in activate_pylab pylab = sys.modules['pylab'] KeyError: 'pylab'

My pipfile looks as follows:

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
beautifulsoup4 = "*"
easy-google-docs = "*"
google = "*"
google_auth_oauthlib = "*"
google-api-python-client = "*"
numpy = "*"
openpyxl = "*"
pandas = "*"
pytest = "*"
pytest-parallel = "*"
pyyaml = "*"
requests = "*"
scipy = "*"
soupsieve = "*"
tqdm = "*"
pathos = "*"
p-tqdm = "==1.3"

[requires]
python_version = "3.7"```

task hangs indefinitely at the end of p_uimap

Hi,

I changed my preprocessing code so that some iterations take a fairly long time (1-2 minutes). I think this irregular speed is causing everything to hang at the end (usually at about 99%). It never finishes so I think there must be a parallel processing issue. I've tried setting miniters=1 and also changing to p_imap but nothing seems to resolve it. Any ideas?

Thanks

Within the function call which is wrapped by p_uimap, I've put in a counter to see if anything takes longer then 2 minutes, and it's never triggered, so I imagine it's a parallel issue. Thanks

ModuleNotFoundError: No module named 'tqdm.auto'

I've been using v1.2 for a while with no issues (along with tqdm 4.45.0) and recently upgraded p_tqdm and got this error. Didn't go away until I downgraded back to 1.2.

    from p_tqdm import p_imap
  File "/usr/local/lib/python3.7/site-packages/p_tqdm/__init__.py", line 1, in <module>
    from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
  File "/usr/local/lib/python3.7/site-packages/p_tqdm/p_tqdm.py", line 15, in <module>
    from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'

This is on a Docker image where I'm installing a bunch of things, but mostly relevant would be:

ENV TQDM_VERSION 4.45.0
ENV P_TQDM_VERSION 1.2

...

RUN pip install -U p_tqdm==$P_TQDM_VERSION
RUN pip install -U tqdm==$TQDM_VERSION

Increasing the version beyond 1.2 leads to the above error. Any idea what's going wrong?

Is any like such as multiprocess.close() needed?

Hello,
Thanks for the nice work. Do I need any line of code such as multiprocess.close or equivalent at the end or p_tqdm takes care of closing the workers once the work is done?

Maybe it would be nice to add this info in the readme.

Best,
Tommaso

NameError: name 'time' is not defined

This code breaks with NameError: name 'time' is not defined:

import time
from tqdm.auto import tqdm
from p_tqdm import p_map, p_umap, p_imap, p_uimap

numbers = list(range(0, 1000))

def heavy_processing(number):
    time.sleep(0.05)
    output = number + 1
    return output

results = p_map(heavy_processing, numbers)

print(results)

Error message:

RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\flori\AppData\Roaming\Python\Python38\site-packages\multiprocess\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\flori\AppData\Roaming\Python\Python38\site-packages\pathos\helpers\mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "<ipython-input-2-a99c335b4b30>", line 5, in add
NameError: name 'time' is not defined
"""

The above exception was the direct cause of the following exception:

NameError                                 Traceback (most recent call last)
<ipython-input-4-1bbf104dda25> in <module>
----> 1 added = p_map(add, l1, l2)

~\AppData\Roaming\Python\Python38\site-packages\p_tqdm\p_tqdm.py in p_map(function, *iterables, **kwargs)
     58     ordered = True
     59     generator = _parallel(ordered, function, *iterables, **kwargs)
---> 60     result = list(generator)
     61 
     62     return result

~\AppData\Roaming\Python\Python38\site-packages\p_tqdm\p_tqdm.py in _parallel(ordered, function, *iterables, **kwargs)
     47     map_func = getattr(pool, map_type)
     48 
---> 49     for item in tqdm(map_func(function, *iterables), total=length, **kwargs):
     50         yield item
     51 

~\AppData\Roaming\Python\Python38\site-packages\tqdm\notebook.py in __iter__(self)
    252     def __iter__(self):
    253         try:
--> 254             for obj in super(tqdm_notebook, self).__iter__():
    255                 # return super(tqdm...) will not catch exception
    256                 yield obj

~\AppData\Roaming\Python\Python38\site-packages\tqdm\std.py in __iter__(self)
   1176 
   1177         try:
-> 1178             for obj in iterable:
   1179                 yield obj
   1180                 # Update and possibly print the progressbar.

~\AppData\Roaming\Python\Python38\site-packages\multiprocess\pool.py in next(self, timeout)
    866         if success:
    867             return value
--> 868         raise value
    869 
    870     __next__ = next                    # XXX

NameError: name 'time' is not defined

However, this works fine - the difference being, I've passed the time module as a name to my function:

import time
from tqdm.auto import tqdm
from p_tqdm import p_map, p_umap, p_imap, p_uimap

numbers = list(range(0, 1000))

def heavy_processing(number, time=time):
    time.sleep(0.05)
    output = number + 1
    return output

results = p_map(heavy_processing, numbers)

print(results)

I have Windows 10, Jupyter Notebook, Python 3.8.8, and these packages:

p-tqdm==1.3.3
tqdm==4.61.1
pathos==0.2.8
multiprocess==0.70.12.2

Is it a bug with p-tqdm? Or with one of the other modules? (in which case I will move this bug report to the appropriate repo)

P_map not waiting for processes to finish before moving on and crashing due to a memory error

When I run my function with p_map, it runs just fine. If I run it in my script that has more operations after the call to p_map, it takes a very long time and then I start getting various memory errors.
Is there a way to force it to wait for the processes to finish before continuing on to the next line of code?

Thanks!

Support tqdm(..., desc='Put description here')

Thanks for writing this awesome library!

It would be nice if the desc argument of tqdm would be propagated when using p_tqdm.p_map(..., desc='My awesome parallel task') to the internal call of tqdm.

Add keyword to disable progress bar

Hi and thanks for a great package!

I was wondering if it would make sense to add a keyword to disable the progress bar? I know that the progress bar is the reason why one would use this package to begin with, but I still sometimes run into cases where it would be useful to be able to disable the progress bar. I was thinking something along the line of:

def _parallel(ordered, function, *iterables, **kwargs):
    """Returns a generator for a parallel map with a progress bar.
    Arguments:
        ordered(bool): True for an ordered map, false for an unordered map.
        function(Callable): The function to apply to each element of the given Iterables.
        iterables(Tuple[Iterable]): One or more Iterables containing the data to be mapped.
    Returns:
        A generator which will apply the function to each element of the given Iterables
        in parallel in order with a progress bar.
    """

    # Extract num_cpus
    num_cpus = kwargs.pop('num_cpus', None)
    do_tqdm = kwargs.pop('do_tqdm', True)

    # Determine num_cpus
    if num_cpus is None:
        num_cpus = cpu_count()
    elif type(num_cpus) == float:
        num_cpus = int(round(num_cpus * cpu_count()))

    # Determine length of tqdm (equal to length of shortest iterable)
    length = min(len(iterable) for iterable in iterables if isinstance(iterable, Sized))

    # Create parallel generator
    map_type = 'imap' if ordered else 'uimap'
    pool = Pool(num_cpus)
    map_func = getattr(pool, map_type)

    # create iterable
    items = map_func(function, *iterables)

    # add progress bar
    if do_tqdm:
        items = tqdm(items, total=length, **kwargs)

    for item in items:
        yield item

    pool.clear()

How would people think about this?

Cheers,
Christian

Update the tqdm postfix?

Hi,

Is there any way to update the postfix str like loop.set_postfix(name=value) inside the loop like tqdm?

Parallel mapping on Generators

When trying to p_map on an iterator/generator p_tqdm fails when checking the length while setting up tqdm.

Steps to reproduce:

def id(x):
    return x

a = t_imap(id, [1,2,3,4])
b = p_imap(id, a)
for c in b:
    b.write(c)

Expected result:
Prints the numbers 1,2,3,4; each on its own line.

Actual result:
Crashes inside p_tqdm before setting up tqdm.

Debugging:
Both p_tqdm._parallel and p_tqdm._sequential are affected.

What could cause p_imap to hang on invocation?

Apologies for not having a small, reproducible snippet for this, but I'm using p_imap in my codebase which has been working fine for a while. However, recently (perhaps since me upgrading from 1.2?) I've had a couple of reproducible instances where my call to p_imap hangs. I print right before the call, and the function(s) that are invoked by p_imap also print (with flush=True) on the first line of those functions and it appears the functions never get invoked.

I'm passing a list of dictionaries to p_imap and the thing these examples have in common is that the list is longer and the dictionaries are larger. Is there some sort of size limitation on the parameters or is something else going on?

When using p_tqdm in a Jupyter session and the cell is killed, the processes continue.

p_tqdm is the easiest and most beautiful way to parallelize computations.
I use it often but sometimes I need to cancel some calculation. When this happens, I need to restart the kernel.

I thought that encapsulating the usage of the Pool in a with ... as ... clause could solve this problem.
I am presenting a PR implementing this small change to demonstrate it.

Allow to set the chunk size used by the thread pool

There should be an option to allow the caller to provide the chunk size used by the thread pool created by p_tqdm._parallel. Using the default can be quite inefficient, especially when the caller knows that each of the operations inside the map is usually quite fast.

Rationale:
https://medium.com/@rvprasad/data-and-chunk-sizes-matter-when-using-multiprocessing-pool-map-in-python-5023c96875ef

No six dependency?

After installing this project, I run into the following error upon import:

    from p_tqdm import p_umap
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/p_tqdm/__init__.py", line 1, in <module>
    from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/p_tqdm/p_tqdm.py", line 13, in <module>
    from pathos.helpers import cpu_count
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/__init__.py", line 55, in <module>
    from . import pools
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/pools.py", line 31, in <module>
    from pathos.helpers import ProcessPool as _ProcessPool
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/helpers/__init__.py", line 9, in <module>
    from . import pp_helper
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/helpers/pp_helper.py", line 30, in <module>
    from pp import _Task
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pp/__init__.py", line 12, in <module>
    from ._pp import *
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pp/_pp.py", line 62, in <module>
    import six
ModuleNotFoundError: No module named 'six'

My poetry dependencies are:

python = "^3.7"
tqdm = "^4.41.1"
numpy = "^1.18.1"
p_tqdm = "^1.3"

Is there any explicit dependency missing here?

Expose the Pool object?

this will allow to get pool object's details such as processid etc.

Does not work with Python 3.10

I am getting this error using v1.3.3 of this package:

    from p_tqdm import p_imap
  File "/Users/werner/.pyenv/versions/3.10.0/lib/python3.10/site-packages/p_tqdm/__init__.py", line 1, in <module>
    from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
  File "/Users/werner/.pyenv/versions/3.10.0/lib/python3.10/site-packages/p_tqdm/p_tqdm.py", line 11, in <module>
    from collections import Sized
ImportError: cannot import name 'Sized' from 'collections' (/Users/werner/.pyenv/versions/3.10.0/lib/python3.10/collections/__init__.py)

The import should come from collections.abc instead of collections.

tqdm.write

Great library! I'm not sure whether this is something you've already looked into/tried or if this would be a new feature addition.

Anyhow, I have a script I am running using p_tqdm and I'd like to achieve something similar to tqdm.write where you can have the progress bar fixed to the bottom whilst printed messages to stdout end up above that. The script is rather large and rather than manually going in and changing those print statements I've borrowed a SO answer to overload print.

A simple repro script is as follows:

import time
import inspect
from p_tqdm import p_map
from p_tqdm.p_tqdm import tqdm  #  NOTE: here I've also tried importing tqdm as `from tqdm.auto import tqdm`.. no luck


def divert_stdout_to_tqdm() -> None:
    old_print = print

    def new_print(*args, **kwargs) -> None:
        # if tqdm.tqdm.write raises error, use builtin print
        try:
            tqdm.write(*args, **kwargs)
        except:
            old_print(*args, ** kwargs)

    inspect.builtins.print = new_print


def do_stuff(num: int) -> None:
    print("HIIIIII")
    time.sleep(0.5)


divert_stdout_to_tqdm()

# doesn't work
results = p_map(do_stuff, range(100))

# works
for i in tqdm(range(100)):
    do_stuff(i)

results = p_map(do_stuff, range(100))

doesn't work exactly as I'd intend as it produces output as such:

HIIIIII
  0%|                                                                                                   | 0/100 [00:00<?, ?it/s]HIIIIII
HIIIIII
  1%|▉                                                                                          | 1/100 [00:00<01:35,  1.03it/s]HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
 17%|███████████████▎                                                                          | 17/100 [00:01<00:57,  1.44it/s]HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII

for i in tqdm(range(100)):
    do_stuff(i)

works as intended and produces output as such:

HIIIIII                                                                                                                         
HIIIIII                                                                                                                         
HIIIIII                                                                                                                         
HIIIIII                                                                                                                         
HIIIIII                                                                                                                         
HIIIIII                                                                                                                         
  5%|████▌                                                                                      | 5/100 [00:02<00:47,  1.99it/s]

I am also not an expert in the multiprocessing library and it is very well possible that this is more related to multiprocessing than it is p_tqdm.