Giter Site home page Giter Site logo

p_tqdm's Introduction

p_tqdm

PyPI - Python Version PyPI version Downloads Build Status codecov license

p_tqdm makes parallel processing with progress bars easy.

p_tqdm is a wrapper around pathos.multiprocessing and tqdm. Unlike Python's default multiprocessing library, pathos provides a more flexible parallel map which can apply almost any type of function, including lambda functions, nested functions, and class methods, and can easily handle functions with multiple arguments. tqdm is applied on top of pathos's parallel map and displays a progress bar including an estimated time to completion.

Installation

pip install p_tqdm

Example

Let's say you want to add two lists element by element. Without any parallelism, this can be done easily with a Python map.

l1 = ['1', '2', '3']
l2 = ['a', 'b', 'c']

def add(a, b):
    return a + b
    
added = map(add, l1, l2)
# added == ['1a', '2b', '3c']

But if the lists are much larger or the computation is more intense, parallelism becomes a necessity. However, the syntax is often cumbersome. p_tqdm makes it easy and adds a progress bar too.

from p_tqdm import p_map

added = p_map(add, l1, l2)
# added == ['1a', '2b', '3c']
  0%|                                    | 0/3 [00:00<?, ?it/s]
 33%|████████████                        | 1/3 [00:01<00:02, 1.00s/it]
 66%|████████████████████████            | 2/3 [00:02<00:01, 1.00s/it]
100%|████████████████████████████████████| 3/3 [00:03<00:00, 1.00s/it]

p_tqdm functions

Parallel maps

  • p_map - parallel ordered map
  • p_imap - iterator for parallel ordered map
  • p_umap - parallel unordered map
  • p_uimap - iterator for parallel unordered map

Sequential maps

  • t_map - sequential ordered map
  • t_imap - iterator for sequential ordered map

p_map

Performs an ordered map in parallel.

from p_tqdm import p_map

def add(a, b):
    return a + b

added = p_map(add, ['1', '2', '3'], ['a', 'b', 'c'])
# added = ['1a', '2b', '3c']

p_imap

Returns an iterator for an ordered map in parallel.

from p_tqdm import p_imap

def add(a, b):
    return a + b

iterator = p_imap(add, ['1', '2', '3'], ['a', 'b', 'c'])

for result in iterator:
    print(result) # prints '1a', '2b', '3c'

p_umap

Performs an unordered map in parallel.

from p_tqdm import p_umap

def add(a, b):
    return a + b

added = p_umap(add, ['1', '2', '3'], ['a', 'b', 'c'])
# added is an array with '1a', '2b', '3c' in any order

p_uimap

Returns an iterator for an unordered map in parallel.

from p_tqdm import p_uimap

def add(a, b):
    return a + b

iterator = p_uimap(add, ['1', '2', '3'], ['a', 'b', 'c'])

for result in iterator:
    print(result) # prints '1a', '2b', '3c' in any order

t_map

Performs an ordered map sequentially.

from p_tqdm import t_map

def add(a, b):
    return a + b

added = t_map(add, ['1', '2', '3'], ['a', 'b', 'c'])
# added == ['1a', '2b', '3c']

t_imap

Returns an iterator for an ordered map to be performed sequentially.

from p_tqdm import p_imap

def add(a, b):
    return a + b

iterator = t_imap(add, ['1', '2', '3'], ['a', 'b', 'c'])

for result in iterator:
    print(result) # prints '1a', '2b', '3c'

Shared properties

Arguments

All p_tqdm functions accept any number of iterables as input, as long as the number of iterables matches the number of arguments of the function.

To repeat a non-iterable argument along with the iterables, use Python's partial from the functools library. See the example below.

from functools import partial

l1 = ['1', '2', '3']
l2 = ['a', 'b', 'c']

def add(a, b, c=''):
    return a + b + c

added = p_map(partial(add, c='!'), l1, l2)
# added == ['1a!', '2b!', '3c!']

CPUs

All the parallel p_tqdm functions can be passed the keyword num_cpus to indicate how many CPUs to use. The default is all CPUs. num_cpus can either be an integer to indicate the exact number of CPUs to use or a float to indicate the proportion of CPUs to use.

Note that the parallel Pool objects used by p_tqdm are automatically closed when the map finishes processing.

tqdm instance

All the parallel p_tqdm functions can be passed the keyword tqdm to choose a specific flavor of tqdm. By default, this value is taken from tqdm.auto. The tqdm parameter can be used pass p_tqdm output to tqdm.gui, tqdm.tk or any customized subclass of tqdm.

p_tqdm's People

Contributors

cthoyt avatar fcoclavero avatar harenbrs avatar jan-janssen avatar swansonk14 avatar varal7 avatar wassname avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

p_tqdm's Issues

ValueError thrown if no iterables are sized

The following code throws a ValueError:

from p_tqdm import p_uimap
def increment(x):
    return x + 1
it = (i for i in range(5))  # don't use range directly, because it is Sized
for x in p_uimap(f, it):
    print(x)

the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cthoyt/.virtualenvs/integrator/lib/python3.8/site-packages/p_tqdm/p_tqdm.py", line 42, in _parallel
    length = min(len(iterable) for iterable in iterables if isinstance(iterable, Sized))
ValueError: min() arg is an empty sequence

This happens because there are no sized iterables, and it's trying to take the min() of an empty sequence. This could be solved a few ways:

  1. Surrounding this line with try/except, then setting length=None. This is an optional argument to tqdm() so this is okay, but will not longer be able to give an estimate
try:
    # Determine length of tqdm (equal to length of shortest iterable)
    length = min(len(iterable) for iterable in iterables if isinstance(iterable, Sized))
except ValueError:
    length = None
  1. Save the list of iterables (this won't be so long) that are sized and check it explicitly for not being empty. If it is, set length=None
# Determine length of tqdm (equal to length of shortest iterable), if possible
lengths = [len(iterable) for iterable in iterables if isinstance(iterable, Sized)]
length = min(lengths) if lengths else None

I'm not sure which you would prefer, but they effectively accomplish the same thing. I made a PR #28 that uses the second solution.

p_tqdm not installing today on Python 3.6

Hi.

I'm seeing the following. I'm clearing my cache, removing my pip --user directory, seeing p_tqdm have trouble installing anyway, and then I show what OS I'm on. Although it's not shown here, I also tried upgrading my version of pip.

$ rm -rf ~/.cache/pip
below cmd output started 2020 Tue Jan 14 12:21:00 PM PST
above cmd output done    2020 Tue Jan 14 12:21:00 PM PST
dstromberg@dstromberg-inspiron-5570:~/src/grok/RM-454-test-train-transient x86_64-pc-linux-gnu 2521

$ rm -rf ~/.local/lib/python3.6
below cmd output started 2020 Tue Jan 14 12:21:04 PM PST
above cmd output done    2020 Tue Jan 14 12:21:04 PM PST
dstromberg@dstromberg-inspiron-5570:~/src/grok/RM-454-test-train-transient x86_64-pc-linux-gnu 2521

$ python3.6 -m pip install --user p_tqdm
below cmd output started 2020 Tue Jan 14 12:21:09 PM PST
Collecting p_tqdm
  Downloading https://files.pythonhosted.org/packages/7c/49/e0d744c3aace9e8951725c7e47c4beabf9311cc47b0ead9879a6957e18a4/p_tqdm-1.3.tar.gz
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-xpwevc0i/p-tqdm/setup.py", line 4, in <module>
        long_description = f.read()
      File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1645: ordinal not in range(128)
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xpwevc0i/p-tqdm/
WARNING: You are using pip version 19.1.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
above cmd output done    2020 Tue Jan 14 12:21:16 PM PST
dstromberg@dstromberg-inspiron-5570:~/src/grok/RM-454-test-train-transient x86_64-pc-linux-gnu 2521

$ cat /etc/lsb-release 
below cmd output started 2020 Tue Jan 14 12:21:20 PM PST
DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=19.3
DISTRIB_CODENAME=tricia
DISTRIB_DESCRIPTION="Linux Mint 19.3 Tricia"

Is this happening to everyone? What can I do to get it installed?

Thanks!

ModuleNotFoundError: No module named 'tqdm.auto

Stable diffusion runs perfectly well until you install any extension which calls for tqdm , I've tried with multiple extensions, and its hit and miss which ones contained this error .. but it completely breaks stable diffusion which then refuses to launch once the extension has been added,

"Launching Web UI with arguments: --xformers --medvram
Traceback (most recent call last):
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\launch.py", line 41, in
main()
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\launch.py", line 37, in main
start()
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\modules\launch_utils.py", line 439, in start
import webui
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\webui.py", line 13, in
initialize.imports()
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\modules\initialize.py", line 21, in imports
import gradio # noqa: F401
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio_init_.py", line 3, in
import gradio.components as components
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio\components_init_.py", line 1, in
from gradio.components.annotated_image import AnnotatedImage
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio\components\annotated_image.py", line 8, in
from gradio_client.documentation import document, set_documentation_group
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio_client_init_.py", line 1, in
from gradio_client.client import Client
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\gradio_client\client.py", line 24, in
from huggingface_hub import CommitOperationAdd, SpaceHardware, SpaceStage
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub_init_.py", line 322, in getattr
submod = importlib.import_module(submod_path)
File "C:\Users\Aaron\AppData\Local\Programs\Python\Python310\lib\importlib_init_.py", line 126, in import_module
return _bootstrap.gcd_import(name[level:], package, level)
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub\hf_api.py", line 35, in
from huggingface_hub.utils import (
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub\utils_init
.py", line 18, in
from . import tqdm as _tqdm # _tqdm is the module
File "C:\1ststable\stable-diffusion-webui-1.6.0-RC\venv\lib\site-packages\huggingface_hub\utils\tqdm.py", line 63, in
from tqdm.auto import tqdm as old_tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
Press any key to continue . . .

Use ThreadPool instead of usual Pool

I tried p_tqdm to do multiprocessing within a function. This works extremely slowly:

import spacy
from pathos.pools import ThreadPool as Pool
import time
from p_tqdm import p_map

# Install with python -m spacy download es_core_news_sm
nlp = spacy.load("es_core_news_sm")

def preworker(text, nlp):
    return [w.lemma_ for w in nlp(text)]

worker = lambda text: preworker(text, nlp)

texts = ["Este es un texto muy interesante en español"] * 1000

st = time.time()
pool = Pool(3)
r = pool.map(worker, texts)
print(f"Usual pool took {time.time()-st:.3f} seconds")

def out_worker(texts, nlp):
    worker = lambda text: preworker(text, nlp)
    pool = Pool(3)
    return pool.map(worker, texts)

st = time.time()
r = out_worker(texts, nlp)
print(f"Pool within a function took {time.time()-st:.3f} seconds")

def out_worker_tqdm(texts, nlp):
    worker = lambda text: preworker(text, nlp)
    return p_map(worker, texts)

st = time.time()
r = out_worker_tqdm(texts, nlp)
print(f"p_tqdm within a function took {time.time()-st:.3f} seconds")

def out_worker2(texts, nlp, pool):     
    worker = lambda text: preworker(text, nlp)     
    return pool.map(worker, texts)

st = time.time()
pool = Pool(3) 
r = out_worker2(texts, nlp, pool)
print(f"Pool passed to a function took {time.time()-st:.3f} seconds")

The output is

Usual pool took 0.052 seconds
Pool within a function took 0.062 seconds
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.23it/s]
p_tqdm within a function took 8.341 seconds
Pool passed to a function took 0.055 seconds

I got the tip of using threadpool instead of the ususal pool (I guess p_tqdm uses the usual pool underneath, but I haven't checked) from pathos author here.

Nested p_map causing AssertionError in thread pool

Trying to nest p_map (e.g. to implement generating a square matrix) causes Python to barf out with a failed assertion.

Steps to reproduce:

from functools import partial

l = [1, 2, 3]

def add(a, b):
    return a + b

def gen_mat(x):
    nonlocal l
    return p_map(partial(add, x), l)

mat = p_map(gen_mat, l)
print(mat)

Expected Result:

[[2,3,4],[3,4,5],[4,5,6]]

Actual Result:

AssertionError: daemonic processes are not allowed to have children

Is any like such as multiprocess.close() needed?

Hello,
Thanks for the nice work. Do I need any line of code such as multiprocess.close or equivalent at the end or p_tqdm takes care of closing the workers once the work is done?

Maybe it would be nice to add this info in the readme.

Best,
Tommaso

Support tqdm(..., desc='Put description here')

Thanks for writing this awesome library!

It would be nice if the desc argument of tqdm would be propagated when using p_tqdm.p_map(..., desc='My awesome parallel task') to the internal call of tqdm.

KeyError 'pylab'

When loading my scripts I get the following error using p-tqdm >= 1.3. It works for p-tqdm 1.2, though

Matplotlib support failed Traceback (most recent call last): File "/home/irazall/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/201.7846.77/plugins/python/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 25, in do_import succeeded = activate_func() File "/home/irazall/.local/share/JetBrains/Toolbox/apps/PyCharm-P/ch-0/201.7846.77/plugins/python/helpers/pydev/pydev_ipython/matplotlibtools.py", line 155, in activate_pylab pylab = sys.modules['pylab'] KeyError: 'pylab'

My pipfile looks as follows:

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
beautifulsoup4 = "*"
easy-google-docs = "*"
google = "*"
google_auth_oauthlib = "*"
google-api-python-client = "*"
numpy = "*"
openpyxl = "*"
pandas = "*"
pytest = "*"
pytest-parallel = "*"
pyyaml = "*"
requests = "*"
scipy = "*"
soupsieve = "*"
tqdm = "*"
pathos = "*"
p-tqdm = "==1.3"

[requires]
python_version = "3.7"```

Global variables not visible by p_tqdm module

I can't get this module to work with other modules in the called function. For example, the following code returns an error:

import time
from p_tqdm import p_map

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   r = p_map(_foo, list(range(0, 30)))

ultiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\multiprocess\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\pathos\helpers\mp_helper.py", line 15, in
func = lambda args: f(*args)
File "c:/Test p_tqdm.py", line 6, in _foo
time.sleep(1)
NameError: name 'time' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "c:/Test p_tqdm.py", line 10, in
r = p_map(_foo, list(range(0, 30)))
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\p_tqdm\p_tqdm.py", line 86, in p_map
result = list(iterator)
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\p_tqdm\p_tqdm.py", line 75, in _parallel
for item in tqdm(map_func(function, *arrays), total=num_iter, **kwargs):
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\tqdm\std.py", line 1081, in iter
for obj in iterable:
File "C:\Users\kb\AppData\Local\Continuum\anaconda3\lib\site-packages\multiprocess\pool.py", line 748, in next
raise value
NameError: name 'time' is not defined

When using p_tqdm in a Jupyter session and the cell is killed, the processes continue.

p_tqdm is the easiest and most beautiful way to parallelize computations.
I use it often but sometimes I need to cancel some calculation. When this happens, I need to restart the kernel.

I thought that encapsulating the usage of the Pool in a with ... as ... clause could solve this problem.
I am presenting a PR implementing this small change to demonstrate it.

amap or umap?

Thanks for your repository, it's a nice and useful work.

Sorry if this issue is trivial, but I'm a newbie with parallel processing.
I just managed to parallelize my code with Pathos and was searching for a progress bar, when I found p_tqdm.

My code currently uses Pathos' amap (asynchronous)
https://pathos.readthedocs.io/en/latest/pathos.html#pathos.multiprocessing.ProcessPool.amap
.. which does not seem to be supported in your work, but instead umap (unordered). Or are they equivalent? Would you perhaps be so kind to extend your library?

Thank you in advance

Deprecation error

Traceback (most recent call last):
  File "group_notes_by_visit.py", line 13, in <module>
    from p_tqdm import p_uimap
  File "/home/ga2530/miniconda3/lib/python3.7/site-packages/p_tqdm/__init__.py", line 1, in <module>
    from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
  File "/home/ga2530/miniconda3/lib/python3.7/site-packages/p_tqdm/p_tqdm.py", line 11, in <module>
    from collections import Sized
  File "<frozen importlib._bootstrap>", line 1032, in _handle_fromlist
  File "/home/ga2530/miniconda3/lib/python3.7/collections/__init__.py", line 52, in __getattr__
    DeprecationWarning, stacklevel=2)
DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working

First time I've gotten this. Not sure what caused it

tqdm.write

Great library! I'm not sure whether this is something you've already looked into/tried or if this would be a new feature addition.

Anyhow, I have a script I am running using p_tqdm and I'd like to achieve something similar to tqdm.write where you can have the progress bar fixed to the bottom whilst printed messages to stdout end up above that. The script is rather large and rather than manually going in and changing those print statements I've borrowed a SO answer to overload print.

A simple repro script is as follows:

import time
import inspect
from p_tqdm import p_map
from p_tqdm.p_tqdm import tqdm  #  NOTE: here I've also tried importing tqdm as `from tqdm.auto import tqdm`.. no luck


def divert_stdout_to_tqdm() -> None:
    old_print = print

    def new_print(*args, **kwargs) -> None:
        # if tqdm.tqdm.write raises error, use builtin print
        try:
            tqdm.write(*args, **kwargs)
        except:
            old_print(*args, ** kwargs)

    inspect.builtins.print = new_print


def do_stuff(num: int) -> None:
    print("HIIIIII")
    time.sleep(0.5)


divert_stdout_to_tqdm()

# doesn't work
results = p_map(do_stuff, range(100))

# works
for i in tqdm(range(100)):
    do_stuff(i)
results = p_map(do_stuff, range(100))

doesn't work exactly as I'd intend as it produces output as such:

HIIIIII
  0%|                                                                                                   | 0/100 [00:00<?, ?it/s]HIIIIII
HIIIIII
  1%|| 1/100 [00:00<01:35,  1.03it/s]HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
 17%|███████████████▎                                                                          | 17/100 [00:01<00:57,  1.44it/s]HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
HIIIIII
for i in tqdm(range(100)):
    do_stuff(i)

works as intended and produces output as such:

HIIIIII                                                                                                                         
HIIIIII                                                                                                                         
HIIIIII                                                                                                                         
HIIIIII                                                                                                                         
HIIIIII                                                                                                                         
HIIIIII                                                                                                                         
  5%|████▌                                                                                      | 5/100 [00:02<00:47,  1.99it/s]

I am also not an expert in the multiprocessing library and it is very well possible that this is more related to multiprocessing than it is p_tqdm.

NameError: name 'time' is not defined

This code breaks with NameError: name 'time' is not defined:

import time
from tqdm.auto import tqdm
from p_tqdm import p_map, p_umap, p_imap, p_uimap

numbers = list(range(0, 1000))

def heavy_processing(number):
    time.sleep(0.05)
    output = number + 1
    return output

results = p_map(heavy_processing, numbers)

print(results)

Error message:

RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\flori\AppData\Roaming\Python\Python38\site-packages\multiprocess\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\flori\AppData\Roaming\Python\Python38\site-packages\pathos\helpers\mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "<ipython-input-2-a99c335b4b30>", line 5, in add
NameError: name 'time' is not defined
"""

The above exception was the direct cause of the following exception:

NameError                                 Traceback (most recent call last)
<ipython-input-4-1bbf104dda25> in <module>
----> 1 added = p_map(add, l1, l2)

~\AppData\Roaming\Python\Python38\site-packages\p_tqdm\p_tqdm.py in p_map(function, *iterables, **kwargs)
     58     ordered = True
     59     generator = _parallel(ordered, function, *iterables, **kwargs)
---> 60     result = list(generator)
     61 
     62     return result

~\AppData\Roaming\Python\Python38\site-packages\p_tqdm\p_tqdm.py in _parallel(ordered, function, *iterables, **kwargs)
     47     map_func = getattr(pool, map_type)
     48 
---> 49     for item in tqdm(map_func(function, *iterables), total=length, **kwargs):
     50         yield item
     51 

~\AppData\Roaming\Python\Python38\site-packages\tqdm\notebook.py in __iter__(self)
    252     def __iter__(self):
    253         try:
--> 254             for obj in super(tqdm_notebook, self).__iter__():
    255                 # return super(tqdm...) will not catch exception
    256                 yield obj

~\AppData\Roaming\Python\Python38\site-packages\tqdm\std.py in __iter__(self)
   1176 
   1177         try:
-> 1178             for obj in iterable:
   1179                 yield obj
   1180                 # Update and possibly print the progressbar.

~\AppData\Roaming\Python\Python38\site-packages\multiprocess\pool.py in next(self, timeout)
    866         if success:
    867             return value
--> 868         raise value
    869 
    870     __next__ = next                    # XXX

NameError: name 'time' is not defined

However, this works fine - the difference being, I've passed the time module as a name to my function:

import time
from tqdm.auto import tqdm
from p_tqdm import p_map, p_umap, p_imap, p_uimap

numbers = list(range(0, 1000))

def heavy_processing(number, time=time):
    time.sleep(0.05)
    output = number + 1
    return output

results = p_map(heavy_processing, numbers)

print(results)

I have Windows 10, Jupyter Notebook, Python 3.8.8, and these packages:

p-tqdm==1.3.3
tqdm==4.61.1
pathos==0.2.8
multiprocess==0.70.12.2

Is it a bug with p-tqdm? Or with one of the other modules? (in which case I will move this bug report to the appropriate repo)

Cleanly halt execution

Is there a way to cleanly halt execution when a large job is running? I am using this in a jupyter notebook on macOS 10.15.4 and when I interrupt the cell (or one of the processes exits with an error) the cell shows as finished but my CPU and memory are still being used. In fact, there appears to be a memory leak because after exiting the whole jupyter and python instance, the memory usage of "kernel_task" (the process which was showing high CPU during execution) does not drop.

Numpy array iteration problem

I tried to use p_map to do iteration on a 3d Numpy array, but the answer was not identical to those of for loop and map. Below is a simple example.

a = np.arange(12).reshape([2,2,3])
result = []
for i in a:
    result.append(np.sum(i))

result is [15, 51].

a = np.arange(12).reshape([2,2,3])
result = list(map(lambda x: np.sum(x), a)

result is [15, 51].

a = np.arange(12).reshape([2,2,3])
result = p_map(lambda x: np.sum(x), a)

result is [66].

No six dependency?

After installing this project, I run into the following error upon import:

    from p_tqdm import p_umap
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/p_tqdm/__init__.py", line 1, in <module>
    from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/p_tqdm/p_tqdm.py", line 13, in <module>
    from pathos.helpers import cpu_count
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/__init__.py", line 55, in <module>
    from . import pools
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/pools.py", line 31, in <module>
    from pathos.helpers import ProcessPool as _ProcessPool
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/helpers/__init__.py", line 9, in <module>
    from . import pp_helper
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pathos/helpers/pp_helper.py", line 30, in <module>
    from pp import _Task
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pp/__init__.py", line 12, in <module>
    from ._pp import *
  File "/home/werner/.cache/pypoetry/virtualenvs/streaming-qoe-logs-9HyVvwyT-py3.7/lib/python3.7/site-packages/pp/_pp.py", line 62, in <module>
    import six
ModuleNotFoundError: No module named 'six'

My poetry dependencies are:

python = "^3.7"
tqdm = "^4.41.1"
numpy = "^1.18.1"
p_tqdm = "^1.3"

Is there any explicit dependency missing here?

progress bar not showing in jupyter notebok

Hi, I have a p_map set up in jupyter notebook, but instead of the progress bar, I get:
HBox(children=(FloatProgress(value=0.0, max=1827402.0), HTML(value='')))

I have tried the following in terminal:
jupyter nbextension enable --py widgetsnbextension
jupyter labextension install @jupyter-widgets/jupyterlab-manager

but the issue persists after restarting the kernel.

PicklingError: Can't pickle 'tkapp' object: <tkapp object at 0x000000000B66C930>

Hi. I was excited to find your package for making a progress bar work with pathos multiprocessing. When I change my original code..

p = pp.ProcessPool(4)
p.map(method, generator)
p.close()
p.join()

to

p_map(method, generator)

I get the following error:

PicklingError: Can't pickle 'tkapp' object: <tkapp object at 0x000000000B66C930>

Should p_tqdm work with Tk objects?

Thanks,
Nick

Update the tqdm postfix?

Hi,

Is there any way to update the postfix str like loop.set_postfix(name=value) inside the loop like tqdm?

Parallel mapping on Generators

When trying to p_map on an iterator/generator p_tqdm fails when checking the length while setting up tqdm.

Steps to reproduce:

def id(x):
    return x

a = t_imap(id, [1,2,3,4])
b = p_imap(id, a)
for c in b:
    b.write(c)

Expected result:
Prints the numbers 1,2,3,4; each on its own line.

Actual result:
Crashes inside p_tqdm before setting up tqdm.

Debugging:
Both p_tqdm._parallel and p_tqdm._sequential are affected.

ModuleNotFoundError: No module named 'tqdm.auto'

I've been using v1.2 for a while with no issues (along with tqdm 4.45.0) and recently upgraded p_tqdm and got this error. Didn't go away until I downgraded back to 1.2.

    from p_tqdm import p_imap
  File "/usr/local/lib/python3.7/site-packages/p_tqdm/__init__.py", line 1, in <module>
    from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
  File "/usr/local/lib/python3.7/site-packages/p_tqdm/p_tqdm.py", line 15, in <module>
    from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'

This is on a Docker image where I'm installing a bunch of things, but mostly relevant would be:

ENV TQDM_VERSION 4.45.0
ENV P_TQDM_VERSION 1.2

...

RUN pip install -U p_tqdm==$P_TQDM_VERSION
RUN pip install -U tqdm==$TQDM_VERSION

Increasing the version beyond 1.2 leads to the above error. Any idea what's going wrong?

Add keyword to disable progress bar

Hi and thanks for a great package!

I was wondering if it would make sense to add a keyword to disable the progress bar? I know that the progress bar is the reason why one would use this package to begin with, but I still sometimes run into cases where it would be useful to be able to disable the progress bar. I was thinking something along the line of:

def _parallel(ordered, function, *iterables, **kwargs):
    """Returns a generator for a parallel map with a progress bar.
    Arguments:
        ordered(bool): True for an ordered map, false for an unordered map.
        function(Callable): The function to apply to each element of the given Iterables.
        iterables(Tuple[Iterable]): One or more Iterables containing the data to be mapped.
    Returns:
        A generator which will apply the function to each element of the given Iterables
        in parallel in order with a progress bar.
    """

    # Extract num_cpus
    num_cpus = kwargs.pop('num_cpus', None)
    do_tqdm = kwargs.pop('do_tqdm', True)

    # Determine num_cpus
    if num_cpus is None:
        num_cpus = cpu_count()
    elif type(num_cpus) == float:
        num_cpus = int(round(num_cpus * cpu_count()))

    # Determine length of tqdm (equal to length of shortest iterable)
    length = min(len(iterable) for iterable in iterables if isinstance(iterable, Sized))

    # Create parallel generator
    map_type = 'imap' if ordered else 'uimap'
    pool = Pool(num_cpus)
    map_func = getattr(pool, map_type)

    # create iterable
    items = map_func(function, *iterables)

    # add progress bar
    if do_tqdm:
        items = tqdm(items, total=length, **kwargs)

    for item in items:
        yield item

    pool.clear()

How would people think about this?

Cheers,
Christian

What could cause p_imap to hang on invocation?

Apologies for not having a small, reproducible snippet for this, but I'm using p_imap in my codebase which has been working fine for a while. However, recently (perhaps since me upgrading from 1.2?) I've had a couple of reproducible instances where my call to p_imap hangs. I print right before the call, and the function(s) that are invoked by p_imap also print (with flush=True) on the first line of those functions and it appears the functions never get invoked.

I'm passing a list of dictionaries to p_imap and the thing these examples have in common is that the list is longer and the dictionaries are larger. Is there some sort of size limitation on the parameters or is something else going on?

task hangs indefinitely at the end of p_uimap

Hi,

I changed my preprocessing code so that some iterations take a fairly long time (1-2 minutes). I think this irregular speed is causing everything to hang at the end (usually at about 99%). It never finishes so I think there must be a parallel processing issue. I've tried setting miniters=1 and also changing to p_imap but nothing seems to resolve it. Any ideas?

Thanks

Within the function call which is wrapped by p_uimap, I've put in a counter to see if anything takes longer then 2 minutes, and it's never triggered, so I imagine it's a parallel issue. Thanks

Number of workers

It would be nice if one can choose the number of workers. Looks like it is currently defaulted according to the number of CPU cores.

Does not work with Python 3.10

I am getting this error using v1.3.3 of this package:

    from p_tqdm import p_imap
  File "/Users/werner/.pyenv/versions/3.10.0/lib/python3.10/site-packages/p_tqdm/__init__.py", line 1, in <module>
    from p_tqdm.p_tqdm import p_map, p_imap, p_umap, p_uimap, t_map, t_imap
  File "/Users/werner/.pyenv/versions/3.10.0/lib/python3.10/site-packages/p_tqdm/p_tqdm.py", line 11, in <module>
    from collections import Sized
ImportError: cannot import name 'Sized' from 'collections' (/Users/werner/.pyenv/versions/3.10.0/lib/python3.10/collections/__init__.py)

The import should come from collections.abc instead of collections.

Cannot modify global variable

I have the following code:

import p_tqdm
d = dict()

def modify_dict(a):
    d[a] = a ** 2

p_tqdm.p_map(modify_dict, list(range(10)))
print(d)

It outputs {} instead of {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}.

In comparasion, concurrent.futures works as expected:

import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
     executor.map(modify_dict, list(range(10)))

python 2 version?

Very nice library!
It seem to work only with python3..
Is there a python 2 version? Ok if not, though I might would have mentioned that on the README

p_map() very slow compared to multiprocess.Pool.map()

I'm trying to accelerate Pandas df.apply(), and also get a progress bar. The problem is, p_map is orders of magnitude slower than plain multiprocess.Pool.map() for a job where most of the processing is done by nltk.sentiment.vader.SentimentIntensityAnalyzer().

This notebook is self-explanatory:

https://github.com/FlorinAndrei/misc/blob/master/p_tqdm_bug_1.ipynb

p_map() is orders of magnitude slower.

However, the same function seems to work fine, fast enough, for another task - reading 25k files off the disk.

Windows 10, Python 3.8.8, Jupyter Notebook

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.