Strictly typing each function makes numba unable to compile specialized functions for

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Allow Numba to perform polymorphic dispatching about pysheds HOT 5 OPEN

groutr commented on July 16, 2024

Allow Numba to perform polymorphic dispatching

from pysheds.

Comments (5)

mdbartos commented on July 16, 2024

Thanks for pointing this out. I initially started with numba's polymorphic dispatching but moved to strict types later on. There are advantages offered by strict typing like ease of debugging. For some users (like myself), the JIT compilation overhead also starts to add up over multiple Python sessions and can be somewhat frustrating when experimenting in an interactive context.

For performance, my inclination would be to export multiple versions of each numba function as shown below and then infer types from the source data: https://numba.readthedocs.io/en/stable/user/pycc.html#standalone-example

from pysheds.

groutr commented on July 16, 2024

@mdbartos I'm not sure if you noticed the note in the documentation about the pending deprecation of AOT compilation. I can understand the ease of debugging, however the inputs to most of these functions are already strictly typed, homogeneous arrays. The main request here is to allow using the smaller native dtypes of the source data. The compiled functions can still be cached, so the comment about compilation overhead adding up doesn't really make sense to me.

from pysheds.

mdbartos commented on July 16, 2024

Hi @groutr, in my experience numba still recompiles the function each time unless types are specified, even when cache=True.

Try running the following code sample, then resetting the kernel and trying again. The untyped one incurs compilation overhead on subsequent runs while the typed one does not:

from numba import njit
import numpy as np
from numba.types import float64

@njit(float64(float64[:]), cache=True)
def norm_squared_typed(vec):
    n = len(vec)
    result = 0.
    for i in range(n):
        result += vec[i]**2
    return result

@njit(cache=True)
def norm_squared_untyped(vec):
    n = len(vec)
    result = 0.
    for i in range(n):
        result += vec[i]**2
    return result

vec = np.arange(10, dtype=np.float64)

%time norm_squared_typed(vec)
%time norm_squared_untyped(vec)

CPU times: user 150 µs, sys: 478 µs, total: 628 µs
Wall time: 80.3 µs
CPU times: user 9.54 ms, sys: 45.7 ms, total: 55.2 ms
Wall time: 6.04 ms

from pysheds.

groutr commented on July 16, 2024

@mdbartos I think there is a little bit of misunderstanding of what numba caching is actually doing. Your benchmark isn't measuring what you think it's measuring.

After playing around with your example, what I think is happening is:
When you define norm_squared_type, because the types are completely defined, numba can compile and cache the function at the function definition time. This means by the the time you run your timing statement, it has already been compiled. When you reset your kernel, the function is still be recompiled when you define the function. It is simply a matter of the compilation happening somewhere other than you expect.
On the other hand, because norm_squared_untyped has to use lazy compilation, it cannot be compiled at function definition because there is not enough type information. The function uses the types of the first call to specialize the function and compile and cache it. All subsequent calls with the same calling signature use the cached version. If you call the function with different types, that will compile a new version of the function for those types and cache it as well.

When you reset the kernel, the both functions get recompiled. It's just one function gets compiled much earlier, when you define the function, making it appear to be faster than the lazy compiled function on the first call. In reality, they take the same time to complete.
You can see this happening if you turn on the numba cache debugging.

from pysheds.

mdbartos commented on July 16, 2024

Thanks @groutr for the explanation. I re-tested with different breakpoints for timing and that appears to be correct.

I would be open to removing strict typing, but it would have to be done carefully. Maybe as part of a longer term release.

from pysheds.

Allow Numba to perform polymorphic dispatching about pysheds HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent