Giter Site home page Giter Site logo

Comments (5)

mdbartos avatar mdbartos commented on July 16, 2024

Thanks for pointing this out. I initially started with numba's polymorphic dispatching but moved to strict types later on. There are advantages offered by strict typing like ease of debugging. For some users (like myself), the JIT compilation overhead also starts to add up over multiple Python sessions and can be somewhat frustrating when experimenting in an interactive context.

For performance, my inclination would be to export multiple versions of each numba function as shown below and then infer types from the source data: https://numba.readthedocs.io/en/stable/user/pycc.html#standalone-example

from pysheds.

groutr avatar groutr commented on July 16, 2024

@mdbartos I'm not sure if you noticed the note in the documentation about the pending deprecation of AOT compilation. I can understand the ease of debugging, however the inputs to most of these functions are already strictly typed, homogeneous arrays. The main request here is to allow using the smaller native dtypes of the source data. The compiled functions can still be cached, so the comment about compilation overhead adding up doesn't really make sense to me.

from pysheds.

mdbartos avatar mdbartos commented on July 16, 2024

Hi @groutr, in my experience numba still recompiles the function each time unless types are specified, even when cache=True.

Try running the following code sample, then resetting the kernel and trying again. The untyped one incurs compilation overhead on subsequent runs while the typed one does not:

from numba import njit
import numpy as np
from numba.types import float64

@njit(float64(float64[:]), cache=True)
def norm_squared_typed(vec):
    n = len(vec)
    result = 0.
    for i in range(n):
        result += vec[i]**2
    return result

@njit(cache=True)
def norm_squared_untyped(vec):
    n = len(vec)
    result = 0.
    for i in range(n):
        result += vec[i]**2
    return result

vec = np.arange(10, dtype=np.float64)

%time norm_squared_typed(vec)
%time norm_squared_untyped(vec)
CPU times: user 150 µs, sys: 478 µs, total: 628 µs
Wall time: 80.3 µs
CPU times: user 9.54 ms, sys: 45.7 ms, total: 55.2 ms
Wall time: 6.04 ms

from pysheds.

groutr avatar groutr commented on July 16, 2024

@mdbartos I think there is a little bit of misunderstanding of what numba caching is actually doing. Your benchmark isn't measuring what you think it's measuring.

After playing around with your example, what I think is happening is:
When you define norm_squared_type, because the types are completely defined, numba can compile and cache the function at the function definition time. This means by the the time you run your timing statement, it has already been compiled. When you reset your kernel, the function is still be recompiled when you define the function. It is simply a matter of the compilation happening somewhere other than you expect.
On the other hand, because norm_squared_untyped has to use lazy compilation, it cannot be compiled at function definition because there is not enough type information. The function uses the types of the first call to specialize the function and compile and cache it. All subsequent calls with the same calling signature use the cached version. If you call the function with different types, that will compile a new version of the function for those types and cache it as well.

When you reset the kernel, the both functions get recompiled. It's just one function gets compiled much earlier, when you define the function, making it appear to be faster than the lazy compiled function on the first call. In reality, they take the same time to complete.
You can see this happening if you turn on the numba cache debugging.

from pysheds.

mdbartos avatar mdbartos commented on July 16, 2024

Thanks @groutr for the explanation. I re-tested with different breakpoints for timing and that appears to be correct.

I would be open to removing strict typing, but it would have to be done carefully. Maybe as part of a longer term release.

from pysheds.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.