Comments (5)
Thanks for pointing this out. I initially started with numba's polymorphic dispatching but moved to strict types later on. There are advantages offered by strict typing like ease of debugging. For some users (like myself), the JIT compilation overhead also starts to add up over multiple Python sessions and can be somewhat frustrating when experimenting in an interactive context.
For performance, my inclination would be to export multiple versions of each numba function as shown below and then infer types from the source data: https://numba.readthedocs.io/en/stable/user/pycc.html#standalone-example
from pysheds.
@mdbartos I'm not sure if you noticed the note in the documentation about the pending deprecation of AOT compilation. I can understand the ease of debugging, however the inputs to most of these functions are already strictly typed, homogeneous arrays. The main request here is to allow using the smaller native dtypes of the source data. The compiled functions can still be cached, so the comment about compilation overhead adding up doesn't really make sense to me.
from pysheds.
Hi @groutr, in my experience numba still recompiles the function each time unless types are specified, even when cache=True
.
Try running the following code sample, then resetting the kernel and trying again. The untyped one incurs compilation overhead on subsequent runs while the typed one does not:
from numba import njit
import numpy as np
from numba.types import float64
@njit(float64(float64[:]), cache=True)
def norm_squared_typed(vec):
n = len(vec)
result = 0.
for i in range(n):
result += vec[i]**2
return result
@njit(cache=True)
def norm_squared_untyped(vec):
n = len(vec)
result = 0.
for i in range(n):
result += vec[i]**2
return result
vec = np.arange(10, dtype=np.float64)
%time norm_squared_typed(vec)
%time norm_squared_untyped(vec)
CPU times: user 150 µs, sys: 478 µs, total: 628 µs
Wall time: 80.3 µs
CPU times: user 9.54 ms, sys: 45.7 ms, total: 55.2 ms
Wall time: 6.04 ms
from pysheds.
@mdbartos I think there is a little bit of misunderstanding of what numba caching is actually doing. Your benchmark isn't measuring what you think it's measuring.
After playing around with your example, what I think is happening is:
When you define norm_squared_type
, because the types are completely defined, numba can compile and cache the function at the function definition time. This means by the the time you run your timing statement, it has already been compiled. When you reset your kernel, the function is still be recompiled when you define the function. It is simply a matter of the compilation happening somewhere other than you expect.
On the other hand, because norm_squared_untyped
has to use lazy compilation, it cannot be compiled at function definition because there is not enough type information. The function uses the types of the first call to specialize the function and compile and cache it. All subsequent calls with the same calling signature use the cached version. If you call the function with different types, that will compile a new version of the function for those types and cache it as well.
When you reset the kernel, the both functions get recompiled. It's just one function gets compiled much earlier, when you define the function, making it appear to be faster than the lazy compiled function on the first call. In reality, they take the same time to complete.
You can see this happening if you turn on the numba cache debugging.
from pysheds.
Thanks @groutr for the explanation. I re-tested with different breakpoints for timing and that appears to be correct.
I would be open to removing strict typing, but it would have to be done carefully. Maybe as part of a longer term release.
from pysheds.
Related Issues (20)
- Facing problem when nodata_out = np.nan as default in functions
- How to create Flow Distance raster same shape as DEM raster?
- Re-evaluate Numba performance HOT 2
- Can't 'imshow' DEM for some unkown reason HOT 1
- extract_profiles function returning wrong connections when neighbouring cells drain to outlet
- strang plots HOT 1
- Allow user to disable `parallel=True` with numba HOT 2
- Why stream network `LineStrings` do not pass through the centroid of each grid cell? HOT 1
- accumulation issue HOT 3
- Preprocessing DEM with pysheds seems to produce incorrect accumulations HOT 11
- Using catchment() with discharge point outside of extents crashes program
- sGrid and pGrid have major differences
- Pysheds Cupy / Cuspatial support HOT 1
- Accumulation disconnections HOT 4
- setup fail by github repo of latest release version
- Pixels that should be accumulation watercourses are shown as nodata HOT 2
- Wrong bbox when using `ViewFinder` HOT 3
- issue in pygrid with np.int and np.warnings
- D8 and Dinf flow directions look incorrect HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pysheds.