Comments (9)
Very nice! With regard to medfilt
then, would you be comfortable making a PR on 0.3.2
branch? You have been making a lot of contributions and it may be nice to count toward that via commits if you would like. Should be pretty easy!
from pyfor.
Its a function of kernel size, but counter intuitively it increases with kernel size. I would have anticipated the opposite.
import cProfile, pstats
cp = cProfile.Profile()
from DeepForest import Lidar
lidar_tile=Lidar.load_lidar("/Users/ben/Documents/DeepLidar/data/SJER/SJER_002.laz")
cp.enable()
chm = lidar_tile.chm(cell_size = 0.1 , interp_method = "nearest", pit_filter = "median", kernel_size = 21)
cp.disable()
203631 function calls (201530 primitive calls) in 3.758 seconds
Ordered by: cumulative time, call count
List reduced from 1184 to 40 due to restriction <40>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 3.760 3.760 cloud.py:304(chm)
1 0.000 0.000 3.542 3.542 rasterizer.py:244(pit_filter)
1 0.000 0.000 2.728 2.728 signaltools.py:854(medfilt)
1 2.728 2.728 2.728 2.728 {built-in method scipy.signal.sigtools._order_filterND}
Versus
from DeepForest import Lidar
lidar_tile=Lidar.load_lidar("/Users/ben/Documents/DeepLidar/data/SJER/SJER_002.laz")
cp.enable()
chm = lidar_tile.chm(cell_size = 0.1 , interp_method = "nearest", pit_filter = "median", kernel_size = 3)
cp.disable()
203631 function calls (201530 primitive calls) in 1.117 seconds
Ordered by: cumulative time, call count
List reduced from 1184 to 40 due to restriction <40>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.120 1.120 cloud.py:304(chm)
1 0.000 0.000 0.903 0.903 rasterizer.py:244(pit_filter)
162/1 0.003 0.000 0.815 0.815 <frozen importlib._bootstrap>:966(_find_and_load)
162/1 0.001 0.000 0.815 0.815 <frozen importlib._bootstrap>:936(_find_and_load_unlocked)
153/1 0.002 0.000 0.815 0.815 <frozen importlib._bootstrap>:651(_load_unlocked)
123/1 0.001 0.000 0.815 0.815 <frozen importlib._bootstrap_external>:672(exec_module)
219/1 0.000 0.000 0.815 0.815 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
336/1 0.036 0.000 0.815 0.815 {built-in method builtins.exec}
1 0.047 0.047 0.815 0.815 __init__.py:308(<module>)
from pyfor.
Interesting find! This is something I have run into as well (not the kernel size differences per-se, but the performance in general of .chm
with interpolation). My best guess without getting into the details is the griddata
call in tandem with the "nearest" interpolation setting:
Lines 101 to 102 in d2bb376
I dug down into the weeds a few months ago, and griddata
turns out to be a rather complex guy. I think even at one point a delaunay triagulation is constructed:
Here is my suggestion for getting started for now:
You can easily rip out the array of your chm before you do any computations.
chm = lidar_tile.chm(0.1)
array = chm.array
Once you have this array you are free to do with it as you please. I have not looked into alternatives for filling nans or filtering pits outside of what the defaults are (although I don't think the median filter is the bottleneck here). I am sure there are a few papers out there, and a long-term plan I have is to write a few of these nan/pit filters in cython
that are a bit more efficient than griddata
, but my priority these days is finishing off the collection
module.
If you come up with something you are happy with, reset the attribute
chm.array = your_filtered_array
Another thing to try is setting the interp_method
to "linear"
but make sure to check the results. I think it runs just a bit faster.
from pyfor.
I think i'm one to something? Might be the wrong function here.
Line 251 in d2bb376
import scipy as sp
import numpy as np
from scipy.signal import medfilt2d
X = np.random.random((2000, 2000)) # sample 2D array
import cProfile, pstats
cp = cProfile.Profile()
cp.enable()
a = sp.signal.medfilt(X,3)
#b = medfilt2d(X,[3,3])
cp.disable()
stats = pstats.Stats(cp)
stats.strip_dirs()
stats.sort_stats('cumulative', 'calls')
stats.print_stats(40)
27 function calls in 2.687 seconds
Ordered by: cumulative time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 2.687 2.687 signaltools.py:854(medfilt)
1 2.686 2.686 2.686 2.686 {built-in method scipy.signal.sigtools._order_filterND}
1 0.000 0.000 0.000 0.000 fromnumeric.py:3163(product)
1 0.000 0.000 0.000 0.000 fromnumeric.py:2478(prod)
1 0.000 0.000 0.000 0.000 fromnumeric.py:64(_wrapreduction)
1 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects}
1 0.000 0.000 0.000 0.000 fromnumeric.py:404(repeat)
1 0.000 0.000 0.000 0.000 fromnumeric.py:49(_wrapfunc)
1 0.000 0.000 0.000 0.000 fromnumeric.py:36(_wrapit)
1 0.000 0.000 0.000 0.000 shape_base.py:11(atleast_1d)
import scipy as sp
import numpy as np
from scipy.signal import medfilt2d
X = np.random.random((2000, 2000)) # sample 2D array
import cProfile, pstats
cp = cProfile.Profile()
cp.enable()
#a = sp.signal.medfilt(X,3)
b = medfilt2d(X,[3,3])
cp.disable()
stats = pstats.Stats(cp)
stats.strip_dirs()
stats.sort_stats('cumulative', 'calls')
stats.print_stats(40)
7 function calls in 0.487 seconds
Ordered by: cumulative time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.487 0.487 signaltools.py:1131(medfilt2d)
1 0.486 0.486 0.486 0.486 {built-in method scipy.signal.sigtools._medfilt2d}
2 0.000 0.000 0.000 0.000 numeric.py:433(asarray)
2 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
5x bump using medfilt2d versus medfilt. I think because assumptions of a square window?
from pyfor.
I'm not the first here. https://gist.github.com/f0k/2f8402e4dfb6974bfcf1
from pyfor.
Done. Looking at those performance stats. I really don't see any calls to griddata, to me it 100% looks like the median filtering.
from pyfor.
I think this is due to the tile size, in your example you use SJER_002.laz
. If I recall correctly this is relatively small "plot-level" tile. The griddata
bottleneck occurs for larger, landscape-level tiles (1km x 1km for example).
from pyfor.
I take it back, looking at the full stack. There is definitely ongoing performance challenges with the .griddata, taking a peek into it.
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
(9999, 9999)
1053282 function calls (1050067 primitive calls) in 4618.410 seconds
Ordered by: cumulative time, call count
List reduced from 1106 to 40 due to restriction <40>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.129 0.129 4618.410 4618.410 Lidar.py:67(load_lidar)
2 0.104 0.052 4521.924 2260.962 rasterizer.py:78(interpolate)
1 0.160 0.160 4521.662 4521.662 cloud.py:304(chm)
3 0.256 0.085 4501.981 1500.660 ndgriddata.py:88(griddata)
3 4356.559 1452.186 4357.653 1452.551 ndgriddata.py:58(__init__)
3 2.839 0.946 144.072 48.024 ndgriddata.py:67(__call__)
3 138.584 46.195 138.585 46.195 {method 'query' of 'scipy.spatial.ckdtree.cKDTree' objects}
1 0.044 0.044 51.709 51.709 cloud.py:68(__init__)
```
from pyfor.
There it is. Ben, I have good time to work on pyfor today, I will be around on gitter if you need quick feedback. I plan on tackling the median filter test first, and then move on to 0.3.2
tasks.
from pyfor.
Related Issues (20)
- Cryptic error when clipping a particular polygon and point set HOT 2
- Travis fails on write .laz
- Subversive pyproj error for Windows conda environment HOT 4
- Unclear argument parameter name in `retile_raster`
- Point cloud colors HOT 2
- `KrausPfeifer1998` classify fails HOT 1
- Add heightbreak for all metrics
- Eliminate `ResourceWarning`s from testing suite HOT 1
- `create_index` should be multithreaded by default
- Scanning files for a collection should be generalized to `.las/z` files
- Point gridding can be simplified
- Migrate to `conda-forge` testing
- Warning for `force_extent` bbox
- Docstring for `array_to_raster` is out of date HOT 1
- Make minimal area-based approach example
- Add ability to pass additional arguments through par_apply HOT 2
- Area-Based Metrics dodument is missing
- missing else branch for tolerance parameter in KrausPfeifer1998 HOT 2
- Conda forge gdal issue HOT 7
- Failed rasterization HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyfor.