I think this is due to the tile size, in your example you use <code class="notranslate

Profile Canopy Height model about pyfor HOT 9 CLOSED

bw4sz commented on June 14, 2024

Profile Canopy Height model

from pyfor.

Comments (9)

brycefrank commented on June 14, 2024 1

Very nice! With regard to medfilt then, would you be comfortable making a PR on 0.3.2 branch? You have been making a lot of contributions and it may be nice to count toward that via commits if you would like. Should be pretty easy!

from pyfor.

bw4sz commented on June 14, 2024

Its a function of kernel size, but counter intuitively it increases with kernel size. I would have anticipated the opposite.

import cProfile, pstats
cp = cProfile.Profile()

from DeepForest import Lidar

lidar_tile=Lidar.load_lidar("/Users/ben/Documents/DeepLidar/data/SJER/SJER_002.laz")

cp.enable()
chm = lidar_tile.chm(cell_size = 0.1 , interp_method = "nearest", pit_filter = "median", kernel_size = 21)
cp.disable()

         203631 function calls (201530 primitive calls) in 3.758 seconds

   Ordered by: cumulative time, call count
   List reduced from 1184 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.760    3.760 cloud.py:304(chm)
        1    0.000    0.000    3.542    3.542 rasterizer.py:244(pit_filter)
        1    0.000    0.000    2.728    2.728 signaltools.py:854(medfilt)
        1    2.728    2.728    2.728    2.728 {built-in method scipy.signal.sigtools._order_filterND}

Versus

from DeepForest import Lidar

lidar_tile=Lidar.load_lidar("/Users/ben/Documents/DeepLidar/data/SJER/SJER_002.laz")

cp.enable()
chm = lidar_tile.chm(cell_size = 0.1 , interp_method = "nearest", pit_filter = "median", kernel_size = 3)
cp.disable()

         203631 function calls (201530 primitive calls) in 1.117 seconds

   Ordered by: cumulative time, call count
   List reduced from 1184 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.120    1.120 cloud.py:304(chm)
        1    0.000    0.000    0.903    0.903 rasterizer.py:244(pit_filter)
    162/1    0.003    0.000    0.815    0.815 <frozen importlib._bootstrap>:966(_find_and_load)
    162/1    0.001    0.000    0.815    0.815 <frozen importlib._bootstrap>:936(_find_and_load_unlocked)
    153/1    0.002    0.000    0.815    0.815 <frozen importlib._bootstrap>:651(_load_unlocked)
    123/1    0.001    0.000    0.815    0.815 <frozen importlib._bootstrap_external>:672(exec_module)
    219/1    0.000    0.000    0.815    0.815 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
    336/1    0.036    0.000    0.815    0.815 {built-in method builtins.exec}
        1    0.047    0.047    0.815    0.815 __init__.py:308(<module>)

from pyfor.

brycefrank commented on June 14, 2024

Interesting find! This is something I have run into as well (not the kernel size differences per-se, but the performance in general of .chm with interpolation). My best guess without getting into the details is the griddata call in tandem with the "nearest" interpolation setting:

pyfor/pyfor/rasterizer.py

Lines 101 to 102 in d2bb376

    
           # TODO generally a slow approach 
        
           interp_grid = griddata(points, values, (X, Y), method=interp_method).T

I dug down into the weeds a few months ago, and griddata turns out to be a rather complex guy. I think even at one point a delaunay triagulation is constructed:

https://stackoverflow.com/questions/20915502/speedup-scipy-griddata-for-multiple-interpolations-between-two-irregular-grids

Here is my suggestion for getting started for now:

You can easily rip out the array of your chm before you do any computations.

chm = lidar_tile.chm(0.1)
array = chm.array

Once you have this array you are free to do with it as you please. I have not looked into alternatives for filling nans or filtering pits outside of what the defaults are (although I don't think the median filter is the bottleneck here). I am sure there are a few papers out there, and a long-term plan I have is to write a few of these nan/pit filters in cython that are a bit more efficient than griddata, but my priority these days is finishing off the collection module.

If you come up with something you are happy with, reset the attribute

chm.array  = your_filtered_array

Another thing to try is setting the interp_method to "linear" but make sure to check the results. I think it runs just a bit faster.

from pyfor.

bw4sz commented on June 14, 2024

I think i'm one to something? Might be the wrong function here.

pyfor/pyfor/rasterizer.py

Line 251 in d2bb376

from scipy.signal import medfilt

import scipy as sp
import numpy as np
from scipy.signal import medfilt2d

X = np.random.random((2000, 2000)) # sample 2D array

import cProfile, pstats
cp = cProfile.Profile()

cp.enable()
a = sp.signal.medfilt(X,3)
#b = medfilt2d(X,[3,3])
cp.disable()

stats = pstats.Stats(cp)
stats.strip_dirs()
stats.sort_stats('cumulative', 'calls')
stats.print_stats(40)

         27 function calls in 2.687 seconds

   Ordered by: cumulative time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.687    2.687 signaltools.py:854(medfilt)
        1    2.686    2.686    2.686    2.686 {built-in method scipy.signal.sigtools._order_filterND}
        1    0.000    0.000    0.000    0.000 fromnumeric.py:3163(product)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:2478(prod)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:64(_wrapreduction)
        1    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.000    0.000 fromnumeric.py:404(repeat)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:49(_wrapfunc)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:36(_wrapit)
        1    0.000    0.000    0.000    0.000 shape_base.py:11(atleast_1d)

import scipy as sp
import numpy as np
from scipy.signal import medfilt2d

X = np.random.random((2000, 2000)) # sample 2D array

import cProfile, pstats
cp = cProfile.Profile()

cp.enable()
#a = sp.signal.medfilt(X,3)
b = medfilt2d(X,[3,3])
cp.disable()

stats = pstats.Stats(cp)
stats.strip_dirs()
stats.sort_stats('cumulative', 'calls')
stats.print_stats(40)

         7 function calls in 0.487 seconds

   Ordered by: cumulative time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.487    0.487 signaltools.py:1131(medfilt2d)
        1    0.486    0.486    0.486    0.486 {built-in method scipy.signal.sigtools._medfilt2d}
        2    0.000    0.000    0.000    0.000 numeric.py:433(asarray)
        2    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.array}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

5x bump using medfilt2d versus medfilt. I think because assumptions of a square window?

from pyfor.

bw4sz commented on June 14, 2024

I'm not the first here. https://gist.github.com/f0k/2f8402e4dfb6974bfcf1

from pyfor.

bw4sz commented on June 14, 2024

Done. Looking at those performance stats. I really don't see any calls to griddata, to me it 100% looks like the median filtering.

from pyfor.

brycefrank commented on June 14, 2024

I think this is due to the tile size, in your example you use SJER_002.laz. If I recall correctly this is relatively small "plot-level" tile. The griddata bottleneck occurs for larger, landscape-level tiles (1km x 1km for example).

from pyfor.

bw4sz commented on June 14, 2024

I take it back, looking at the full stack. There is definitely ongoing performance challenges with the .griddata, taking a peek into it.

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

(9999, 9999)
         1053282 function calls (1050067 primitive calls) in 4618.410 seconds

   Ordered by: cumulative time, call count
   List reduced from 1106 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.129    0.129 4618.410 4618.410 Lidar.py:67(load_lidar)
        2    0.104    0.052 4521.924 2260.962 rasterizer.py:78(interpolate)
        1    0.160    0.160 4521.662 4521.662 cloud.py:304(chm)
        3    0.256    0.085 4501.981 1500.660 ndgriddata.py:88(griddata)
        3 4356.559 1452.186 4357.653 1452.551 ndgriddata.py:58(__init__)
        3    2.839    0.946  144.072   48.024 ndgriddata.py:67(__call__)
        3  138.584   46.195  138.585   46.195 {method 'query' of 'scipy.spatial.ckdtree.cKDTree' objects}
        1    0.044    0.044   51.709   51.709 cloud.py:68(__init__)
```

from pyfor.

brycefrank commented on June 14, 2024

There it is. Ben, I have good time to work on pyfor today, I will be around on gitter if you need quick feedback. I plan on tackling the median filter test first, and then move on to 0.3.2 tasks.

from pyfor.

Profile Canopy Height model about pyfor HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	# TODO generally a slow approach
	interp_grid = griddata(points, values, (X, Y), method=interp_method).T