Giter Site home page Giter Site logo

Profile Canopy Height model about pyfor HOT 9 CLOSED

bw4sz avatar bw4sz commented on June 14, 2024
Profile Canopy Height model

from pyfor.

Comments (9)

brycefrank avatar brycefrank commented on June 14, 2024 1

Very nice! With regard to medfilt then, would you be comfortable making a PR on 0.3.2 branch? You have been making a lot of contributions and it may be nice to count toward that via commits if you would like. Should be pretty easy!

from pyfor.

bw4sz avatar bw4sz commented on June 14, 2024

Its a function of kernel size, but counter intuitively it increases with kernel size. I would have anticipated the opposite.

import cProfile, pstats
cp = cProfile.Profile()

from DeepForest import Lidar

lidar_tile=Lidar.load_lidar("/Users/ben/Documents/DeepLidar/data/SJER/SJER_002.laz")

cp.enable()
chm = lidar_tile.chm(cell_size = 0.1 , interp_method = "nearest", pit_filter = "median", kernel_size = 21)
cp.disable()
         203631 function calls (201530 primitive calls) in 3.758 seconds

   Ordered by: cumulative time, call count
   List reduced from 1184 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.760    3.760 cloud.py:304(chm)
        1    0.000    0.000    3.542    3.542 rasterizer.py:244(pit_filter)
        1    0.000    0.000    2.728    2.728 signaltools.py:854(medfilt)
        1    2.728    2.728    2.728    2.728 {built-in method scipy.signal.sigtools._order_filterND}

Versus

from DeepForest import Lidar

lidar_tile=Lidar.load_lidar("/Users/ben/Documents/DeepLidar/data/SJER/SJER_002.laz")

cp.enable()
chm = lidar_tile.chm(cell_size = 0.1 , interp_method = "nearest", pit_filter = "median", kernel_size = 3)
cp.disable()
         203631 function calls (201530 primitive calls) in 1.117 seconds

   Ordered by: cumulative time, call count
   List reduced from 1184 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.120    1.120 cloud.py:304(chm)
        1    0.000    0.000    0.903    0.903 rasterizer.py:244(pit_filter)
    162/1    0.003    0.000    0.815    0.815 <frozen importlib._bootstrap>:966(_find_and_load)
    162/1    0.001    0.000    0.815    0.815 <frozen importlib._bootstrap>:936(_find_and_load_unlocked)
    153/1    0.002    0.000    0.815    0.815 <frozen importlib._bootstrap>:651(_load_unlocked)
    123/1    0.001    0.000    0.815    0.815 <frozen importlib._bootstrap_external>:672(exec_module)
    219/1    0.000    0.000    0.815    0.815 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
    336/1    0.036    0.000    0.815    0.815 {built-in method builtins.exec}
        1    0.047    0.047    0.815    0.815 __init__.py:308(<module>)

from pyfor.

brycefrank avatar brycefrank commented on June 14, 2024

Interesting find! This is something I have run into as well (not the kernel size differences per-se, but the performance in general of .chm with interpolation). My best guess without getting into the details is the griddata call in tandem with the "nearest" interpolation setting:

pyfor/pyfor/rasterizer.py

Lines 101 to 102 in d2bb376

# TODO generally a slow approach
interp_grid = griddata(points, values, (X, Y), method=interp_method).T

I dug down into the weeds a few months ago, and griddata turns out to be a rather complex guy. I think even at one point a delaunay triagulation is constructed:

https://stackoverflow.com/questions/20915502/speedup-scipy-griddata-for-multiple-interpolations-between-two-irregular-grids

Here is my suggestion for getting started for now:

You can easily rip out the array of your chm before you do any computations.

chm = lidar_tile.chm(0.1)
array = chm.array

Once you have this array you are free to do with it as you please. I have not looked into alternatives for filling nans or filtering pits outside of what the defaults are (although I don't think the median filter is the bottleneck here). I am sure there are a few papers out there, and a long-term plan I have is to write a few of these nan/pit filters in cython that are a bit more efficient than griddata, but my priority these days is finishing off the collection module.

If you come up with something you are happy with, reset the attribute

chm.array  = your_filtered_array

Another thing to try is setting the interp_method to "linear" but make sure to check the results. I think it runs just a bit faster.

from pyfor.

bw4sz avatar bw4sz commented on June 14, 2024

I think i'm one to something? Might be the wrong function here.

from scipy.signal import medfilt

import scipy as sp
import numpy as np
from scipy.signal import medfilt2d

X = np.random.random((2000, 2000)) # sample 2D array

import cProfile, pstats
cp = cProfile.Profile()

cp.enable()
a = sp.signal.medfilt(X,3)
#b = medfilt2d(X,[3,3])
cp.disable()

stats = pstats.Stats(cp)
stats.strip_dirs()
stats.sort_stats('cumulative', 'calls')
stats.print_stats(40)
         27 function calls in 2.687 seconds

   Ordered by: cumulative time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.687    2.687 signaltools.py:854(medfilt)
        1    2.686    2.686    2.686    2.686 {built-in method scipy.signal.sigtools._order_filterND}
        1    0.000    0.000    0.000    0.000 fromnumeric.py:3163(product)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:2478(prod)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:64(_wrapreduction)
        1    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.000    0.000 fromnumeric.py:404(repeat)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:49(_wrapfunc)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:36(_wrapit)
        1    0.000    0.000    0.000    0.000 shape_base.py:11(atleast_1d)
import scipy as sp
import numpy as np
from scipy.signal import medfilt2d

X = np.random.random((2000, 2000)) # sample 2D array

import cProfile, pstats
cp = cProfile.Profile()

cp.enable()
#a = sp.signal.medfilt(X,3)
b = medfilt2d(X,[3,3])
cp.disable()

stats = pstats.Stats(cp)
stats.strip_dirs()
stats.sort_stats('cumulative', 'calls')
stats.print_stats(40)
         7 function calls in 0.487 seconds

   Ordered by: cumulative time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.487    0.487 signaltools.py:1131(medfilt2d)
        1    0.486    0.486    0.486    0.486 {built-in method scipy.signal.sigtools._medfilt2d}
        2    0.000    0.000    0.000    0.000 numeric.py:433(asarray)
        2    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.array}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

5x bump using medfilt2d versus medfilt. I think because assumptions of a square window?

from pyfor.

bw4sz avatar bw4sz commented on June 14, 2024

I'm not the first here. https://gist.github.com/f0k/2f8402e4dfb6974bfcf1

from pyfor.

bw4sz avatar bw4sz commented on June 14, 2024

Done. Looking at those performance stats. I really don't see any calls to griddata, to me it 100% looks like the median filtering.

from pyfor.

brycefrank avatar brycefrank commented on June 14, 2024

I think this is due to the tile size, in your example you use SJER_002.laz. If I recall correctly this is relatively small "plot-level" tile. The griddata bottleneck occurs for larger, landscape-level tiles (1km x 1km for example).

from pyfor.

bw4sz avatar bw4sz commented on June 14, 2024

I take it back, looking at the full stack. There is definitely ongoing performance challenges with the .griddata, taking a peek into it.

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

(9999, 9999)
         1053282 function calls (1050067 primitive calls) in 4618.410 seconds

   Ordered by: cumulative time, call count
   List reduced from 1106 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.129    0.129 4618.410 4618.410 Lidar.py:67(load_lidar)
        2    0.104    0.052 4521.924 2260.962 rasterizer.py:78(interpolate)
        1    0.160    0.160 4521.662 4521.662 cloud.py:304(chm)
        3    0.256    0.085 4501.981 1500.660 ndgriddata.py:88(griddata)
        3 4356.559 1452.186 4357.653 1452.551 ndgriddata.py:58(__init__)
        3    2.839    0.946  144.072   48.024 ndgriddata.py:67(__call__)
        3  138.584   46.195  138.585   46.195 {method 'query' of 'scipy.spatial.ckdtree.cKDTree' objects}
        1    0.044    0.044   51.709   51.709 cloud.py:68(__init__)
```

from pyfor.

brycefrank avatar brycefrank commented on June 14, 2024

There it is. Ben, I have good time to work on pyfor today, I will be around on gitter if you need quick feedback. I plan on tackling the median filter test first, and then move on to 0.3.2 tasks.

from pyfor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.