tomasbeuzen / pybeach Goto Github PK

A Python package for locating the dune toe on cross-shore beach profile transects.

License: MIT License

Jupyter Notebook 59.23% TeX 3.31% Python 37.46%

pybeach's Introduction

As of 2021 this repo is currently not actively maintained. I do hope to make time for it in the future as there's lots of room for improvement - PRs welcome any time.

pybeach: A Python package for locating the dune toe on cross-shore beach profile transects.

Background

pybeach is a Python package for identifying dune toes on 2D beach profile transects. It includes the following methods:

Machine learning;
Maximum curvature (Stockdon et al, 2007);
Relative relief (Wernette et al, 2016); and,
Perpendicular distance.

In addition, pybeach contains methods for identifying the shoreline position and dune crest position on 2D beach profile transects. See the pybeach paper for more details about pybeach.

Installation

pip install pybeach

Usage

from pybeach.beach import Profile

# example data
x = np.arange(0, 80, 0.5)
z = np.concatenate((np.linspace(4, 5, 40),
                    np.linspace(5, 2, 10),
                    np.linspace(2, 0, 91)[1:],
                    np.linspace(0, -1, 20)))

# instantiate
p = Profile(x, z)

# predict dune toe, dune crest, shoreline location
toe_ml, prob_ml = p.predict_dunetoe_ml('wave_embayed_clf')  # predict toe using machine learning model
toe_mc = p.predict_dunetoe_mc()    # predict toe using maximum curvature method (Stockdon et al, 2007)
toe_rr = p.predict_dunetoe_rr()    # predict toe using relative relief method (Wernette et al, 2016)
toe_pd = p.predict_dunetoe_pd()    # predict toe using perpendicular distance method
crest = p.predict_dunecrest()      # predict dune crest
shoreline = p.predict_shoreline()  # predict shoreline

See the example notebook for more details.

Documentation

Read the pybeach documentation here.

Dependencies

A list of pybeach dependencies can be found in pyproject.toml. Currently, pybeach depends on the following:

python = "^3.7"
numpy = "1.17.2"
scipy = "1.3.1"
pandas = "0.25.1"
scikit-learn = "0.21.2"
joblib = "0.13.2"

Questions, Comments, Suggestions

Do you have a question that needs answering? Have you found an issue with the code and need to get it fixed? Or perhaps you're looking to contribute to the code and have ideas for how it could be improved. In all cases, please see the Issues page.

References

Stockdon, H. F., Sallenger Jr, A. H., Holman, R. A., & Howd, P. A. (2007). A simple model for the spatially-variable coastal response to hurricanes. Marine Geology, 238, 1-20. https://doi.org/10.1016/j.margeo.2006.11.004

Wernette, P., Houser, C., & Bishop, M. P. (2016). An automated approach for extracting Barrier Island morphology from digital elevation models. Geomorphology, 262, 1-7. https://doi.org/10.1016/j.geomorph.2016.02.024

pybeach's People

Contributors

Stargazers

Watchers

Forkers

tumugenshu ereisemann tristansalles pwernette benjaminh gorsol npucino simonweppe ssheminway anellenson

pybeach's Issues

JOSS review by Sherwood issue 3

The example notebook https://github.com/TomasBeuzen/pybeach/blob/master/example/example.ipynb
refers to pydune instead of pybeach. Unfortunately, there is also a package out there on GitLab called pydune. Installation of pybeach using pip install worked fine, and this example worked as indicated after I added the magic command %matplotlib inline. I notice scikit-learn is required...I already had that in my environment, but I don't think the documentation for pybeach indicates that it is required.

Suggestion: Improve ML performance by only loading model file once

I've run into a bit of a performance issue when trying to predict dune toes for 1000's of profiles using the machine learning approach. I thought that doing this should be quite fast since everything is built on numpy. Example of the code I'm trying to run is:

# List of x and y coordinates for a number of profiles
x_s = [(0,1, 2, 3, ...), (0, 1, 2, 3,...), ...]
y_s = [(0.1, 0.2, 0.3,...), (0.1, 0.2, 0.3,...), ...]

# Iterating over the profile coordinates is not as fast as it should be
for x, z in zip(x_s, ys):
  p = Profile(x, z)
  toe_ml, prob_ml = p.predict_dunetoe_ml('wave_embayed_clf')

I found the problem to be here:

pybeach/pybeach/beach.py

Lines 151 to 155 in 10d9cf2

    
           # Load the random forest classifier 
        
           try: 
        
               clf = cs.load_classifier(clf_name) 
        
           except FileNotFoundError: 
        
               raise FileNotFoundError(f'no classifier named {clf_name} found in classifier folder.')

Basically, everytime we make a new profile, we are reloading the model file which is an I/O bottleneck when everything else is numpy based. It'd be better if we only loaded the model file once and then predicted the profiles based on that. I think that making a new class for each predictor might be a better way to go? Something like this as a usage example:

# For machine learning:
ml_predictor = MLPredictor(model='mixed_clf')  # load model file once

# Now can reuse the predictor without reloading the model file
profile1_x_toe_ml = ml_predictor.predict(x=profile1_x, z=profile1_z)
profile2_x_toe_ml = ml_predictor.predict(x=profile2_x, z=profile2_z)

# You'd also have a new class for each other method
rr_predictor = RRPredictor(window_size=21)
profile1_x_toe_rr = rr_predictor.predict(x=profile1_x, z=profile1_z)

I'm happy to put in a pull request for this, but it'd involve refactoring and rearranging the code. Just wanted your thoughts @TomasBeuzen if you're happy with this, or there could be any other alternatives to improve performance?

JOSS review by Chris Sherwood issue 2

Unfortunately, PyBeach is also being used by a Python coding conference: https://2020.pybeach.org/. Not sure whether this is worth resolving or not.

Modification to compute relative relief

I noticed that relative relief is currently computed along a single transect, which is different from how relative relief was initially proposed and affects its delineation ability. Relative relief should be computed over a given area, instead of along a transect. To compute it along a transect is neglecting potentially important information in the DEM cells adjacent to the transect itself. Once relative relief has been computed using a planimetric 2D window (ideally across 3 different spatial scales), then it is reasonable to apply thresholding to determine the location of dune features. While the distinction between computing relative relief within a 2D area versus computing relative relief along a transect may seem trivial, the results are significantly different, with the transect-based computation typically being an underestimate of dune feature locations.

Improvements for v0.2.1

vectorize list comprehensions: while loops were a nice way to get pybeach started, I'd like to vectorize as many operations as I can to improve code efficiency
dump input type checking to utils.py: input type checking is bloating the pybeach modules. I'd like to create a new utils.py module to hold input type checking functions.

Paper edits for JOSS

This issue summarizes a few topical editor requests for the content of the paper under review at JOSS.

@TomasBeuzen can you

Provide a very short definition of "beach" and "dune" in the context of shore profile geometry. I request this to make the summary more understandable for a diverse, non-specialist audience.
Specify you are using the Random Forest classifier in the summary rather than referring to the more general "Machine Learning". I think this may be as simple as stating "Machine learning using Random Forest classification- discussed further below." in your enumerated list of supported methods.
When you are discussing support for custom ML models from user supplied data, can you be more specific about which ML methods are supported. Random forest with other data? Anything SciKit-learn supports?
Your paper nicely describes your extensive work to compare the performance of the alternative methods provided in pybeach. However, this is lost a bit in the pybeach section. I'd recommend that you add a new section heading called something like "Performance Assessment" just before the sentences that start with "For each dataset described above, the true location of the dune toe..."

JOSS review by Sherwood issue 4

The short paper is very readable and seems to cover all of the points required by JOSS. The examples showing performance of the various methods were nice, but I was left wondering "How is the true dune toe identified in the dataset?" A mention of how "true" dune toes are defined and determined from field observations or profiles would help.

When creating classifier, docstring is incorrect regarding required shape of z

Hi Tom,

Small issue here, but I've noticed that when using pybeach.support.classifier_support.create_classifier to create a classifier from multiple profiles, the shape of z needs to be (n,m) rather than (m,n) as stated in the docstring. This can be seen by running the following in an interpreter:

import numpy as np
from numpy import matlib
from pybeach.support import classifier_support as cs
from scipy.interpolate import interp1d

# Create dummy profile #
x = np.arange(0,10)
z = [9.8,10,9,8,7,6,5.8,5.6,5.4,5.2]

# Create dummy x and z data (with 100 of the same profile) based on dummy profile #
xi = np.arange(0,100,0.1)
f = interp1d(x,z,fill_value=np.nan,bounds_error=False);zi = f(xi)
zi = np.transpose(matlib.repmat(zi,100,1))
toes = np.tile(50,100)
print('x shape:'+str(np.shape(xi)),'z shape:'+str(np.shape(zi)),'toes shape:'+str(np.shape(toes)))

# Try to create the classifier #
clf = cs.create_classifier(xi, zi, toes, window=2, min_buffer=40, max_buffer=200)
# Note IndexError about boolean index not matching indexed array #
clf = cs.create_classifier(xi, np.transpose(zi), toes, window=2, min_buffer=40, max_buffer=200)
# Note that this worked with transposed z #

I suppose just changing the docstring here would be the way to go. Thanks for putting together such a helpful and easy-to-use package!

JOSS review by Sherwood - Final issue

I have completed my review of this repo. Except for the missing link to the documentation, everything looks great. I have downloaded, installed, and tested the code, and read through many of the code and documentation files. The author has done a great job...the code is clean, the documentation thorough, and the software provides a valuable tool for those of use that work with beach morphology data. I plan to use the code for analysis of my data, and if I run in to any questions, I will raise additional issues. I recommend this software and the accompanying paper be published with minor revisions to address the issues raised here. Thanks for the opportunity to review this code.

Discussion regarding profile classification for crest detection

Hello there,

my colleagues and I work for a coastal observatory and I wrote some Python code to process LIDAR data. Our method is similar to RR (relative relief) provided in your package yet quite different on some points.
I found your paper and this repo back these days and this was quite a nice discovery as you provide some interesting thoughts using ML.

Though I did struggle a bit to make an editable install (not used to poetry) and to make use of my data (opposite profile direction, profile points numbering not starting at 0, failures on detecting a dune toe depending on the profile, etc.), the results are very interesting ! We are thinking of creating our own classifier based on our data but I'm unsure if we have enough training data at that moment (~300 profiles vs. ~1500 that you used to create your models).

Yet, if the results on the dune toe detection are satisfying so far, we still have trouble on detecting the correct crest, whether considering a geomorphology approach or a risk management approach, and depending on the profile typology.

For example, sometimes the highest z value goes too far on land side whereas the highest peek sometimes point to a tiny peek on the beach side.

So here is the main question: did you try to apply your ML method (using a random forest classifier) in order to detect the dune crest as well ?
And in a similar way, we observe that the crest detection is sensitive to profile morphology (like reflective vs. dissipative one), did you try or do you know any approach that makes use of profile classification before detecting the crest/toe ?

JOSS review by Sherwood - Issue 1

Some of the links in the README.md file are broken. Specifically, 404 errors from the links to an example notebook and the documentation.

How to extract elevation of toes?

Hello! Great program, been using it with my own terrestrial LiDAR data and its working great. I am trying to plot the elevations of many dune toes and how they vary over years of data and cannot seem to isolate just the elevation value of the toe_ml variable, it is only outputting the value of the spot along the line where the toe_ml is located, not an elevation. I am aware this is just a coding issue, not specific to pybeach but any help would be greatly appreciated! Just looking to isolate the z value(elevation) of toe_ml .

I am using a csv file, not .pkl, which might be adding to the issue... Here's my code:

input_data_file = "D:/Output tables/2018_LiDAR.csv"
profile = pd.read_csv(input_data_file)
#x = np.arange(len(profile))
x = profile['FIRST_DIST'].to_numpy()
z = profile['FIRST_Z'].to_numpy()
p = Profile(x, z)
#plt.plot(x, z, '-k')

instantiate

p = Profile(x, z)

predict dune toe, dune crest, shoreline location

toe_ml, prob_ml = p.predict_dunetoe_ml('wave_embayed_clf') # predict toe using machine learning model
toe_mc = p.predict_dunetoe_mc() # predict toe using maximum curvature method (Stockdon et al, 2007)
toe_rr = p.predict_dunetoe_rr() # predict toe using relative relief method (Wernette et al, 2016)
toe_pd = p.predict_dunetoe_pd() # predict toe using perpendicular distance method
crest = p.predict_dunecrest() # predict dune crest
shoreline = p.predict_shoreline() # predict shoreline

fig, axes = plt.subplots(1, 1, figsize=(7, 5))
toe = [toe_ml, toe_mc, toe_rr, toe_pd]
labels = ['True toe', 'Machine learning toe', 'Maximum curvature toe', 'Relative relief toe', 'Perpendicular distance']
colors = ['lime', 'tomato', 'cornflowerblue', 'gold', 'limegreen']
axes.plot(x, z, '-k')
axes.fill_between([70, 100], [0, 0], y2=-1, color='lightskyblue', alpha=0.5)
axes.fill_between(x, z, y2=-1, color='cornsilk', alpha=1)
axes.axvspan(-10, -9, color='tomato', alpha = 0.6, label='ML Toe probability') # legend placeholder

for i, itoe in enumerate(toe):
axes.plot(x[itoe], z [itoe],
'o', color=colors[i], ms=12, mec='k', label=labels[i])

axes.plot(x[crest], z[crest], 'v', color='k', ms=12, mec='k', label='Crest')
axes.plot(x[shoreline], z[shoreline], '^', color='k', ms=12, mec='k', label='Shoreline')
#axes.set_xlim(200, 400)
#axes.set_ylim(0, 6)
#plt.set_title('Example profile')
axes.set_xlabel('Cross-shore distance (ft)')
axes.set_ylabel('Elevation (ft)')
axes.grid()
axes.legend(loc='lower left')

df = pd.DataFrame({'MAE': [np.absolute(toe-toe_ml).mean(),
np.absolute(toe-toe_mc).mean(),
np.absolute(toe-toe_rr).mean(),
np.absolute(toe-toe_pd).mean()],
'RMSE': [np.sqrt(np.square(toe-toe_ml).mean()),
np.sqrt(np.square(toe-toe_mc).mean()),
np.sqrt(np.square(toe-toe_rr).mean()),
np.sqrt(np.square(toe-toe_pd).mean())]},

              index=['ML', 'MC', 'RR', 'PD']).round(2)

print(toe_ml)

[43]

	# Load the random forest classifier
	try:
	clf = cs.load_classifier(clf_name)
	except FileNotFoundError:
	raise FileNotFoundError(f'no classifier named {clf_name} found in classifier folder.')