ceff-tech / ffc_api_client Goto Github PK

View Code? Open in Web Editor NEW

9.0 7.0 3.0 266.72 MB

An R client for the online Functional Flows Calculator API

Home Page: https://ceff-tech.github.io/ffc_api_client

R 100.00%

environmental-flows functional-flows

ffc_api_client's Introduction

Simple Functional Flows Calculator API client

This package is designed to:

Process data through the online functional flows calculator
Transform that data and return plots of the Dimensionless Reference Hydrograph (DRH) as well as boxplots showing the observed versus predicted percentile values for each metric.
Have shortcut functions that handle all of this, while exposing the internals so you can access useful intermediate products, such as the functional flows calculator results as an R dataframe, in case you need to do more complex analysis.

It is meant to be used with simply a gage ID, or with a timeseries dataframe of flows along with either a stream segment COMID or longitude and latitude (it will look up the COMID for you). See Setup and Examples below for more.

Documentation and Examples
Setup
Change Log

Full Documentation and Exmaples

We have moved all documentation and examples to our documentation website. A PDF manual is also available.

Setup

If you don't already have devtools installed, run install.packages('devtools') in your R console, or install the package any way you prefer.
Install this package with devtools::install_github('ceff-tech/ffc_api_client/ffcAPIClient'). If you get an error on this installation step, make sure you are using the latest version of the devtools package.
Now we need to retrieve your token. In Firefox or Chrome, log into https://eflows.ucdavis.edu. Once logged in, make sure you are on your user profile page at https://eflows.ucdavis.edu/profile and then press F12 on your keyboard to bring up the Inspector, then switch to the Console tab.
In the console, type localStorage.getItem('ff_jwt') - you may need to type it in yourself instead of pasting (or follow Firefox's instructions to enable pasting - it will tell you how after you try to paste). Hit Enter to send the command.
Your browser will place text on the line below the command you typed - this is your "token". Save this value and copy it to your clipboard and we'll use it below. This value should stay private - if other people knew the value, they could use it to access your account on eflows.ucdavis.edu!

That's it. You can now run data through the online FFC using this package and process the results.

Change Log

Version 0.9.8.3

[Bugfix] get_predicted_flow_metrics() no longer returns observed metrics, specifically returns only modeled or inferred.

Version 0.9.8.2

[Bugfix] Log messages weren't going to the screen when running CEFF steps with an output folder. They once again go to both the screen and the output file. We now have a known issue where when it prints the FFC percentiles to the console, they don't come out appropriately (due to our logging method) - use the output CSV instead in the meantime.

Version 0.9.8.1

[Bugfix] Internal stream class data occasionally had multiple records, which would cause bad parameters to be sent to the FFC and a failure in the package. Now checks if there's more than one stream class record and uses the first one.

Version 0.9.8.0

[Enhancement] The package now checks to make sure it received a valid COMID from web services, which helps when the web service is down. It prints a warning if the lookup failed.
[Enhancement] The package now checks to make sure it obtained at least 24 predicted metric records. If it didn't but received some, then it prints a warning. If it received no records, it prints an error about the COMID
[Enhancement] Added a new $get_comid_online (default TRUE) flag to FFCProcessor objects. The web service that we use to look up COMIDs for gages has been spotty recently, and we returned the option to do a lookup locally with spatial data. It will still download a large amount of spatial data for NHD segments the first time it runs with the flag set to FALSE, but then future lookups will be much faster. It uses the nhdR package, which is not included as a package requirement, and instead is installed only if you set ffc$get_comid_online = FALSE.

Version 0.9.7.5

[Update] Included updated data for peak flow metric predictions into package data. To support this, we are temporarily changing the FFCProcessor object to use offline metrics instead of data from the TNC API. If you wish to use the API again, set ffc$predicted_percentiles_online <- TRUE. We will revert it to using online data in the future once the API is updated

Version 0.9.7.4

[Enhancement] New code that fills NA values in the 10th percentile column of predicted metrics from the TNC API if the 25th percentile column is 0. A warning will be raised if NA values are found in the 10th percentile column. Can be turned off by setting ffc$predicted_percentiles_fill_na_p10 to FALSE.

Version 0.9.7.3

[Bugfix] Previously, providing a timeseries with a date field that was not named "date" (case sensitive) would fail when building the data frame to send to the FFC. This has been fixed, and any date field name should be usable, so long as it is provided as a parameter, or set on the ffc object before running ffc$set_up.

Version 0.9.7.2

[Update] Fixes for new versions of the eflows API launched recently. Everyone using this package will need to upgrade to this version or newer to keep using the package.

Version 0.9.7.1

Added function clean_account to remove all current runs from the online FFC. Helps with broken accounts after recent FFC update. To use, just call clean_account(token) after setting your token value into the variable named token

Version 0.9.7.0

[Change] Data sent to the FFC is now filtered according to CEFF Tech Team determined rules - water years are dropped if they have more than 7 missing days or more than 2 consecutive missing days (by default). See the documentation for the function filter_timeseries for more info. It applies to both automatically retrieved gage data and user-provided timeseries. It can be disabled by setting timeseries_enable_filtering to FALSe on the FFCProcessor object, but you're much better off just disabling it by filling gaps in your data yourself if you need to keep all water years.
[Enhancement] ffcAPIClient now logs both to the console and to a log file in the output folder, including the version of the package and any log messages (some R warnings and errors may not show up in the file yet)
[Change] The package now stops running if you have fewer than 10 years of data after the filters implemented in this version have run. It warns you, but still runs if you have fewer than 15 years of data. These apply whether the data source is gage data or a provided timeseries. To change this behavior, change the values of fail_years_data and warn_years_data on the FFCProcessor object
[Enhancement] You can now pass minimum and maximum dates for gage data retrieval. Set the values of gage_start_date and gage_end_date on the FFCProcessor object.

Version 0.9.6.9

[Bugfix] Fixed issue preventing export of R6 Classes - FFCProcessor and USGSGage should now be available for use as the documentation describes

Version 0.9.6.8

[Enhancement] Code available for three steps of CEFF process - working on documentation for it still

Version 0.9.6.7

[Bugfix] Adjusted to change in FFM API from flows.codefornature.org
[Enhancement] Code available to support first step of CEFF process

Version 0.9.6.6

[Enhancement] New function force_consistent_naming sets option to convert peak magnitude metrics to use same name format as other magnitude metrics. Peak_2 becomes Peak_Mag_2, etc. Defaults to off to remain aligned with CEFF, but if you need metric names to follow a pattern, that will help. See documentation for usage.
[Change] Predicted flow metrics now use a character instead of a factor in the metric column.

Version 0.9.6.5

[Enhancement] ffc_results dataframe now filters out non-metrics (things starting with __ or ending with _Julian)

Version 0.9.6.4

[Bugfix] Handled a condition where the predicted flow metric API returns duplicate values for some metrics

Version 0.9.6.3

[Bugfix] Fixed an error where predicted Spring Duration metrics came through as SP_Du

Version 0.9.6.2

[Bugfix] Fixed error preventing evaluate_alteration from running with warnings about date_format_string.

Version 0.9.6.1

[Enhancement] Updated plotting to facet each metric with a free Y axis so that all boxplots can be clearly seen. Other minor enhancements to plotting, like titles and X axis labels as well.

Version 0.9.6.0

[Enhancement] evaluate_alteration family of functions now also returns predicted_wyt_percentiles in addition to the predicted_percentiles. The WYT form includes a wyt column that includes the water year type of the prediction
[Change] Using TNC's online API to pull predicted flow metrics instead of internal data by default
[Change] Under the hood, the code behaves differently - most processing is now being handled in the FFCProcessor class, but more will be moved there

Version 0.9.5.8

[Change] Updated the rules used to determine alteration to match new rules for CEFF Appendix F - specifically, we now always check that >=50% of observations are within the i80r before declaring something unaltered.

Version 0.9.5.7

[Enhancement] evaluate_alteration now supports parameter to control plotting (similar to evaluate_gage_alteration) and documentation has been added for the function.

Version 0.9.5.6

[Enhancement] Added the ability to pull predicted metrics from TNC's predicted metrics API instead of from internal data. To use it, set online=TRUE when calling get_predicted_flow_metrics. It includes one small difference - in the source field, values marked as obs in the offline data show up as inferred.

Version 0.9.5.5

[Change] No longer look up gage COMIDs by default due to error-prone nature of lookup near stream junctions. Use force_comid_lookup parameter to evaluate_gage_alteration to enable previous lookup behavior.
[Enhancement] Added an automatic lookup that corrects bad data from comid lookups and returns the correct COMID. Only used for Jones Bar gage right now, but structure is there for if others are found.

Version 0.9.5.4

[Breaking Change] Where found, column names have been fully lowercased for consistency, including Metric -> metric and COMID -> comid
[Breaking Change] Parameter com_id to get_predicted_flow_metrics was renamed comid
[Enhancement] observed percentiles and predictions both include a result_type field, with observed FFC results containing the value "observed" and prediction percentiles fields containing the value "predicted" to allow for merging of the data frames in some contexts
[Enhancement] evaluate_gage_alteration now attaches a field gage_id to predicted_percentiles, observed_percentiles, and alteration data frames.
[Enhancement] observed percentiles now include a comid field to allow for merging and accessing the comid in other contexts
[Enhancement] evaluate_alteration and evaluate_gage_alteration now includes a fifth key alteration with the assessed flow alteration scores in a data frame

Version 0.9.5.3

[Breaking Change] Results from evaluate_alteration and evaluate_gage_alteration now use the list key ffc_percentiles instead of simply percentiles to be clear that the percentiles are from the observed FFC results.
[Change] Changed quantile processing type to the default of type 7 so that observed FFC data are processed into percentiles the same way that the predicted flow metrics were calculated to minimize resulting error.
[Enhancement] evaluate_alteration and evaluate_gage_alteration now includes a fourth key predicted_percentiles with the predicted flow metric percentile values so they don't need to be looked up separately.

Version 0.9.5.2

[Enhancement] Added annual parameter to assess_alteration that runs a year over year analysis.

Version 0.9.5.1

[Enhancement] New parameter plot (boolean) to evaluate_gage_alteration controls whether the function produces plots or not

Version 0.9.5

[Enhancement] New assess_alteration function returns a data frame with alteration results - documentation forthcoming.
[Change] Warning when can't determine stream segment's hydrogeomorphic type has been downgraded to a print statement.

Version 0.9.4.2

[Breaking Change] The client now detects and sends the appropriate parameters to the FFC online for the stream class that it detects based on the COMID. If you are using low-level functions such as get_ffc_results_for_df, then you must provide an argument comid - check the documentation for which functions need it. Further,process-data now requires that the stream parameters be provided to it. I recommend moving to something like get_ffc_results_for_df as process_data may soon be moved to be internal only.

Version 0.9.4.1

[Enhancement] New code to support sending the correct stream class parameters to the FFC - includes the ability to identify stream classes by COMID, but not yet send the parameters
[Change] Data loading code made more generic, and potentially faster - multiple calls to get predicted flow metrics should not result in reloading the dataset.

Version 0.9.4

[Breaking Change] List item $ffc_results_df returned from evaluate_alteration functions changed to $ffc_results for consistency with FFCProcessor object and allowing for more flexibility in the future.
[Enhancement] Basic alteration assessment capabilities included. Require more testing before use
[Documentation] Reworking documentation to make best workflows clearer

Version 0.9.3

[Enhancement] Can now provide a time format string to evaluate_alteration - it will use that to read the values in the time field and reformat them to send to the FFC as needed.
[Bugfix] FFC results no longer fail to transform if one flow metric is entirely NULL

ffc_api_client's People

Contributors

Stargazers

Watchers

Forkers

ryanpeek yesicaleo donnydhkim

ffc_api_client's Issues

Boxplots for predicted FA_Dur, SP_ROC, all Peak_Dur, all Peak_Fre are split into two

Boxplots for predicted FA_Dur, SP_ROC, all Peak_Dur, all Peak_Fre are split into two. Please plot only one boxplot for those predicted metrics

Missing flow_metrics

The flow metrics table is missing unless people use library(ffcAPIClient) beforehand, which makes this code unusable in other packages. Not huge, but we need to:

Expand the docs to include the call to library
See if we can figure out how to make that not required. Post to D-RUG was ignored, so maybe try a post to StackOverflow?

Include number of observations as output in observed percentiles table

Ted wanted to know the number of observations for each metric that led to the calculated percentiles. These are different than the number of data available since it depends on the calculations coming out of the FFC. This should be relatively trivial to add into the percentile calculation code, but we'd then need to make sure to drop it anywhere else where we don't need it, but need to merge with the predicted percentiles. It could also come out as a separate table. Here's the code we used to calculate it as a oneoff, in case that's useful - it'd be good to switch the for loop to an apply, but had trouble since the structure of the ffc_results data meant that getting the name of each metric was not straightforward outside of a loop:

observation_number_func <- function(element){
  comid = element$predicted_percentiles$comid[1]
  gage_id = element$predicted_percentiles$gage_id[1]
  metrics <- as.data.frame(element$ffc_results)
  metric_results = data.frame("comid" = comid, "gage_id" = gage_id, "metric" = NA, num_observations = NA)
  for(column in colnames(metrics)){
    metric_results <- rbind(metric_results, data.frame("comid" = comid, "gage_id" = gage_id, "metric" = column, "num_observations" = sum(!is.na(metrics[column]))))
  }
  metric_results <- metric_results[!is.na(metric_results$metric), ]
  metric_results <- dplyr::filter(metric_results, !grepl("_Julian", metric))
  metric_results <- dplyr::filter(metric_results, !grepl("X__", metric))
  return(metric_results)
}

results <- lapply(usgs_ffc_alt, observation_number_func)
df_results <- do.call("rbind", results)

Suppress NA warnings when converting FFC data to data frame

Currently getting a whole stream of warnings as we convert to a data frame. These are safe and I tried to provide parameters to acknowledge that I knew it was adding NAs, but it didn't get rid of warnings.

Plot outputs as facets, not all on same graph

We should facet plot outputs so they can all appear in the same image by component, but don't need to have the same axes between metrics. Make sure to add code that frees the axis to have its own scale - its an argument to facet_wrap.

Convert the FFC API code to be an R6 object

Under the hood, can have the same functions operate with the class under the hood, but can also run the R6 object itself to make multiple requests.

Add warning when data frame has a high percentage of NA values

We're occasionally encountering gages where most of the FFC results are NULL/NA. We should go through the processing pipeline but warn the user to inspect the data extra closely when a high proportion of the data frame is NA.

Consider Bulk Processing Code

Consider something to handle batching requests to the FFC online. Not sure what the inputs would look like though - maybe this is just a vignette or a sample rather than core code.

Enable token storage so that people only have to set the token once

Ryan has some code for using an environment to store tokens between sessions - that way people can set the token once and use it again later. Should be fine for this instance. Maybe make a parameter to store permanently or only for the session, that defaults to permanently.

Small README update to make clear that the "Easy Mode" does plotting

Make it clearer what happens when you run the "Easy Mode" examples and what plots come out where.

FA_Dur missing from reference percentiles WYT

When using evaluate_alteration() with user uploaded timeseries, the reference percentiles by WYT dataframe is missing FA_Dur

Make sure spring duration component observed vs predicted plot on same plot

Looks like the spring duration component plots observed and predicted values are on different plots

Suppress Warnings for rbindlist

I'm following documentation and what I'm doing shouldn't trigger a warning, but it does. So we'll suppress them just for that call in convert_season_to_df (I think it's that call - check whole file).

evaluate_alteration() does not work for certain COMIDs

I get the following error message for several of the COMIDs where stream gages are located, using the attached timeseries data (also for COMIDs 17608051 and 17611425 for example)

flowdata_COMID17574729.txt

test_results <- ffcAPIClient::evaluate_alteration(timeseries_df = test_df , token = get_token(), latitude = lat,longitude = long)
[1] "Using default date format string of %m/%d/%Y"
[1] "Using date format string %m/%d/%Y"
[1] "COMID 17574729 is of stream class PGR - sending parameters to FFC online for that stream class. This may produce different results than if you run data through the FFC yourself using their default parameters."
Error in if (median >= predictions[["p10"]] && median <= predictions[["p90"]]) { :
missing value where TRUE/FALSE needed

Add Month/Day Field for metrics

Add columns that show month/day for each timing metric based on the water year day column.

Create results dataframe that allows for annual alteration assessment for each metric

One of our project managers in the Shasta asked me if I could determine if flow alteration was better after 2009 when they started a fall flow program. I think this will be a common request as people want to be able to track the results of their actions. I am envisioning a new dataframe with the following fields:
COMID
Gage_ID
Year
Functional Flow Metric Code
Observed Value
Water Year Type (specific to that COMID)
p10 (specific to that water year type)
p25(specific to that water year type)
p50(specific to that water year type)
p75(specific to that water year type)
p90(specific to that water year type)
Percentile group of observed (e.g., "less than P10", "between P75 and P90")
Altered status p25-p75 (e.g., 1 if less than p25 or greater than p75, 0 otherwise)
Altered status p10-p90 (e.g., 1 if less than p10 or greater than p90, 0 otherwise)

One key piece of information for this dataframe will be the look up of water year type by COMID. Ted has already generated this table here: https://protect-us.mimecast.com/s/rHNlCDkZxRIQDXjjTW0hoy?domain=drive.google.com

Add full metric name and units to plots for easy interpretation

The box plots are very helpful, but a little hard to interpret and share with stakeholders. Consider changing the name from the flow metric code to the flow metric name (e.g., "Fall Pulse Magnitude" instead of "FA_Mag"). Also, please consider adding the units for each plot (e.g, "Cubic Feet per Second" or "Water Year Day") to the y-axis of each plot. I have attached a file that provides the name and units for each metric code.
Functional Flow Metrics List and Definitions.xlsx

Create function to create dimensionless observed hydrograph from observed gage or model data

Step 5 of CEFF: assess alteration asks the user to compare DRHs to dimensionless observed hydrograph. It would be great to have a function that plots DOH from observed flow timerseries as in Figure 9 of CEFF comparing DRH from reference gage (left) to DOH (right).

add "Mag" in front of all peak magnitude metric names

I think we talked about this, but just to make sure we don't forget, could we change the metric names so that the peak metrics have the word "Mag" in them? So for example, instead of Peak_10, it'd be Peak_Mag_10.

evaluate_alteration(), evaluate_gage_alteration(), and get_predicted_flow_metrics() not working

Hi Nick, last week these functions (evaluate_alteration(), evaluate_gage_alteration(), and get_predicted_flow_metrics()) were working fine. Now they no longer work (with same data/script) and give the following error after saving the DRH step:
[1] "Saving DRH to..."
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match

I uninstalled and reinstalled the ffc api but still getting the same error. Did something change in the code? Thanks!

Boxplot Output Enhancements

Boxplots need titles, axes, etc. Standard graph stuff

Also, do we want to return the plot objects for modification to the caller?

For evaluate_alteration() only output FFMs for ffc_results and ffc_percentiles

For evaluate_alteration() only output FFMs for ffc_results and ffc_percentiles. Currently, function outputs additional metrics (for example, julian day metrics and peak timing metrics). Can you also sort output tables by metric alphabetically? That way user can have direct comparison of predicted vs observed metrics/percentiles.

layer 'NHDFlowline' not found in component 'NHDSnapshot' when getting gage COMID

Ryan reports the following when the gage is trying to determine its COMID:

ffcAPIClient::evaluate_gage_alteration(gageNo, get_token())
Re-downloading http://www.horizon-systems.com/NHDPlusData/NHDPlusV21/Data/NHDPlusCA/NHDPlusV21_CA_18_NHDSnapshot_05.7z
  |======================================================| 100%
A local copy of http://www.horizon-systems.com/NHDPlusData/NHDPlusV21/Data/NHDPlusCA/NHDPlusV21_CA_18_NHDSnapshot_05.7z already exists on disk
Error in FUN(X[[i]], ...) : 
  layer 'NHDFlowline' not found in component 'NHDSnapshot'

I can't reproduce, even manually trying to remove the NHDFlowline layer - I'd thought it was triggered by an incomplete download, but the current version of the code forces redownload and extraction if it doesn't find the right number of items in the NHD package the first time through. Could be Mac-specific or something else. Ryan is probably going to reimplement the function that gets the COMID for a gage using nhdPlusTools instead of nhdR, which should hopefully resolve this issue.

Add tests!

Return WYT percentiles

Want all water year types by default returned

No need to calculate FFC by WYT - people can do that themselves by filtering to the correct water years.

We'll return the normal predicted_percentiles as we currently do, but also return predicted_wyt_percentiles separately. We'll need to slightly tweak the code that pulls it to return both.

get_predicted_flow_metrics still broken

the flow metrics data isn't available to functions within the package until we call library(ffcAPIClient) - this was first raised in the pull request to change the data over to this form, but I thought it'd been resolved. This is a critical issue to fix.

Add function to assess level of alteration based on Ted's rules

Ted will send sample code? Nick has images of some of the rules, but need a quick discussion of exactly which comparisons need to occur.

Add/store 'metrics reference' table in package

It'd be nice if the package stored the table that has the longer description of the metric names somewhere. While using the results from the package, I've found that I have to keep joining those metric names to the outputs afterward in order to make tables that are readable (DS_Tim might not be super useful in a table, but the text that describes it was dry season timing is useful).

Get COMID for gage

Need this to support workflows in #5. Gage data comes with lat/lon - we'll pull it for the gage into the object when we first request data, then use nhdR (maybe?) to get the COMID for the gage.

Make assess_alteration test data local

Make a set of offline test data - we'll want it so the tests don't mysteriously break if the data changes in the future. Save the derived outputs as Rdata, then use those to run the tests.

Paired boxplots of FFC vs predicted data

Once we have the percentiles, we can plot the FFC vs predicted as boxplots against each other.

Translate FFC data to use same metric names as predictions/rest of eflows data

Need to do some translation to pull the metrics for the FFC - we can do rows for metrics, columns for days, then that will let us collapse the columns into percentiles (or alternatively we can do the reverse and just need to transpose, since metrics as columns, days as rows makes more sense)

Error: length of 'dimnames' [2] not equal to array extent

From Slack

Ryan:
just found something weird…don’t have time to troubleshoot cuz boarding plane shortly but try this and explain why it’s breaking if you can?

test_ff <- ffcAPIClient::get_ffc_results_for_usgs_gage("10257549")
test_ff <- suppressWarnings(ffcAPIClient::get_results_as_df(test_ff))

Error in dimnames(x) <- dn : 
  length of 'dimnames' [2] not equal to array extent

Calculate percentiles from FFC results

Once have a data frame of the raw data, we can make percentiles for each metric as opposed to the raw values

Data in wrong columns?

Alyssa Obester
I think the observed data in the plot above might be mislabeled or be pulling something else. The observed two year flood shouldnt be larger than the 10 year flood. I'm looking at the FFC output for that data and the numbers are different (peak_10 is 20770, peak_20 is 16740, peak_50 is 7080)

Nick Santos
it should print out a table of the percentiles into the console when it's run - is that the output you're looking at, or did you run it through online? Just trying to figure out if it's that the plot doesn't match the data the R code shows, or if it's not right anywhere in the R code and my error would be earlier

Alyssa Obester
the numbers that I gave you above were from when I ran it with the eflows website. I'm looking at the percentiles from the r package now and weirdly the peak_10 is the same as above but the peak_50 and peak_20 are different numbers.

Nick Santos
Ok. Hmmm. I'll look into that. Could share the actual values with me? And how were you calculating the percentiles based on the online data. Just trying to think of all the places these could end up different

Alyssa Obester
yeah - the FFMs summary tab here has the percentiles. the peak magnitude percentile values are all the same though because they were calculated by ranking all the peaks

the gage ID is 11336000

FFC_output_McConnell_webFFC_dec11.xlsx

Date standardization code

The FFC requires dates in a specific format. We can standardize to that format for people if they provide us with the format their dates are currently in.

annual comparison notes

The "annual" comparison for alteration needs some changes. Here is the discussion with Ryan:

Ryan 2:01 PM
Hey Nick just now trying to play around with getting the annual FFM alteration status to go. I have a question…because it’s unclear what/how the data should be entered currently. In the help, under the annual argument, the helps states it needs a TRUE/FALSE “indicating whether to run a year over year analysis. If TRUE, then the parameter percentiles changes and should be a data frame with only two columns - the first is still metric, but the second is just value representing the current year’s value for the metric”.
However, the dataframe returned by the evaluate_gage_alteration has columns that are: Year, each metric… (edited)
2:02

# A tibble: 6 x 39
  Year  DS_Tim_Julian DS_Dur_WS DS_Tim `__summer_no_fl…
  <chr>         <dbl>     <dbl>  <dbl>            <dbl>
1 1951            254        96    345               75
2 1952            164       151    255              151
3 1953            199       155    290              155

2:03
I can transpose or reconfigure this to have metric and value as stated, but we still need a year column correct? Ideally if we feed a dataframe with year, metric, and value, we could return a the alteration dataframe exactly as is but it would then include a year column for each respective year for that gage/comid.
2:06
this isn’t urgent, but I realized I’m not sure how to get this to run in it’s current form:

tst <- ffcAPIClient::evaluate_gage_alteration(gage_id = 10255810, 
                                              token = ffctoken,
                                              comid = 22595619)
annual_tst <- ffcAPIClient::assess_alteration(
  percentiles = tst$ffc_results, # this is the part that needs clarifying...
  predictions = tst$predicted_percentiles,
  ffc_values = tst$ffc_results, 
  comid = tst$alteration %>% distinct(comid) %>% as.integer(),
  annual = TRUE)

Nick Santos 2:28 PM
Hey Ryan, sorry, had to eat something or my brain was going to break
2:29
So, I think this is mostly a function of me, for some reason, being unable to understand what's needed completely. You ever have those scenarios where you talk about something with someone, and it makes perfect sense while they're describing it, you take notes, and then the moment the conversation is over, some critical part of the structure is missing? That's happening to me here (not sure why) - The thing you're describing to me seems simple, and I keep getting it slightly wrong
2:29
So, it sounds like you want to take the FFC results DF and run it through rather than the percentiles DF
2:29
is that correct?
2:31
Almost like that example should become:

# A tibble: 6 x 39
  Year  DS_Tim_Julian DS_Dur_WS DS_Tim `__summer_no_fl…
  <chr>         <dbl>     <dbl>  <dbl>            <dbl>
1 1951          FALSE      TRUE   FALSE            TRUE
2 1952          FALSE     FALSE    TRUE           FALSE
3 1953          FALSE     FALSE    TRUE           FALSE

for whether it's altered, based on how each value fits into each metric's predicted percentiles?

Ryan 4:03 PM
yes….but instead of TRUE/FALSE perhaps just using the -1/0/1 codes? but that’s exactly it. Feed FFC results DF (so metrics that are calculate for each year of data) to the predicted percentiles and see if they fall inside the 20/80 percentile range.
4:03
and no worries…this isn’t easy to figure out all the details for on the fly! I really appreciate all you’ve been doing.

Nick Santos 4:09 PM
OK, and just to clarify, theoretically, there wouldn't be many years of data - that's why you'd be doing the annual, right? It's because there's too little data for reliable percentiles?
4:09
And yes on using the codes instead - forgot that's what it's doing now
4:10
I should be able to make that tweak without too much trouble

Ryan 4:14 PM
there could be many years…actually hopefully there will be…although I suppose I could filter to just the years I want first.
4:16
but by comparing flow year to same biological sampling year, or flow year to a lagged sample year, we may be able to take a look at things like drought impacts, etc. In particular, which metrics are responsive to this annual view and which are not. There will be lots of noise for sure, and for some it may not be able to calculate, but this is a fairly narrow use case (I hope).
4:17
basically taking a one single metric calculated in one single year and seeing if it falls in the predicted percentiles, but dataframe style

Nick Santos 4:17 PM
ok, gotcha - so realistically, you're probably more likely to be replacing predicted percentiles with a reference year, and it isn't always about the amount of data
4:17
(I know you've explained this to me, so sorry!)

Ryan 4:18 PM
not sure what you mean by replacing with year….?
4:18
also probably doing bad job explaining. 🙂

Nick Santos 4:19 PM
Sorry, with a reference year's values, rather than actual predicted percentile values?
4:19
Nah, definitely not you - I'm tying my brain in knots, and I swear it's made sense each time you've explained it, then it just....flies away

Ryan 4:23 PM
ah yes I see…yea I think the function is pretty much there, but we only need to feed it:

annual_tst <- ffcAPIClient::assess_alteration(
  predictions =  "predicted_percentiles",
  ffc_values = "ffc_results", 
  comid = "comid"
  annual = TRUE)

Package documentation and examples

The package has some documentation, but needs quite a bit more. Might want to wait to finalize this until we actually have a set of workflows that will be stable APIs and maybe have a class that handles the work. But maybe not - the risk of the R6 class is that items are modified in-place, so we'd need to be careful that the new code doesn't break just from moving it to a class. What we could do is just manage the data together with the class and use it to chain everything together (but then we can just also keep using convenience functions).

add gageID/comid to plot_output filename

Add some code to add the gageID to the filename for plot outputs. Currently it defaults to a metric/component and then "" but nothing else (i.e., "DS.png", "SP_.png"). Would help when dealing with multiple gages at once or iterating. I can mess with this later.

Continuation of Package Development

@ryanpeek @alyssaobester @kklausmeyer, @kristaniguchi

I'm leaving CWS on April 30th and won't be able to officially support the package beyond that. While I will always be happy to provide any handoff information I can, I'm trying to make sure that if this becomes a tool that's important for any CEFF users, that there are people who can make bugfixes, and potentially even enhancements. I know Ryan knows his way around this codebase since he has been a part of its development, but he is also highly loaded up with work for the forseeable future. Kris, do you know if there is anyone at SCCWRP, whether you or someone else, who might be interested in contributing bugfixes, etc to this package in the future? Kirk, is this something that would belong with your unit at TNC? I'll do my best to leave it in a maintainable state, and hope to make some changes in the future to write it up for publication, but am not yet sure what time I'll be able to dedicate to it. Just wanted to get this discussion started so that we can figure out where future stewardship of the package belongs. Thanks!

Make family of functions that matches CEFF guidance

Notes from our call today - I'll split these out into separate items soon, but wanted to file it immediately so it's visible.

add source column to observed percentiles
add comid or gage_id column to ffc_results df
batch shortcut that returns a big dataframe with everything.
add assess_alteration to evaluate_alteration functions
Add documentation for evaluate_alteration
change Metric columns to "metric"
Create three functions that mirror the steps in the document:

Explore the FFC results and metrics:

generate_functional_flow_results
DRH plot
observed percentiles DF + CSV output
Set of plots with only observed data

Explore predicted metrics (ecological flow criteria)

explore_ecological_flow_criteria
csv + data frame for predicted
set of plots that show only predicted

Set of plots that

assess_alteration + boxplots that shows both (like now)
Store data in a named environment for the run between these functions.

Potential new names for shortcuts
assess_alteration
assess_gage_alteration
evaluate_fc_alteration or evaluate_flow_criteria_alteration.

add argument to turn off plotting in evaluate_alteration function

just a note for batch processing, we probably need to turn off the plotting option. Default can be that plotting occurs, but should add argument for plotting=FALSE to allow just the data to get pulled in...and then maybe option to run plotting function separately? I can work on this but wanted to remember.

Do not plot Peak_Tim metrics boxplots

Peak timing metrics are no longer included as functional flow metrics. Do not create boxplots for them.

Change Use of "DRH" to "Dimensonless Hydrograph" or similar

Alyssa Obester 2:51 PM
just got clarification on DOHs vs DRHs. DOHs are dimensionless hydrographs for the observed data (so what you currently have the package doing), and a DRH is a dimensionless hydrograph for the reference gages
2:52
I guess we just need to be more consistent about how we refer to the DOHs/DRHs/DHs

Nick Santos:all_the_things: 3:06 PM
oh, interesting, so I guess we should probably just change references to DRH to be something like dimensionless_hydrograph?

Alyssa Obester 3:06 PM
yeah

Drop Peak_20 and Peak_50 metrics

They aren't used, aren't shown on the website, and we don't have predicted metrics. They just confuse the Peak metrics.

Make some convenience functions that make a full process chaining together the pieces.

(Eg: start with gage ID and move all the way through, start with data frame and COMID and move all the way through).

Parameters sent to FFC can change based on stream class

We're using a general set of parameters off the website - they have different parameters based on stream class. We need to figure out if we should use different parameters by stream class (probably not), or if using the general parameters is sufficient. We could also try not sending parameters, but then if they change defaults our outputs could change without warning and without knowing what's different. Maybe good to keep control, but provide a method to change params??

returning warning and message or NA instead of "less than 10 years of data...try again"

evaluate_gage_alteration() (and associated functions) currently return an error with the message "Less than 10 years of data...try again" if the gage has insufficient data. This is helpful, but also can be an issue for batch processing as it is a stop error and thus requires the user to wrap code in some sort of error function (i.e., I'm using possibly(get_percentiles, NA_real_) but there's others too like safely(), try()). If possible it would be preferable to return the warning message in user console, but return an actual NA value to the user environment. Not best way to do this but can look.

Model fall frequency metric (percent of years where flushing flow occurs)

Not sure how we output this to people, but we can make it available? Maybe we print it out for now and store it as a value somewhere.

Do we want to compare it to a local reference gage? We don't have the modeled data for this, so we could grab it from local reference gage. (Bigger lift to put together separate

Wait to create this until after call - might not need it.

Set up ability to send different params based on streamclass

We may want to send different parameters to the FFC based on the streamclass

Need to be able to get streamclass of COMID
Need to check if it's a suspect streamclass (then what do we do?)
Need to look up parameters for streamclass (via https://github.com/ceff-tech/eflow-client/blob/master/src/constants/params.js)
Need to format those params to send to FFC.