Giter Site home page Giter Site logo

bcgov / wqbc Goto Github PK

View Code? Open in Web Editor NEW
19.0 6.0 9.0 26.58 MB

An R package for water quality thresholds and index calculation for British Columbia

Home Page: http://bcgov.github.io/wqbc/

License: Apache License 2.0

R 99.72% TeX 0.28%
r-package rstats env r data-science

wqbc's Introduction

wqbc

Lifecycle:Dormant img R build status Codecov test coverage

Overview

The wqbc R package facilitates cleaning and tidying water quality data and calculating water quality thresholds for British Columbia. Previously it also calculated the CCME Water Quality Index but that functionality has been moved to the wqindex package.

wqbc was written by B.C. Ministry of Environment and Poisson Consulting team members.

Usage

For more information please see the vignette. In your R session, you can type vignette("wqbc") to see the vignette. Please note that this vignette is currently out of date as it includes information on calculating the Water Quality Index (which has been moved to its own package wqindex).

Install

To install and load the latest version of wqbc:

# install.packages("remotes") # if remotes is not installed
remotes::install_github("bcgov/wqbc")
library(wqbc)

Project Status

This package is under development. The user is responsible for checking all variables and limits that they use.

Getting Help or Reporting an Issue

To report bugs/issues/feature requests, please file an issue.

How to Contribute

If you would like to contribute to the package, please see our CONTRIBUTING guidelines.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

License

Copyright 2015 Province of British Columbia

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at 

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

This repository is maintained by Environmental Reporting BC. Click here for a complete list of our repositories on GitHub.

wqbc's People

Contributors

ateucher avatar aylapear avatar colinpmillar avatar heathergranger avatar joethorley avatar karharker avatar repo-mountie[bot] avatar robynirvine avatar sebdalgarno avatar stephhazlitt avatar wenwangcode avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

wqbc's Issues

write vignette

I know its early since the bulk of the code has yet to be written but can you begin writing a vignette on the package that will be useful to users and that we can also consider submitting to the Journal of Statistical Software or the R Journal. I'm not sure if markdown or latex would make more sense. More info at http://r-pkgs.had.co.nz/vignettes.html.

trend analysis function

Hi Colin

When you get a chance could you write a function

test_trends <- function (x, scale = "Year", by = NULL)

the function will take a data.frame x which must have the columns WQI (0 to 100), Year (or whichever single column is specified by the argument Year). If either is missing it should throw an informative error.
It then tests for a trend in WQI where Year (or whatever scale specifies) is the temporal axis. Note if there is more than one WQI value for each year then it should throw an informative error. The only exception is if columns are named in by for example if by = c("Lake", "Site") then it should test for a trend in each Site within each Lake by Year
(again there should be no more than one WQI value for each year). If a column is named in by then it must be in x or an informative error should be thrown.

It should return a object of class wq_trend which inherits from data.frame. It should include any columns listed in the argument by plus any columns which do not vary for each Trend test (allows things like Lat and Lon to be passed through for plotting particular sites) plus the columns From and To which give the first and last Year for each trend test plus the column Values which gives the actual number of WQI values in the test plus the columns Trend (which gives the positive or negative net rate of change per scale unit) and Significance (p-value). Let me know if you think of anything else.

Check out the notes on Coding in the wiki for more general informative (I will add these shortly)

Let me know if any questions

Some more headings for threshold table

Type: metal, nutrient, physical
MinValues: 1, 10
Metric: mean, max
DaySpan: 0, 1, 7, 28, 365

Also at least three types of uses: aquatic life, drinking water, irrigation

decide on trend test

is Mann-Kendall Trend Test with methods to accommodate for autocorrelation appropriate or should we use something else - definitely want something that makes minimal parametric assumptions and robust and also quick and ideally implemented in a bomber package... I think avoid Bayesian for sure

confirm bootstrap confidence interval

I've implemented bootstrap confidence intervals as described in Method 1 of El-Shaarawi (2010) which is now available in the data-raw folder. I have used the code to calculate the CIs for the come example data set which El-Shaarawi also uses. Although the upper 95% CI is the same the lower 95% CI is 80 whereas I get 87. I have implemented the method in various ways and always get the same result. I believe the difference occurs because El-Shaarawi implements the method in wide format which necessitates the inclusion of NAs for certain variables which then introduces the possibility losing variables (if only NAs are selected). I consider this possibility to be undesirable because 1) it can result in replicates for which WQI cannot be calculated i.e. NAs for all variables and 2) it breaks the assumed independence between the variables because the presence of a second variable with more frequent samples will change the sampling distribution of the original variable simply by virtue of inclusion of NAs. The long-format method also has the advantage of being quicker to implement since variables with no values outside the limits don't need to be resampled (this is not the case with the wide-format method). And it is also generalizable to multiple values for one variable on the same date.

In the case of no missing values then the implemented methods should give the same result as Method 1 of El-Shaarawi. Can you

  • provide a test dataset with no missing values for which the El-Shaarawi Method 1 has been applied so that I can test whether the wqbc implementation provides the same result ?
  • assess my reasoning and decide which method you would like the wqbc package to implement. To avoid confusion and ambiguity I think we should implement a single method.

Missing Items in guideline

So I finished writing the guideline which contains most of the items in this website (http://www2.gov.bc.ca/gov/topic.page?id=044DD64C7E24415D83D07430964113C9). As discussed with Joe, I skipped some complicated ones. So the items that I skipped are:

  • Microbiological Indicators โ€“ bacteria
  • Oxygen โ€“ dissolved, as it is the same as Dissolved Oxygen
  • Temperature
  • pH (added ? in its comment column)
  • Turbidity
  • Ammonia
  • Xylene

Please let me know if I need to add any of them on guideline. Thx!

min number of variables and samples for WQI?

In the CCME users manual it states that "The calculation of the CCME WQI requires that at least four variables, sampled a minimum of four times be used". I think this makes sense and we should put it as a constraint. Do you agree @ateucher ?

Allow Specification of Custom Limits

Let the user pass in custom water quality parameter limits, including where limits may be seasonal, site-specific, or dependent on other parameters.

handling N/A for guideline

In some tables in the pdf_bc, there are some cells which are written as None proposed, No criteria set, Not applicable, No guideline set, etc. I did not include these in the guideline as they are not neccessary, I think. However, please do let me know if I need to add them back. Thanks!

Add New Variables with New Units

E Coli 0147
Coliform Fecal 0450
Coliform Total 0451
Turbidity 0015 - NTU
Color Apparent 0001
Color True 0002
Temperature 0013 - C

Variables in variables.csv should match those in guidelines.cv

Hi Wendy

Good work!

Can you update the variable.csv table so that it reflects the Variables in guidelines.csv. i.e. remove variables that are not in guidelines.csv and add those that are. The two need to match on the Variable column because we merge the two tables together. Also if you could look up codes for all the variables and add them to the variables.csv table that would be great.

Thanks

Joe

What to do with zero values when lower limit

When there is a lower limit and a value is zero the equation to get the excursion is

(lowerlimit / 0) - 1

which is Inf - 1!

Currently wqbc::calc_limits throws an error if this situation arises. I suggest that we modify the error so that it is informative and leave it to the user to fix their values because it is difficult to know how much above zero a value should be. What do you think?

review help files

I'm assigning to Stephanie because I can only assign to one person but everyone should feel free to review the help files and edit to improve readability, information content etc if obvious need

Delete stale branches

@joethorley do you mind if I delete all branches now except master? They are all well behind, and none have any commits that are ahead of master.

threshold for Copper

Hi Joe, so I was trying to fix the threshold guideline for Copper. In the row of "Copper, Freshwater", I saw a "? g/L" for its threshold. I am just wondering if you were thinking maybe it should be g/L instead of ug/L. If that's the case, table 9 in pdf_bc shows ug/L at the end of the equation, while g/L is referring to the hardness of CaCO3. Please let me know if I am right. Thanks!

Some standard plots

eg:

library(ggplot2)
data(fraser)

sub <- fraser[fraser$Variable %in% c("SELENIUM TOTAL", "CADMIUM TOTAL"), ]
sub$Value[sub$Value < -999] <- NA

ggplot(sub, aes(x = SiteID, y = Value)) + 
  geom_boxplot() + 
  facet_wrap(~ Variable, scales = "free_y") + 
  geom_hline(aes(yintercept = "guideline_value"))

Implement function `lookup_limits`

allows lookup of limits for system given ph and hardness etc
useful for testing and also general reference

lookup_limits <- function (ph = NULL, caco3 = NULL , chloride = NULL, mehg = NULL, term = "long") {
}

If limits dependent on condition on one of the four which are null then returns NA.

Implement by generating dummy dataset and running through calc_limits - which may need modification
to return NAs.

Message order

  1. If Variable and Code choose code [and call convert_ems_data]
  2. first delete rows with missing by columns
  3. then delete negative values
  4. Then lookup variables and delete unrecognized
  5. The lookup units and delete unrecognized

Confusion on the Average column in guideline

For the average column, I am not quite sure what is it referring to. I was guessing the average is either maximum or mean depending on the heavy metals. Please let me know if I am wrong. Thanks!

category color palette

I've come up with a preliminary color palette function for the the different water quality index categories

get_category_colors <- function () {
  c(Excellent = "green", Good = "blue", Fair = "yellow", Marginal = "brown", Poor = "red")
}

I think the general color order is valid but it would be nice to use more pastelly colors.

Stephanie - do you have any interest in choosing the colors for the categories since you are likely going to view them more than most people? If not we can easily do among ourselves.

Add start up message

To the best of the package creator's knowledge the limits correspond to the approved Provincial thresholds on the date of release but wqbc comes with ABSOLUTELY NO WARRANTY.

Prepare for transfer to bcgov

I'm going to start working on doing a bit of cleanup to transfer the repo to the bcgov organization, in the prep-for-transfer branch.

  • Add CONTRIBUTING.md
  • Update README.md to bcgov standards
  • add Apache license
  • add Apache header to source code files
  • clean up data-raw folder
  • Add appropriate open data license files to data-raw
  • Remove EC bootstrap code
  • Update DESCRIPTION with license and set AT as maintainer
  • Check limits with new publication

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.