Giter Site home page Giter Site logo

doi-usgs / ncdfgeom Goto Github PK

View Code? Open in Web Editor NEW
14.0 6.0 7.0 11.33 MB

NetCDF-CF Geometry and Timeseries Tools for R: https://code.usgs.gov/water/ncdfgeom

Home Page: https://doi-usgs.github.io/ncdfgeom/

R 100.00%
netcdf geometry timeseries timeseries-data cf-conventions r

ncdfgeom's Introduction

NetCDF-CF Geometry and Timeseries Tools for R

CRAN Downloads CRAN

ncdfgeom reads and writes geometry data (points lines and polygons), attributes of geometries, and time series associated with the geometries in a standards-compliant way.

It implements the NetCDF-CF Spatial Geometries specification and the timeSeries feature type of the Discrete Sampling Geometry NetCDF-CF specification.

Visit the pkgdown site for a complete overview of the package.

Given that this package is fairly new and in active development, please test it out and consider submitting issues and/or contributions!

Installation

ncdfgeom is available via CRAN.

install.packages("ncdfgeom")

For the latest development version:

install.packages("remotes")
remotes::install_github("DOI-USGS/ncdfgeom")

Contributing

First, thanks for considering a contribution! I hope to make this package a community created resource for us all to gain from and won’t be able to do that without your help!

  1. Contributions should be thoroughly tested with testthat.
  2. Code style should attempt to follow the tidyverse style guide.
  3. Please attempt to describe what you want to do prior to contributing by submitting an issue.
  4. Please follow the typical github fork - pull-request workflow.
  5. Make sure you use roxygen and run Check before contributing. More on this front as the package matures.

Disclaimer

This software is in the public domain because it contains materials that originally came from the U.S. Geological Survey, an agency of the United States Department of Interior. For more information, see the official USGS copyright policy

Although this software program has been used by the U.S. Geological Survey (USGS), no warranty, expressed or implied, is made by the USGS or the U.S. Government as to the accuracy and functioning of the program and related program material nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed by the USGS in connection therewith.

This software is provided "AS IS."

ncdfgeom's People

Contributors

dblodgett-usgs avatar edzer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ncdfgeom's Issues

archived from CRAN

see https://cran.r-project.org/web/packages/ncdfgeom/index.html - is there anything I can do to help restore it? Is this because of checking with _R_CHECK_DEPENDS_ONLY_=true ? I see

  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Complete output:
  > library(testthat)
  > test_check("ncdfgeom")
  Loading required package: ncdfgeom
  ****Support Package****
  This package is a USGS-R Support package. 
  see: https://owi.usgs.gov/R/packages.html#support
  Error in library(ncdf4) : there is no package called 'ncdf4'
  Calls: test_check ... source_dir -> lapply -> FUN -> eval -> eval -> library
  Execution halted
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking running R code from vignettes ...
  ‘geometry.Rmd’ using ‘UTF-8’... OK
  ‘ncdfgeom.Rmd’ using ‘UTF-8’... OK
  ‘timeseries.Rmd’ using ‘UTF-8’... OK
 NONE
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... OK
* DONE

Status: 1 ERROR
See
  ‘/tmp/ncdfgeom.Rcheck/00check.log’
for details.

which may need a skip_if_not_installed("ncdf4") at the top of the test.

create.nc gives warning with current RNetCDF pkg

Warning message:
In create.nc(nc_file, large = TRUE) :
  Argument 'large' is deprecated; please specify 'format' instead

From within write_timeseries_dsg

I think this requires an arg update to deal with changes in ncdf4 pkg

Units... deal with them.

Units are explicit in NetCDF but not so much in shapefile dbfs or dataframes in general. Figure out a solution to passing them in and returning them from NetCDF files.

Switch ncmeta and RNetCDF

Lots of chances to clean up the package by eliminating us of ncdf4. Immediate bug fix would be where ncdf4 treats coordinate variables differently from regular variables and they don't end up in the vars list. Availability of OPeNDAP on windows is another perk. Generally should try to get away from specific NetCDF calls by using ncmeta too.

Function to add geometries to an existing netcdf file.

Starting here: https://github.com/dblodgett-usgs/NCDFSG/blob/master/R/addGeomData.R

Add error handling, instance dimension configuration / determination, optional configuration of the hole and multi break values, note the distinction between 'ind' and 'coord', add checks for non-lat/lon like coordinates, plan how to implement additional projections, move conventions addition out to wrapper function.

For testing, rework the coordinates test all function so it can be run for all tests. (use all TRUE/FALSE rather than a ton of assertions.)

Combine with addPointsData.R https://github.com/dblodgett-usgs/NCDFSG/blob/master/R/addPointsData.R

getting started

The "getting started" vignette gives me the following warnings:

> climdiv_centroids <- climdiv_poly %>%
+   st_transform(5070) %>% # Albers Equal Area
+   st_set_agr("constant") %>%
+   st_centroid() %>%
+   st_transform(4269) %>% #NAD83 Lat/Lon
+   st_coordinates() %>%
+   as.data.frame()
> 
> nc_file <- "climdiv_prcp.nc"
> 
> prcp_dates <- prcp_data$date
> prcp_data <- select(prcp_data, -date)
Error in -x : invalid argument to unary operator
> prcp_meta <- list(name = "climdiv_prcp_inches", 
+                   long_name = "Estimated Monthly Precipitation (Inches)")
> 
> write_timeseries_dsg(nc_file = nc_file, 
+                      instance_names = climdiv_poly$CLIMDIV, 
+                      lats = climdiv_centroids$Y, 
+                      lons = climdiv_centroids$X, 
+                      times = prcp_dates, 
+                      data = prcp_data, 
+                      data_unit = rep("inches", (ncol(prcp_data) - 1)), 
+                      data_prec = "float", 
+                      data_metadata = prcp_meta, 
+                      attributes = list(title = "Demonstation of ncdfgeom"), 
+                      add_to_existing = FALSE) -> nc_file
Error in as.POSIXct.default(times) : 
  do not know how to convert 'times' to class “POSIXct”
> 
> climdiv_poly <- st_sf(st_cast(climdiv_poly, "MULTIPOLYGON"))
> 
> write_geometry(nc_file = "climdiv_prcp.nc", 
+                geom_data = climdiv_poly,
+                variables = "climdiv_prcp_inches") -> nc_file
Error : NetCDF: Variable not found
Error : NetCDF: Variable not found

and the resulting netcdf file's ncdump does not correspond to the one in the vignette:

netcdf climdiv_prcp {
dimensions:
    instance = 344 ;
    char = 30 ;
    node = 3169875 ;
    part = 1636 ;
variables:
    char CLIMDIV(instance, char) ;
        CLIMDIV:units = "unknown" ;
        CLIMDIV:missing_value = "" ;
        CLIMDIV:grid_mapping = "grid_mapping" ;
        CLIMDIV:geometry = "geometry_container" ;
    char CLIMDIV_NAME(instance, char) ;
        CLIMDIV_NAME:units = "unknown" ;
        CLIMDIV_NAME:missing_value = "" ;
        CLIMDIV_NAME:grid_mapping = "grid_mapping" ;
        CLIMDIV_NAME:geometry = "geometry_container" ;
    double x_nodes(node) ;
        x_nodes:units = "degrees_east" ;
        x_nodes:axis = "X" ;
    double y_nodes(node) ;
        y_nodes:units = "degrees_north" ;
        y_nodes:axis = "Y" ;
    int geometry_container ;
        geometry_container:node_coordinates = "x_nodes y_nodes" ;
        geometry_container:geometry_type = "polygon" ;
        geometry_container:node_count = "node_count" ;
        geometry_container:part_node_count = "part_node_count" ;
... etc

Follow-up issues:

> prcp_data <- read_timeseries_dsg("climdiv_prcp.nc")
Error in read_timeseries_dsg("climdiv_prcp.nc") : 
  A timeseries id variable was not found in the file.
In addition: Warning message:
In read_timeseries_dsg("climdiv_prcp.nc") :
  File does not advertise use of the CF timeseries featureType, unexpected behavior may result.

but

climdiv_poly <- read_geometry("climdiv_prcp.nc")

OK.

Update docs on write_timeseries_dsg to ignore overwrite if add_to_existing is TRUE

Seems overwrite would be ignored if add_to_existing is TRUE. Doc says it will error if the file exists.

Perhaps the doc is right and it will error if there is a file regardless of whether you want to add to it. Or perhaps this isn't the way it behaves and the docs should let you know it is ignored if you are adding to an existing file.

Reconcile how ncdfgeom::calculate_area_intersection_weights() calculates weights with gdptools

image

image

The major difference between the two methods is what goes in the denominator when calculating weights; in the case of ncdfgeom it is source geometry area and in the case of gdptools it is target geometry area.

This caused a bit of confusion, which would be nice to fix by:

  • either adding a parameter to the function that allows the user to pick which sort of weight they want to calculate or
  • explaining in the documentation that you can flip between he two with a simple multiplication gdptools(w) = ncdfgeom(w)* Ax/Ay

NOTICE: upcoming default branch name change

The master branch of this repository will soon be renamed from master to main, as part of a coordinated change across the USGS-R and USGS-VIZLAB organizations. This is part of a broader effort across the git community to use more inclusive language. For instance, git, GitHub, and GitLab have all changed or are in the process of changing their default branch name.

We will make this change early in the week of February 6, 2022. The purpose of this issue is to give notification of the change and provide information on how to make it go smoothly.

If you wish to make the change yourself rather than wait for us to do it, it can either be done manually or through some convenience functions in the usethis package.

  • From your local version of the repository, change the remote branch name using either of the methods detailed below: usethis or the manual method.
  • Verify that the local branches are changed as well (if using usethis) or change them yourselves (if using the manual method).
  • If you have collaborators on this repository, let them know that they will need to change their forked/local repos either by running usethis::git_default_branch_rediscover() or by steps 2 and 3 of the manual method. Point them to this issue to facilitate the process!
  • Search within your repository for "master" so that you can change references (e.g. URLs pointing to specific commit points of files) to point to "main" instead.
  • When you are done, feel free to close this issue!

Using usethis

Note: usethis must be version 2.1.2 or higher

  1. Navigate to your project's working directory.
  2. Double-check that you have git credentials set up for HTTPS by running usethis::gh_token_help(). If you have not yet set up git credentials for HTTPS, you can do so by creating a GitHub PAT and using gitcreds::gitcreds_set() to register it with git. The PAT must have at least "repo" scope. gitcreds_gitcreds_set
  3. Rename default branches locally and on all remote repositories at once with usethis::git_default_branch_rename(). For more details see here.
  4. Verify that the work was successful by running usethis::git_default_branch().

Manual Method

  1. Go to <your repository> -> Settings -> Branches and edit the default branch from master to main.
  2. All members should update their local settings so that new repositories created locally will have the correct default branch: git config --global init.defaultBranch main.
  3. All members must update their local settings to match the change to this repository. They can either do this with usethis::git_default_branch_rediscover() (see above) or else run the following:
git branch -m master main
git fetch origin
git branch -u origin/main main
git remote set-head origin -a

Interface with stars

It would be good to interface this package with stars, as it has wider support for raster data cubes and their integration with vector data cubes done here, and links to the GDAL world.

Revisit ragged array writing and reading

The function we have is tested atleast, but it needs to be revisited in light of other work on the package.

  1. Use package env vars instead of hard coded strings.
  2. Update inputs and documentation to be parallel with other functions.
  3. Write needed functionality in checker function to get all the details out of a file.
  4. Create a reader function.

@param `instance_names` , if \code{numeric}, triggers `var.put.nc()` error

While testing an implementation of write_timeseries_dsg, I initially provided a numeric vector of grid cell ids as the instance_names parameter, since the documentation notes that the instance_names vector can be \code{character} or \code{numeric}. However, this triggered the following error upon execution:

Error in var.put.nc(nc, pkg.env$dsg_timeseries_id, instance_names) : 
  length(count) == ndims is not TRUE

That pointed me to this line, so I dug into the addition of the dsg_timeseries_id variable. When it is added, its type is set as "NC_CHAR", which triggers this code block in add_var(). That block determines the maximum length of the elements in the data vector (the instance_names vector), which is then used to define a new dimension (char_dim). Then, when the dsg_timeseries_id variable is defined with a call to var.def.nc, it is defined as having 2 dimensions (char_dim and the original dim specified in the call to add_var() by the user, which is the already defined instance_dim_name dimension). Two dimensions are defined in preparation for later adding the data to that type = "NC_CHAR" variable, because when character variables are added with var.put.nc, they have an implied second dimension:

When writing to NC_CHAR variables, character variables have an implied dimension corresponding to the string length. This implied dimension must be defined explicitly as the fastest-varying dimension of the NC_CHAR variable

Up to this point, providing a \code{numeric} vector as the instance_names vector doesn't trigger any errors. But when that data vector is later added to the dsg_timeseries_id variable with var.put.nc, I'm guessing it prompts the length(count) == ndims is not TRUE error because numeric data do not have a second implied dimension when writing to a NetCDF variable.

If I change my instance_names vector to \code{character}, the call to write_timeseries_dsg() executes without errors and returns the expected NetCDF file.

Add ability to append additional variables.

Given an additional variable, it would be nice to be able to append the new variable to an existing file.

I could see their being one function that initializes an empty file with stations, and all the station metadata (the point stuff) and optionally takes the inputs to a second standalone function that adds data variables to an existing file.

The second function would have to do a bit of verification on an existing file, but it wouldn't be too bad. The number of time steps and the number of stations would need to be verified. Can use the standard name for the station identifier and time to figure out which variable is which.

Add `ncdfgeom` class attribute to list created by functions.

In order to move on #65 it will be helpful to have a method for the ncdfgeom list as an object.

This object may be temporary in the package as there is a chance that ncdfgeom can just use stars objects directly rather than working with the old list format. For now, just adding a class to the list will be backward compatible and allow things to move forward.

Use RNetCDF's time functions.

Handling of time is clumsy right now -- should be using RNetCDF's time functions. Consider contributing some of that to ncmeta.

Improve handling of time

Probably for discussion, but the time handling could be better. I've spent a lot of time with reading and converting time in netcdf over here: https://github.com/jjvanderwal/climates/ Search the repo for 'chron' and 'julian' ugh, so annoying.

I like the current 'just pass in a nice POSIX vector', but I think we could take it a step further and give optional control over the date origin / step with some of the smarts I developed for that climates package.

Convert to sf

This is going to be a bit of a project, but should probably be done all in one go.

Should do a bit of analysis and planning. If there is a chance of breaking the conversion out with some conversion from sf to sp and vice versa along the way, it would probably be worth doing. Maybe just in the tests as a stepping stone.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.