Giter Site home page Giter Site logo

nawendt / gribr Goto Github PK

View Code? Open in Web Editor NEW
24.0 4.0 0.0 3.31 MB

GRIB interface for R using ECMWF ecCodes

License: BSD 3-Clause "New" or "Revised" License

R 52.46% C 40.34% Shell 2.32% M4 4.88%
linux macos grib grib-api ecmwf eccodes r meteorology atmospheric-science c

gribr's Introduction

gribr

GRIB interface for R using the ECMWF ecCodes package.

Anaconda-Server Badge Anaconda-Server Badge
check-standard
check-latest
check-minimum
License

Purpose

Easily read GRIB data into common R data structures.

Status

gribr is contains most of the functionaily found in the ecCodes library. Several functions exist for reading GRIB messages and extracting them. Functions are also available to help with projecting the data. All of the documentation has been written and some basic examples exist to get you started.

NOTE FOR v1.2.3+: Prior to version 2.19.0 of ecCodes, there was a bug where file handles were not closed after being opened when creating a grib index. This led to potential errors when using grib_select on multiple files. Due to this bug, gribr now requires at least ecCodes 2.19.0 beginning with v1.2.3.

Installation

Using conda (Recommended)

gribr can now be install with conda using the conda-forge channel. This is the preferred method as it simplifies things greatly and makes it possible to set up a clean environment to work in. RStudio can be pointed to use the R binary in this environment along with its associated libary, if that is what you use. To install, just run the following:

conda install -c conda-forge r-gribr
Using Package Managers or Source
  1. Prerequisites
  • gribr depends on the ECMWF ecCodes package (>=2.19.0), and the proj4 R package
  • ecCodes can be easily installed via a package manager on Linux (apt-get, yum, etc.) or MacOS (port, fink, brew). Since some repositories carry versions that are too old, you may have to install from source.
    • To install from source, download ecCodes here
  1. Set up environment (if necessary)
  • ecCodes installed in a system location: The the libraries/headers should be found by the linker/compiler without any additional environmental settings.
  • ecCodes installed in a non-system/user location: When ecCodes is installed in a non-system location (i.e., a path that is not in the ld search paths), there are some extra steps that need to be taken depending on your use case:
    • R: Prior to running R CMD INSTALL, set the ECCODES_LIBS and ECCODES_CPPFLAGS environment variables to the ecCodes library and header directories, respectively. The configure script should pick up on these variables and build the PKG_LIBS and PKG_CPPFLAGS for gribr appropriately.

    • RStudio: RStudio does not grab the envionment variables that you set on the terminal or in your shell login scripts. Instead, use .Renviron. If you do not have this file, create it in your home directory (this is where R searches for it by default). Setting ECCODES_LIBS and ECCODES_CPPFLAGS will cause RStudio to load them into the session and allow the configure script to use them.

  • NOTE: When linking to ecCodes from a non-system location, be sure to include the ECCODES_LIBS variable in your LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH on OSX) environment variable. R will not otherwise be able to load the gribr shared object as it will need to know where the linked ecCodes functions are. I have found that RStudio is quite picky that the LD_LIBRARY_PATH is set in a login script. If you have trouble with this in RStudio, make sure it is set at login. One option to avoid setting/altering LD_LIBRARY_PATH is to use -Wl,-rpath=$ECCODES_LIBS in the compiler flags. I have had some luck with this using gcc.
  1. Install gribr
  • From the command line:
R CMD INSTALL /path/to/gribr --configure-vars="ECCODES_CPPFLAGS=-I/path/to/eccodes/include ECCODES_LIBS=-L/path/to/eccodes/lib"

To install the tests, just add --install-tests.

  • From within R or RStudio:
install.packages("path/to/cloned/gribr/repo", repos = NULL, configure.vars = c("ECCODES_LIBS=-L/path/to/eccodes/lib", "ECCODES_CPPFLAGS=-I/path/to/eccodes/include"))
devtools::install_github("nawendt/gribr", configure.args="ECCODES_LIBS=-L/path/to/eccodes/lib ECCODES_CPPFLAGS=-I/path/to/eccodes/include")

Installing the tests through these methods requires adding INSTALL_opts = "--install-tests".

Windows Install Options

While a native Windows ecCodes library that will work with R is not available, there are ways to run gribr on Windows. The first option is to use the Windows Subsystem for Linux. Simply follow the instructions to install the latest R as you would on any other Linux system. Any other necessary libraries should be available on the package manager. Build gribr as usual and enjoy using it on Windows.

The second option is to use gribr via Cygwin. A Cygwin build of R is available through their package manager. Simply make sure to also install the dev versions of the libraries that you will need. This will likely include libraries that the R package proj4 depends on. You will have to install ecCodes from source for this. From there, install gribr from R CMD INSTALL taking care to include all the correct configure-vars. If you build ecCodes with PNG or JPEG support, you'll need to add the appropriate linker flags.

Contributing

You are welcome to contribute to this project.

Current Needs: While all improvements are welcome, here are a few specific needs:

  • Build ecCodes on Windows with MinGW

Contact

Nathan Wendt

gribr's People

Contributors

nawendt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gribr's Issues

grib_context_full_defs_path not supported until GRIB-API 1.13.0

Due to a bug that caused R to crash when the GRIB definition files were not accessible to the GRIB-API, a fix was introduced in d8c8380. This fix utilized an API function that was not originally exposed called grib_context_full_defs_path. However, this means the minimum GRIB-API version that can be supported is now 1.13.0 as the function first appears in that release. Currently, 1.10.0 is listed.

Memory management for batch processing a large number of grib files

@nawendt Thank you for this project.

I would like your opinion when working with a large number of grib files (1000 files or more). When using a code like this:

myfiles <- list.files(mypath)

for (myfile in myfiles) {
  g <- grib_open(myfile)
  ...
  operations that utilize grib_list(), grib_get_message(), etc ...
  ...
  grib_close(g)
}

memory management in R becomes a problem. Although grib_close() is used, each iteration reserves a significant amount of memory which never gets freed leading to total memory exhaustion (24GB of RAM are not enough for processing ~1000 grib files, each one about 8kb). I tried using gc() and rm() for specific temporary variables after each for-loop iteration, but nothing changes. The only way to really free reserved memory is to end the R session.

Have you ever faced such problems and what would you propose as a workaround ?
Thank you in advance.

Create plotting examples

Some examples of plotting data would be useful. This could be done as part of a vignette, basic script, or even an R Notebook. Putting together some wrapper functions for plotting may be a nice, natural extension of this also.

Problem with non unique filename in loop

Thank you for this valuable R package!

I found a possible bug related to the usage of the non unique filenames.

Please consider the following example script:

library(gribr)
library(curl)
library(R.utils)

NOTUNIQUEFILENAME <- 'NOTUNIQUEFILENAME.grb'
if(file.exists(NOTUNIQUEFILENAME)){
	unlink(NOTUNIQUEFILENAME)
}

# download test data
dl_testdata <- function(){
	# download the 00 UTC for today from opendata.dwd.de
	datestr = format(Sys.time(), "%Y%m%d")

	url.ta <- paste0('https://opendata.dwd.de/weather/nwp/icon/grib/00/t_2m/icon_global_icosahedral_single-level_',datestr,'00_000_T_2M.grib2.bz2')
	filename.ta <- 'T_2M.grb.bz2'
	curl_fetch_disk(url.ta, filename.ta)
	bunzip2(filename.ta)

	url.td <- paste0('https://opendata.dwd.de/weather/nwp/icon/grib/00/td_2m/icon_global_icosahedral_single-level_',datestr,'00_000_TD_2M.grib2.bz2')
	filename.td <- 'TD_2M.grb.bz2'
	curl_fetch_disk(url.td, filename.td)
	bunzip2(filename.td)
}





# load forecast
get_fc <- function(varname){

	filename = paste0(varname,'.grb')
	print(filename)
	file.copy(filename, NOTUNIQUEFILENAME)
	gribHandle <- grib_open(NOTUNIQUEFILENAME)
	print(gribHandle)
	print(grib_list(gribHandle))
	#print(paste('varname', varname))
	gribRecord <- grib_select(gribHandle, list(shortName = varname))
	grib_close(gribHandle)
	#print(gribHandle)
	unlink(filename)
	unlink(NOTUNIQUEFILENAME)

	return()

}

dl_testdata()

for(var in c('T_2M','TD_2M')){
	get_fc(var)
}

The first call of get_fc works fine, in the second call, the following error is thrown:

Error in grib_select(gribHandle, list(shortName = varname)) : 
  gribr: no messages matched
Calls: get_fc -> grib_select
Execution halted

It was hard to understand this error, since grib_list(gribHandle) shows, that the variable TD_2M is found in the grib file.
A colleague, however, found, that the problem may be related to grib_select.

Therefore, we looked at the system calls using trace -e trace=open,close,read Rscript gribr-bug_example.R

During the first iteration, the call of grib_select opens two new file decriptors on the grib file.

open(".../NOTUNIQUEFILENAME.grb", O_RDONLY) = 10
open("/.../NOTUNIQUEFILENAME.grb", O_RDONLY) = 11

These file descriptors are not closed lateron (e.g. by grib_close). In the second iteration, another file descriptor is opened. Some ot the data, however, is still read from the file descriptor opened during the first iteration and which is still pointing to the grib file opened during the first iteration.

open("/.../NOTUNIQUEFILENAME.grb", O_RDONLY) = 12
read(11, "GRIB\377\377\0\2\0\0\0\0\0Z\0\251\0\0\0\25\1\0N\0\377\23\1\1\7\345\1\10"..., 8192) = 8192
read(11, "z:\27:\35:\377:E:\237:\2009\2539\0369c9M9\221:(9\3319\366:\n9"..., 5890048) = 5890048
read(11, "\31\224\34\224\25\224\253\224\222\224\245\224\262\224\245\224a\224\210\224}\224\251\224\305\224\270\224\251\224g

Maybe a simple f_close is missing in src/grib_select.c?

Devtools Install Instruction in README.md

Currently, the devtools install instructions are as follows:

devtools::install_github("nawendt/gribr", args = "--configure-args='ECCODES_LIBS=-L/path/to/eccodes/lib ECCODES_CPPFLAGS=-I/path/to/eccodes/include'")

For me, this causes a problematic argument error. The following worked on my machine, and seems to be the correct syntax for configure args in devtools:

devtools::install_github("nawendt/gribr", configure.args="ECCODES_LIBS=-L/path/to/eccodes/lib ECCODES_CPPFLAGS=-I/path/to/eccodes/include")

I would also recommend including INSTALL_opts = "--install-tests" so users can check the install against the included tests.

R version 4.3.1, devtools version 2.4.5

Use CMake instead of autotools

CMake is becoming more and more utilized for building software. Work should be done to explore whether using CMake will make a source build of gribr easier for the end-user.

Error from grib_get_message() with certain files

Hello @nawendt

I'm an R-tool developer working in BSC-CNS (Barcelona Supercomputing Center.) We're trying to use gribr to load the GRIB files but we found some problems.

We downloaded some files, system5 and era5 monthly data, from ECMWF MARS archive. For era5 ones, the gribr functions work well. But for system5 ones, we got errors from grib_get_message().

library(gribr)
g <- grib_open("/path/to/file/tas_20001101.grb")
gm <- grib_get_message(g, 1)

ECCODES ERROR : Wrong size for hourOfEndOfOverallTimeInterval it contains 1 values
Error in grib_get_message(g, 1) : gribr: unable to get long array
GRIB ERROR Passed array is too small

We tried to load the data with pygrib and it works, so we wonder if it is a problem in gribr. I put the file in the link in case it's needed: https://drive.google.com/file/d/1ANMuGn7Y9vXU_UKD4Zp-w8r0ZoxjFgol/view?usp=share_link

Kind regards,
An-Chi

Scale term incorrect when calculating semiminor axis proj4 parameter

grib_proj4str contains an incorrect portion of code that determines the semiminor axis proj4 parameter. The problem occurs when the shapeOfTheEarth GRIB key is set to oblate spheroid. In this case, the semiminor axis parameter was never calculated. The semimajor axis parameter was erroneously in its place.

Create gribr conda package

It would be very helpful to make gribr available on conda. Installation would be much easier for users.

File descriptor leak

An email from a user indicated that they were hitting the Linux-imposed, per-process, open file descriptor limit when looping through a large number of GRIB files. This was occurring even though the grib_close function was being called, appropriately, within the loop.

Missing documentation for users installing from github

roxygen produced documentation was left out of repository originally. This would cause an issue with users installing the package from github not getting the documentation at all. The documentation should be removed from .gitignore

typeOfLevel required

I'm trying to use grib to read a grib2 file

g <- grib_open(file = "rec_JULES_BRAMS05km_2016052800_2016052800.grib2")
cube <- grib_cube(gribObj = g, shortName = 'HGTprs')

the missing input is typeOfLevel, however, idk the key in the grib file. Is there a way to know this input?

Also, when I use grib_select, i got another error:

cube <- grib_select(gribObj =  g, keyPairs = list(shortName = 'HGTprs'))
Error in grib_select(gribObj = g, keyPairs = list(shortName = "HGTprs")) : 
  gribr: no messages matched

Thanks

Memory management with multiple messages

Hi @nawendt ,

thanks a lot for this nice package, very useful!

I used it on ERA5 data and noticed when running grib_get_message with multiple messages, for example

gm_multi <- grib_get_message(g, c(1, 2, 3))

that the memory usage associated with the R process increases significantly, by orders of magnitude beyond the actual size of the gm_multi object. This does not happen when running the function iteratively on single messages, e.g.

gm_multi <- plyr::llply(c(1, 2, 3), function(idx) gribr::grib_get_message(g, idx))

Removing all objects and running garbage collector in R doesn't really decrease the memory use associated with the process.

I am not very familiar with C nor with the associated eccodes functions, but noticed in gribr_grib_get_message that codes_handle_delete(h) is only called when messagesLength = 1. Could that be one reason for the memory not being deallocated in C?

Thanks,
Odran

Missing directory

Hello, I have been trying to download package (gribr) and I keep getting this message

"C:/rtools40/mingw64/bin/"gcc -I"C:/Users/x1/d11/R/R-401.3/include" -DNDEBUG -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c grib_api_version.c -o grib_api_version.o
In file included from grib_api_version.c:4:
gribr.h:1:10: fatal error: eccodes.h: No such file or directory

leading to a failure in the download. What could be the issue?

Improve test coverage

More needs to be done to ensure that the output from R matches that of a correctly built ecCodes library.

readme `install_github()` typo?

Thanks for the package! I just got it installed and i believe i noted a typo in the readme installation instructions:

You've written

devtools::install_github("nawendt/gribr", configure.args="ECCODES_LIBS=-L/path/to/eccodes/lib ECCODES_CPPFLAGS=-I/path/to/eccodes/include")

when I think it should be (like in the install.packages() version):

devtools::install_github("nawendt/gribr",configure.args=c("ECCODES_LIBS=-L/path/to/eccodes/lib", "ECCODES_CPPFLAGS=-I/path/to/eccodes/include"))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.