biodiversitydata-se / sbdi4r Goto Github PK

View Code? Open in Web Editor NEW

1.0 8.0 2.0 10.89 MB

R package to search and access data made available through the Swedish biodiversity data infrastructure SBDI

Home Page: https://biodiversitydata-se.github.io/SBDI4R/

License: GNU Affero General Public License v3.0

R 12.74% HTML 87.26%

ala4r nbn data-quality-assertions biodiversity-informatics

sbdi4r's Introduction

SBDI4R

`SBDI4R` is deprecated; we suggest you use either `sbdi4r2` (soon available) or `galah` instead. Both packages provide an improved interface to ALA data, while providing the same core functionality as `SBDI4R`. For an introduction to galah, visit the GitHub page.

R functionality for the SBDI data portal

The Swedish Biodiversity Data Infrastructure (SBDI) provides tools to enable users of biodiversity information to find, access, combine and visualize data on Swedish plants and animals; available through SBDI. The R package SBDI4R provides a subset of the tools, and some extension tools (found previously in Analysportalen.se), to be directly used within R.

SBDI4R enables the R community to directly access data and resources hosted by SBDI. Our goal is to enable observations of species to be queried and output in a range of standard formats. This tool is built on the Atlas of Living Australia ALA4R package which provides similar services for the ALA. Similar to the NBN4R package SBDI4R wraps ALA4R functions but redirects requests to local web servers. All SBDI, NBN and ALA share similar Application Protocol Interface (API) web services.

Use-examples are available in the package vignette here, or via (in R): vignette("SBDI4R"). If you have any questions please get in contact with us via the support center.

Installing SBDI4R

Windows

In R:

Or the development version from GitHub:

install.packages("remotes")
library(remotes)
install_github("AtlasOfLivingAustralia/ALA4R")
install_github("biodiversitydata-se/SBDI4R")

If you see an error about a missing package, you will need to install it manually, e.g.:

install.packages(c("stringr","sf"))

and then install_github("biodiversitydata-se/SBDI4R") again.

If you see an error about "ERROR: lazy loading failed for package 'SBDI4R'", this may be due to you trying to install on a network location. Try instead to install on a local location: first create the local location you want to use, and then specify this location for installing, and later loading the package:

install_github("biodiversitydata-se/SBDI4R", lib = "C:/pathname/MyLibrary")
library(SBDI4R, lib.loc = "C:/pathname/MyLibrary")

If you wish to use the data.table package for potentially faster loading of data matrices (optional), also do:

install.packages("data.table")

Mac

Follow the instructions for Windows.

If you see an error about a failure to set default locale, you will need to manually set this:

system('defaults write org.R-project.R force.LANG en_US.UTF-8')

and restart R.

More information can be found on the CRAN R for Mac page.

Linux

First, ensure that libcurl is installed on your system --- e.g. on Ubuntu, open a terminal and do:

sudo apt-get install libcurl4-openssl-dev

or install libcurl4-openssl-dev via the Software Centre.

Then, in R:

Or the development version from GitHub:

install.packages("remotes")
library(remotes)
install_github("AtlasOfLivingAustralia/ALA4R")
install_github("biodiversitydata-se/SBDI4R")

If you see an error about a missing package, you will need to install it manually, e.g.:

install.packages(c("stringr","fp"))

and then try installing SBDI4R again.

If you wish to use the data.table package for potentially faster loading of data matrices (optional), also do:

install.packages("data.table")

Using SBDI4R

The SBDI4R package must be loaded for each new R session with library(SBDI4R), or specifying your local location with library(SBDI4R, lib.loc = "C:/pathname/MyLibrary").

Customizing SBDI4R

Various aspects of the SBDI4R package can be customized.

Caching

SBDI4R can cache most results to local files. This means that if the same code is run multiple times, the second and subsequent iterations will be faster. This will also reduce load on the web servers. By default, this caching is session-based, meaning that the local files are stored in a temporary directory that is automatically deleted when the R session is ended. This behaviour can be altered so that caching is permanent, by setting the caching directory to a non-temporary location. For example, under Windows, use something like:

sbdi_config(cache_directory = file.path("c:","mydata","sbdi_cache")) ## Windows

or for Linux:

sbdi_config(cache_directory = "~/mydata/sbdi_cache") ## Linux

Note that this directory must exist (you need to create it yourself).

All results will be stored in that cache directory and will be used from one session to the next. They won't be re-downloaded from the server unless the user specifically deletes those files or changes the caching setting to "refresh".

If you change the cache_directory to a permanent location, you may wish to add something like this to your .Rprofile file, so that it happens automatically each time the SBDI4R package is loaded:

setHook(packageEvent("SBDI4R", "onLoad"), 
        function(...) sbdi_config(cache_directory=file.path("~","mydata","sbdi_cache")))

Caching can also be turned off entirely by:

sbdi_config(caching="off")

or set to "refresh", meaning that the cached results will re-downloaded from the SBDI servers and the cache updated. (This will happen for as long as caching is set to "refresh" — so you may wish to switch back to normal "on" caching behaviour once you have updated your cache with the data you are working on).

E-mail address

Each download request to SBDI servers is also accompanied by an "e-mail address" string that identifies the user making the request. You will need to provide an email address registered with the SBDI. You can create an account here. Once an email is registered with the SBDI, it should be stored in the config:

sbdi_config(email="[email protected]")

Else you can provide this e-mail address as a parameter directly to each call of the function occurrences().

User-agent string

Each request to SBDI servers is accompanied by a "user-agent" string that identifies the software making the request. This is a standard behaviour used by web browsers as well. The user-agent identifies the user requests to SBDI, helping SBDI to adapt and enhance the services that it provides. By default, the SBDI4R user-agent string is set to "SBDI4R" plus the SBDI4R version number (e.g. "SBDI4R 1.0").

NO other personal identification information is sent. You can see all configuration settings, including the the user-agent string that is being used, with the command:

sbdi_config()

Debugging

If things aren't working as expected, more detail (particularly about web requests and caching behaviour) can be obtained by setting the verbose configuration option:

sbdi_config(verbose=TRUE)

Setting the download reason

SBDI requires that you provide a reason when downloading occurrence data (via the SBDI4R occurrences() function). You can provide this as a parameter directly to each call of occurrences(), or you can set it once per session using:

sbdi_config(download_reason_id=your_reason_id)

(See sbdi_reasons() for valid download reasons, e.g. download_reason_id=10 for "testing", or 7 for "ecological research", 8 for "systematic research/taxonomy", 3 for "education")

Other options

If you make a request that returns an empty result set (e.g. an un-matched name), by default you will simply get an empty data structure returned to you without any special notification. If you would like to be warned about empty result sets, you can use:

sbdi_config(warn_on_empty=TRUE)

See examples on how to use the package in the next vignette

sbdi4r's People

Contributors

Stargazers

Watchers

Forkers

jha-vineet69 erikrikarddaniel

sbdi4r's Issues

SitesBySpecies broken

the api for https://spatial.bioatlas.se/ws/sitesbyspecies/
seems to be returning wrong answer where columns are both the species names and something else
e.g. accipiterNisusLinnaeus; x1758; accipiterGentilisLinnaeus; x17581

example using other polygons

make an example using other set of polygons (e.g. municipalities) as it where grid cells

Add standard Swedish Grids as data

https://github.com/biodiversitydata-se/Sweden_Grids

search occurrences for a time period (from start time to end time)

current possibilities to filter for time when searching/downloading occurrences: using the fq argument to specify e.g. year: fq=c("year:2010 OR year:2019") will return occurrences for 2010 and 2019. But if I want to have all data for a time period, e.g. from 2010-2019 I would have to list all those years. Worse if I wanted data from say 1 May 2011 to 31 December 2017. With current package functionality I cannot easily search for a time period starting from-date (or from-year) and ending to-date (or to-year), or? I can only download data and then filter afterwards. would limiting search to a time period become more easily possible with facet search?

common filter strings, like for searches “from year to year2"

Add a function that facilitates generation of certain common filter strings, like for searches “from year to year2?

search occurrences for a polygon

polygons can be specified by the wkt argument. provide a simple example of how to use/get a wkt for 1) a Swedish administrative boundary, e.g. county, or municipality, 2) own polygon e.g. as specified by a geoJson or shapefile.

Search for redlisted species (NBN)

using an indexed species list [as the Swedish lists are not implemented in Bioatlas yet – can we provide an example with NBN data and list?

"HTTP status code 404 received" with occurrences()

I can not download SBDI-data via the occurrences()-function. Other functions such as search_fulltext(), taxinfo_download() reach the servers but not occurrences(). Changing email dosen't work. The same error for several days and not possbible to check if the server is down at biodiversitydata.se.

occurrences(taxon="Callitriche cophocarpa", email="[email protected]", download_reason_id=10)
Caching https://logger.biodiversitydata.se/service/logger/reasons to file C:\Users\larwe75\AppData\Local\Temp\Rtmp0ycfIr/9661a42ce1390d5ae52c01eac576ad77
-> GET /service/logger/reasons HTTP/1.1
-> Host: logger.biodiversitydata.se
-> User-Agent: SBDI4R 1.0.0
-> Accept-Encoding: deflate, gzip
-> Accept: application/json, text/xml, application/xml, /
->
<- HTTP/1.1 404 Not Found
<- Server: nginx/1.21.1
<- Date: Wed, 11 Aug 2021 09:22:59 GMT
<- Content-Length: 0
<- Connection: keep-alive
<- Strict-Transport-Security: max-age=31536000
<- Referrer-Policy: strict-origin-when-cross-origin
<- X-Content-Type-Options: nosniff
<-
Error in check_status_code(status_code, on_redirect = on_redirect, on_client_error = on_client_error, :
HTTP status code 404 received.
Either there was an error with your
request or in the SBDI4R package, or the servers are down. If this problem persists please notify the SBDI4R maintainers
by lodging an issue at SBDI4R github repo https://github.com/biodiversitydata-se/SBDI4R/issues

sbdi_config()
$caching
[1] "on"
$cache_directory
[1] "C:\Users\larwe\AppData\Local\Temp\Rtmp0ycfIr"
$user_agent
[1] "SBDI4R 1.0.0"
$download_reason_id
[1] 7
$verbose
[1] FALSE
$warn_on_empty
[1] FALSE
$text_encoding
[1] "UTF-8"
$email
[1] "[removed]@liu.se"

sbdi4r Scatterplot tool

There is no specific scatterplot-tool function in ALA4R but the corresponding can be done after data retrieval with ALA4R and standard R packages. Build scatterplot-tool function or just provide example with workflow?

Also add/show possibilities to plot occurrences (e.g. frequencies) against one or more environmental variables (current scatterplot tool only plots presences against 2 or more variables) - like in Example 4 at https://atlasoflivingaustralia.github.io/ALA4R/articles/ALA4R.html for species richness.

Update URL in onload

The url here https://github.com/biodiversitydata-se/SBDI4R/blob/master/R/onload.R#L33 should probably be

https://spatial.bioatlas.se/alaspatial/ws

as in the original ALA4R
https://github.com/AtlasOfLivingAustralia/ALA4R/blob/master/R/onload.R#L28
or the NBN4R
https://github.com/fozy81/NBN4R/blob/master/R/onload.R#L28

Time series search and analysis

is part of #18

Enable time period and time units as a priori search criteria. As a user I want to be able to retrieve records for specified period (from date - to date), for specified time window reoccurring over years (e.g. 5 June-25 June every year between year 2000-2015). Want to be able to get tables or plots for occurrences for specified time units (e.g. year, week).
In current ALA4R one can filter time periods and plot time series ad hoc mainly: occurrences_s3() function can specify 'temporal' to specify the temporal unit for which to keep unique records (e.g. year, month, yearmonth or full date. privide functionality (extending the occurrence() function for a priori selection of time.

Add license

Add relevant permissible license if applicable

fields for search

originally the argument "extra" requires that ..."Field names can be passed as full names (e.g. "Radiation - lowest period (Bio22)") rather than id ("el871"). "
however, the e.g. environmental layer in bioatlas sbdi_fields(fields_type = "occurrence") does not have other names or decription than elXXXX.

Grid-based data aggregation and map views

We want to provide functionality to sort observations into grid cells (a) and aggregate observations for spatial grids (assign observations to grid cells, ble to retrieve as output a table with all occurrences and column for associated grid cell, and aggregated results: no. observations and/or no. species per cell), and provide grid-based species observation maps. Points can be assigned to grid cells using the points-to-grid tool (in the Spatial Portal), but tool only provides result as table, no map view.
We can either provide a set of grid sizes to choose from (see below for layers provided) and/or let users define gridcell size.

a weird search

A search within a very Swedish polygon

wkt <- "POLYGON((14.94 58.88, 14.94 59.69, 18.92 59.69, 18.92 58.88, 14.94 58.88))"
env_layers <- c("el10007","el10011")
x <- occurrences(taxon="family:Fabaceae", wkt=wkt, qa="none", download_reason_id="testing", extra=env_layers)

returns observations in e.g. Alaska... The observation is from LUND collection which is not either within the search polygon

search by observation type

is part of #18

Observations labelled as Not rediscovered, non-natural occurrence should be possible for the user to filter out.
Provide this filtering option using data fields (Adb/Artportalen fields): IsNeverFoundObservation, IsNotRediscoveredObservation, IsNaturalOccurrence, IsPositiveObservation.

Fields need to be included in the SOLR index, current bioatlas SOLR scheme: https://github.com/bioatlas/ala-docker/blob/develop/solr7/mycores/biocache/conf/schema.xml

Check that all functions have working examples

and all functions are in the vignetes

search for species observations using coordinate accuracy

is part of #18

Current ALA4R has no such a priori selection (can only specify specific field values of form "indexedfield:value" to match with but no "larger than" or "smaller than"), user has to filter subsequently.
User wants to a priory select data by minimum accuracy of the coordinates.

Field for coordinate accuracy needs to be included in the SOLR index, current bioatlas SOLR scheme: https://github.com/bioatlas/ala-docker/blob/develop/solr7/mycores/biocache/conf/schema.xml

sbdi4r species list by traits (species attributes & facts)

function specific for Swedish data:

We want R functions dynamically creating species lists from Artfakta information (via Artfakta APIs) for use in bioatlas.
We need to find out how we can use Artfakta APIs.

Artfakta = species facts i.e. information linked to species including:
Swedish occurrence, landscape type, biotope, substrate, some traits, Red list information (categories)

Users want to be able to search for records of all species classed according to, e.g. a specific substrate. There are 2 ways to do this: 1) using species lists indexed in the LA portal (predefined, e.g. Red list, or can also import own lists), 2) dynamically create list in R using Artfakta API:s.

We want to create a user-defined species list based on species facts information so we can search for records for the species on that list. In order for the user to select a class or category they need to be presented with the values to choose from.
Note that the current occurrence() in ALA4R does not support lists, only search for one taxon. But there are API web services that allow using species lists so it should be possible to add this to our wrapper package.

More information on API Species Information in attached pdf (in Swedish only):
API-dokumentation_SpeciesDataService_1.1.pdf

Example to use imported geojson file, LOW Priority

selecting species observations: facet search & filtering

We want to be able to search for species observations using flexible a priori (before download) filter on field values.

Similar to other filters: ALA4R occurrence() function allows filtering by exact match field value but not more flexible rules, and not by facets. In occurrence() function one can specify a 'fq' optional string for specifying filters to be applied to the original query, of the form "indexedfield:value" (I assume this works only for exact matches but no "larger than" or "smaller than" or other more flexible rule). This way the user can only filter after data selection.
Provide solution to select according to rules applied on field values a priory and select according to multiple facets.

Fields that we want to use need to be included in the SOLR index, current bioatlas SOLR scheme: https://github.com/bioatlas/ala-docker/blob/develop/solr7/mycores/biocache/conf/schema.xml

ALA guide on Faceting and Filtering

ALA4R occurrence() function: https://rdrr.io/cran/ALA4R/man/occurrences.html

sbdi4r workflows

provide workflows for working with sbdi4r and other packages for e.g. data cleaning, taxonomy (dyntaxa), data review (birds), recorder/recording metrics RecorderMetrics), analysis (sparta), etc

divide Example 2 in vignette

this example is getting to long and maybe can be dividen into little examples to be searched quickly

exceeding functionality transfer:

in the future we want to add or point to additional functionality:

Help users to assess whether data is fit for their purpose: e.g. estimating and visualising sampling effort (as a minimum: get number of observers, number of unique field visits, species list length for field visits)

Provide workflows for estimating variables/proxies for biodiversity assessements: e.g. EBV Essential Biodiversity Variables, Red-List-Index, SEBI 2020 indicators, Species Richness Index, Shannon-Wiener Index, etc.

This is currently addressed here: https://github.com/biodiversitydata-se/biodiversity-analysis-tools

sbdi4r Scatterplot tool

sbdi4r species list by traits (species attributes & facts)

function specific for Swedish data:

We want R functions dynamically creating species lists from Artfakta information (via Artfakta APIs) for use in bioatlas.
We need to find out how we can use Artfakta APIs.

Artfakta = species facts i.e. information linked to species including:
Swedish occurrence, landscape type, biotope, substrate, some traits, Red list information (categories)

More information on API Species Information in attached pdf (in Swedish only):
API-dokumentation_SpeciesDataService_1.1.pdf

aggregate observations

aggregate observations as total #obs in gridcell, #species in gridcell, #obs/species in gridcell, add lines of code to show how to save the table with the aggregated data.

Do it using BIRDS

Provide examples to showcase Dyntaxa package

to get taxa to be used in occurrences(). Re-use examples from Dyntaxa documentation.

https://github.com/bioatlas/dyntaxa

Add filters for coordinates uncertainty

can this be added to pick_filter?

TIME, agregate for time units (year month date). and produce time series

Use BIRDS for this. Depends on bug #6

add Grid ID on the GRIDS examples

column month not coming in GET request

calling this API
https://records.bioatlas.se/ws/occurrences/index/download?reasonTypeId=10&q=:&fq=genus:Otis&reasonTypeId=10&sourceTypeId=2001
returns (among others) the columns

Year | Day | Verbatim event date

but not Month.

Any idea why? this affects some functions depending on Month

after package release

https://bioatlas.se/how-to-use-the-bioatlas/ should mention SBDI4R not ALA4R

remove argument method and all mentions to it from orruccence

the argument is deprecated in the latest verison of ALA4R

fields = "all" in occurrence()

Describe the bug
The arguments fields and extra are not working as expected

To Reproduce
calling

occurrences(taxon="sommarlånke",
                 fq = "data_resource_uid:dr5",
                 fields = "all", #or extra = "all",
                 email="[email protected]",
                 download_reason_id=10)

returns
HTTP status code 414 received. Either there was an error with your request or servers are down
"URL length may be longer than is allowed by the server"

Expected behavior
a table with all available columns

common names

at some point somebody asked for search by common names.

this search for example
x <- occurrences(taxon="golden plover",
download_reason_id=10,
qa=sbdi_fields("assertions")$name)

returns data, and even add the column commonName, otherwise not included.

search for species observations using threshold values

similar to coordinate uncertainty (#20): user want to get only observations above or below a threshold, e.g. count>50

biodiversitydata-se / sbdi4r Goto Github PK

sbdi4r's Introduction

SBDI4R

SBDI4R is deprecated; we suggest you use either sbdi4r2 (soon available) or galah instead. Both packages provide an improved interface to ALA data, while providing the same core functionality as SBDI4R. For an introduction to galah, visit the GitHub page.

R functionality for the SBDI data portal

Installing SBDI4R

Windows

Mac

Linux

Using SBDI4R

Customizing SBDI4R

Caching

E-mail address

User-agent string

Debugging

Setting the download reason

Other options

See examples on how to use the package in the next vignette

sbdi4r's People

Contributors

Stargazers

Watchers

Forkers

sbdi4r's Issues

Recommend Projects

Recommend Topics

Recommend Org

`SBDI4R` is deprecated; we suggest you use either `sbdi4r2` (soon available) or `galah` instead. Both packages provide an improved interface to ALA data, while providing the same core functionality as `SBDI4R`. For an introduction to galah, visit the GitHub page.