trias-project / trias Goto Github PK

View Code? Open in Web Editor NEW

3.0 5.0 1.0 136.41 MB

R package with functionality for TrIAS and LIFE RIPARIAS

Home Page: https://trias-project.github.io/trias

License: MIT License

R 100.00%

r r-package rstats oscibio invasive-species

trias's Introduction

trias

Trias is an R package providing functionality for the Tracking Invasive Alien Species (TrIAS) and LIFE RIPARIAS projects.

To get started, see:

Function reference: overview of all functions.

Installation

You can install the development version of trias from GitHub with:

# install.packages("devtools")
devtools::install_github("trias-project/trias")

trias's People

Contributors

Stargazers

Watchers

Forkers

mielhostens

trias's Issues

Specify type columns occurrence data beforehand

Importing (big) occurrence downloads in R means fighting constantly against parsing failures. This is due to the fact that some fields have NAs in the first rows, sometimes hundreds of thousands.
One trick is to increase the number of rows R uses to guess type (parameter guess_max in read_delim() function). However, if the number of rows with NA is very high, parsing failures have to be solved by defining the type you expect to get. Doing it everytime for each file is time consumming. My idea is to write the specifications for each file occurrence data field. They are 237, as far my experience with occurrence downloads says to me. I already made a list of almost 90 fields few days ago. I put them together in a gist: https://gist.github.com/damianooldoni/01da78e5e55617798804db1804434754. I know, it's boring (very boring!) but it saves time in the future.
@peterdesmet : What do you think about putting it in trias package?

Included verifiedBy column after remarks

Add a column verifiedBy in verification input and output. The content should remain unaltered (just like remarks).

Issues returned as codes by name_usage()

The last version of name_usage() solves the problem of returning more rows than taxa (see issue ropensci/rgbif#324). However, the field issues doesn't contain the issue names, but the codes which are not easy to interpret without using the table returned by rgbif::name_issues().
My proposal is to use the pipes in my gist damianooldoni/get_df_with_issues in our get_taxa.Rmd pipeline to substitute the codes with the corresponding issues.
@peterdesmet : what do you think about? Should we code this step in a trias function? I don't think so, at least for the moment.

Check correctness GAM output

I am writing unit-tests for the in-development function apply_gam, based on function defined in modelling pipeline, which on its turn has been developed from a function of @ToonVanDaele.

@ToonVanDaele: could you please check basic GAM of this dummy data, please?

# dummy data
df_gam <- data.frame(
  taxonKey = rep(2224970, 19),
  canonicalName = rep("Palaemon macrodactylus", 19),
  year = seq(2001, 2019),
  n_observations = c(1, 5, 8, 12, 18, 23, 30, 40, 60, 40, 20, 10, 1, 
                     3, 10, 20, 35, 50, 80),
  stringsAsFactors = FALSE
)

# define evaluation year(s)
evaluation_year <- 2018

# apply function without any baseline correction
basic_gam <- apply_gam(df = df_gam,
                       y_var = "n_observations",
                       eval_years = evaluation_year)

I get these values for the minimal guaranteed growth, i.e. model$family$linkinv(lower):

> basic_gam$output$growth
 [1] 1.7777714 1.6977409 1.6088981 1.4940603 1.3330764 1.1393911 0.9689625 0.8451249
 [9] 0.7635014 0.7126579 0.6854127 0.6902043 0.7421756 0.8565297 1.0434188 1.2814552
[17] 1.5424851 1.8369261 2.1775717

which I find strange if we see the output graph:

I would expect a negative first derivative when not emerging (green dots) and a negative lower CI value of it as well. @ToonVanDaele : what do you think about? Do you find it strange as well?
If yes, could you please check results using your code?
If no, could you explain me why is it not strange?

Better cartographic image of climate zones at risk of invasion

as proposed by @timadriaens

Having discussed with @SanderDevisscher and looked at the results of the bulbul climate matching, we notice that the maps produced are not very clear nor useful to provide an idea to a risk assessor about areas at risk. The reason is that the %overlap (perc_climate) is calculated on the entire global dataset and many records do not "hit" a climate classification that occurs in the risk assessment area (Europe). Consequently, when making the legend, the 'suitability' is extremely low overall.

To better visualize the areas at risk in Europe, we propose to calculate the % overlap using only the records that intersect in a climate classification occurring in Europe. Maybe we could also think about an automated way of reclassification (highly suitable, moderately suitable, low suitability).

=> I'll add this "subset" as a new type of map output based on the code below, the cm output and the single species maps
https://github.com/inbo/riparias-prep/blob/5c3039199c6104a30adf0a382651e16fe2c202b3/scripts/bul_bul_cm.Rmd#L89-L156

this output has to be NULL when maps == FALSE and/or when missing(region)

n_totaal, and consequently the perc_climate, needs to be recalculated to only take observations in climate zones present in the region of intrest into account.

the new outputs shall be:

region_cm: the recalculated cm
region_single_species_maps: The visualized region_cm with bbox around the region_shape

Number of introduced taxa in graphs is way larger than expected

While tackling #54, I found that the cumulative plot shows more than 8000 alien taxa. This is not true and it is due to the fact that we tidied data on all description fields. This was not the case when we first developed the graph functions indicator_introduction_year() and indicator_total_year().

I solve it now together with #54.

Add vignette about climate match function

From #77: about the entire workflow behind this function, I really suggest to write a vignettte about. I mean, it's a pity the schema you drew in #77 (comment) would be "lost" in a PR. I can help providing you the yml structure and templates.

Errors while running climate matching

This line of code is the probable cause of the error below:

trias/R/climate_match.R

Line 167 in 6c1dc31

if(length(taxon_key) == 1 & is.na(taxon_key)){

Error in if (length(taxon_key) == 1 & is.na(taxon_key)) { :
the condition has length > 1

this is an other error I get:

Error:
! Can't subset .data outside of a data mask context.

the error only occurs when maps == TRUE
this is the rlang::last_error() printout:

Backtrace:

trias::climate_match(...)

leaflet::addCircleMarkers(...)

leaflet::invokeMethod(...)

leaflet::evalFormula(list(...), data)

leaflet evalAll(list)

base::lapply(x, evalAll)

leaflet FUN(X[[i]], ...)

leaflet:::resolveFormula(x, data)

[ base::eval(...) ] with 1 more call
Run rlang::last_trace() to see the full context.

and the last_trace:

├─trias::climate_match(...)

│ └─... %>% ... at trias/R/climate_match.R:491:4

├─leaflet::addPolygons(., data = sea, fillColor = "#e0e0e0", weight = 0.5)

│ ├─... %>% expandLimitsBbox(pgons)

│ └─leaflet::invokeMethod(...)

│ └─leaflet::dispatch(...)

├─leaflet::expandLimitsBbox(., pgons)

│ └─leaflet::expandLimits(map, bbox[2, ], bbox[1, ])

├─leaflet::addLayersControl(., baseGroups = ~.data$acceptedScientificName)

│ ├─leaflet::invokeMethod(...)

│ │ └─crosstalk::is.SharedData(data)

│ └─leaflet::getMapData(map)

├─leaflet::addLegend(...)

│ ├─leaflet::invokeMethod(map, data, "addLegend", legend)

│ │ └─crosstalk::is.SharedData(data)

│ └─leaflet::getMapData(map)

├─leaflet::addLegend(...)

│ ├─leaflet::invokeMethod(map, data, "addLegend", legend)

│ │ └─crosstalk::is.SharedData(data)

│ └─leaflet::getMapData(map)

├─leaflet::addCircleMarkers(...)

│ ├─... %>% expandLimits(pts$lat, pts$lng)

│ └─leaflet::invokeMethod(...)

│ └─leaflet::evalFormula(list(...), data)

│ └─leaflet evalAll(list)

│ └─base::lapply(x, evalAll)

│ └─leaflet FUN(X[[i]], ...)

│ └─leaflet:::resolveFormula(x, data)

│ └─base::eval(f[[2]], metaData(data), environment(f))

│ └─base::eval(f[[2]], metaData(data), environment(f))

│ ├─acceptedScientificName

│ └─rlang:::$.rlang_fake_data_pronoun(.data, acceptedScientificName) #=> suspected cause of error

│ └─rlang:::stop_fake_data_subset(call)

│ └─rlang::abort(...)

└─leaflet::expandLimits(., pts$lat, pts$lng)

How verify_taxa() handles taxa which don't need manual verification

Three options:

verify_taxa() doesn't accepts taxa with bb_taxonomicStatus equal to ACCEPTED or DOUBTFUL, so throwing an error. In other words, the function doesn't accept taxa which don't need verification.
verify_taxa() throws a warning saying: n taxa do not need verification. They will be removed. Taxon keys: and a list of related taxon keys follows.
verify_taxa() doesn't throw any warning. It just filter these taxa out.
verify_taxa() adds these taxa to output as well with verification_key = bb_taxonKey.

I would like to ask you, @peterdesmet, which option do you prefer?
I implemented option 2 in order to be flexible, but at the same time informative.
Let me know your meaning about. Thanks.

problems with installation of trias package

While trying to install the trias package, I got this error message in return:

ERROR: dependencies 'dplyr', 'readr', 'rgbif', 'rlang', 'tibble', 'tidyr', 'tidyselect' are not available for package 'trias' removing 'C:/R/Library/trias' In R CMD INSTALL

I used this code to install the package:

devtools::install_github("trias-project/trias")

Consistent use of snake case for function arguments

From #77:
In climate match function, use taxon_key instead of taxonkey, zip_file instead of zipfile etc.: ALL functions in this package use snake_case. No reason to make an exception for this function.

verify_taxa: error

When running this:

verification <- trias::verify_taxa(taxa, verification)

I get:

Check input dataframes...DONE.
Column verificationKey already exists. It will be overwritten.
Assign verificationKey to taxa which don't need verification...DONE.
Find new synonyms...DONE.
Find new unmatched taxa...DONE.
Update backbone scientific names...DONE.
Update backbone accepted names...DONE.
Retrieve backbone info about accepted taxa for synonyms...DONE.
Detect outdated data...DONE.
Check verification keys...
Quitting from lines 466-467 (unified-checklist.Rmd) 
Error: Column names `7820753`, `2287073`, `2702816`, `9393876`, `2702820`, and 53 more must not be duplicated.
Use .name_repair to specify repair.
Backtrace:
     █
  1. ├─rmarkdown::render_site(encoding = "UTF-8")
  2. │ └─generator$render(...)
  3. │   ├─xfun::in_dir(...)
  4. │   └─bookdown:::render_book_script(output_format, envir, quiet)
  5. │     └─bookdown::render_book(...)
  6. │       └─bookdown:::render_cur_session(...)
  7. │         └─rmarkdown::render(main, output_format, ..., clean = clean, envir = envir)
  8. │           └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
  9. │             └─knitr:::process_file(text, output)
 10. │               ├─base::withCallingHandlers(...)
 11. │               ├─knitr:::process_group(group)
 12. │               └─knitr:::process_group.block(group)
 13. │                 └─knitr:::call_block(x)
 14. │                   └─knitr:::block_exec(params)
 15. │                     ├─knitr:::in_dir(...)
 16. │                     └─knitr:::evaluate(...)
 17. │                       └─evaluate::evaluate(...)
 18. │                         └─evaluate:::evaluate_call(...)
 19. │                           ├─evaluate:::timing_fn(...)
 20. │                           ├─base:::handle(...)
 21. │                           ├─base::withCallingHandlers(...)
 22. │                           ├─base::withVisible(eval(expr, envir, enclos))
 23. │                           └─base::eval(expr, envir, enclos)
 24. │                             └─base::eval(expr, envir, enclos)
 25. └─trias::verify_taxa(taxa, verification)
 26.   └─trias::gbif_verify_keys(verification_keys)
 27.     └─purrr::map_df(gbif_info, ~is.character(.) == FALSE)
 28.       └─dplyr::bind_rows(res, .id = .id)
 29.         ├─tibble::as_tibble(dots)
 30.         └─tibble:::as_tibble.list(dots)
 31.           └─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
 32.             └─tibble:::set_repaired_names(x, repair_hint = TRUE, .name_repair)
 33.               ├─rlang::set_names(...)
 34.               └─tibble:::repaired_names(...)
 35.                 ├─tibble:::subclass_name_repair_errors(...)
 36.                 │ └─base::withCallingHandlers(...)
 37.                 └─vctrs::vec_as_names(...)
 38.                   └─(function () ...
 39.                     └─vctrs:::validate_unique(names = names, arg = arg)
 40.                       └─vctrs:::stop_names_must_be_unique(names, arg)
 41.                         └─vctrs:::stop_names(...)
 42.                           └─vctrs:::stop_vctrs(class = c(class, "vctrs_error_names"), ...)
Warning messages:
1: `progress_estimated()` was deprecated in dplyr 1.0.0. 
2: `progress_estimated()` was deprecated in dplyr 1.0.0. 
Execution halted

Exited with status 1.

Session Info

─ Session info ───────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.2 (2020-06-22)
 os       macOS Mojave 10.14.6        
 system   x86_64, darwin17.0          
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Brussels             
 date     2021-03-26                  

─ Packages ───────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source                              
 assertable    0.2.8   2021-01-27 [1] CRAN (R 4.0.2)                      
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.2)                      
 backports     1.2.1   2020-12-09 [1] CRAN (R 4.0.2)                      
 bookdown      0.21    2020-10-13 [1] CRAN (R 4.0.2)                      
 broom         0.7.5   2021-02-19 [1] CRAN (R 4.0.2)                      
 cachem        1.0.4   2021-02-13 [1] CRAN (R 4.0.2)                      
 callr         3.5.1   2020-10-13 [1] CRAN (R 4.0.2)                      
 cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.0.2)                      
 cli           2.3.1   2021-02-23 [1] CRAN (R 4.0.2)                      
 clisymbols    1.2.0   2017-05-21 [1] CRAN (R 4.0.2)                      
 colorspace    2.0-0   2020-11-11 [1] CRAN (R 4.0.2)                      
 conditionz    0.1.0   2019-04-24 [1] CRAN (R 4.0.2)                      
 cowplot       1.1.1   2020-12-30 [1] CRAN (R 4.0.2)                      
 crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.2)                      
 crul          1.1.0   2021-02-15 [1] CRAN (R 4.0.2)                      
 curl          4.3     2019-12-02 [1] CRAN (R 4.0.1)                      
 data.table    1.14.0  2021-02-21 [1] CRAN (R 4.0.2)                      
 DBI           1.1.1   2021-01-15 [1] CRAN (R 4.0.2)                      
 dbplyr        2.1.0   2021-02-03 [1] CRAN (R 4.0.2)                      
 desc          1.3.0   2021-03-05 [1] CRAN (R 4.0.2)                      
 devtools      2.3.2   2020-09-18 [1] CRAN (R 4.0.2)                      
 digest      * 0.6.27  2020-10-24 [1] CRAN (R 4.0.2)                      
 dplyr       * 1.0.5   2021-03-05 [1] CRAN (R 4.0.2)                      
 egg           0.4.5   2019-07-13 [1] CRAN (R 4.0.2)                      
 ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.2)                      
 evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.1)                      
 fansi         0.4.2   2021-01-15 [1] CRAN (R 4.0.2)                      
 fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.0.2)                      
 forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.0.2)                      
 fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)                      
 generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.2)                      
 ggplot2     * 3.3.3   2020-12-30 [1] CRAN (R 4.0.2)                      
 glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)                      
 gratia        0.5.1   2021-01-24 [1] CRAN (R 4.0.2)                      
 gridExtra     2.3     2017-09-09 [1] CRAN (R 4.0.2)                      
 gtable        0.3.0   2019-03-25 [1] CRAN (R 4.0.2)                      
 haven         2.3.1   2020-06-01 [1] CRAN (R 4.0.2)                      
 here        * 1.0.1   2020-12-13 [1] CRAN (R 4.0.2)                      
 hms           1.0.0   2021-01-13 [1] CRAN (R 4.0.2)                      
 htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)                      
 htmlwidgets   1.5.3   2020-12-10 [1] CRAN (R 4.0.2)                      
 httpcode      0.3.0   2020-04-10 [1] CRAN (R 4.0.2)                      
 httr          1.4.2   2020-07-20 [1] CRAN (R 4.0.2)                      
 janitor     * 2.1.0   2021-01-05 [1] CRAN (R 4.0.2)                      
 jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.0.2)                      
 knitr         1.31    2021-01-27 [1] CRAN (R 4.0.2)                      
 lattice       0.20-41 2020-04-02 [1] CRAN (R 4.0.2)                      
 lazyeval      0.2.2   2019-03-15 [1] CRAN (R 4.0.2)                      
 lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.2)                      
 lubridate     1.7.10  2021-02-26 [1] CRAN (R 4.0.2)                      
 magrittr    * 2.0.1   2020-11-17 [1] CRAN (R 4.0.2)                      
 Matrix        1.3-2   2021-01-06 [1] CRAN (R 4.0.2)                      
 memoise       2.0.0   2021-01-26 [1] CRAN (R 4.0.2)                      
 mgcv          1.8-34  2021-02-16 [1] CRAN (R 4.0.2)                      
 modelr        0.1.8   2020-05-19 [1] CRAN (R 4.0.2)                      
 munsell       0.5.0   2018-06-12 [1] CRAN (R 4.0.2)                      
 mvnfast       0.2.5.1 2020-10-14 [1] CRAN (R 4.0.2)                      
 nlme          3.1-152 2021-02-04 [1] CRAN (R 4.0.2)                      
 oai           0.3.0   2019-09-07 [1] CRAN (R 4.0.2)                      
 openxlsx    * 4.2.3   2020-10-27 [1] CRAN (R 4.0.2)                      
 patchwork     1.1.1   2020-12-17 [1] CRAN (R 4.0.2)                      
 pillar        1.5.1   2021-03-05 [1] CRAN (R 4.0.2)                      
 pkgbuild      1.2.0   2020-12-15 [1] CRAN (R 4.0.2)                      
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.2)                      
 pkgload       1.2.0   2021-02-23 [1] CRAN (R 4.0.2)                      
 plotly        4.9.3   2021-01-10 [1] CRAN (R 4.0.2)                      
 plyr          1.8.6   2020-03-03 [1] CRAN (R 4.0.2)                      
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.2)                      
 processx      3.5.0   2021-03-23 [1] CRAN (R 4.0.2)                      
 progress    * 1.2.2   2019-05-16 [1] CRAN (R 4.0.2)                      
 ps            1.6.0   2021-02-28 [1] CRAN (R 4.0.2)                      
 purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.0.2)                      
 R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.2)                      
 Rcpp          1.0.6   2021-01-15 [1] CRAN (R 4.0.2)                      
 readr       * 1.4.0   2020-10-05 [1] CRAN (R 4.0.2)                      
 readxl        1.3.1   2019-03-13 [1] CRAN (R 4.0.2)                      
 remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.2)                      
 reprex        1.0.0   2021-01-27 [1] CRAN (R 4.0.2)                      
 reshape2      1.4.4   2020-04-09 [1] CRAN (R 4.0.2)                      
 rgbif       * 3.5.2   2021-01-27 [1] CRAN (R 4.0.2)                      
 rlang         0.4.10  2020-12-30 [1] CRAN (R 4.0.2)                      
 rmarkdown     2.7     2021-02-19 [1] CRAN (R 4.0.2)                      
 rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.0.2)                      
 rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.2)                      
 rvest         1.0.0   2021-03-09 [1] CRAN (R 4.0.2)                      
 scales        1.1.1   2020-05-11 [1] CRAN (R 4.0.2)                      
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.2)                      
 snakecase     0.11.0  2019-05-25 [1] CRAN (R 4.0.2)                      
 stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)                      
 stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.0.2)                      
 testthat      3.0.2   2021-02-14 [1] CRAN (R 4.0.2)                      
 tibble      * 3.1.0   2021-02-25 [1] CRAN (R 4.0.2)                      
 tidylog     * 1.0.2   2020-07-03 [1] CRAN (R 4.0.2)                      
 tidyr       * 1.1.3   2021-03-03 [1] CRAN (R 4.0.2)                      
 tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.2)                      
 tidyverse   * 1.3.0   2019-11-21 [1] CRAN (R 4.0.2)                      
 trias       * 1.5.0   2021-03-25 [1] Github (trias-project/trias@82d1b61)
 triebeard     0.3.0   2016-08-04 [1] CRAN (R 4.0.2)                      
 urltools      1.7.3   2019-04-14 [1] CRAN (R 4.0.2)                      
 usethis       2.0.1   2021-02-10 [1] CRAN (R 4.0.2)                      
 utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.2)                      
 uuid          0.1-4   2020-02-26 [1] CRAN (R 4.0.2)                      
 vctrs         0.3.6   2020-12-17 [1] CRAN (R 4.0.2)                      
 viridisLite   0.3.0   2018-02-01 [1] CRAN (R 4.0.1)                      
 wellknown     0.7.2   2021-01-07 [1] CRAN (R 4.0.2)                      
 whisker       0.4     2019-08-28 [1] CRAN (R 4.0.2)                      
 withr         2.4.1   2021-01-26 [1] CRAN (R 4.0.2)                      
 wk            0.4.1   2021-03-16 [1] CRAN (R 4.0.2)                      
 xfun          0.22    2021-03-11 [1] CRAN (R 4.0.2)                      
 xml2          1.3.2   2020-04-23 [1] CRAN (R 4.0.2)                      
 yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.2)                      
 zip           2.1.1   2020-08-27 [1] CRAN (R 4.0.2)                      

[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Verify taxa: split info$outdated_taxa in two df

I like to list the outdated synonyms and outdated unmatched taxa separately:

verification$info$outdated_taxa %>% filter(is.na(bb_key)) %>% nrow()
verification$info$outdated_taxa %>% filter(!is.na(bb_acceptedKey)) %>% nrow()

That however returns an error if verification$info$outdated_taxa is NULL (often the case). I would therefore prefer two dataframes outdated_unmatched_taxa and outdated_synonyms:

verification$info$outdated_unmatched_taxa %>% nrow()
verification$info$outdated_synonyms %>% nrow()

Return not only a plot but also a dataframe for checklist related functions

@SanderDevisscher would like to have the possibility to use the data.frame with the data behind the ggplot graph. This is possible, indeed, although some worfklows could break due to this change.
See original question in trias-project/indicators repo: https://github.com/trias-project/indicators/issues/98

Always return plot for apply_gam

Currently returned plot is NULL when the emergence status cannot be assessed in trias::apply_gam()
For the alien species application they would like to still return a basic plot (only obs), but with a message that the emergence status cannot be assessed. Currently I've implemented a hacky solution at our side, but it might be nice if this could be incorporated in the trias package directly.

df_gam <- tibble(
   taxonKey = rep(3003709, 24),
   canonicalName = rep("Rosa glauca", 24),
   year = seq(1995, 2018),
   obs = c(
     1, 1, 0, 0, 0, 2, 0, 0, 1, 3, 1, 2, 0, 5, 0, 5, 4, 2, 1,
     1, 3, 3, 8, 10
   ),
   cobs = rep(0, 24)
 )
 # apply GAM to n without baseline as covariate
 tmpResult <- apply_gam(df_gam,
   y_var = "obs",
   eval_years = 2018,
   taxon_key = 3003709,
   name = "Rosa glauca",
   baseline_var = "cobs",
   verbose = TRUE
 )
 tmpResult$plot

# Hacky solution to create the plot anyhow
 df <- tmpResult$output
 df$lcl <- 10^11
 # gam failed
 trias:::plot_ribbon_em(df_plot = df, ptitle = "") +
   annotate("text", y = max(df$obs), x = max(df$year), hjust = 1, vjust = 1,
     label = "The emergence status \ncannot be assessed.", colour = "red")

has_distribution does not work for multiple distributions?

Function works for taxa with one distribution: https://api.gbif.org/v1/species/139334263/distributions

has_distribution(
  139334263,
  country = "CA"
)

But not for multiple distributions: https://api.gbif.org/v1/species/139334288/distributions

has_distribution(
  139334288, 
  country = "CA"
)

Error:

Error in intersect_data_frame(x, y) : 
  not compatible: Cols in y but not x: `NA`.

error produced by indicator_total_year()

In using the function indicator_total_year(), I experienced the following error:

Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn,  : 
  length of 'dimnames' [2] not equal to array extent

After looking into the source code at https://github.com/trias-project/trias/blob/master/R/indicator_total_year.R and debugging, it seems that there might be a typo at line 93:

if (nrow(filter(df, is.na(first_observed)) > 0)) {

should be adjusted to

if (nrow(filter(df, is.na(first_observed))) > 0) {
(adjust bracket position)

Could this be fixed?

Add function spread_with_duplicates

Create a function from functionality developed for the pipeline to spread a dataframe with duplicate entries:

taxon_key	type	description
1	A	R
1	B	S
1	C	T
1	C	X

With:

df %>% spread_with_duplicates(type, description)

To:

taxon_key	A	B	C
1	R	S	T
1	R	S	X

Note that the regular dplyr spread:

df %>% spread(type, description)

Would throw an error because of the duplicate entries.

Might be good to suggest this functionality to dplyr first? /cc @stijnvanhoey @damianooldoni

create function for climate matching

@damianooldoni within the RIPARIAS project @SanderDevisscher and me developed a procedure for climate matching, the idea is described in this issue and further developed in the RIPARIAS repo for crayfish and aquatic plants.

This issue is meant to make this into a function within the trias package.

Add temporal cut-off param in get_table_pathway function

Based on discussion in trias-project/indicators#19, we add a parameter to count pathways based on a temporal cut-off in get_table_pathway function.
This means:

Add new param from, default: NULL, i.e. no cut-off.
Add param year_introduction with cname of column containing year of introduction. Default: firstObserved.
Update documentation.

Return dataframe used for graph

@damianooldoni is it possible to return the summarised dataframe used for the graphs in the indicator functions ?

How to manage changes of synonym in verify_taxa()

While running pipeline for unified checklist I encountered this situation.
I have a taxon in verification table (verified_taxa) from Manual of Alien Plants: Oxalis stricta L. (key = 141266323) added in 2018-08-01. At that time it was recognized by GBIF Backbone as Oxalis stricta L. (9823072) and linked by synonym relation to Oxalis corniculata L. (8427624). However, I get now as input a data.frame checklist_taxa where the same taxon is now linked to a new GBIF Backbone key (2891666) and a new synonym, Oxalis dillenii (2891677). Notice that the GBIF Backbone scientific name didn't change.

@peterdesmet : is it possible? I find slightly strange that two GBIF backbone keys (9823072 and 2891666) share the same scientific name. Are they not same taxon? I notice that the most recent key has the (deprecated) nubKey value while the old one doesn't. As verify_taxa() has been implemented so far, it doesn't take explicitly into account changes of synonym relation, so a second row is added linking to new synonym. Is it ok? But what should happen with old synonym relation? Should it be labelled as an unused taxon?

Order of the taxa returned by verify_taxa()

As already decided, the new implementation of verify_taxa(taxa, verified_taxa) updates and returns the two input dfs.
To do it, some joins and row binding operations are performed so the order of the taxa is typically not the same as in the input dfs. I split the issue in two:

order of taxa in `taxa`

I propose to return the taxa in the same order as in the input df for better comparison.

order of taxa in `verified_taxa`

I would put outdated taxa (taxa not needed to be checked anymore) at the very end. Would we order them by date? the newest ones first? Maybe a detail, but still...

Update gbif_has_distribution() to allow NOT filters

@damianooldoni, see trias-project/unified-checklist#37: we want to update the gbif_has_distribution filter used in the unified checklist so that we can filter:

trias::gbif_has_distribution(
      taxon_key = x,
      country = "BE",
      establishmentMeans = c("INTRODUCED", "NATURALISED", "INVASIVE", "ASSISTED COLONISATION"),
      status != c("ABSENT", "EXCLUDED", "DOUBTFUL")
    )

I.e. where the filter status does NOT include ABSENT, EXCLUDED, or DOUBTFUL. Not sure how to pass such a parameter. It would be cool if we could pass on a dplyr selection, but not sure how to do that.

Maybe we should drop the use of the gbif_has_distribution() function, get all distributions, and then use dplyr filters to select those taxa that match what we need.

Refactor climate match function: use help-subfunctions

From #77:
Climate match function is way too long. Splitting in subfunctions will help readibility and debugging: future-you, the maintainer (that's me 😮 ) and anyone else trying to understand the workflow will be grateful. Notice that these subfunctions can be written in the same R file, just append them below the "main" function. For example, the leaflet part is clearly something you can put in a subfunction, isn't?

Add functions for occurrence indicators

Occurrence indicators for detecting emerging species are ready.
Functions to add:

em_status_dr
apply_gam and its help plot function plot_ribbon_em
They are both defined in chunk codes in this pipeline.

What ancillary df duplicates_taxa should contain?

In old version of verify_taxa the df output duplicates_taxa contained taxa from different checklists which with same scientific name.
In new verify_taxa(), I propose to use duplicates_taxa to collect all taxa which point to same bb_key-bb_acceptedKey.

Example. The taxa here below would be in duplicates_taxa:

taxonKey	scientificName	bb_key	bb_acceptedKey
1	A	10	15
2	B	10	15
3	C	10	15

Drawback of this implementation: taxa without match to GBIF Backbone would be not included. So the following two taxa which are very likely the same species, would be not returned in duplicates_taxa:

taxonKey	scientificName	bb_key	bb_acceptedKey
4	E	NA	NA
5	E	NA	NA

I can live with that 😄 @peterdesmet : you too? It is actually a question about the meaning we want to give to the expression "duplicates taxa". As we use now a key-triplets (taxonkey - bb_key - bb_acceptedKey) to identify unique taxa instead of names, I think using bb_key - bb_acceptedKey sounds to be the best option.

Enable custom axis labs for indicators

I've been trying to translate the axis labels of the graphs provided by this function indicator_introduction_year.R.
But since the function exports the graph as a "Large egg" I've been unable to do it.
If axis labels could be included as a variable in the function, with English as default, I and others would be able to translate the graphs to any language required.

This could be useful for other indicators as well.

add_indicator_native_range_year function

As suggested by @eadriaensen at issue #145 from reporting-rshiny-grofwildjacht I'll try to add the indicator_native_range_year function to the package.

copy script from zip provided by @eadriaensen in comment 651739495
test script in trias environment
add relative/absolute variable

changes will be made in the add_indicator_native_range_year - branch

How to make has_distribution work for data frames

I hoped I could do this:

checklist_taxa %>%
  select(key) %>%
  head() %>%
  rowwise() %>%
  mutate(distribution = has_distribution(
    key,
    countryCode = "BE",
    establishmentMeans = "INTRODUCED",
    status = "PRESENT"
  ))

Where I basically call has_distribution() for each row. But I get:

Error in mutate_impl(.data, dots) : 
  Evaluation error: Strings must match column names. Unknown columns: country, establishmentMeans, status.
In addition: Warning message:
In has_distribution(key, countryCode = "BE", establishmentMeans = "INTRODUCED",  :
  countryCode renamed to country

So it seems the function was called (otherwise I wouldn't get the Warning message), but dplyr interprets the function parameters as df columns?

Note: this functionality works for other functions (with and without rowwise()):

checklist_taxa %>%
  select(key) %>%
  head() %>%
#  rowwise() %>%
  mutate(distribution = lubridate::parse_date_time(key, orders = "dmy"))

It doesn't return any useful data, but orders is considered a parameter, and not a column.

@stijnvanhoey @damianooldoni Any idea how to enable dplyr compatibility for has_distribution?

My actual code can be found here

Error when filtering data for indicator_total_year

When making the natuurindicatoren we want to use only the data from the flemish region. e.a. create this filter: data <- data %>% filter(locality == "Flemish Region") before using the indicator_total_year function from trias@add_functions_checklist_indicators -branch.

This however prompts an error message

pls fix this

Verify taxa: ungroup() info$duplicates

So it can be grouped by user without ungrouping first.

Write tests for indicator_* functions

Writing basic unit-tests for indicator_introduction_year() and indicator_total_year() functions.

test inputs
test outputs

Testing the outputs could be not really possible as these functions return plots.

Verify taxa: return correct taxa df if input already contains verificationKey

If the input taxa already contains a verificationKey, then the function works but returns a taxa df with more (duplicated) columns:

# ... with 6,097 more rows, and 21 more variables:
#   bb_kingdom <chr>, bb_rank <chr>, bb_taxonomicStatus <chr>,
#   bb_acceptedKey <dbl>, bb_acceptedName <chr>,
#   verificationKey.x.x <chr>, verificationKey.x <lgl>,
#   taxonID <chr>, nameType <chr>, issues <chr>,
#   validDistribution <lgl>, bb_species <chr>, bb_genus <chr>,
#   bb_family <chr>, bb_order <chr>, bb_class <chr>,
#   bb_phylum <chr>, bb_speciesKey <dbl>,
#   verificationKey.y <lgl>, verificationKey.y.y <chr>,
#   verificationKey <chr>

Ideally, the df is return exactly like the input taxa, with an updated verifiedKey column or that column appended to the end.

verify_taxa: also verify genus and infraspecific taxa

Update columns

bb_acceptedKey → bb_suggestedKey
bb_acceptedName → bb_suggestedName
bb_acceptedKingdom → bb_suggestedKingdom
bb_acceptedRank → bb_suggestedRank
bb_acceptedTaxonomicStatus → bb_suggestedTaxonomicStatus

Processing

Not in backbone

Same as before

Leave suggested_ fields empty

Genera

New

Leave suggested_ fields empty
If need be, verifier can add multiple species keys to verifiedKey

Infraspecific ranks

New

For all taxonomicStatus!!
Lookup info via speciesKey*
Populate suggested_ fields with species parent

Note: unfortunately not all SYNONYM or even ACCEPTED taxa have this, e.g. https://api.gbif.org/v1/species/7707872 (maybe due to NAME_PARENT_MISMATCH). We will have to manually add the correct species keys in verificationKey for those.

Synonyms species

Same as before

Lookup info via acceptedKey
Populated suggested_ fields with accepted taxon

Accepted species

Same as before

Add option "origin" to get_taxa()

Problem described at ropensci/rgbif#288

If tackled at rgbif (preferred)

Add option origin to get_taxa()
Directly transfer to name_usage() to filter

If not tackled at rgbif

Add option origin to get_taxa()
Get results
Filter on data on origin

Solve failures automatic tests

Some unit tests failures popped up while @SanderDevisscher merged #88 to main. Nothing dramatic, but it has to be fixed.

Expand climate matching function to use additional datasources

https://bison.usgs.gov

Expand climate matching function with single species maps

The idea is to expand the function from #73 with single_species_maps as a new item in the functions return list.
These maps should be made as a sublist of taxonkeys containing a leaflet with all scenarios as basegroups.

so the final output of this expansion should be callable as ouput$single_species_maps$taxonkey_1

Expand climate matching function to display native ranges

The idea is to add a layer to display the native ranges of species to the ouput maps.

Function has_distribution()

has_distribution <- function(taxon_key, ...)

Parameters

taxon_key: single taxon key (numeric or character)

Optional parameters

These GBIF distribution properties and their API synonyms. They have to be passed as single values (if single) or vectors.
- countryCode = country
- occurrenceStatus = status
- establishmentMeans
If any property passed to function does not exist: assert error
Most of the distribution parameters can take many inputs, and treated as ’OR’ (e.g., a or b or c)

Example:

has_distribution(134086855, countryCode = c("BE", "NL", "LU"), 
                          establishmentMeans  = "INTRODUCED", status = c("PRESENT", "DOUBTFUL"))

Return

A logical (TRUE or FALSE)

From the example above, return TRUE if the taxon 134086855 has at least one distribution with:

BE, NL OR LU in field countryCode AND
INTRODUCED in field establishmentMeans AND
PRESENT OR DOUBTFUL in field occurrenceStatus

Documentation

Document all above as succinct as possible with roxygen

Rename functions

I would consider renaming the functions:

get_taxa → get_gbif_taxa()
has_distribution() → has_gbif_distribution()

As we might have non GBIF related functions. For easier grouping, we could also consider:

gbif_get_taxa()
gbif_has_distribution()

@damianooldoni @stijnvanhoey thoughts?

Drop input_country from update_download_list + parse DOI

The gbif download list currently has a field input_country. However, there are more filters that we might set than countries, e.g. not accepting occurrences with issues or searching within a specific date range. Because:

We can't capture all of this in the file
Just listing countries gives a false impression of the actual filters
The download page itself lists all filters

I would drop countries from the function as a parameter and the output. I would also drop the column from the tsv file.

Note 1: simply not providing the parameter is not an option, as it is required.

Note 2: the specific taxa list we used is still very useful though, so input_checklist should be kept.

To provide a better link to the download page, I would parse the gbif_download_doi within the function and prepend it with https://doi.org/ (so 10.15468/dl.6cljf9 becomes https://doi.org/10.15468/dl.6cljf9). This only needs to be done for new lines (so no need to check this for all previous downloads).

Failed to install 'trias' from GitHub

platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          6.1                         
year           2019                        
month          07                          
day            05                          
svn rev        76782                       
language       R                           
version.string R version 3.6.1 (2019-07-05)
nickname       Action of the Toes

I'm having problems installing the trias package. Any idea what I should do?

The error message is...

package ‘rlang’ successfully unpacked and MD5 sums checked
Error: Failed to install 'trias' from GitHub:
  (converted from warning) cannot remove prior installation of package ‘rlang’

Error when using Trias to create new Indicators

When rendering the new ias pathways indicator for the INBO website (see #34 I get the following error message:

Error: package or namespace load failed for 'trias':
 object 'all_of' is not exported by 'namespace:dplyr'

I think this indicates the renv(ironment) used by the indicators uses an earlier version of dplyr (0.8.5.) which does not include the 'all_of' - function. Can someone look into the minimum dplyr version required to use this function ? and maybe add it to the DESCRIPTION - file ?

Add option to specify column names for input dfs of verify_taxa

Up to now the column names of input dfs taxa and verification are hard coded. This should not be the case. We should give the user the possibility to specify own column names. We will use the present column names as default values.

Write functions for pathways graphs

After videomeeting with @timadriaens, I will write the following functions:

visualize_pathways_level1(df, category = NULL): returns a bar graph. X: pathway_level1, Y: number of introduced taxa. As for get_table_pathways() function producing tables, we can specify the category: one of c("Plantae", "Animalia", "Fungi", "Chromista", "Archaea", "Bacteria", "Protozoa", "Viruses", "incertae sedis", "Chordata", "Not Chordata")
visualize_pathways_level2(df, pathway_level1, category = NULL): returns a bar graph. X: pathway_level2, Y: number of introduced taxa. pathway_level1 one of c("contaminant", "escape", "release", "corridor", "natural_dispersal", "unaided", "stowaway", "unknown")
visualize_pathways_year_level1(df, bin = 10, cut_off = 1950, category = NULL): returns a line plot. X: year grouped in bins of width equal to bin (default 10 years) starting from cut_off. All taxa introduced before cut_off year are count together (default 1950). Y: number of introduced taxa
visualize_pathways_year_level2(df, pathway_level1, bin = 10, cut_off = 1950, category = NULL): returns a line plot as in 3. pathway_level1 must be specified as in 2.

Reminders:

Set warning if taxa are removed due to missing info about year of introduction (similar to indicator_*_year() functions)
Write unit-tests
Check whether flipping the bar plots as suggested by @peterdesmet in trias-project/indicators#75 (comment) helps readibility.

trias-project / trias Goto Github PK

trias's Introduction

trias

Installation

Meta

trias's People

Contributors

Stargazers

Watchers

Forkers

trias's Issues

order of taxa in taxa

order of taxa in verified_taxa

Update columns

Processing

Not in backbone

Genera

Infraspecific ranks

Synonyms species

Accepted species

If tackled at rgbif (preferred)

If not tackled at rgbif

Parameters

Optional parameters

Return

Documentation

Recommend Projects

Recommend Topics

Recommend Org

order of taxa in `taxa`

order of taxa in `verified_taxa`