Giter Site home page Giter Site logo

trias-project / trias Goto Github PK

View Code? Open in Web Editor NEW
3.0 5.0 1.0 136.41 MB

R package with functionality for TrIAS and LIFE RIPARIAS

Home Page: https://trias-project.github.io/trias

License: MIT License

R 100.00%
r r-package rstats oscibio invasive-species

trias's Introduction

trias

CRAN status R-CMD-check codecov repo status

Trias is an R package providing functionality for the Tracking Invasive Alien Species (TrIAS) and LIFE RIPARIAS projects.

To get started, see:

Installation

You can install the development version of trias from GitHub with:

# install.packages("devtools")
devtools::install_github("trias-project/trias")

Meta

  • We welcome contributions including bug reports.
  • License: MIT
  • Get citation information for trias in R doing citation("trias").
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

trias's People

Contributors

actions-user avatar damianooldoni avatar mvarewyck avatar peterdesmet avatar pietrh avatar sanderdevisscher avatar soriadelva avatar stijnvanhoey avatar yasmine-verzelen avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

mielhostens

trias's Issues

Specify type columns occurrence data beforehand

Importing (big) occurrence downloads in R means fighting constantly against parsing failures. This is due to the fact that some fields have NAs in the first rows, sometimes hundreds of thousands.
One trick is to increase the number of rows R uses to guess type (parameter guess_max in read_delim() function). However, if the number of rows with NA is very high, parsing failures have to be solved by defining the type you expect to get. Doing it everytime for each file is time consumming. My idea is to write the specifications for each file occurrence data field. They are 237, as far my experience with occurrence downloads says to me. I already made a list of almost 90 fields few days ago. I put them together in a gist: https://gist.github.com/damianooldoni/01da78e5e55617798804db1804434754. I know, it's boring (very boring!) but it saves time in the future.
@peterdesmet : What do you think about putting it in trias package?

Issues returned as codes by name_usage()

The last version of name_usage() solves the problem of returning more rows than taxa (see issue ropensci/rgbif#324). However, the field issues doesn't contain the issue names, but the codes which are not easy to interpret without using the table returned by rgbif::name_issues().
My proposal is to use the pipes in my gist damianooldoni/get_df_with_issues in our get_taxa.Rmd pipeline to substitute the codes with the corresponding issues.
@peterdesmet : what do you think about? Should we code this step in a trias function? I don't think so, at least for the moment.

Check correctness GAM output

I am writing unit-tests for the in-development function apply_gam, based on function defined in modelling pipeline, which on its turn has been developed from a function of @ToonVanDaele.

@ToonVanDaele: could you please check basic GAM of this dummy data, please?

# dummy data
df_gam <- data.frame(
  taxonKey = rep(2224970, 19),
  canonicalName = rep("Palaemon macrodactylus", 19),
  year = seq(2001, 2019),
  n_observations = c(1, 5, 8, 12, 18, 23, 30, 40, 60, 40, 20, 10, 1, 
                     3, 10, 20, 35, 50, 80),
  stringsAsFactors = FALSE
)

# define evaluation year(s)
evaluation_year <- 2018

# apply function without any baseline correction
basic_gam <- apply_gam(df = df_gam,
                       y_var = "n_observations",
                       eval_years = evaluation_year)

I get these values for the minimal guaranteed growth, i.e. model$family$linkinv(lower):

> basic_gam$output$growth
 [1] 1.7777714 1.6977409 1.6088981 1.4940603 1.3330764 1.1393911 0.9689625 0.8451249
 [9] 0.7635014 0.7126579 0.6854127 0.6902043 0.7421756 0.8565297 1.0434188 1.2814552
[17] 1.5424851 1.8369261 2.1775717

which I find strange if we see the output graph:
image

I would expect a negative first derivative when not emerging (green dots) and a negative lower CI value of it as well. @ToonVanDaele : what do you think about? Do you find it strange as well?
If yes, could you please check results using your code?
If no, could you explain me why is it not strange?

Better cartographic image of climate zones at risk of invasion

as proposed by @timadriaens

Having discussed with @SanderDevisscher and looked at the results of the bulbul climate matching, we notice that the maps produced are not very clear nor useful to provide an idea to a risk assessor about areas at risk. The reason is that the %overlap (perc_climate) is calculated on the entire global dataset and many records do not "hit" a climate classification that occurs in the risk assessment area (Europe). Consequently, when making the legend, the 'suitability' is extremely low overall.

To better visualize the areas at risk in Europe, we propose to calculate the % overlap using only the records that intersect in a climate classification occurring in Europe. Maybe we could also think about an automated way of reclassification (highly suitable, moderately suitable, low suitability).

=> I'll add this "subset" as a new type of map output based on the code below, the cm output and the single species maps
https://github.com/inbo/riparias-prep/blob/5c3039199c6104a30adf0a382651e16fe2c202b3/scripts/bul_bul_cm.Rmd#L89-L156

this output has to be NULL when maps == FALSE and/or when missing(region)

n_totaal, and consequently the perc_climate, needs to be recalculated to only take observations in climate zones present in the region of intrest into account.

the new outputs shall be:

  • region_cm: the recalculated cm
  • region_single_species_maps: The visualized region_cm with bbox around the region_shape

Number of introduced taxa in graphs is way larger than expected

While tackling #54, I found that the cumulative plot shows more than 8000 alien taxa. This is not true and it is due to the fact that we tidied data on all description fields. This was not the case when we first developed the graph functions indicator_introduction_year() and indicator_total_year().

I solve it now together with #54.

Add vignette about climate match function

From #77: about the entire workflow behind this function, I really suggest to write a vignettte about. I mean, it's a pity the schema you drew in #77 (comment) would be "lost" in a PR. I can help providing you the yml structure and templates.

Errors while running climate matching

This line of code is the probable cause of the error below:

if(length(taxon_key) == 1 & is.na(taxon_key)){

Error in if (length(taxon_key) == 1 & is.na(taxon_key)) { :
the condition has length > 1

this is an other error I get:

Error:
! Can't subset .data outside of a data mask context.

the error only occurs when maps == TRUE
this is the rlang::last_error() printout:

Backtrace:

  1. trias::climate_match(...)
  2. leaflet::addCircleMarkers(...)
  3. leaflet::invokeMethod(...)
  4. leaflet::evalFormula(list(...), data)
  5. leaflet evalAll(list)
  6. base::lapply(x, evalAll)
  7. leaflet FUN(X[[i]], ...)
  8. leaflet:::resolveFormula(x, data)
  9. [ base::eval(...) ] with 1 more call
    Run rlang::last_trace() to see the full context.

and the last_trace:

  1. ├─trias::climate_match(...)
  2. │ └─... %>% ... at trias/R/climate_match.R:491:4
  3. ├─leaflet::addPolygons(., data = sea, fillColor = "#e0e0e0", weight = 0.5)
  4. │ ├─... %>% expandLimitsBbox(pgons)
  5. │ └─leaflet::invokeMethod(...)
  6. │ └─leaflet::dispatch(...)
  7. ├─leaflet::expandLimitsBbox(., pgons)
  8. │ └─leaflet::expandLimits(map, bbox[2, ], bbox[1, ])
  9. ├─leaflet::addLayersControl(., baseGroups = ~.data$acceptedScientificName)
  10. │ ├─leaflet::invokeMethod(...)
  11. │ │ └─crosstalk::is.SharedData(data)
  12. │ └─leaflet::getMapData(map)
  13. ├─leaflet::addLegend(...)
  14. │ ├─leaflet::invokeMethod(map, data, "addLegend", legend)
  15. │ │ └─crosstalk::is.SharedData(data)
  16. │ └─leaflet::getMapData(map)
  17. ├─leaflet::addLegend(...)
  18. │ ├─leaflet::invokeMethod(map, data, "addLegend", legend)
  19. │ │ └─crosstalk::is.SharedData(data)
  20. │ └─leaflet::getMapData(map)
  21. ├─leaflet::addCircleMarkers(...)
  22. │ ├─... %>% expandLimits(pts$lat, pts$lng)
  23. │ └─leaflet::invokeMethod(...)
  24. │ └─leaflet::evalFormula(list(...), data)
  25. │ └─leaflet evalAll(list)
  26. │ └─base::lapply(x, evalAll)
  27. │ └─leaflet FUN(X[[i]], ...)
  28. │ └─leaflet:::resolveFormula(x, data)
  29. │ └─base::eval(f[[2]], metaData(data), environment(f))
  30. │ └─base::eval(f[[2]], metaData(data), environment(f))
  31. │ ├─acceptedScientificName
  32. │ └─rlang:::$.rlang_fake_data_pronoun(.data, acceptedScientificName) #=> suspected cause of error
  33. │ └─rlang:::stop_fake_data_subset(call)
  34. │ └─rlang::abort(...)
  35. └─leaflet::expandLimits(., pts$lat, pts$lng)

How verify_taxa() handles taxa which don't need manual verification

Three options:

  1. verify_taxa() doesn't accepts taxa with bb_taxonomicStatus equal to ACCEPTED or DOUBTFUL, so throwing an error. In other words, the function doesn't accept taxa which don't need verification.
  2. verify_taxa() throws a warning saying: n taxa do not need verification. They will be removed. Taxon keys: and a list of related taxon keys follows.
  3. verify_taxa() doesn't throw any warning. It just filter these taxa out.
  4. verify_taxa() adds these taxa to output as well with verification_key = bb_taxonKey.

I would like to ask you, @peterdesmet, which option do you prefer?
I implemented option 2 in order to be flexible, but at the same time informative.
Let me know your meaning about. Thanks.

problems with installation of trias package

While trying to install the trias package, I got this error message in return:

ERROR: dependencies 'dplyr', 'readr', 'rgbif', 'rlang', 'tibble', 'tidyr', 'tidyselect' are not available for package 'trias' removing 'C:/R/Library/trias' In R CMD INSTALL

I used this code to install the package:

devtools::install_github("trias-project/trias")

verify_taxa: error

When running this:

verification <- trias::verify_taxa(taxa, verification)

I get:

Check input dataframes...DONE.
Column verificationKey already exists. It will be overwritten.
Assign verificationKey to taxa which don't need verification...DONE.
Find new synonyms...DONE.
Find new unmatched taxa...DONE.
Update backbone scientific names...DONE.
Update backbone accepted names...DONE.
Retrieve backbone info about accepted taxa for synonyms...DONE.
Detect outdated data...DONE.
Check verification keys...
Quitting from lines 466-467 (unified-checklist.Rmd) 
Error: Column names `7820753`, `2287073`, `2702816`, `9393876`, `2702820`, and 53 more must not be duplicated.
Use .name_repair to specify repair.
Backtrace:
     █
  1. ├─rmarkdown::render_site(encoding = "UTF-8")
  2. │ └─generator$render(...)
  3. │   ├─xfun::in_dir(...)
  4. │   └─bookdown:::render_book_script(output_format, envir, quiet)
  5. │     └─bookdown::render_book(...)
  6. │       └─bookdown:::render_cur_session(...)
  7. │         └─rmarkdown::render(main, output_format, ..., clean = clean, envir = envir)
  8. │           └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
  9. │             └─knitr:::process_file(text, output)
 10. │               ├─base::withCallingHandlers(...)
 11. │               ├─knitr:::process_group(group)
 12. │               └─knitr:::process_group.block(group)
 13. │                 └─knitr:::call_block(x)
 14. │                   └─knitr:::block_exec(params)
 15. │                     ├─knitr:::in_dir(...)
 16. │                     └─knitr:::evaluate(...)
 17. │                       └─evaluate::evaluate(...)
 18. │                         └─evaluate:::evaluate_call(...)
 19. │                           ├─evaluate:::timing_fn(...)
 20. │                           ├─base:::handle(...)
 21. │                           ├─base::withCallingHandlers(...)
 22. │                           ├─base::withVisible(eval(expr, envir, enclos))
 23. │                           └─base::eval(expr, envir, enclos)
 24. │                             └─base::eval(expr, envir, enclos)
 25. └─trias::verify_taxa(taxa, verification)
 26.   └─trias::gbif_verify_keys(verification_keys)
 27.     └─purrr::map_df(gbif_info, ~is.character(.) == FALSE)
 28.       └─dplyr::bind_rows(res, .id = .id)
 29.         ├─tibble::as_tibble(dots)
 30.         └─tibble:::as_tibble.list(dots)
 31.           └─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
 32.             └─tibble:::set_repaired_names(x, repair_hint = TRUE, .name_repair)
 33.               ├─rlang::set_names(...)
 34.               └─tibble:::repaired_names(...)
 35.                 ├─tibble:::subclass_name_repair_errors(...)
 36.                 │ └─base::withCallingHandlers(...)
 37.                 └─vctrs::vec_as_names(...)
 38.                   └─(function () ...
 39.                     └─vctrs:::validate_unique(names = names, arg = arg)
 40.                       └─vctrs:::stop_names_must_be_unique(names, arg)
 41.                         └─vctrs:::stop_names(...)
 42.                           └─vctrs:::stop_vctrs(class = c(class, "vctrs_error_names"), ...)
Warning messages:
1: `progress_estimated()` was deprecated in dplyr 1.0.0. 
2: `progress_estimated()` was deprecated in dplyr 1.0.0. 
Execution halted

Exited with status 1.
Session Info
Session info ───────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.2 (2020-06-22)
 os       macOS Mojave 10.14.6        
 system   x86_64, darwin17.0          
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Brussels             
 date     2021-03-26Packages ───────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source                              
 assertable    0.2.8   2021-01-27 [1] CRAN (R 4.0.2)                      
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.2)                      
 backports     1.2.1   2020-12-09 [1] CRAN (R 4.0.2)                      
 bookdown      0.21    2020-10-13 [1] CRAN (R 4.0.2)                      
 broom         0.7.5   2021-02-19 [1] CRAN (R 4.0.2)                      
 cachem        1.0.4   2021-02-13 [1] CRAN (R 4.0.2)                      
 callr         3.5.1   2020-10-13 [1] CRAN (R 4.0.2)                      
 cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.0.2)                      
 cli           2.3.1   2021-02-23 [1] CRAN (R 4.0.2)                      
 clisymbols    1.2.0   2017-05-21 [1] CRAN (R 4.0.2)                      
 colorspace    2.0-0   2020-11-11 [1] CRAN (R 4.0.2)                      
 conditionz    0.1.0   2019-04-24 [1] CRAN (R 4.0.2)                      
 cowplot       1.1.1   2020-12-30 [1] CRAN (R 4.0.2)                      
 crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.2)                      
 crul          1.1.0   2021-02-15 [1] CRAN (R 4.0.2)                      
 curl          4.3     2019-12-02 [1] CRAN (R 4.0.1)                      
 data.table    1.14.0  2021-02-21 [1] CRAN (R 4.0.2)                      
 DBI           1.1.1   2021-01-15 [1] CRAN (R 4.0.2)                      
 dbplyr        2.1.0   2021-02-03 [1] CRAN (R 4.0.2)                      
 desc          1.3.0   2021-03-05 [1] CRAN (R 4.0.2)                      
 devtools      2.3.2   2020-09-18 [1] CRAN (R 4.0.2)                      
 digest      * 0.6.27  2020-10-24 [1] CRAN (R 4.0.2)                      
 dplyr       * 1.0.5   2021-03-05 [1] CRAN (R 4.0.2)                      
 egg           0.4.5   2019-07-13 [1] CRAN (R 4.0.2)                      
 ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.2)                      
 evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.1)                      
 fansi         0.4.2   2021-01-15 [1] CRAN (R 4.0.2)                      
 fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.0.2)                      
 forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.0.2)                      
 fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)                      
 generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.2)                      
 ggplot2     * 3.3.3   2020-12-30 [1] CRAN (R 4.0.2)                      
 glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)                      
 gratia        0.5.1   2021-01-24 [1] CRAN (R 4.0.2)                      
 gridExtra     2.3     2017-09-09 [1] CRAN (R 4.0.2)                      
 gtable        0.3.0   2019-03-25 [1] CRAN (R 4.0.2)                      
 haven         2.3.1   2020-06-01 [1] CRAN (R 4.0.2)                      
 here        * 1.0.1   2020-12-13 [1] CRAN (R 4.0.2)                      
 hms           1.0.0   2021-01-13 [1] CRAN (R 4.0.2)                      
 htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)                      
 htmlwidgets   1.5.3   2020-12-10 [1] CRAN (R 4.0.2)                      
 httpcode      0.3.0   2020-04-10 [1] CRAN (R 4.0.2)                      
 httr          1.4.2   2020-07-20 [1] CRAN (R 4.0.2)                      
 janitor     * 2.1.0   2021-01-05 [1] CRAN (R 4.0.2)                      
 jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.0.2)                      
 knitr         1.31    2021-01-27 [1] CRAN (R 4.0.2)                      
 lattice       0.20-41 2020-04-02 [1] CRAN (R 4.0.2)                      
 lazyeval      0.2.2   2019-03-15 [1] CRAN (R 4.0.2)                      
 lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.2)                      
 lubridate     1.7.10  2021-02-26 [1] CRAN (R 4.0.2)                      
 magrittr    * 2.0.1   2020-11-17 [1] CRAN (R 4.0.2)                      
 Matrix        1.3-2   2021-01-06 [1] CRAN (R 4.0.2)                      
 memoise       2.0.0   2021-01-26 [1] CRAN (R 4.0.2)                      
 mgcv          1.8-34  2021-02-16 [1] CRAN (R 4.0.2)                      
 modelr        0.1.8   2020-05-19 [1] CRAN (R 4.0.2)                      
 munsell       0.5.0   2018-06-12 [1] CRAN (R 4.0.2)                      
 mvnfast       0.2.5.1 2020-10-14 [1] CRAN (R 4.0.2)                      
 nlme          3.1-152 2021-02-04 [1] CRAN (R 4.0.2)                      
 oai           0.3.0   2019-09-07 [1] CRAN (R 4.0.2)                      
 openxlsx    * 4.2.3   2020-10-27 [1] CRAN (R 4.0.2)                      
 patchwork     1.1.1   2020-12-17 [1] CRAN (R 4.0.2)                      
 pillar        1.5.1   2021-03-05 [1] CRAN (R 4.0.2)                      
 pkgbuild      1.2.0   2020-12-15 [1] CRAN (R 4.0.2)                      
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.2)                      
 pkgload       1.2.0   2021-02-23 [1] CRAN (R 4.0.2)                      
 plotly        4.9.3   2021-01-10 [1] CRAN (R 4.0.2)                      
 plyr          1.8.6   2020-03-03 [1] CRAN (R 4.0.2)                      
 prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.2)                      
 processx      3.5.0   2021-03-23 [1] CRAN (R 4.0.2)                      
 progress    * 1.2.2   2019-05-16 [1] CRAN (R 4.0.2)                      
 ps            1.6.0   2021-02-28 [1] CRAN (R 4.0.2)                      
 purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.0.2)                      
 R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.2)                      
 Rcpp          1.0.6   2021-01-15 [1] CRAN (R 4.0.2)                      
 readr       * 1.4.0   2020-10-05 [1] CRAN (R 4.0.2)                      
 readxl        1.3.1   2019-03-13 [1] CRAN (R 4.0.2)                      
 remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.2)                      
 reprex        1.0.0   2021-01-27 [1] CRAN (R 4.0.2)                      
 reshape2      1.4.4   2020-04-09 [1] CRAN (R 4.0.2)                      
 rgbif       * 3.5.2   2021-01-27 [1] CRAN (R 4.0.2)                      
 rlang         0.4.10  2020-12-30 [1] CRAN (R 4.0.2)                      
 rmarkdown     2.7     2021-02-19 [1] CRAN (R 4.0.2)                      
 rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.0.2)                      
 rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.2)                      
 rvest         1.0.0   2021-03-09 [1] CRAN (R 4.0.2)                      
 scales        1.1.1   2020-05-11 [1] CRAN (R 4.0.2)                      
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.2)                      
 snakecase     0.11.0  2019-05-25 [1] CRAN (R 4.0.2)                      
 stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)                      
 stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.0.2)                      
 testthat      3.0.2   2021-02-14 [1] CRAN (R 4.0.2)                      
 tibble      * 3.1.0   2021-02-25 [1] CRAN (R 4.0.2)                      
 tidylog     * 1.0.2   2020-07-03 [1] CRAN (R 4.0.2)                      
 tidyr       * 1.1.3   2021-03-03 [1] CRAN (R 4.0.2)                      
 tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.2)                      
 tidyverse   * 1.3.0   2019-11-21 [1] CRAN (R 4.0.2)                      
 trias       * 1.5.0   2021-03-25 [1] Github (trias-project/trias@82d1b61)
 triebeard     0.3.0   2016-08-04 [1] CRAN (R 4.0.2)                      
 urltools      1.7.3   2019-04-14 [1] CRAN (R 4.0.2)                      
 usethis       2.0.1   2021-02-10 [1] CRAN (R 4.0.2)                      
 utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.2)                      
 uuid          0.1-4   2020-02-26 [1] CRAN (R 4.0.2)                      
 vctrs         0.3.6   2020-12-17 [1] CRAN (R 4.0.2)                      
 viridisLite   0.3.0   2018-02-01 [1] CRAN (R 4.0.1)                      
 wellknown     0.7.2   2021-01-07 [1] CRAN (R 4.0.2)                      
 whisker       0.4     2019-08-28 [1] CRAN (R 4.0.2)                      
 withr         2.4.1   2021-01-26 [1] CRAN (R 4.0.2)                      
 wk            0.4.1   2021-03-16 [1] CRAN (R 4.0.2)                      
 xfun          0.22    2021-03-11 [1] CRAN (R 4.0.2)                      
 xml2          1.3.2   2020-04-23 [1] CRAN (R 4.0.2)                      
 yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.2)                      
 zip           2.1.1   2020-08-27 [1] CRAN (R 4.0.2)                      

[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Verify taxa: split info$outdated_taxa in two df

I like to list the outdated synonyms and outdated unmatched taxa separately:

verification$info$outdated_taxa %>% filter(is.na(bb_key)) %>% nrow()
verification$info$outdated_taxa %>% filter(!is.na(bb_acceptedKey)) %>% nrow()

That however returns an error if verification$info$outdated_taxa is NULL (often the case). I would therefore prefer two dataframes outdated_unmatched_taxa and outdated_synonyms:

verification$info$outdated_unmatched_taxa %>% nrow()
verification$info$outdated_synonyms %>% nrow()

Always return plot for apply_gam

Currently returned plot is NULL when the emergence status cannot be assessed in trias::apply_gam()
For the alien species application they would like to still return a basic plot (only obs), but with a message that the emergence status cannot be assessed. Currently I've implemented a hacky solution at our side, but it might be nice if this could be incorporated in the trias package directly.

df_gam <- tibble(
   taxonKey = rep(3003709, 24),
   canonicalName = rep("Rosa glauca", 24),
   year = seq(1995, 2018),
   obs = c(
     1, 1, 0, 0, 0, 2, 0, 0, 1, 3, 1, 2, 0, 5, 0, 5, 4, 2, 1,
     1, 3, 3, 8, 10
   ),
   cobs = rep(0, 24)
 )
 # apply GAM to n without baseline as covariate
 tmpResult <- apply_gam(df_gam,
   y_var = "obs",
   eval_years = 2018,
   taxon_key = 3003709,
   name = "Rosa glauca",
   baseline_var = "cobs",
   verbose = TRUE
 )
 tmpResult$plot

# Hacky solution to create the plot anyhow
 df <- tmpResult$output
 df$lcl <- 10^11
 # gam failed
 trias:::plot_ribbon_em(df_plot = df, ptitle = "") +
   annotate("text", y = max(df$obs), x = max(df$year), hjust = 1, vjust = 1,
     label = "The emergence status \ncannot be assessed.", colour = "red")

error produced by indicator_total_year()

In using the function indicator_total_year(), I experienced the following error:

Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn,  : 
  length of 'dimnames' [2] not equal to array extent

After looking into the source code at https://github.com/trias-project/trias/blob/master/R/indicator_total_year.R and debugging, it seems that there might be a typo at line 93:

if (nrow(filter(df, is.na(first_observed)) > 0)) {

should be adjusted to

if (nrow(filter(df, is.na(first_observed))) > 0) {
(adjust bracket position)

Related to inbo/reporting-rshiny-grofwildjacht#146

Could this be fixed?

Add function spread_with_duplicates

Create a function from functionality developed for the pipeline to spread a dataframe with duplicate entries:

taxon_key type description
1 A R
1 B S
1 C T
1 C X

With:

df %>% spread_with_duplicates(type, description)

To:

taxon_key A B C
1 R S T
1 R S X

Note that the regular dplyr spread:

df %>% spread(type, description)

Would throw an error because of the duplicate entries.


Might be good to suggest this functionality to dplyr first? /cc @stijnvanhoey @damianooldoni

How to manage changes of synonym in verify_taxa()

While running pipeline for unified checklist I encountered this situation.
I have a taxon in verification table (verified_taxa) from Manual of Alien Plants: Oxalis stricta L. (key = 141266323) added in 2018-08-01. At that time it was recognized by GBIF Backbone as Oxalis stricta L. (9823072) and linked by synonym relation to Oxalis corniculata L. (8427624). However, I get now as input a data.frame checklist_taxa where the same taxon is now linked to a new GBIF Backbone key (2891666) and a new synonym, Oxalis dillenii (2891677). Notice that the GBIF Backbone scientific name didn't change.

@peterdesmet : is it possible? I find slightly strange that two GBIF backbone keys (9823072 and 2891666) share the same scientific name. Are they not same taxon? I notice that the most recent key has the (deprecated) nubKey value while the old one doesn't. As verify_taxa() has been implemented so far, it doesn't take explicitly into account changes of synonym relation, so a second row is added linking to new synonym. Is it ok? But what should happen with old synonym relation? Should it be labelled as an unused taxon?

Order of the taxa returned by verify_taxa()

As already decided, the new implementation of verify_taxa(taxa, verified_taxa) updates and returns the two input dfs.
To do it, some joins and row binding operations are performed so the order of the taxa is typically not the same as in the input dfs. I split the issue in two:

order of taxa in taxa

I propose to return the taxa in the same order as in the input df for better comparison.

order of taxa in verified_taxa

I would put outdated taxa (taxa not needed to be checked anymore) at the very end. Would we order them by date? the newest ones first? Maybe a detail, but still...

Update gbif_has_distribution() to allow NOT filters

@damianooldoni, see trias-project/unified-checklist#37: we want to update the gbif_has_distribution filter used in the unified checklist so that we can filter:

trias::gbif_has_distribution(
      taxon_key = x,
      country = "BE",
      establishmentMeans = c("INTRODUCED", "NATURALISED", "INVASIVE", "ASSISTED COLONISATION"),
      status != c("ABSENT", "EXCLUDED", "DOUBTFUL")
    )

I.e. where the filter status does NOT include ABSENT, EXCLUDED, or DOUBTFUL. Not sure how to pass such a parameter. It would be cool if we could pass on a dplyr selection, but not sure how to do that.

Maybe we should drop the use of the gbif_has_distribution() function, get all distributions, and then use dplyr filters to select those taxa that match what we need.

Refactor climate match function: use help-subfunctions

From #77:
Climate match function is way too long. Splitting in subfunctions will help readibility and debugging: future-you, the maintainer (that's me 😮 ) and anyone else trying to understand the workflow will be grateful. Notice that these subfunctions can be written in the same R file, just append them below the "main" function. For example, the leaflet part is clearly something you can put in a subfunction, isn't?

What ancillary df duplicates_taxa should contain?

In old version of verify_taxa the df output duplicates_taxa contained taxa from different checklists which with same scientific name.
In new verify_taxa(), I propose to use duplicates_taxa to collect all taxa which point to same bb_key-bb_acceptedKey.

Example. The taxa here below would be in duplicates_taxa:

taxonKey scientificName bb_key bb_acceptedKey
1 A 10 15
2 B 10 15
3 C 10 15

Drawback of this implementation: taxa without match to GBIF Backbone would be not included. So the following two taxa which are very likely the same species, would be not returned in duplicates_taxa:

taxonKey scientificName bb_key bb_acceptedKey
4 E NA NA
5 E NA NA

I can live with that 😄 @peterdesmet : you too? It is actually a question about the meaning we want to give to the expression "duplicates taxa". As we use now a key-triplets (taxonkey - bb_key - bb_acceptedKey) to identify unique taxa instead of names, I think using bb_key - bb_acceptedKey sounds to be the best option.

Enable custom axis labs for indicators

I've been trying to translate the axis labels of the graphs provided by this function indicator_introduction_year.R.
But since the function exports the graph as a "Large egg" I've been unable to do it.
If axis labels could be included as a variable in the function, with English as default, I and others would be able to translate the graphs to any language required.

This could be useful for other indicators as well.

How to make has_distribution work for data frames

I hoped I could do this:

checklist_taxa %>%
  select(key) %>%
  head() %>%
  rowwise() %>%
  mutate(distribution = has_distribution(
    key,
    countryCode = "BE",
    establishmentMeans = "INTRODUCED",
    status = "PRESENT"
  ))

Where I basically call has_distribution() for each row. But I get:

Error in mutate_impl(.data, dots) : 
  Evaluation error: Strings must match column names. Unknown columns: country, establishmentMeans, status.
In addition: Warning message:
In has_distribution(key, countryCode = "BE", establishmentMeans = "INTRODUCED",  :
  countryCode renamed to country

So it seems the function was called (otherwise I wouldn't get the Warning message), but dplyr interprets the function parameters as df columns?

Note: this functionality works for other functions (with and without rowwise()):

checklist_taxa %>%
  select(key) %>%
  head() %>%
#  rowwise() %>%
  mutate(distribution = lubridate::parse_date_time(key, orders = "dmy"))

It doesn't return any useful data, but orders is considered a parameter, and not a column.

@stijnvanhoey @damianooldoni Any idea how to enable dplyr compatibility for has_distribution?

My actual code can be found here

Error when filtering data for indicator_total_year

When making the natuurindicatoren we want to use only the data from the flemish region. e.a. create this filter: data <- data %>% filter(locality == "Flemish Region") before using the indicator_total_year function from trias@add_functions_checklist_indicators -branch.

This however prompts an error message

pls fix this

Write tests for indicator_* functions

Writing basic unit-tests for indicator_introduction_year() and indicator_total_year() functions.

  • test inputs
  • test outputs

Testing the outputs could be not really possible as these functions return plots.

Verify taxa: return correct taxa df if input already contains verificationKey

If the input taxa already contains a verificationKey, then the function works but returns a taxa df with more (duplicated) columns:

# ... with 6,097 more rows, and 21 more variables:
#   bb_kingdom <chr>, bb_rank <chr>, bb_taxonomicStatus <chr>,
#   bb_acceptedKey <dbl>, bb_acceptedName <chr>,
#   verificationKey.x.x <chr>, verificationKey.x <lgl>,
#   taxonID <chr>, nameType <chr>, issues <chr>,
#   validDistribution <lgl>, bb_species <chr>, bb_genus <chr>,
#   bb_family <chr>, bb_order <chr>, bb_class <chr>,
#   bb_phylum <chr>, bb_speciesKey <dbl>,
#   verificationKey.y <lgl>, verificationKey.y.y <chr>,
#   verificationKey <chr>

Ideally, the df is return exactly like the input taxa, with an updated verifiedKey column or that column appended to the end.

verify_taxa: also verify genus and infraspecific taxa

Update columns

  • bb_acceptedKeybb_suggestedKey
  • bb_acceptedNamebb_suggestedName
  • bb_acceptedKingdombb_suggestedKingdom
  • bb_acceptedRankbb_suggestedRank
  • bb_acceptedTaxonomicStatusbb_suggestedTaxonomicStatus

Processing

Not in backbone

Same as before

  1. Leave suggested_ fields empty

Genera

New

  1. Leave suggested_ fields empty
  2. If need be, verifier can add multiple species keys to verifiedKey

Infraspecific ranks

New

  1. For all taxonomicStatus!!
  2. Lookup info via speciesKey*
  3. Populate suggested_ fields with species parent
  • Note: unfortunately not all SYNONYM or even ACCEPTED taxa have this, e.g. https://api.gbif.org/v1/species/7707872 (maybe due to NAME_PARENT_MISMATCH). We will have to manually add the correct species keys in verificationKey for those.

Synonyms species

Same as before

  1. Lookup info via acceptedKey
  2. Populated suggested_ fields with accepted taxon

Accepted species

Same as before

Add option "origin" to get_taxa()

Problem described at ropensci/rgbif#288

If tackled at rgbif (preferred)

  1. Add option origin to get_taxa()
  2. Directly transfer to name_usage() to filter

If not tackled at rgbif

  1. Add option origin to get_taxa()
  2. Get results
  3. Filter on data on origin

Expand climate matching function with single species maps

The idea is to expand the function from #73 with single_species_maps as a new item in the functions return list.
These maps should be made as a sublist of taxonkeys containing a leaflet with all scenarios as basegroups.

so the final output of this expansion should be callable as ouput$single_species_maps$taxonkey_1

Function has_distribution()

has_distribution <- function(taxon_key, ...) 

Parameters

  • taxon_key: single taxon key (numeric or character)

Optional parameters

  • These GBIF distribution properties and their API synonyms. They have to be passed as single values (if single) or vectors.
    • countryCode = country
    • occurrenceStatus = status
    • establishmentMeans
  • If any property passed to function does not exist: assert error
  • Most of the distribution parameters can take many inputs, and treated as ’OR’ (e.g., a or b or c)

Example:

has_distribution(134086855, countryCode = c("BE", "NL", "LU"), 
                          establishmentMeans  = "INTRODUCED", status = c("PRESENT", "DOUBTFUL"))

Return

  • A logical (TRUE or FALSE)

From the example above, return TRUE if the taxon 134086855 has at least one distribution with:

  1. BE, NL OR LU in field countryCode AND
  2. INTRODUCED in field establishmentMeans AND
  3. PRESENT OR DOUBTFUL in field occurrenceStatus

Documentation

  • Document all above as succinct as possible with roxygen

Rename functions

I would consider renaming the functions:

  • get_taxaget_gbif_taxa()
  • has_distribution()has_gbif_distribution()

As we might have non GBIF related functions. For easier grouping, we could also consider:

  • gbif_get_taxa()
  • gbif_has_distribution()

@damianooldoni @stijnvanhoey thoughts?

Drop input_country from update_download_list + parse DOI

The gbif download list currently has a field input_country. However, there are more filters that we might set than countries, e.g. not accepting occurrences with issues or searching within a specific date range. Because:

  1. We can't capture all of this in the file
  2. Just listing countries gives a false impression of the actual filters
  3. The download page itself lists all filters

I would drop countries from the function as a parameter and the output. I would also drop the column from the tsv file.

Note 1: simply not providing the parameter is not an option, as it is required.

Note 2: the specific taxa list we used is still very useful though, so input_checklist should be kept.


To provide a better link to the download page, I would parse the gbif_download_doi within the function and prepend it with https://doi.org/ (so 10.15468/dl.6cljf9 becomes https://doi.org/10.15468/dl.6cljf9). This only needs to be done for new lines (so no need to check this for all previous downloads).

Failed to install 'trias' from GitHub

platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          6.1                         
year           2019                        
month          07                          
day            05                          
svn rev        76782                       
language       R                           
version.string R version 3.6.1 (2019-07-05)
nickname       Action of the Toes        

I'm having problems installing the trias package. Any idea what I should do?

The error message is...

package ‘rlang’ successfully unpacked and MD5 sums checked
Error: Failed to install 'trias' from GitHub:
  (converted from warning) cannot remove prior installation of package ‘rlang’

Error when using Trias to create new Indicators

When rendering the new ias pathways indicator for the INBO website (see #34 I get the following error message:

Error: package or namespace load failed for 'trias':
 object 'all_of' is not exported by 'namespace:dplyr'

I think this indicates the renv(ironment) used by the indicators uses an earlier version of dplyr (0.8.5.) which does not include the 'all_of' - function. Can someone look into the minimum dplyr version required to use this function ? and maybe add it to the DESCRIPTION - file ?

Write functions for pathways graphs

After videomeeting with @timadriaens, I will write the following functions:

  1. visualize_pathways_level1(df, category = NULL): returns a bar graph. X: pathway_level1, Y: number of introduced taxa. As for get_table_pathways() function producing tables, we can specify the category: one of c("Plantae", "Animalia", "Fungi", "Chromista", "Archaea", "Bacteria", "Protozoa", "Viruses", "incertae sedis", "Chordata", "Not Chordata")
  2. visualize_pathways_level2(df, pathway_level1, category = NULL): returns a bar graph. X: pathway_level2, Y: number of introduced taxa. pathway_level1 one of c("contaminant", "escape", "release", "corridor", "natural_dispersal", "unaided", "stowaway", "unknown")
  3. visualize_pathways_year_level1(df, bin = 10, cut_off = 1950, category = NULL): returns a line plot. X: year grouped in bins of width equal to bin (default 10 years) starting from cut_off. All taxa introduced before cut_off year are count together (default 1950). Y: number of introduced taxa
  4. visualize_pathways_year_level2(df, pathway_level1, bin = 10, cut_off = 1950, category = NULL): returns a line plot as in 3. pathway_level1 must be specified as in 2.

Reminders:

  • Set warning if taxa are removed due to missing info about year of introduction (similar to indicator_*_year() functions)
  • Write unit-tests
  • Check whether flipping the bar plots as suggested by @peterdesmet in trias-project/indicators#75 (comment) helps readibility.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.