This package has been archived. The former README is now in README-not.<
ropensci-archive / finch Goto Github PK
View Code? Open in Web Editor NEW:warning: ARCHIVED :warning: Read Darwin Core Archive files
License: Other
:warning: ARCHIVED :warning: Read Darwin Core Archive files
License: Other
This package has been archived. The former README is now in README-not.<
Hi,
Links to DWC-A files shared with the integrated publishing toolkit usually look like this:
http://ipt.jbrj.gov.br/jbrj/archive.do?r=redlist_2013_taxons&v=3.12
However, dwca_read() uses the provided url to get the name of the file where the data will be stored:
basename("http://ipt.jbrj.gov.br/jbrj/archive.do?r=redlist_2013_taxons&v=3.12")
basename("http://ipt.jbrj.gov.br/jbrj/archive.do?r=redlist_2013_taxons&v=3.12")
[1] "archive.do?r=redlist_2013_taxons&v=3.12"
This doesn't work well as R can't find the path to the zip file later on:
Error in unzip(writepath, exdir = dirpath) :
cannot open file '/Users/gustavo/archive.do?r=lista_especies_flora_brasil&v=393.71/resourcerelationship.txt': Not a directory
I guess this could be solved by changing the basename to something else when the url isn't a direct link to a zip file.
Thanks!
related to #9 - need to figure out best option that will work in as many cases as possible
users machines will vary in RAM, so should leave it up to the users - but helping prevent session from crashing would be good.
Maybe warn with prompt if dataset is over certain size? Maybe too much
Hi, I'm having trouble parsing a DarwinCore file, eml.xml, using the finch package with the code below:
file <- simple_read("eml.xml")
This brings up the error message "Error: no parser for eml". The file eml.xml is in the current file path.
Thanks a lot for any suggestions as to what's going wrong.
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] finch_0.1.0 traits_0.2.0 data.table_1.10.4 vegan_2.4-3
[5] lattice_0.20-35 permute_0.9-4 reshape2_1.4.2 rgeos_0.3-23
[9] plyr_1.8.4 bindrcpp_0.2 purrr_0.2.2.2 robis_0.1.8
[13] wellknown_0.1.0 stringr_1.2.0 stringi_1.1.5 dplyr_0.7.2
[17] jsonlite_1.5 httr_1.2.1
loaded via a namespace (and not attached):
[1] taxize_0.8.9 htmltools_0.3.6 yaml_2.1.14 mgcv_1.8-17
[5] base64enc_0.1-3 rlang_0.1.1 glue_1.1.1 sp_1.2-5
[9] uuid_0.1-2 foreach_1.4.3 bindr_0.1 rvest_0.3.2
[13] htmlwidgets_0.9 codetools_0.2-15 evaluate_0.10.1 knitr_1.16
[17] httpuv_1.3.5 crosstalk_1.0.0 curl_2.7 parallel_3.4.1
[21] Rcpp_0.12.12 xtable_1.8-2 readr_1.1.1 backports_1.1.0
[25] leaflet_1.1.0 mime_0.5 hms_0.3 digest_0.6.12
[29] shiny_1.0.3 grid_3.4.1 rprojroot_1.2 tools_3.4.1
[33] magrittr_1.5 tibble_1.3.3 EML_1.0.3 crul_0.3.8
[37] bold_0.5.0 cluster_2.0.6 ape_4.1 pkgconfig_2.0.1
[41] MASS_7.3-47 Matrix_1.2-10 xml2_1.1.1 iterators_1.0.8
[45] reshape_0.8.7 assertthat_0.2.0 rmarkdown_1.6 R6_2.2.2
[49] nlme_3.1-131 compiler_3.4.1
--
> devtools::session_info() Session info ------------------------------------------------------------------------------------------------------------ setting value version R version 3.4.1 (2017-06-30) system x86_64, mingw32 ui RStudio (1.0.143) language (EN) collate English_United States.1252 tz Europe/Paris date 2017-07-18 Packages ---------------------------------------------------------------------------------------------------------------- package * version date source base * 3.4.1 2017-06-30 local compiler 3.4.1 2017-06-30 local data.table 1.10.4 2017-02-01 CRAN (R 3.4.0) datasets * 3.4.1 2017-06-30 local devtools * 1.13.2 2017-06-02 CRAN (R 3.4.1) digest 0.6.12 2017-01-27 CRAN (R 3.4.1) EML 1.0.3 2017-05-01 CRAN (R 3.4.1) finch * 0.1.0 2016-12-23 CRAN (R 3.4.1) graphics * 3.4.1 2017-06-30 local grDevices * 3.4.1 2017-06-30 local memoise 1.1.0 2017-04-21 CRAN (R 3.4.1) methods * 3.4.1 2017-06-30 local plyr 1.8.4 2016-06-08 CRAN (R 3.4.1) rappdirs 0.3.1 2016-03-28 CRAN (R 3.4.1) Rcpp 0.12.12 2017-07-15 CRAN (R 3.4.1) stats * 3.4.1 2017-06-30 local tools 3.4.1 2017-06-30 local utils * 3.4.1 2017-06-30 local uuid 0.1-2 2015-07-28 CRAN (R 3.4.0) withr 1.0.2 2016-06-20 CRAN (R 3.4.1) xml2 1.1.1 2017-01-24 CRAN (R 3.4.1)
--
I get the following error:
file <- "http://ipt.vliz.be/eurobis/archive.do?r=nbn_ga000467&v=1.1"
out <- dwca_read(file, read = TRUE)
File in cache
Error in if (grepl("<|>", x)) { : argument is of length zero
but the following works fine:
file <- "dwca-nbn_ga000467-v1.1.zip"
out <- dwca_read(file, read = TRUE)
e.g., https://travis-ci.org/ropensci/finch#L1526 Error : object ‘eml_read’ is not exported by 'namespace:EML'
check fxn exported by EML, make fixes
After a year working with GBIF data in R and getting always problems importing correctly occurrence text files in R, I ended up writing a gist where I collected most of the col types I got problems with: type_GBIF_occurrence_fields.R. I discussed with colleagues about the utility of putting it in our project package. But, as suggested here trias-project/trias#25 (comment) why not pitch the authors of finch about? 👍
The typical issue while opening such files is that some DwC fields (columns) are NAs for thousands of rows before getting a real value. This creates parsing failures as R assigned type logical to these fields (columns). My first solution was to increase the value of guess_max
parameter but for big files is this unfeasible, plus this is just a work-around.
I am looking into validating Darwin Core "taxon" class data in R, and want to make sure I don't reinvent the wheel. I see {finch} has a validator function, but it apparently just passes the zip off to https://tools.gbif.org/dwca-validator/ then returns only some very basic statistics (number of records) and a URL to the gbif dwca validator results, where all the juicy info is.
It would be great if the validation results were actually output as a list or (even better) dataframe.
Do the developers of finch have any plans for this sort of functionality or know of any other existing packages that do such a thing? Thanks!
I setup a GBIF query for all records in the genus "Quercus". This file is fairly large and that may be part of the problem. I ran the function dwca_read
on a locally downloaded version of the file that I have linked to below. Also I did verify that the examples included in the help file of dwca_read
worked, and with those datasets I was able to load the occurrence.txt
data.frame
library(finch)
file <- "http://api.gbif.org/v1/occurrence/download/request/0020631-151016162008034.zip"
out <- dwca_read(file, read = TRUE)
#Read 0.0% of 1027888 rowss
out$data$occurrence.txt
#data frame with 0 columns and 0 rows
I tried to install the package bt I ran into some difficulties, this is the error I get, I think it might have something to do with my R version?
install.packages("finch")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/hanie/OneDrive/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/finch_0.4.0.zip'
Content type 'application/zip' length 302409 bytes (295 KB)
downloaded 295 KB
Also tried
install_github("ropensci/finch")
Error in install_github("ropensci/finch") :
could not find function "install_github"
Can you help please?
This package needs more documentation! Help out the community by contributing a vignette. If you don't know what a vignette is, check out http://r-pkgs.had.co.nz/vignettes.html for an introduction.
If you aren't sure how to contribute on github checkout https://github.com/ropensci/finch/blob/master/.github/CONTRIBUTING.md
Keep in mind our code of conduct https://github.com/ropensci/finch/blob/master/CONDUCT.md
Hi!
...
arguments are not being passed on to fread
in dwca_read
. I've fixed this in my local repo, but it's probably easier for you to do it yourself here. Accents in some files aren't displayed correctly unless I pass encoding = UTF-8
in dwca_read
.
Thanks!
These all show check problems on the Debian check systems caused by
attempts to write to the user library to which all packages get
installed before checking (and which now is remounted read-only for
checking).
Having package code which is run as part of the checks and attempts to
write to the user library violates the CRAN Policy's
Packages should not write in the user’s home filespace (including
clipboards), nor anywhere else on the file system apart from the R
session’s temporary directory (or during installation in the location
pointed to by TMPDIR: and such usage should be cleaned up).
Hence, please update your package(s) as quickly as possible to no longer
(attempt to) write to the user library (including, of course, the
location where the package itself was installed to).
Hi again,
I noticed that when using finch to download a bunch of DWC-A files my Sys.getenv("HOME") would get cluttered with them. It seems to me that the best places to write those files to would be either tempdir() or the working path if persistency is desired.
Thanks,
Gustavo.
A few examples files I got from https://code.google.com/p/darwincore/source/browse/trunk/2013-10-22/examples/xml/?r=1638 start with <dwr:DarwinRecordSet
at the top. simple()
doesn't parse these yet, failis acutally
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.