Giter Site home page Giter Site logo

finch's Introduction

Project Status: Abandoned

This package has been archived. The former README is now in README-not.<

finch's People

Contributors

cboettig avatar gustavobio avatar jeroen avatar maelle avatar sckott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finch's Issues

Reading files from the web

Hi,

Links to DWC-A files shared with the integrated publishing toolkit usually look like this:

http://ipt.jbrj.gov.br/jbrj/archive.do?r=redlist_2013_taxons&v=3.12

However, dwca_read() uses the provided url to get the name of the file where the data will be stored:

basename("http://ipt.jbrj.gov.br/jbrj/archive.do?r=redlist_2013_taxons&v=3.12")

basename("http://ipt.jbrj.gov.br/jbrj/archive.do?r=redlist_2013_taxons&v=3.12")
[1] "archive.do?r=redlist_2013_taxons&v=3.12"

This doesn't work well as R can't find the path to the zip file later on:

Error in unzip(writepath, exdir = dirpath) :
cannot open file '/Users/gustavo/archive.do?r=lista_especies_flora_brasil&v=393.71/resourcerelationship.txt': Not a directory

I guess this could be solved by changing the basename to something else when the url isn't a direct link to a zip file.

Thanks!

Test on larger files

related to #9 - need to figure out best option that will work in as many cases as possible

What to do about big datasets

users machines will vary in RAM, so should leave it up to the users - but helping prevent session from crashing would be good.

Maybe warn with prompt if dataset is over certain size? Maybe too much

Error parsing .xml DwC file with finch::simple_read

Hi, I'm having trouble parsing a DarwinCore file, eml.xml, using the finch package with the code below:
file <- simple_read("eml.xml")

This brings up the error message "Error: no parser for eml". The file eml.xml is in the current file path.

Thanks a lot for any suggestions as to what's going wrong.

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] finch_0.1.0       traits_0.2.0      data.table_1.10.4 vegan_2.4-3      
 [5] lattice_0.20-35   permute_0.9-4     reshape2_1.4.2    rgeos_0.3-23     
 [9] plyr_1.8.4        bindrcpp_0.2      purrr_0.2.2.2     robis_0.1.8      
[13] wellknown_0.1.0   stringr_1.2.0     stringi_1.1.5     dplyr_0.7.2      
[17] jsonlite_1.5      httr_1.2.1       

loaded via a namespace (and not attached):
 [1] taxize_0.8.9     htmltools_0.3.6  yaml_2.1.14      mgcv_1.8-17     
 [5] base64enc_0.1-3  rlang_0.1.1      glue_1.1.1       sp_1.2-5        
 [9] uuid_0.1-2       foreach_1.4.3    bindr_0.1        rvest_0.3.2     
[13] htmlwidgets_0.9  codetools_0.2-15 evaluate_0.10.1  knitr_1.16      
[17] httpuv_1.3.5     crosstalk_1.0.0  curl_2.7         parallel_3.4.1  
[21] Rcpp_0.12.12     xtable_1.8-2     readr_1.1.1      backports_1.1.0 
[25] leaflet_1.1.0    mime_0.5         hms_0.3          digest_0.6.12   
[29] shiny_1.0.3      grid_3.4.1       rprojroot_1.2    tools_3.4.1     
[33] magrittr_1.5     tibble_1.3.3     EML_1.0.3        crul_0.3.8      
[37] bold_0.5.0       cluster_2.0.6    ape_4.1          pkgconfig_2.0.1 
[41] MASS_7.3-47      Matrix_1.2-10    xml2_1.1.1       iterators_1.0.8 
[45] reshape_0.8.7    assertthat_0.2.0 rmarkdown_1.6    R6_2.2.2        
[49] nlme_3.1-131     compiler_3.4.1  
--



Error importing DwC file from URL

> devtools::session_info() Session info ------------------------------------------------------------------------------------------------------------  setting  value                         version  R version 3.4.1 (2017-06-30)  system   x86_64, mingw32               ui       RStudio (1.0.143)             language (EN)                          collate  English_United States.1252    tz       Europe/Paris                  date     2017-07-18                    Packages ----------------------------------------------------------------------------------------------------------------  package    * version date       source          base       * 3.4.1   2017-06-30 local           compiler     3.4.1   2017-06-30 local           data.table   1.10.4  2017-02-01 CRAN (R 3.4.0)  datasets   * 3.4.1   2017-06-30 local           devtools   * 1.13.2  2017-06-02 CRAN (R 3.4.1)  digest       0.6.12  2017-01-27 CRAN (R 3.4.1)  EML          1.0.3   2017-05-01 CRAN (R 3.4.1)  finch      * 0.1.0   2016-12-23 CRAN (R 3.4.1)  graphics   * 3.4.1   2017-06-30 local           grDevices  * 3.4.1   2017-06-30 local           memoise      1.1.0   2017-04-21 CRAN (R 3.4.1)  methods    * 3.4.1   2017-06-30 local           plyr         1.8.4   2016-06-08 CRAN (R 3.4.1)  rappdirs     0.3.1   2016-03-28 CRAN (R 3.4.1)  Rcpp         0.12.12 2017-07-15 CRAN (R 3.4.1)  stats      * 3.4.1   2017-06-30 local           tools        3.4.1   2017-06-30 local           utils      * 3.4.1   2017-06-30 local           uuid         0.1-2   2015-07-28 CRAN (R 3.4.0)  withr        1.0.2   2016-06-20 CRAN (R 3.4.1)  xml2         1.1.1   2017-01-24 CRAN (R 3.4.1)
--

I get the following error:

file <- "http://ipt.vliz.be/eurobis/archive.do?r=nbn_ga000467&v=1.1"
out <- dwca_read(file, read = TRUE)
File in cache
Error in if (grepl("<|>", x)) { : argument is of length zero

but the following works fine:

file <- "dwca-nbn_ga000467-v1.1.zip"
out <- dwca_read(file, read = TRUE)

imports fixes

  • remove plyr
  • remove rappdirs
  • importFrom hoardr
  • importFrom EML
  • importFrom digest

Parsing occurrence text files in DwC archive

After a year working with GBIF data in R and getting always problems importing correctly occurrence text files in R, I ended up writing a gist where I collected most of the col types I got problems with: type_GBIF_occurrence_fields.R. I discussed with colleagues about the utility of putting it in our project package. But, as suggested here trias-project/trias#25 (comment) why not pitch the authors of finch about? 👍
The typical issue while opening such files is that some DwC fields (columns) are NAs for thousands of rows before getting a real value. This creates parsing failures as R assigned type logical to these fields (columns). My first solution was to increase the value of guess_max parameter but for big files is this unfeasible, plus this is just a work-around.

Any possibility of more detailed output from dwca_validate()?

I am looking into validating Darwin Core "taxon" class data in R, and want to make sure I don't reinvent the wheel. I see {finch} has a validator function, but it apparently just passes the zip off to https://tools.gbif.org/dwca-validator/ then returns only some very basic statistics (number of records) and a URL to the gbif dwca validator results, where all the juicy info is.

It would be great if the validation results were actually output as a list or (even better) dataframe.

Do the developers of finch have any plans for this sort of functionality or know of any other existing packages that do such a thing? Thanks!

dwca_read() returns blank occurrence.txt data.frame despite this file not being blank

I setup a GBIF query for all records in the genus "Quercus". This file is fairly large and that may be part of the problem. I ran the function dwca_read on a locally downloaded version of the file that I have linked to below. Also I did verify that the examples included in the help file of dwca_read worked, and with those datasets I was able to load the occurrence.txt data.frame

library(finch)

file <- "http://api.gbif.org/v1/occurrence/download/request/0020631-151016162008034.zip"
out <- dwca_read(file, read = TRUE)
#Read 0.0% of 1027888 rowss
out$data$occurrence.txt
#data frame with 0 columns and 0 rows

issue with installing the finch

I tried to install the package bt I ran into some difficulties, this is the error I get, I think it might have something to do with my R version?

install.packages("finch")
WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:

https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/hanie/OneDrive/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/finch_0.4.0.zip'
Content type 'application/zip' length 302409 bytes (295 KB)
downloaded 295 KB

Also tried

install_github("ropensci/finch")
Error in install_github("ropensci/finch") :
could not find function "install_github"
Can you help please?

... not actually being passed on to fread in dwca_read

Hi!

... arguments are not being passed on to fread in dwca_read. I've fixed this in my local repo, but it's probably easier for you to do it yourself here. Accents in some files aren't displayed correctly unless I pass encoding = UTF-8 in dwca_read.

Thanks!

from CRAN: don't write to user library

These all show check problems on the Debian check systems caused by
attempts to write to the user library to which all packages get
installed before checking (and which now is remounted read-only for
checking).

Having package code which is run as part of the checks and attempts to
write to the user library violates the CRAN Policy's

Packages should not write in the user’s home filespace (including
clipboards), nor anywhere else on the file system apart from the R
session’s temporary directory (or during installation in the location
pointed to by TMPDIR: and such usage should be cleaned up).

Hence, please update your package(s) as quickly as possible to no longer
(attempt to) write to the user library (including, of course, the
location where the package itself was installed to).

Write path

Hi again,

I noticed that when using finch to download a bunch of DWC-A files my Sys.getenv("HOME") would get cluttered with them. It seems to me that the best places to write those files to would be either tempdir() or the working path if persistency is desired.

Thanks,

Gustavo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.