Giter Site home page Giter Site logo

atom4r's Introduction

Hi there πŸ‘‹

  • I'm a Senior Geospatial Information Systems engineer and R Lead Expert for the Food and Agriculture Organization for the United Nations (UN-FAO) - Fisheries & Aquaculture Division, and as freelance consultant/developer
  • πŸ—ΊοΈ I'm interested in geospatial data sharing, software and standards in particular ISO/OGC standards
  • I participate in various community development & standardization activities in support of Fisheries data interoperability at fdiwg
  • I'm actively participating to the R spatial community, especially dealing with handling data and metadata with international standards, geospatial data workflows automation in relation with OGC web-services in support to FAIR data principles implementation
  • I'm lead developer of the OpenFairViewer ISO/OGC metadata driven web viewer in support of open geospatial science and FAIR data principles

Personal Site LinkedIn

stats Top Langs

atom4r's People

Contributors

eblondel avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

mpadge

atom4r's Issues

DCElement classes not listed when namespace not loaded

This reprex demonstrates:

dcentry <- atom4R::DCEntry$new()
#> Loading Atom XML schemas...
#> Loading DCMI vocabularies...
dcentry$setId("my-dc-entry")
dcentry$addDCDate(Sys.time())
#> Error : $ operator is invalid for atomic vectors
#> Error in clazz$new: $ operator is invalid for atomic vectors

Created on 2022-02-22 by the reprex package (v2.0.1)

The problem arises here:

atom4R/R/DCElement.R

Lines 62 to 63 in df577bc

DCElement$getDCClasses = function(extended = FALSE, pretty = FALSE){
list_of_classes <- unlist(sapply(search(), ls))

If the namespace is not loaded, none of the classes are found, and getDCClasses returns an empty list. A simple solution would be to insert something like:

if (!isNamespaceLoaded("atom4R")) {
  attachNamespace("atom4R")
}

That has undesirable side-effects like issuing packageStartupMessages, so maybe you can think of a better solution?

Support new setter methods for list of elements

library(atom4R)
packageVersion ("atom4R")
#> [1] '0.2.9999'
d <- atom4R::DCEntry$new ()
d$addDCCreator (c ("a", "b"))
d$validate ()
#> Error in if (Encoding(out) != "UTF-8") out <- iconv(out, to = "UTF-8"): the condition has length > 1

Created on 2022-05-18 by the reprex package (v2.0.1)

Obviously it's been coded with the assumption that a Creator entry is a single string, but confirming that needs to be part of the validation step. (And actually the Creator term permits Creator as an Agent and by extension Agent Class, so multiple entries should be permitted.)

Include local xsd schema to avoid internet resource unavailability

Following CRAN notice, the xml.xsd will be set locally to avoid issues, and make schemas loading faster

Dear maintainer,

Please see the problems shown on
[<https://cran.r-project.org/web/checks/check_results_atom4R.html>](https://cran.r-project.org/web/checks/check_results_atom4R.html).

Please correct before 2022-09-01 to safely retain your package on CRAN.

It seems we need to remind you of the CRAN policy:

'Packages which use Internet resources should fail gracefully with an informative message
if the resource is not available or has changed (and not give a check warning nor error).'

This needs correction whether or not the resource recovers.

The CRAN Team

Issue with XML package in Github r-lib v2 actions

The XML package is considered obsolete and currently causing widespread build failures on GitHub actions, including R CMD Check on ubuntu, and test coverage which by default runs on the failing environment. That means that no packages which depend on {atom4R} can be tested on GitHub. It would be good for future-proofing to replace XML with xml2.

bug in "updated" timestamp

No matter what I do, all DCEntry objects always have the same time stamp. The first timestamp in this reprex is when I did the system install of the package, and the second produces the timestamp when load_all() is called. Even then, subsequent calls after the second of those repeat that same timestamp, and do not ever set it to the expected value of Sys.time().

library (atom4R)
dcmi <- atom4R::DCEntry$new ()
dcmi$updated
#> [1] "2022-05-19 11:45:51 CEST"
Sys.time () - dcmi$updated
#> Time difference of 4.596361 hours

path <- "/<path>/<to>/<local>/atom4R"
devtools::load_all (path, export_all = FALSE)
#> β„Ή Loading atom4R
dcmi <- atom4R::DCEntry$new ()
dcmi$updated
#> [1] "2022-05-19 16:21:39 CEST"
Sys.time () - dcmi$updated
#> Time difference of 1.471567 secs

Created on 2022-05-19 by the reprex package (v2.0.1)

validate method fails after 'setUpdated' method called

Context: To test any package which uses atom4R, any attributes of any objects which are dependent on untrollable variables, such as current time or timezone of test environments, must be standardised. It is therefore necessary to be able to set a standard "updated" time using the setUpdated method. Doing so, however, leads to this issue:

library (atom4R)
packageVersion ("atom4R")
#> [1] '0.3.1'

m <- DCEntry$new ()
m$updated
#> [1] "2022-10-06 14:59:56 CEST"
class (m$updated)
#> [1] "POSIXct" "POSIXt"
m$setUpdated (as.POSIXct ("2022-01-01 00:00:01", tz = "CEST"))
m$updated
#> [1] "2022-01-01 00:00:01 CEST"
class (m$updated)
#> [1] "POSIXct" "POSIXt"

At that stage, all looks good, and the updated field has merely changed iternally. But then:

m$validate ()
#> [atom4R][WARN] Element '{http://www.w3.org/2005/Atom}updated', attribute 'tzone': The attribute 'tzone' is not allowed at line 2. 
#> [atom4R][WARN] Object 'DCEntry' is INVALID according to Atom XML schemas!
#> [1] FALSE

And that happens because of the code in the 'validate' method:

self <- m
schemaNamespaceId <- self$namespace$id
xml <- self$encode(addNS = TRUE, validate = FALSE, strict = strict)
print (xml)
#> <atom:entry xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
#>   <atom:updated tzone="CEST">2022-01-01T00:00:01+00:00</atom:updated>
#>   <!--Creation date/time: 2022-10-06T14:59:56-->
#>   <!--Atom XML generated by atom4R R package - Version 0.3-1-->
#>   <!--Atom XML compliance: NOT TESTED-->
#>   <!--atom4R R package information:  Contact: Emmanuel Blondel [email protected]   URL: https://github.com/eblondel/atom4R BugReports: https://github.com/eblondel/atom4R-->
#> </atom:entry>
xsd <- getAtomSchemas()
if(is(xml, "XMLInternalNode")) xml <- XML::xmlDoc(xml)
report <- XML::xmlSchemaValidate(xsd, xml)
report$errors[[1]]$msg
#> [1] "Element '{http://www.w3.org/2005/Atom}updated', attribute 'tzone': The attribute 'tzone' is not allowed.\n"
isValid <- report$status == 0 # FALSE

Created on 2022-10-06 with reprex v2.0.2

So atom4R then inserts a tzone="CEST" value into the XML and renders it invalid. Note that in the initial call before setUpdated() is called, then tzone is still there yet empty (so tzone=""). Then the xmlSchemaValidate() method does not complain.

CRAN reports package startup messages (related to rdflib) cannot be suppressed

The below messages cannot be suppressed by CRAN checks.
This issue relates to some codes in rdflib, that should made less verbose. Code has been already contributed by @mpadge at ropensci/rdflib@649cb31 in ropensci/rdflib#45 but it's not yet released on CRAN, hence we still get the below messages.

Version: 0.3-3
Check: startup messages can be suppressed
Result: NOTE
    Rows: 98 Columns: 7
    ── Column specification ────────────────────────────────────────────────────────
    Delimiter: ","
    chr (5): s, comment, isDefinedBy, label, description
    lgl (1): memberOf
    date (1): issued
    
    β„Ή Use `spec()` to retrieve the full column specification for this data.
    β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.
    Rows: 12 Columns: 7
    ── Column specification ────────────────────────────────────────────────────────
    Delimiter: ","
    chr (6): s, comment, isDefinedBy, label, memberOf, description
    date (1): issued
    
    β„Ή Use `spec()` to retrieve the full column specification for this data.
    β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.
    
    It looks like this package (or a package it requires) has a startup
    message which cannot be suppressed: see ?packageStartupMessage.
Flavors: [r-devel-linux-x86_64-debian-clang](https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/atom4R-00check.html), [r-devel-linux-x86_64-debian-gcc](https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-gcc/atom4R-00check.html), [r-devel-linux-x86_64-fedora-clang](https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-clang/atom4R-00check.html), [r-devel-linux-x86_64-fedora-gcc](https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-fedora-gcc/atom4R-00check.html)

Issue with registerAtomSchema example

As reported by CRAN

Version: 0.3-2
Check: examples
Result: ERROR
Running examples in β€˜atom4R-Ex.R’ failed
The error most likely occurred in:

> ### Name: registerAtomSchema
> ### Title: registerAtomSchema
> ### Aliases: registerAtomSchema
>
> ### ** Examples
>
> registerAtomSchema(xsdFile = "https://jvndb.jvn.jp/schema/atom.xsd")
Error in curl::curl_fetch_memory(url, handle = handle) :
 error:0A00018A:SSL routines::dh key too small
Calls: registerAtomSchema ... request_fetch -> request_fetch.write_memory -> <Anonymous>
Execution halted

Flavors: r-devel-linux-x86_64-fedora-clang, r-devel-linux-x86_64-fedora-gcc

Switch DC vocabularies from web access to local files (in package)

A security layer was added on dublincore website that prevents to parse directly Dublincore terms:

These resources are queries indirectly through rdflib/ redland package, leading now to a 403 status code (forbidden).

An alert was raised by CRAN team asking for a fix before 2022-02-16 to safely retain atom4R on CRAN.

For the timebeing, these resources will be put as local files in inst/extdata to solve the issue (and dependencies such geoflow.

Missing DCMI Elements

Not all DCMI elements are enabled in DCEntry. For my own purposes, i particularly need to use the following, currently missing, terms:

  • hasPart
  • hasVersion
  • isPartOf
  • isReferencedBy
  • isReplacedBy
  • isRequiredBy
  • isVersionOf

Any plans to implement these currently missing terms?

Issue with Keyring >= 1.2.0 - system password requested - Set keyring 'env' backend by default

With keyring package version 1.2.0, we get message The 'system' keyring does not exist, enter a keyring password to create it when instantiating the AtomPubClient. This relates to this change r-lib/keyring#95 where file backend is now set by default on Linux OS (env is kept by default on other OS).
For the time being, atom4R will keep using the env backend for storing Atom pub client user/passwords. This could be refined later depending on needs.

addDCElement is too slow

I've got code that loops over a list of metadata terms which can't be know in advance and does something like this, acting on a DCEntry object called dcmi:

    for (i in seq (nrow (terms))) {
        dc_fn <- # code to match `addDC...` fn from `atom4R`
        value <- data [[terms$value [i]]]
        do.call (dcmi [[dc_fn]], list (value))
    }

The problem is this is the slowest bit of my code by far, because all of the addDCElement() calls take so long. This in turn seems to be because of these lines:

list_of_classes <- list_of_classes[sapply(list_of_classes, function(x){
clazz <- try(eval(parse(text=x)),silent=TRUE)
if(is(clazz, "try-error")) clazz <- try(eval(parse(text=paste0("atomR::",x))),silent=TRUE)
r6Predicate <- class(clazz)[1]=="R6ClassGenerator"
if(!r6Predicate) return(FALSE)
atomObjPredicate <- FALSE
superclazz <- clazz
while(!atomObjPredicate && !is.null(superclazz)){
clazz_fields <- names(superclazz)
if(!is.null(clazz_fields)) if(length(clazz_fields)>0){
if("get_inherit" %in% clazz_fields){
superclazz <- superclazz$get_inherit()
atom4RPredicate <- FALSE
if("parent_env" %in% clazz_fields) atom4RPredicate <- environmentName(superclazz$parent_env)=="atom4R"
atomObjPredicate <- superclazz$classname == classname && atom4RPredicate
}else{
break
}
}
}
return(atomObjPredicate)
})]

For a single term, that can end up looping around 300 times, each time doing a full eval(parse()) call. Would it not be possible to restructure that entire call as its own reference function that did not accept classname as a dynamic parameter, rather did the single trawl over everything and dumped c(classname, atom4RPredicate) to create a single reference table. That call could easily be memoised for instant recall, and then the list_of_classes for a single classname could also be instantly extracted. Given my estimates for a few terms of 300 or so loops of that funciton times unknown numbers of terms going into my code like the above, it should be possible to achieve a speed-up here of O(1000).

Fetch vocabulary data by default

Vocabularies data (such as the list of DC terms) is fetched each time we create an object of DCElement subclass, by running RDF sparql query; and this in order to check that term is a valid term. Data should be fetched by default, once, when vocabulary is set-up. This will be contribute to improve performance when setting DC terms. Cf #15 and #18

Accessors/getters for DCElement objects

In the absence of any accessors for DCElement objects, I would expect the only available method of simply naming them to return the actual objects. This leads to the following unexpected behaviour:

library(atom4R)
packageVersion ("atom4R")
#> [1] '0.2.9999'

a0 <- DCAvailable$new (value = FALSE)
ls (a0) # okay
#>  ... expected output

d <- atom4R::DCEntry$new ()
d$addDCAvailable ("01-01-0001")
d$validate ()
#> [atom4R][INFO] Object 'DCEntry' is VALID according to Atom XML schemas!
#> [1] TRUE
a1 <- d$available
ls (a1)
#> Error in list2env(list(<environment>), NULL, <environment>): names(x) must be a character vector of the same length as x

Created on 2022-05-18 by the reprex package (v2.0.1)

Happens because of this line:

self[[term]][[length(self[[term]])+1]] <- elem

Everything is embedded within a list, and trying to access elements as $<element> does not behave as expected. It would arguably be safer to provide accessor methods to (1) enable easy and systematic access; and (2) avoid unexpected behaviour like this.

Curl error while publishing into dataverse

Hi Emmanuel,
I did some other test publishing into data.inrae.fr from the rstudio server Gedeop. I thought it came from my colleague's configuration but it's more about mine! Indeed, we did a test with another colleague's account and he got the same error :

1  atom4R      TRUE     0.1
[atom4R][WARN] SwordDataverseClient - Token is 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
[atom4R][INFO] SwordDataverseClient - GET - Sword Dataverse service document at 'https://data.inrae.fr/dvn/api/data-deposit/v1.1/swordv2/service-document'
Error in curl::curl_fetch_memory(url, handle = handle) :
  error:141A318A:SSL routines:tls_process_ske_dhe:dh key too small
In addition: Warning messages:
1: In default_backend_auto() :
  Selecting β€˜env’ backend. Secrets are stored in environment variables
2: In default_backend_auto() :
  Selecting β€˜env’ backend. Secrets are stored in environment variables
3: In default_backend_auto() :
  Selecting β€˜env’ backend. Secrets are stored in environment variables
[atom4R][ERROR] SwordDataverseClient - Error while retrieving SWORD service document
 
Error in super$initialize(url = sword_api_url, version = "2", token = token,  :
  Error while retrieving SWORD service document

On my rstudio session, I don't get this error, whatever the token used. So it seems that I'm the only one able to publish ...

Thanks for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.