Giter Site home page Giter Site logo

opensdmx / rsdmx Goto Github PK

View Code? Open in Web Editor NEW
96.0 28.0 30.0 1.75 MB

Tools for reading SDMX data and metadata in R

Home Page: https://github.com/opensdmx/rsdmx/wiki

R 100.00%
r sdmx read readsdmx api web-services sdmx-provider sdmx-format sdmx-standards dsd

rsdmx's Introduction

opensdmx

OpenSDMX

rsdmx's People

Contributors

bobsan16 avatar cderv avatar dmenne avatar eblondel avatar expersso avatar matthieustigler avatar yjaques avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rsdmx's Issues

Parsing issues with Belgian data

National Bank of Belgium is migrating towards SDMX (in the spirit of OECD) and I'm trying to fetch data from their database. Here are two sample links:

library(rsdmx)

url <- "http://stat.nbb.be/RestSDMX/sdmx.ashx/GetData/NICP2013/99_9_9_9_99+HEALTH+XEFUN0+1_00_0_0_0_00+2_00_0_0_0_00+3_00_0_0_0_00.M/all?startTime=2014-07&endTime=2014-08"
url1 <- "http://stat.nbb.be/RestSDMX/sdmx.ashx/GetData/NICP2013/99_9_9_9_99+HEALTH+XEFUN0+2_00_0_0_0_00+3_00_0_0_0_00.M/all?startTime=2014-07&endTime=2014-08"

Notice that the only difference is +1_00_0_0_0_00 available in the first url. While readSDMX(url1) works fine, readSDMX(url) outputs a parsing error:

Opening and ending tag mismatch: br line 1 and body
Opening and ending tag mismatch: br line 1 and html
Premature end of data in tag body line 1
Premature end of data in tag html line 1
<XMLParserErrorList: 1: Opening and ending tag mismatch: br line 1 and body
2: Opening and ending tag mismatch: br line 1 and html
3: Premature end of data in tag body line 1
4: Premature end of data in tag html line 1

This issue seems to be rather recurrent with NBB database. Long links don't seem to parse but if you cut them up, that seems to do the job. I can't discern any pattern for which ones work and which don't. However, note that if you export an XML file from the web application and read it with rsdmx locally, it works alright.

Issue with nchar - Errors with new R-devel

R CMD check rsdmx is not successfull anymore, because of the change to nchar() in the R-devel. This issue needs to be fixed ASAP to make rsdmx R-devel compatible. Replacing nchar(x) by nchar(x, "w") should be enough and backward compatible.

How to implement SDMX 2.1

rsdmx currently supports SDMX 2.0 only. Being able to parse SDMX 2.1 would be needed to connect to the REST service at Eurostat as it is only available for SDMX2.1. This raises several questions:

  • are there any reasons not to implement SDMX 2.1
  • is it sensible to have both versions of the standard (and possible future updates) available in the package
  • How could parsing SDMX2.1 be made possible without changing the existing interface.

How do you feel about this?

SDMX dataset - absence data support

Some series might not have observations, which introduces the notion of absence data. This should be supported for both SDMX GenericData and CompactData parsers.

Enable SDMX-ML SAX (Simple API for XML) parsing method

When SDMX-ML documents become large, ending up with 2 copies of the data in memory at once, i.e the tree and the R datastructure (resulting from as.data.frame in rsdmx), could lead to memory issues.
In this case, event-driven or SAX (Simple API for XML) style parser could be investigated, not as replacement but as complementary approach to read SDMX-ML files. With the XML package, event-driven parsing is possible using xmlEventParser.

Under this activity, an enhancement of readSDMX should allow reading directly from URL (for SDMX web-services that do not require an explicit user-agent). Combined with SAX approach, it intends to provide a powerful SDMX reader that parse SDMX data from the web and minimizes the R in-memory not storing the SDMX-ML XML content, ideal for processing R web-services using rsdmx.

This activity is listed as priority enhancement for which rsdmx seeks for fundings. See https://github.com/opensdmx/rsdmx/wiki#8-fundings
.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Investigate codelist encoding issues

Parsing codelists can lead to some encoding issues, especially when code are labelized in different languages. This needs to be investigated for the type of resource (local vs. remote) and later the method (xPath vs. SAX) for which the encoding is handled differently.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

as.data.frame method fails with multiple attributes per observation

Here's an XML query link constructed by following the guidelines of ECB:

Alternatively, one can use this query

Both queries work fine to get the data. However, as.data.frame method correctly works only in the second case. In the first case, each observation is repeated so that first dataframe has twice the number of rows of the second one. Indeed, during conversion R spits a bunch of warnings, e.g.

1: In [<-.data.frame(*tmp*, , i, value = list("A", NA)) :
provided 2 variables to replace 1 variables

On the other hand,

works just fine. Just comparing both XML files suggests that there may be an issue if dealing with multiple attributes per each observation.

Oxygenize R documentation

To assess if the R documentation could be oxygenized using Roxygen or Roxygen2, where documentation will be added as comments within the R files, and Rd files generated automatically.

Support for SDMX-ML writer

I create this ticket to discuss about adding the capacity to add a writeSDMX in the package.

SDMX-ML remains a statistical data exchange format. This would make sense that people using R to process and analyze data could have the possibility to export their data analysis results into SDMX-ML format, to share they result in standard way. Any suggestion from users is very welcome!


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Support for concepts / concept schemes

This ticket will add new classes SDMXConcept, SDMXConceptScheme, SDMXConcepts and related methods. The class SDMXConcepts later will be used to feed a SDMX DSD object as indicated in #22, and will come with a method implementing the generic as.data.frame

Support for DataStructureDefinition (DSD)

This issue creates a reference number for the creation of a DSD parser which can be used for naming an issue branch for the topic.
I'd sugggest to make this a function that creates a list of dataframes for every codelist contained in the DSD file, wherein each dataframe has a code and description column.

At a later iteration this object can be assigned to a slot of an S4 object representing the DSD (could be named SDMXdsd, inheriting from the SDMX class) which in itself could then be used to create a class that compounds the header information, the codelists and the actual data as dataframe. However, these thoughts should probably be detailed in another issue.

Issue in parsing serie & obs attributes

Take the following valid query:
https://sdw-wsrest.ecb.europa.eu/service/data/QSA/Q.N.ES.W0..S1.N.N.B9F.F._Z._Z.XDC._T.S.V.N._T?startPeriod=2014-01-01

When running:

library('rsdmx')
url<-'https://sdw-wsrest.ecb.europa.eu/service/data/QSA/Q.N.ES.W0..S1.N.N.B9F.F._Z._Z.XDC._T.S.V.N._T?startPeriod=2014-01-01'
sdmx<-readSDMX(url)
df <- as.data.frame(sdmx)

The following error is raised

Error in structure(attrsValues, .Names = serieAttrsNames) : 
  'names' attribute [5] must be the same length as the vector [4]
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Warning when querying large datasets at Eurostat

Take this query for data at Eurostat:

It returns a message with code 413 indicating that the query is too large and provides a link to a zip file which contains the desirable XML file. In this case, it's of size 4.7 Mb.

It could be useful to get warned whenever this is the case and possibly provide the link to the zip file in the console output. As of now, rsdmx parses the message correctly, though user is not warned. Only when you get NULL dataframe, you start investigating the problem.

Improve SDMX version 1.0 support

I just found this package and gave it a try by changing one of the examples to download data from federal reserve. However, I got an error and not sure whether this is due to the data or the package. Could you please take a quick look? Thanks!

myUrl <- "http://www.federalreserve.gov/datadownload/Output.aspx?rel=G17&series=13577f97794ad0f0b6b3e593c7cad8a1&lastObs=&from=01/01/2006&to=12/31/2014&filetype=sdmx&label=include&layout=seriescolumn"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset) 

Error in sender$name[[xmlGetAttr(x, "xml:lang")]] <- xmlValue(x) :
wrong args for environment subassignment

Include serie attributes

At now, only seriesKey are appended to the dataset data.frame. Additional series attributes should be included.

Testing rsdmx with Canada Statistics

Testing rsdmx with Canada Statistics. Examples are provided here.

Theses tests will aim to provide support to a request sent on the rsdmx mailing list, and identify/fix potential issues in the code.

Note: The case of Canada Statistics represents a useful use case for rsdmx, as it shows that not all data providers necessarily handle an SDMX web-service API, and that many SDMX resources may come as downloaded files, hence the added value of rsdmx to enable reading SDMX local files.

Include dataset Observation attributes

Eurostat data is marked up with what they call "flags" which are attributes for each data item (numerical values). They indicate if example is estimated (flag e), provisional (flag p) etc. In the SDMX files they are coded as value attributes. An example has been included in the manual or readSDMX in #16 (p flags for all values).

A great feature would be if these flags could be made available for plotting (e.g. marking flagged values etc.)

I think it will be difficult implementing this in a standard dataframe as there might be more than one value variables. The returned data would probably need to be more complex (S3, S4). This would also allow to include some Meta information (last update etc.) and make it also available for plotting.

Do you have any thoughts/remarks on this?

Add S4 class for structure type & methods

A S4 class will be added for the abstract SDMX Structure with method to get the appropriate type. Later, this type intends to be activate as valid SDMXType this will come with a first structure subtype parser.

Support for SDMX 2.1 DataStructureDefinition (DSD)

I believe ECB has implemented changes to the way they serve data with SDMX-ML documents in the last few days, at least their documentation page is now visually different. For example, try this link for obtaining DSD data:

It's a valid query. I can't be sure but I believe readSDMX() function parsed the above link without problems before. It seems like ECB implemented additional elements as readSDMX() fails currently with OrganisationSchemes. One also has mes and str abbreviated instead of message and structure. I ain't very familiar with XML document structure, so I am just wondering whether they're still following the full standard. I don't see any changes for parsing raw data since, for example,

still returns the usual stuff and is correctly read with readSDMX.

Support for UtilityData type

Hi, I got error and warning running the following. Could you please take a look? Thanks a lot!

library("rsdmx")
myUrl <- "http://markets.newyorkfed.org/api/pd/get/SBN2013/timeseries/PDPOSTIPS-G11.sdmx.xml"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset) 

Error messages:

Error in validObject(.Object) : invalid class “SDMXType” object: FALSE
In addition: Warning message:
In validityMethod(object) : Unknown SDMXType UtilityDataType

Investigate readSDMX with https SDMX web-resources

There is an issue with applying readSDMX on https SDMX web-resources.
Example from ECB:

dataURL <- "https://sdw-wsrest.ecb.europa.eu/service/data/DD/M.SE.BSI_STF.RO.4F_N"
sdmx <- readSDMX(dataURL)

with the following error:

<SSL_CACERT in function (type, msg, asError = TRUE) { if (!is.character(type)) { i = match(type, CURLcodeValues) typeName = if (is.na(i)) character() else names(CURLcodeValues)[i] } typeName = gsub("^CURLE_", "", typeName) fun = (if (asError) stop else warning) fun(structure(list(message = msg, call = sys.call()), class = c(typeName, "GenericCurlError", "error", "condition")))}(60L, "SSL certificate problem, verify that the CA cert is OK. Details:\nerror:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed", TRUE): SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed>

User interface to codelists for data selection

To retrieve data from SDMX web services (e.g. Eurostat REST, FAO REST etc) it would be helpful to have some user interface to create the appropriate url to request as resource. At a basic level this could be a function that parses a DSD into a list of Codelists and an overview of the datastructure. This would allow the user to create the appropriate url for a data request.

At a very sophisticated level this function could fire up a dialog that allows a menu based selection. Right now this is certainly beyond the manpower/scope of the package, especially given that there is no funding for development yet. Nevertheless it might be reasonable to have that kind of vision to guide the development of the general architecture of the interface.

If this feature should be part of rsdmx depends on the generality of DSDs across different instituions. If the DSD structure is significantly different at Eurostat than for OECD it would make more sense to create further packages specific for each institution that require rsdmx for general parsing. Similar to the faostat package available these packages could also include convenience functions to create something likd contry factsheets or similar.

Add as.data.frame.SDMXDataSet to Namespace?

Shouldn't the function as.data.frame.SDMXDataSet be added to the NAMESPACE under export so that the function definition can be invoked by typing the function name. This can still be done by rsdmx:::as.data.frame.SDMXDataSet which is somewhat clumsy. Or is there a general rule when not to export functions? I checked around some other packages and found that this is not handled consistently.

Support for codelists

This include the creation of S4 models for SDMX Code, Codelist, and Codelists types + a method implementing the generic as.data.frame applied to codelists.

rsdmx does not work with ECB website

Dear rsdmx developpers,

Thank you for your work on rsdmx, it looks very promising. However, I was not able to use data from ECB as dataframes. You will find below an example and the error message:

library("rsdmx")
url<-"http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=118.DD.M.SE.BSI_STF.RO.4F_N&type=sdmx"
sdmx<-readSDMX(url)
df<-as.data.frame(sdmx)
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Erreur dans xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces, :
error evaluating xpath expression //ns:Series

Maybe I did not use rsdmx correctly, so do not hesitate to correct me. Otherwise, it would be great if you could provide a way to use ECB data.

Thanks,
Bertrand

Support for data provider-oriented readers

As you know rsdmx strategy is to provide at now a low-level functionality to read SDMX data and metadata, focusing first on the data format instead of SDMX web-service architectures. This has several benefits which mostly deal with flexibility. Some of them are that:

  • we can read SDMX local files (shared by colleagues, or dowloaded after some manual search and discovery in the web)
  • we don't depend on any hardcoded web-services endpoint to retrieve data.

However, when reading SDMX data from the web, it would be particularly practical for user to have some utility functions that avoid to copy the entire data request URL. These functions would give more usability by building the requests and call the readSDMX function.

Any feedback from users would be very welcome. Ideas or contributions are also welcome!

Distinguish classes for generic vs. compact data

At now both generic and compact data are handled in a single SDMXDataSet class. This will be splitted into 2 different classes SDMXGenericData and SDMXCompactData, to align on the SDMX spec message types. Later a superclass SDMXData might be modeled if required.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.