opensdmx / rsdmx Goto Github PK

View Code? Open in Web Editor NEW

96.0 28.0 30.0 1.75 MB

Tools for reading SDMX data and metadata in R

Home Page: https://github.com/opensdmx/rsdmx/wiki

R 100.00%

r sdmx read readsdmx api web-services sdmx-provider sdmx-format sdmx-standards dsd

rsdmx's Introduction

opensdmx

OpenSDMX

rsdmx's People

Contributors

Stargazers

Watchers

rsdmx's Issues

improve CompactData as.data.frame method

Investigate enhancement of performance for as.data.frame - data

Investigation is required to improve performance of as.data.frame methods applied to data, especially GenericData format. Good test cases include big SDMX datasets from Canada Statistics. This ticket will report about the gain of performance.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Make SDMXType aligned with SDMX format types

type.SDMXType method should be aligned with the SDMX format types.
e.g. GenericData, CompactData, MessageGroup etc

Control xml validity of getURL result

If the SDMX web request fails, getURL returns a html content, and the expected xml response. For most of the errors, it occured with OECD data.

Add R documentation for existing codes

Parsing issues with Belgian data

National Bank of Belgium is migrating towards SDMX (in the spirit of OECD) and I'm trying to fetch data from their database. Here are two sample links:

library(rsdmx)

url <- "http://stat.nbb.be/RestSDMX/sdmx.ashx/GetData/NICP2013/99_9_9_9_99+HEALTH+XEFUN0+1_00_0_0_0_00+2_00_0_0_0_00+3_00_0_0_0_00.M/all?startTime=2014-07&endTime=2014-08"
url1 <- "http://stat.nbb.be/RestSDMX/sdmx.ashx/GetData/NICP2013/99_9_9_9_99+HEALTH+XEFUN0+2_00_0_0_0_00+3_00_0_0_0_00.M/all?startTime=2014-07&endTime=2014-08"

Notice that the only difference is +1_00_0_0_0_00 available in the first url. While readSDMX(url1) works fine, readSDMX(url) outputs a parsing error:

Opening and ending tag mismatch: br line 1 and body
Opening and ending tag mismatch: br line 1 and html
Premature end of data in tag body line 1
Premature end of data in tag html line 1
<XMLParserErrorList: 1: Opening and ending tag mismatch: br line 1 and body
2: Opening and ending tag mismatch: br line 1 and html
3: Premature end of data in tag body line 1
4: Premature end of data in tag html line 1

This issue seems to be rather recurrent with NBB database. Long links don't seem to parse but if you cut them up, that seems to do the job. I can't discern any pattern for which ones work and which don't. However, note that if you export an XML file from the web application and read it with rsdmx locally, it works alright.

Issue with nchar - Errors with new R-devel

R CMD check rsdmx is not successfull anymore, because of the change to nchar() in the R-devel. This issue needs to be fixed ASAP to make rsdmx R-devel compatible. Replacing nchar(x) by nchar(x, "w") should be enough and backward compatible.

Create specific S4 class for MessageGroup

A specific S4 model will be set-up for SDMX MessageGroup, that can handle. This is needs to be addressed after #25, in order to make rsdmx working on OECD data.

How to implement SDMX 2.1

rsdmx currently supports SDMX 2.0 only. Being able to parse SDMX 2.1 would be needed to connect to the REST service at Eurostat as it is only available for SDMX2.1. This raises several questions:

are there any reasons not to implement SDMX 2.1
is it sensible to have both versions of the standard (and possible future updates) available in the package
How could parsing SDMX2.1 be made possible without changing the existing interface.

How do you feel about this?

Set-up a R Unit testing approach & apply it to existing codes

All existing codes should be enriched with unit tests. New codes will need to be implemented with unit tests.

Enable readSDMX to read SDMX urls directly (with user-agent)

As suggested by @Tungurahua, httpheader could be added in the getURL call from RCurl.

Before moving forward, we should see exactly how/where to use (I'm wondering if we might use it by default, instead of creating a new argument for readSDMX)

SDMX dataset - absence data support

Some series might not have observations, which introduces the notion of absence data. This should be supported for both SDMX GenericData and CompactData parsers.

Enable SDMX-ML SAX (Simple API for XML) parsing method

When SDMX-ML documents become large, ending up with 2 copies of the data in memory at once, i.e the tree and the R datastructure (resulting from as.data.frame in rsdmx), could lead to memory issues.
In this case, event-driven or SAX (Simple API for XML) style parser could be investigated, not as replacement but as complementary approach to read SDMX-ML files. With the XML package, event-driven parsing is possible using xmlEventParser.

Under this activity, an enhancement of readSDMX should allow reading directly from URL (for SDMX web-services that do not require an explicit user-agent). Combined with SAX approach, it intends to provide a powerful SDMX reader that parse SDMX data from the web and minimizes the R in-memory not storing the SDMX-ML XML content, ideal for processing R web-services using rsdmx.

This activity is listed as priority enhancement for which rsdmx seeks for fundings. See https://github.com/opensdmx/rsdmx/wiki#8-fundings
.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Support for key families / data structure

Make reusable code to get namespaces

Some code used to get namespaces for parsing SDMX-ML documents will be make reusable through separate functions.

Improve xml namespaces resolving

Investigate generic coercing of obsTime into date format

As highlighted on the mailing list, there is a need to have a generic coercing of dataset obsTime into date format.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Investigate codelist encoding issues

Parsing codelists can lead to some encoding issues, especially when code are labelized in different languages. This needs to be investigated for the type of resource (local vs. remote) and later the method (xPath vs. SAX) for which the encoding is handled differently.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

as.data.frame method fails with multiple attributes per observation

Here's an XML query link constructed by following the guidelines of ECB:

http://sdw-wsrest.ecb.europa.eu/service/data/FM/M.U2.EUR.DS.EI.DJEURST.HSTA

Alternatively, one can use this query

http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=FM.M.U2.EUR.DS.EI.DJEURST.HSTA&type=sdmx

Both queries work fine to get the data. However, as.data.frame method correctly works only in the second case. In the first case, each observation is repeated so that first dataframe has twice the number of rows of the second one. Indeed, during conversion R spits a bunch of warnings, e.g.

1: In [<-.data.frame(*tmp*, , i, value = list("A", NA)) :
provided 2 variables to replace 1 variables

On the other hand,

http://sdw-wsrest.ecb.europa.eu/service/data/EXR/M.USD.EUR.SP00.A

works just fine. Just comparing both XML files suggests that there may be an issue if dealing with multiple attributes per each observation.

Oxygenize R documentation

To assess if the R documentation could be oxygenized using Roxygen or Roxygen2, where documentation will be added as comments within the R files, and Rd files generated automatically.

Missing control in type.SDMXType

An error occurs, when trying to apply type.SDMXType function with documents that do not have namespace as tag prefix.

Support for SDMX-ML writer

I create this ticket to discuss about adding the capacity to add a writeSDMX in the package.

SDMX-ML remains a statistical data exchange format. This would make sense that people using R to process and analyze data could have the possibility to export their data analysis results into SDMX-ML format, to share they result in standard way. Any suggestion from users is very welcome!

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Submit rsdmx 0.3 to CRAN

Missing control in version.SDMXSchema

Enable SDMX 1.0 and minor tests

Support for concepts / concept schemes

This ticket will add new classes SDMXConcept, SDMXConceptScheme, SDMXConcepts and related methods. The class SDMXConcepts later will be used to feed a SDMX DSD object as indicated in #22, and will come with a method implementing the generic as.data.frame

OECD example?

Hello,

I have seen your past example with FAO SDMX data. Is the package ready to work with other websites, like the OECD, which seems to have a wonderful but undocumented SDMX API?

Support for DataStructureDefinition (DSD)

This issue creates a reference number for the creation of a DSD parser which can be used for naming an issue branch for the topic.
I'd sugggest to make this a function that creates a list of dataframes for every codelist contained in the DSD file, wherein each dataframe has a code and description column.

At a later iteration this object can be assigned to a slot of an S4 object representing the DSD (could be named SDMXdsd, inheriting from the SDMX class) which in itself could then be used to create a class that compounds the header information, the codelists and the actual data as dataframe. However, these thoughts should probably be detailed in another issue.

Issue in parsing serie & obs attributes

Take the following valid query:
https://sdw-wsrest.ecb.europa.eu/service/data/QSA/Q.N.ES.W0..S1.N.N.B9F.F._Z._Z.XDC._T.S.V.N._T?startPeriod=2014-01-01

When running:

library('rsdmx')
url<-'https://sdw-wsrest.ecb.europa.eu/service/data/QSA/Q.N.ES.W0..S1.N.N.B9F.F._Z._Z.XDC._T.S.V.N._T?startPeriod=2014-01-01'
sdmx<-readSDMX(url)
df <- as.data.frame(sdmx)

The following error is raised

Error in structure(attrsValues, .Names = serieAttrsNames) : 
  'names' attribute [5] must be the same length as the vector [4]
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Support for SDMX CompactData

Need to add reading support for SDMX CompactData

Warning when querying large datasets at Eurostat

Take this query for data at Eurostat:

http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/ei_bsbu_m/..SA..M/?startPeriod=2000-01&endPeriod=2015-03

It returns a message with code 413 indicating that the query is too large and provides a link to a zip file which contains the desirable XML file. In this case, it's of size 4.7 Mb.

It could be useful to get warned whenever this is the case and possibly provide the link to the zip file in the console output. As of now, rsdmx parses the message correctly, though user is not warned. Only when you get NULL dataframe, you start investigating the problem.

Improve SDMX version 1.0 support

I just found this package and gave it a try by changing one of the examples to download data from federal reserve. However, I got an error and not sure whether this is due to the data or the package. Could you please take a quick look? Thanks!

myUrl <- "http://www.federalreserve.gov/datadownload/Output.aspx?rel=G17&series=13577f97794ad0f0b6b3e593c7cad8a1&lastObs=&from=01/01/2006&to=12/31/2014&filetype=sdmx&label=include&layout=seriescolumn"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset)

Error in sender$name[[xmlGetAttr(x, "xml:lang")]] <- xmlValue(x) :
wrong args for environment subassignment

Include serie attributes

At now, only seriesKey are appended to the dataset data.frame. Additional series attributes should be included.

S4 class - HeaderType

Model the SDMX-ML HeaderType with a S4 class

Testing rsdmx with Canada Statistics

Testing rsdmx with Canada Statistics. Examples are provided here.

Theses tests will aim to provide support to a request sent on the rsdmx mailing list, and identify/fix potential issues in the code.

Note: The case of Canada Statistics represents a useful use case for rsdmx, as it shows that not all data providers necessarily handle an SDMX web-service API, and that many SDMX resources may come as downloaded files, hence the added value of rsdmx to enable reading SDMX local files.

Include dataset Observation attributes

Eurostat data is marked up with what they call "flags" which are attributes for each data item (numerical values). They indicate if example is estimated (flag e), provisional (flag p) etc. In the SDMX files they are coded as value attributes. An example has been included in the manual or readSDMX in #16 (p flags for all values).

A great feature would be if these flags could be made available for plotting (e.g. marking flagged values etc.)

I think it will be difficult implementing this in a standard dataframe as there might be more than one value variables. The returned data would probably need to be more complex (S3, S4). This would also allow to include some Meta information (last update etc.) and make it also available for plotting.

Do you have any thoughts/remarks on this?

Add S4 class for structure type & methods

A S4 class will be added for the abstract SDMX Structure with method to get the appropriate type. Later, this type intends to be activate as valid SDMXType this will come with a first structure subtype parser.

Support for SDMX 2.1 DataStructureDefinition (DSD)

I believe ECB has implemented changes to the way they serve data with SDMX-ML documents in the last few days, at least their documentation page is now visually different. For example, try this link for obtaining DSD data:

http://sdw-wsrest.ecb.europa.eu/service/datastructure/ECB/ECB_EXR1/1.0?references=children

It's a valid query. I can't be sure but I believe readSDMX() function parsed the above link without problems before. It seems like ECB implemented additional elements as readSDMX() fails currently with OrganisationSchemes. One also has mes and str abbreviated instead of message and structure. I ain't very familiar with XML document structure, so I am just wondering whether they're still following the full standard. I don't see any changes for parsing raw data since, for example,

https://sdw-wsrest.ecb.europa.eu/service/data/EXR/M.USD.EUR.SP00.A

still returns the usual stuff and is correctly read with readSDMX.

Support for UtilityData type

Hi, I got error and warning running the following. Could you please take a look? Thanks a lot!

library("rsdmx")
myUrl <- "http://markets.newyorkfed.org/api/pd/get/SBN2013/timeseries/PDPOSTIPS-G11.sdmx.xml"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset)

Error messages:

Error in validObject(.Object) : invalid class “SDMXType” object: FALSE
In addition: Warning message:
In validityMethod(object) : Unknown SDMXType UtilityDataType

Investigate readSDMX with https SDMX web-resources

There is an issue with applying readSDMX on https SDMX web-resources.
Example from ECB:

dataURL <- "https://sdw-wsrest.ecb.europa.eu/service/data/DD/M.SE.BSI_STF.RO.4F_N"
sdmx <- readSDMX(dataURL)

with the following error:

<SSL_CACERT in function (type, msg, asError = TRUE) { if (!is.character(type)) { i = match(type, CURLcodeValues) typeName = if (is.na(i)) character() else names(CURLcodeValues)[i] } typeName = gsub("^CURLE_", "", typeName) fun = (if (asError) stop else warning) fun(structure(list(message = msg, call = sys.call()), class = c(typeName, "GenericCurlError", "error", "condition")))}(60L, "SSL certificate problem, verify that the CA cert is OK. Details:\nerror:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed", TRUE): SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed>

Improve performance of as.data.frame method for SDMX datasets

User interface to codelists for data selection

To retrieve data from SDMX web services (e.g. Eurostat REST, FAO REST etc) it would be helpful to have some user interface to create the appropriate url to request as resource. At a basic level this could be a function that parses a DSD into a list of Codelists and an overview of the datastructure. This would allow the user to create the appropriate url for a data request.

At a very sophisticated level this function could fire up a dialog that allows a menu based selection. Right now this is certainly beyond the manpower/scope of the package, especially given that there is no funding for development yet. Nevertheless it might be reasonable to have that kind of vision to guide the development of the general architecture of the interface.

If this feature should be part of rsdmx depends on the generality of DSDs across different instituions. If the DSD structure is significantly different at Eurostat than for OECD it would make more sense to create further packages specific for each institution that require rsdmx for general parsing. Similar to the faostat package available these packages could also include convenience functions to create something likd contry factsheets or similar.

Add as.data.frame.SDMXDataSet to Namespace?

Shouldn't the function as.data.frame.SDMXDataSet be added to the NAMESPACE under export so that the function definition can be invoked by typing the function name. This can still be done by rsdmx:::as.data.frame.SDMXDataSet which is somewhat clumsy. Or is there a general rule when not to export functions? I checked around some other packages and found that this is not handled consistently.

Error in as.data.frame method for series with a single attribute

Reading datasets with series having a single attribute seems to cause problems with the following error:
Error inrownames<-(x, value) : attempt to set 'rownames' on an object with no dimensions

Example of serie: http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/UN_DEN/AUT+BEL+CZE+DNK+EST+FIN+FRA+DEU+GRC+IRL+ITA+LUX+NLD+POL+PRT+SVK+SVN+ESP+SWE+GBR/OECD?startTime=1995&endTime=2012

To be fixed ASAP.

Support for codelists

This include the creation of S4 models for SDMX Code, Codelist, and Codelists types + a method implementing the generic as.data.frame applied to codelists.

Error with VERSION.SDMXSchema when multiple SDMX namespaces

when multiple SDMX namespaces are specified in addition to the "SDMX message schemaLocation", the method VERSION.SDMXSchema doesn't work.

SDMXDataSet as.data.frame function does not work with data having noNS prefix

SDMXDataSet as.data.frame function does not work with data having no NS prefix.

rsdmx does not work with ECB website

Dear rsdmx developpers,

Thank you for your work on rsdmx, it looks very promising. However, I was not able to use data from ECB as dataframes. You will find below an example and the error message:

library("rsdmx")
url<-"http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=118.DD.M.SE.BSI_STF.RO.4F_N&type=sdmx"
sdmx<-readSDMX(url)
df<-as.data.frame(sdmx)
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Erreur dans xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces, :
error evaluating xpath expression //ns:Series

Maybe I did not use rsdmx correctly, so do not hesitate to correct me. Otherwise, it would be great if you could provide a way to use ECB data.

Thanks,
Bertrand

Support for data provider-oriented readers

As you know rsdmx strategy is to provide at now a low-level functionality to read SDMX data and metadata, focusing first on the data format instead of SDMX web-service architectures. This has several benefits which mostly deal with flexibility. Some of them are that:

we can read SDMX local files (shared by colleagues, or dowloaded after some manual search and discovery in the web)
we don't depend on any hardcoded web-services endpoint to retrieve data.

However, when reading SDMX data from the web, it would be particularly practical for user to have some utility functions that avoid to copy the entire data request URL. These functions would give more usability by building the requests and call the readSDMX function.

Any feedback from users would be very welcome. Ideas or contributions are also welcome!

Distinguish classes for generic vs. compact data

At now both generic and compact data are handled in a single SDMXDataSet class. This will be splitted into 2 different classes SDMXGenericData and SDMXCompactData, to align on the SDMX spec message types. Later a superclass SDMXData might be modeled if required.

opensdmx / rsdmx Goto Github PK

rsdmx's Introduction

opensdmx

rsdmx's People

Contributors

Stargazers

Watchers

Forkers

rsdmx's Issues

Recommend Projects

Recommend Topics

Recommend Org