OpenSDMX
opensdmx / rsdmx Goto Github PK
View Code? Open in Web Editor NEWTools for reading SDMX data and metadata in R
Home Page: https://github.com/opensdmx/rsdmx/wiki
Tools for reading SDMX data and metadata in R
Home Page: https://github.com/opensdmx/rsdmx/wiki
Investigation is required to improve performance of as.data.frame
methods applied to data, especially GenericData
format. Good test cases include big SDMX datasets from Canada Statistics. This ticket will report about the gain of performance.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
type.SDMXType method should be aligned with the SDMX format types.
e.g. GenericData, CompactData, MessageGroup etc
If the SDMX web request fails, getURL
returns a html content, and the expected xml response. For most of the errors, it occured with OECD data.
National Bank of Belgium is migrating towards SDMX (in the spirit of OECD) and I'm trying to fetch data from their database. Here are two sample links:
library(rsdmx)
url <- "http://stat.nbb.be/RestSDMX/sdmx.ashx/GetData/NICP2013/99_9_9_9_99+HEALTH+XEFUN0+1_00_0_0_0_00+2_00_0_0_0_00+3_00_0_0_0_00.M/all?startTime=2014-07&endTime=2014-08"
url1 <- "http://stat.nbb.be/RestSDMX/sdmx.ashx/GetData/NICP2013/99_9_9_9_99+HEALTH+XEFUN0+2_00_0_0_0_00+3_00_0_0_0_00.M/all?startTime=2014-07&endTime=2014-08"
Notice that the only difference is +1_00_0_0_0_00
available in the first url
. While readSDMX(url1)
works fine, readSDMX(url)
outputs a parsing error:
Opening and ending tag mismatch: br line 1 and body
Opening and ending tag mismatch: br line 1 and html
Premature end of data in tag body line 1
Premature end of data in tag html line 1
<XMLParserErrorList: 1: Opening and ending tag mismatch: br line 1 and body
2: Opening and ending tag mismatch: br line 1 and html
3: Premature end of data in tag body line 1
4: Premature end of data in tag html line 1
This issue seems to be rather recurrent with NBB database. Long links don't seem to parse but if you cut them up, that seems to do the job. I can't discern any pattern for which ones work and which don't. However, note that if you export an XML file from the web application and read it with rsdmx
locally, it works alright.
R CMD check rsdmx
is not successfull anymore, because of the change to nchar()
in the R-devel
. This issue needs to be fixed ASAP to make rsdmx
R-devel compatible. Replacing nchar(x)
by nchar(x, "w")
should be enough and backward compatible.
A specific S4 model will be set-up for SDMX MessageGroup
, that can handle. This is needs to be addressed after #25, in order to make rsdmx working on OECD data.
rsdmx
currently supports SDMX 2.0 only. Being able to parse SDMX 2.1 would be needed to connect to the REST service at Eurostat as it is only available for SDMX2.1. This raises several questions:
How do you feel about this?
All existing codes should be enriched with unit tests. New codes will need to be implemented with unit tests.
As suggested by @Tungurahua, httpheader
could be added in the getURL
call from RCurl
.
Before moving forward, we should see exactly how/where to use (I'm wondering if we might use it by default, instead of creating a new argument for readSDMX
)
Some series
might not have observations
, which introduces the notion of absence data
. This should be supported for both SDMX GenericData
and CompactData
parsers.
When SDMX-ML documents become large, ending up with 2 copies of the data in memory at once, i.e the tree and the R datastructure (resulting from as.data.frame
in rsdmx
), could lead to memory issues.
In this case, event-driven or SAX (Simple API for XML) style parser could be investigated, not as replacement but as complementary approach to read SDMX-ML files. With the XML
package, event-driven parsing is possible using xmlEventParser
.
Under this activity, an enhancement of readSDMX
should allow reading directly from URL (for SDMX web-services that do not require an explicit user-agent). Combined with SAX approach, it intends to provide a powerful SDMX reader that parse SDMX data from the web and minimizes the R in-memory not storing the SDMX-ML XML content, ideal for processing R web-services using rsdmx
.
This activity is listed as priority enhancement for which rsdmx seeks for fundings. See https://github.com/opensdmx/rsdmx/wiki#8-fundings
.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
Some code used to get namespaces for parsing SDMX-ML documents will be make reusable through separate functions.
As highlighted on the mailing list, there is a need to have a generic coercing of dataset obsTime into date format.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
Parsing codelists can lead to some encoding issues, especially when code are labelized in different languages. This needs to be investigated for the type of resource (local
vs. remote
) and later the method (xPath
vs. SAX
) for which the encoding is handled differently.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
Here's an XML
query link constructed by following the guidelines of ECB:
Alternatively, one can use this query
Both queries work fine to get the data. However, as.data.frame
method correctly works only in the second case. In the first case, each observation is repeated so that first dataframe has twice the number of rows of the second one. Indeed, during conversion R
spits a bunch of warnings, e.g.
1: In
[<-.data.frame
(*tmp*
, , i, value = list("A", NA)) :
provided 2 variables to replace 1 variables
On the other hand,
works just fine. Just comparing both XML
files suggests that there may be an issue if dealing with multiple attributes per each observation.
To assess if the R documentation could be oxygenized using Roxygen
or Roxygen2
, where documentation will be added as comments within the R files, and Rd files generated automatically.
An error occurs, when trying to apply type.SDMXType
function with documents that do not have namespace as tag prefix.
I create this ticket to discuss about adding the capacity to add a writeSDMX
in the package.
SDMX-ML remains a statistical data exchange format. This would make sense that people using R to process and analyze data could have the possibility to export their data analysis results into SDMX-ML format, to share they result in standard way. Any suggestion from users is very welcome!
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
This ticket will add new classes SDMXConcept
, SDMXConceptScheme
, SDMXConcepts
and related methods. The class SDMXConcepts
later will be used to feed a SDMX DSD object as indicated in #22, and will come with a method implementing the generic as.data.frame
Hello,
I have seen your past example with FAO SDMX data. Is the package ready to work with other websites, like the OECD, which seems to have a wonderful but undocumented SDMX API?
This issue creates a reference number for the creation of a DSD parser which can be used for naming an issue branch for the topic.
I'd sugggest to make this a function that creates a list of dataframes for every codelist contained in the DSD file, wherein each dataframe has a code
and description
column.
At a later iteration this object can be assigned to a slot of an S4
object representing the DSD (could be named SDMXdsd, inheriting from the SDMX class) which in itself could then be used to create a class that compounds the header information, the codelists and the actual data as dataframe. However, these thoughts should probably be detailed in another issue.
Take the following valid query:
https://sdw-wsrest.ecb.europa.eu/service/data/QSA/Q.N.ES.W0..S1.N.N.B9F.F._Z._Z.XDC._T.S.V.N._T?startPeriod=2014-01-01
When running:
library('rsdmx')
url<-'https://sdw-wsrest.ecb.europa.eu/service/data/QSA/Q.N.ES.W0..S1.N.N.B9F.F._Z._Z.XDC._T.S.V.N._T?startPeriod=2014-01-01'
sdmx<-readSDMX(url)
df <- as.data.frame(sdmx)
The following error is raised
Error in structure(attrsValues, .Names = serieAttrsNames) :
'names' attribute [5] must be the same length as the vector [4]
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Need to add reading support for SDMX CompactData
Take this query for data at Eurostat:
It returns a message with code 413
indicating that the query is too large and provides a link to a zip
file which contains the desirable XML
file. In this case, it's of size 4.7 Mb
.
It could be useful to get warned whenever this is the case and possibly provide the link to the zip
file in the console output. As of now, rsdmx
parses the message correctly, though user is not warned. Only when you get NULL
dataframe, you start investigating the problem.
I just found this package and gave it a try by changing one of the examples to download data from federal reserve. However, I got an error and not sure whether this is due to the data or the package. Could you please take a quick look? Thanks!
myUrl <- "http://www.federalreserve.gov/datadownload/Output.aspx?rel=G17&series=13577f97794ad0f0b6b3e593c7cad8a1&lastObs=&from=01/01/2006&to=12/31/2014&filetype=sdmx&label=include&layout=seriescolumn"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset)
Error in sender$name[[xmlGetAttr(x, "xml:lang")]] <- xmlValue(x) :
wrong args for environment subassignment
At now, only seriesKey are appended to the dataset data.frame. Additional series attributes should be included.
Model the SDMX-ML HeaderType with a S4 class
Testing rsdmx
with Canada Statistics. Examples are provided here.
Theses tests will aim to provide support to a request sent on the rsdmx mailing list, and identify/fix potential issues in the code.
Note: The case of Canada Statistics represents a useful use case for rsdmx, as it shows that not all data providers necessarily handle an SDMX web-service API, and that many SDMX resources may come as downloaded files, hence the added value of rsdmx to enable reading SDMX local
files.
Eurostat data is marked up with what they call "flags" which are attributes for each data item (numerical values). They indicate if example is estimated (flag e), provisional (flag p) etc. In the SDMX files they are coded as value attributes. An example has been included in the manual or readSDMX
in #16 (p flags for all values).
A great feature would be if these flags could be made available for plotting (e.g. marking flagged values etc.)
I think it will be difficult implementing this in a standard dataframe as there might be more than one value variables. The returned data would probably need to be more complex (S3, S4). This would also allow to include some Meta information (last update etc.) and make it also available for plotting.
Do you have any thoughts/remarks on this?
A S4 class will be added for the abstract SDMX Structure
with method to get the appropriate type. Later, this type intends to be activate as valid SDMXType this will come with a first structure subtype parser.
I believe ECB has implemented changes to the way they serve data with SDMX-ML
documents in the last few days, at least their documentation page is now visually different. For example, try this link for obtaining DSD data:
It's a valid query. I can't be sure but I believe readSDMX()
function parsed the above link without problems before. It seems like ECB implemented additional elements as readSDMX()
fails currently with OrganisationSchemes
. One also has mes
and str
abbreviated instead of message
and structure
. I ain't very familiar with XML
document structure, so I am just wondering whether they're still following the full standard. I don't see any changes for parsing raw data since, for example,
still returns the usual stuff and is correctly read with readSDMX
.
Hi, I got error and warning running the following. Could you please take a look? Thanks a lot!
library("rsdmx")
myUrl <- "http://markets.newyorkfed.org/api/pd/get/SBN2013/timeseries/PDPOSTIPS-G11.sdmx.xml"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset)
Error messages:
Error in validObject(.Object) : invalid class “SDMXType” object: FALSE
In addition: Warning message:
In validityMethod(object) : Unknown SDMXType UtilityDataType
There is an issue with applying readSDMX
on https SDMX web-resources.
Example from ECB:
dataURL <- "https://sdw-wsrest.ecb.europa.eu/service/data/DD/M.SE.BSI_STF.RO.4F_N"
sdmx <- readSDMX(dataURL)
with the following error:
<SSL_CACERT in function (type, msg, asError = TRUE) { if (!is.character(type)) { i = match(type, CURLcodeValues) typeName = if (is.na(i)) character() else names(CURLcodeValues)[i] } typeName = gsub("^CURLE_", "", typeName) fun = (if (asError) stop else warning) fun(structure(list(message = msg, call = sys.call()), class = c(typeName, "GenericCurlError", "error", "condition")))}(60L, "SSL certificate problem, verify that the CA cert is OK. Details:\nerror:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed", TRUE): SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed>
To retrieve data from SDMX web services (e.g. Eurostat REST, FAO REST etc) it would be helpful to have some user interface to create the appropriate url to request as resource. At a basic level this could be a function that parses a DSD into a list of Codelists and an overview of the datastructure. This would allow the user to create the appropriate url for a data request.
At a very sophisticated level this function could fire up a dialog that allows a menu based selection. Right now this is certainly beyond the manpower/scope of the package, especially given that there is no funding for development yet. Nevertheless it might be reasonable to have that kind of vision to guide the development of the general architecture of the interface.
If this feature should be part of rsdmx
depends on the generality of DSDs across different instituions. If the DSD structure is significantly different at Eurostat than for OECD it would make more sense to create further packages specific for each institution that require rsdmx
for general parsing. Similar to the faostat
package available these packages could also include convenience functions to create something likd contry factsheets or similar.
Shouldn't the function as.data.frame.SDMXDataSet
be added to the NAMESPACE under export
so that the function definition can be invoked by typing the function name. This can still be done by rsdmx:::as.data.frame.SDMXDataSet
which is somewhat clumsy. Or is there a general rule when not to export functions? I checked around some other packages and found that this is not handled consistently.
Reading datasets with series having a single attribute seems to cause problems with the following error:
Error in
rownames<-(x, value) : attempt to set 'rownames' on an object with no dimensions
To be fixed ASAP.
This include the creation of S4 models for SDMX Code, Codelist, and Codelists types + a method implementing the generic as.data.frame
applied to codelists.
when multiple SDMX namespaces are specified in addition to the "SDMX message schemaLocation", the method VERSION.SDMXSchema doesn't work.
SDMXDataSet as.data.frame function does not work with data having no NS prefix.
Dear rsdmx developpers,
Thank you for your work on rsdmx, it looks very promising. However, I was not able to use data from ECB as dataframes. You will find below an example and the error message:
library("rsdmx")
url<-"http://sdw.ecb.europa.eu/quickviewexport.do?SERIES_KEY=118.DD.M.SE.BSI_STF.RO.4F_N&type=sdmx"
sdmx<-readSDMX(url)
df<-as.data.frame(sdmx)
XPath error : Undefined namespace prefix
XPath error : Invalid expression
Erreur dans xpathApply.XMLInternalDocument(doc, path, fun, ..., namespaces = namespaces, :
error evaluating xpath expression //ns:Series
Maybe I did not use rsdmx correctly, so do not hesitate to correct me. Otherwise, it would be great if you could provide a way to use ECB data.
Thanks,
Bertrand
As you know rsdmx
strategy is to provide at now a low-level functionality to read SDMX data and metadata, focusing first on the data format instead of SDMX web-service architectures. This has several benefits which mostly deal with flexibility
. Some of them are that:
However, when reading SDMX data from the web, it would be particularly practical for user to have some utility functions that avoid to copy the entire data request URL. These functions would give more usability
by building the requests and call the readSDMX
function.
Any feedback from users would be very welcome. Ideas or contributions are also welcome!
At now both generic
and compact
data are handled in a single SDMXDataSet
class. This will be splitted into 2 different classes SDMXGenericData
and SDMXCompactData
, to align on the SDMX spec message types. Later a superclass SDMXData might be modeled if required.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.