Comments (10)
@ghawkins-ott No need to apology, what you need for code labels instead of values, is supported by rsdmx in a very easy way by associating the corresponding DSD (data structure definition) to the dataset, but in case of SDMX files downloaded manually (without a proper SDMX web-service) which is the case of Canada Statistics, there is one line of code to write to associate the DSD to the dataset, using the function setDSD
. See below the code that should save some time on your side:
require(rsdmx)
#read DSD
dsd <- readSDMX("Structure_99-010-X2011027.xml", isURL=FALSE)
#read dataset
data <- readSDMX("Generic_99-010-X2011027.xml", isURL = FALSE)
#associate the DSD to the dataset
data <- setDSD(data, dsd)
#because you associated the DSD, you can now apply labels = TRUE
df <- as.data.frame(data, labels = TRUE)
Hope this helps
from rsdmx.
The Canada Statistics portal provides a download facility in SDMX-ML format. This download allows to save a zip file containing two XML
files, one representing the GenericData
set, the other giving the DataStructure
.
Two (minor) issues were identified with the current code:
- Datasets are correctly converted into R sdmx objects, but an error is raised when using the
as.data.frame
method to convert the dataset intodata.frame
(very minor bug, to be fixed ASAP) - An error is raised when trying to convert the
DataStructure
xml file in R sdmx object
from rsdmx.
Hello Emmanuel;
I installed the package with update.packages("rsdmx")
I ran the code that you suggested:
library((rsdmx)
setwd(location of file)
sdmx <- readSDMX("GenericAbPop.xml", isURL = FALSE)
I did this the first time from within RStudio and it completely locked up the system. I thought this was an anomaly because I was running an emerge -auv @world, (Gentoo HP DV6 Quadcore laptop) at the same time, thought that possibly I had over loaded the system. So, I ran it again in RStudio without a mess of other processes and the system locked up again.
So, I rebooted and ran from the command line without any windowing running (normally run KDE 4.14.3) and left it overnight. Came back this morning and it had returned to the command prompt and output the error message "Killed". How do I turn on more comprehensive error messaging?
GenericAbPop.xml is 6.8GB and StructureAbPop.xml is 123.6KiB
Jan
from rsdmx.
Hello, about the bugs i've highlighted above, i've solved it (it was a minor bug), but still needs to commit it to the code repository (it's still voluntary basis on my side, so i need to do it on after work). With this, reading the data as data.frame
will be operational.
Once it is ok, i will share an example.
Afterwhat i will closely look the issue of datastructure
.
This being said, datasets provided by Canada Statistics are big files. It logically requires lot of time to parse the document (while there is still matter to improve performance), but especially requires memory. On this aspect, rsdmx
currently relies on xPath
to read the xml file, which means that the XML tree is loaded in R, the double then once you transform to data.frame
.
I've investigated an important enhancement here where rsdmx would support SAX
parser, where the XML would not be loaded into R, but still maintaining the object-oriented rsdmx
model, mapped to the SDMX standard model.
Having this SAX method would be especially required for reading big datasets (avoid issues of memory). This enhancement is in preparation, but requires sponsoring / funding given the amount of work. See #36
By the way, i will also test against huge datasets.
from rsdmx.
@nordicgnome I've pushed the first bug fix (dealing with the dataset
)
For testing, you will need to install rsdmx
from CRAN. Please follow the indications in the wiki.
The sample code is as follows:
require(rsdmx)
sdmx <- readSDMX("myfile.xml", isURL = FALSE)
sdmx.df <- as.data.frame(sdmx)
I've tested it on a smaller dataset (a file of ~ 50mb), it works but it takes about 20min, for a dataset of more than 127,000 records. I will issue a separate ticket to investigate gaining in performance (processing time). Canada Statistics datasets will be a good test case.
Once i have some more few time, i look into the 2d fix. Anyway, your feedback is welcome.
from rsdmx.
@nordicgnome the 2d minor bug has been fixed. DataStructuresDefinition files from Canada Statistics are now properly read in R. For the example, you can follow the one provided in the wiki, with the exception that you will need to use isURL = FALSE
in readSDMX
.
Note that following these fixs, i've opened 2 tickets that i will investigate further, one dealing with codelist content &encoding
(see #48 ) and as mentioned above, one about as.data.frame
performance (see #49).
Your feedback is welcome,
from rsdmx.
I'm having an issue with DataStructures - i'm not sure what is going on. Using the most current version of rsdmx (0.5-10) and I can't read the StatsCan structure data.
I'm downloading this file: http://www12.statcan.gc.ca/nhs-enm/2011/dp-pd/dt-td/OpenDataDownload.cfm?PID=105470
The dropping it into my RStudio Server. Following the instructions on the wiki (eg: sdmx <- readSDMX(sdmx_files[2], isURL = FALSE)). Then trying to read that into a data.frame and getting the following error:
Error in as.data.frame.default(sdmx) :
cannot coerce class "structure("SDMXDataStructureDefinition", package = "rsdmx")" to a data.frame
Thoughts? I can read the data file but I don't get any of the codes mapped in that case.
from rsdmx.
@ghawkins-ott a SDMX DataStructureDefinition can't be read as data.frame because it's a complex object (meaning it includes several subparts that can them be individually read as data.frames such as codelists and concepts).
To extract codelists and concepts from the DSD, and read them as data.frame you can look at DSD example in https://github.com/opensdmx/rsdmx/wiki#sdmx-datastructuredefinition-dsd
from rsdmx.
@eblondel Thank you! I see the codelists now. I am still struggling with the concept of how to apply them to the data file. For example, I'd like a data frame that would display the code value, (eg: "Female" instead of "2" in the Sex column)...
Sorry, I'm fairly new to this.
from rsdmx.
@eblondel Perfect, thank you so much!
from rsdmx.
Related Issues (20)
- Reading BIS data with readSDMX HOT 4
- Consolidate Github CI builds
- Use https for OECD HOT 1
- readSDMX crash HOT 1
- readSDMX Bundesbank error HOT 1
- Content-Type application/xml causes Server errors for some providers
- ABS (Australia) sdmx provider is now on REST 2.1
- NBB (Belgium) service provider moved to https
- UKDS service provider seems to be SDMX-JSON restricted
- Add Bundesbank SDMX service provider
- change in Eurostat API: adjustment of rsdmx request builder and man page? HOT 5
- Missing control in trying to get embedded DSD from SDMX dataset
- Update Eurostat SDMX provider
- Add New EUROSTAT providers
- codelist 'method' gets all possible values of the attribute, including non-related to the data/table considered HOT 1
- R 4.3.0 -> Calling && or || -> sugnificant change HOT 3
- as.data.frame or as_tibble broken with R 4.3.x, using && with expression of length greater than 1 HOT 2
- Curl/OpenSSL Error HOT 7
- as.data.frame.SDMXConcepts error HOT 1
- Migrate from XML to xml2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rsdmx.