Giter Site home page Giter Site logo

Comments (10)

eblondel avatar eblondel commented on June 15, 2024 2

@ghawkins-ott No need to apology, what you need for code labels instead of values, is supported by rsdmx in a very easy way by associating the corresponding DSD (data structure definition) to the dataset, but in case of SDMX files downloaded manually (without a proper SDMX web-service) which is the case of Canada Statistics, there is one line of code to write to associate the DSD to the dataset, using the function setDSD. See below the code that should save some time on your side:

require(rsdmx)

#read DSD
dsd <- readSDMX("Structure_99-010-X2011027.xml", isURL=FALSE)

#read dataset
data <- readSDMX("Generic_99-010-X2011027.xml", isURL = FALSE)

 #associate the DSD to the dataset
data <- setDSD(data, dsd)

#because you associated the DSD, you can now apply labels = TRUE
df <- as.data.frame(data, labels = TRUE) 

Hope this helps

from rsdmx.

eblondel avatar eblondel commented on June 15, 2024

The Canada Statistics portal provides a download facility in SDMX-ML format. This download allows to save a zip file containing two XML files, one representing the GenericData set, the other giving the DataStructure.
Two (minor) issues were identified with the current code:

  • Datasets are correctly converted into R sdmx objects, but an error is raised when using the as.data.frame method to convert the dataset into data.frame (very minor bug, to be fixed ASAP)
  • An error is raised when trying to convert the DataStructure xml file in R sdmx object

from rsdmx.

nordicgnome avatar nordicgnome commented on June 15, 2024

Hello Emmanuel;

I installed the package with update.packages("rsdmx")
I ran the code that you suggested:
library((rsdmx)
setwd(location of file)
sdmx <- readSDMX("GenericAbPop.xml", isURL = FALSE)
I did this the first time from within RStudio and it completely locked up the system. I thought this was an anomaly because I was running an emerge -auv @world, (Gentoo HP DV6 Quadcore laptop) at the same time, thought that possibly I had over loaded the system. So, I ran it again in RStudio without a mess of other processes and the system locked up again.

So, I rebooted and ran from the command line without any windowing running (normally run KDE 4.14.3) and left it overnight. Came back this morning and it had returned to the command prompt and output the error message "Killed". How do I turn on more comprehensive error messaging?
GenericAbPop.xml is 6.8GB and StructureAbPop.xml is 123.6KiB

Jan

from rsdmx.

eblondel avatar eblondel commented on June 15, 2024

Hello, about the bugs i've highlighted above, i've solved it (it was a minor bug), but still needs to commit it to the code repository (it's still voluntary basis on my side, so i need to do it on after work). With this, reading the data as data.frame will be operational.
Once it is ok, i will share an example.

Afterwhat i will closely look the issue of datastructure.

This being said, datasets provided by Canada Statistics are big files. It logically requires lot of time to parse the document (while there is still matter to improve performance), but especially requires memory. On this aspect, rsdmx currently relies on xPath to read the xml file, which means that the XML tree is loaded in R, the double then once you transform to data.frame.
I've investigated an important enhancement here where rsdmx would support SAX parser, where the XML would not be loaded into R, but still maintaining the object-oriented rsdmx model, mapped to the SDMX standard model.
Having this SAX method would be especially required for reading big datasets (avoid issues of memory). This enhancement is in preparation, but requires sponsoring / funding given the amount of work. See #36

By the way, i will also test against huge datasets.

from rsdmx.

eblondel avatar eblondel commented on June 15, 2024

@nordicgnome I've pushed the first bug fix (dealing with the dataset)
For testing, you will need to install rsdmx from CRAN. Please follow the indications in the wiki.

The sample code is as follows:

require(rsdmx)
sdmx <- readSDMX("myfile.xml", isURL = FALSE)
sdmx.df <- as.data.frame(sdmx)

I've tested it on a smaller dataset (a file of ~ 50mb), it works but it takes about 20min, for a dataset of more than 127,000 records. I will issue a separate ticket to investigate gaining in performance (processing time). Canada Statistics datasets will be a good test case.

Once i have some more few time, i look into the 2d fix. Anyway, your feedback is welcome.

from rsdmx.

eblondel avatar eblondel commented on June 15, 2024

@nordicgnome the 2d minor bug has been fixed. DataStructuresDefinition files from Canada Statistics are now properly read in R. For the example, you can follow the one provided in the wiki, with the exception that you will need to use isURL = FALSE in readSDMX.

Note that following these fixs, i've opened 2 tickets that i will investigate further, one dealing with codelist content &encoding (see #48 ) and as mentioned above, one about as.data.frame performance (see #49).

Your feedback is welcome,

from rsdmx.

ghawkins-ott avatar ghawkins-ott commented on June 15, 2024

I'm having an issue with DataStructures - i'm not sure what is going on. Using the most current version of rsdmx (0.5-10) and I can't read the StatsCan structure data.

I'm downloading this file: http://www12.statcan.gc.ca/nhs-enm/2011/dp-pd/dt-td/OpenDataDownload.cfm?PID=105470

The dropping it into my RStudio Server. Following the instructions on the wiki (eg: sdmx <- readSDMX(sdmx_files[2], isURL = FALSE)). Then trying to read that into a data.frame and getting the following error:

Error in as.data.frame.default(sdmx) :
cannot coerce class "structure("SDMXDataStructureDefinition", package = "rsdmx")" to a data.frame

Thoughts? I can read the data file but I don't get any of the codes mapped in that case.

from rsdmx.

eblondel avatar eblondel commented on June 15, 2024

@ghawkins-ott a SDMX DataStructureDefinition can't be read as data.frame because it's a complex object (meaning it includes several subparts that can them be individually read as data.frames such as codelists and concepts).

To extract codelists and concepts from the DSD, and read them as data.frame you can look at DSD example in https://github.com/opensdmx/rsdmx/wiki#sdmx-datastructuredefinition-dsd

from rsdmx.

ghawkins-ott avatar ghawkins-ott commented on June 15, 2024

@eblondel Thank you! I see the codelists now. I am still struggling with the concept of how to apply them to the data file. For example, I'd like a data frame that would display the code value, (eg: "Female" instead of "2" in the Sex column)...

Sorry, I'm fairly new to this.

from rsdmx.

ghawkins-ott avatar ghawkins-ott commented on June 15, 2024

@eblondel Perfect, thank you so much!

from rsdmx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.