mountainmath / cmhc Goto Github PK
View Code? Open in Web Editor NEWWrapper for hack into CMHC data
License: Other
Wrapper for hack into CMHC data
License: Other
Eventually there should be a clear interface for selecting different data in different ways. It will take some experimenting and playing with the CMHC calls to figure out what the best way of structuring the interface.
It seems that the Halifax data is now missing. Perhaps the underlying structure has been changed?
cmhc::get_cmhc(survey = "Rms", series = "Vacancy Rate", dimension = "Bedroom Type", breakdown = "Historical Time Periods", geo_uid = 205, year = 2020)
and
cmhc::get_cmhc(survey = "Rms", series = "Vacancy Rate", dimension = "Bedroom Type", breakdown = "Survey Zones", geo_uid = 205)
Both give "No data available."
The given survey/series/dimension/breakdown all appear in list_cmhc_tables() and the SGC Code (205) is as given by get_cmhc_geography(level = "MET")
This is the case for all calls to geo_uid 205 that I looked at (across all 31 tables that have both Historical Time Periods and Survey Zones as available breakdowns).
I'll try to look into the get_cmhc function to see what is happening.
Hello!
By getting the data through get_cmhc
from different year, the naming of what seems to be the same survey zone can differ over time; here's an example.
plateau <- lapply(2015:2016, \(yr) {
out <- cmhc::get_cmhc(survey = "Rms",
series = "Vacancy Rate",
dimension = "Rent Ranges",
breakdown = "Survey Zones",
geo_uid = 24462,
year = yr)
out$`Survey Zones`[grepl("^Plateau", out$`Survey Zones`)]
})
print(unique(do.call(c, plateau)))
Output: [1] "Plateau Mont-Royal" "Plateau-Mont-Royal"
Naming for le Plateau in Montreal changes overtime. Before 2015 (included), there was no hyphen, and after 2015, the hyphen appeared. I believe this is the same zone, but there's no way to really be sure? From the description of the get_cmhc_geography
function, it's stated that the geographic data corresponds to an extract from 2017, and that it won't necessary match regions from other years.
Could a year argument be added to the get_cmhc_geography
function, letting us match names to spatial polygon for every individual year? And then year over year we could match the actual zones rather than names that might differ from a single string (in the hypothetical case that this is indeed the same survey zone).
Here is another example of names differing in the data, and a zone disappearing in some years:
st_lin <- lapply(2016:2021, \(yr) {
out <- cmhc::get_cmhc(survey = "Rms",
series = "Vacancy Rate",
dimension = "Rent Ranges",
breakdown = "Survey Zones",
geo_uid = 24462,
year = yr)
out$`Survey Zones`[grepl("^Saint-Lin", out$`Survey Zones`)]
})
print(st_lin)
Output:
[[1]]
character(0)
[[2]]
[1] "Saint-Lin\u0096Laurentides V" "Saint-Lin\u0096Laurentides V"
[3] "Saint-Lin\u0096Laurentides V" "Saint-Lin\u0096Laurentides V"
[5] "Saint-Lin\u0096Laurentides V" "Saint-Lin\u0096Laurentides V"
[7] "Saint-Lin\u0096Laurentides V"
[[3]]
character(0)
[[4]]
character(0)
[[5]]
character(0)
[[6]]
[1] "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V"
[4] "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V"
[7] "Saint-Lin-Laurentides V"
Maybe the zone just has a different naming in some years?
I think getting the survey zones geography for every year, if at all possible, would be the best way to fix these non-matching namings. These zones also have a METZONE_UID
in the output of the get_cmhc_geography
, which would help idenfity the zone coming from the data to the spatial zone, if that code was also in the output of the get_cmhc
. But having seen the content of the httr::POST
call, I understand there's only a name in that table to identify the zone; and as stated, this name isn't constant over years.
I understand CMHC data isn't super easy to work with! From your experience working with it, do you see a possibility to solve this problem? The only thing I can think of is either get spatial polygons of zones for every year (which would be very reliable), or merging years of data with names using the closest string match (less reliable).
Thanks !
The function get_cmhc_geography appears to have differerent column names for the METCODE depending on the level of geography selected.
The ZONE level is called MET_CODE while the MET level is called METCODE.
The latter looks like it might be hardcoded in the internal function census_to_cmhc_geocode, while the former may come from the gdb files.
library(tidyverse)
library(cmhc)
get_cmhc_geography("ZONE") %>% select(starts_with("MET") )
get_cmhc_geography("MET") %>% select(starts_with("MET") )
There seems to be a change in the internal handling of the data on the data portal, to access up-to-date data one needs to specify the MetId in the POST parameters. In the current development version v0.2.6 this is now fudged in for regions within CMAs, but there seems to be an internal MetId for regions outside of CMAs that I will have to get from CMHC.
Should have a lookup table for region codes to make it easier to pull data for different regions. Right now the region codes are hard-coded into the parameters.
Thanka a lot for this package, I look forward to using it to have much easier access to CMHC data!
While I can access data using cmhc::get_cmhc, I cannot access geographies using cmhc::get_cmhc_geography. It looks like the endpoint to the AWS bucket isn't right:
> cmhc::set_cmhc_cache_path("~/cmhc_cache", install = TRUE, overwrite = TRUE)
Your original .Renviron will be backed up and stored in your R HOME directory if needed.
Your cache path has been stored in your .Renviron and can be accessed by Sys.getenv("CMHC_CACHE_PATH").
[1] "~/cmhc_cache"
> cmhc::get_cmhc_geography(level = "ZONE")
Downloading geographies, this may take a minute...
List of 6
$ Code : chr "PermanentRedirect"
$ Message : chr "The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future "| __truncated__
$ Endpoint : chr "mountainmath.s3.amazonaws.com"
$ Bucket : chr "mountainmath"
$ RequestId: chr "DEVEV28Z0WV4E5FB"
$ HostId : chr "Wpn6zIAqge9ld7WR4zrY1cNmhGwV9tTfvVVw16Kxk6FVrY0AyYmfDrl+7xbYbTnbbeyZC7KTAG0="
- attr(*, "headers")=List of 7
..$ x-amz-bucket-region: chr "ca-central-1"
..$ x-amz-request-id : chr "DEVEV28Z0WV4E5FB"
..$ x-amz-id-2 : chr "Wpn6zIAqge9ld7WR4zrY1cNmhGwV9tTfvVVw16Kxk6FVrY0AyYmfDrl+7xbYbTnbbeyZC7KTAG0="
..$ content-type : chr "application/xml"
..$ transfer-encoding : chr "chunked"
..$ date : chr "Thu, 06 Oct 2022 17:26:01 GMT"
..$ server : chr "AmazonS3"
..- attr(*, "class")= chr [1:2] "insensitive" "list"
- attr(*, "class")= chr "aws_error"
NULL
Error in parse_aws_s3_response(r, Sig, verbose = verbose) :
Moved Permanently (HTTP 301).
In addition: Warning message:
In dir.create(file.path(base_directory)) :
'C:\Users\maxim\OneDrive - McGill University\Documents\cmhc_cache' already exists
Let me know if there's any more information from my end you'd need me to share.
Thank you!
Hello again!
When getting the vacancy rates in the rent ranges dimension, the Rent Ranges
column gets tweaked into (I believe) unwanted duplicated rows, and unreadable values.
cmhc::get_cmhc(survey = "Rms",
series = "Vacancy Rate",
dimension = "Rent Ranges",
breakdown = "Survey Zones",
geo_uid = "24462")
New names:
• `"$1` -> `"$1...6`
• `"$1` -> `"$1...10`
• `"$1` -> `"$1...14`
# A tibble: 390 × 7
`Survey Zones` `Rent Ranges` Value Quality Censu…¹ Survey Series
<chr> <fct> <dbl> <fct> <chr> <chr> <chr>
1 Downtown Montréal/Îles-des-Soeurs "Less Than $750" NA NA 2016 Rms Vacan…
2 Downtown Montréal/Îles-des-Soeurs "$750 - $999" 5.1 Fair (Use … 2016 Rms Vacan…
3 Downtown Montréal/Îles-des-Soeurs "\"$1...6" NA NA 2016 Rms Vacan…
4 Downtown Montréal/Îles-des-Soeurs "000 - $1" NA NA 2016 Rms Vacan…
5 Downtown Montréal/Îles-des-Soeurs "249\"" 4.6 Good 2016 Rms Vacan…
6 Downtown Montréal/Îles-des-Soeurs "\"$1...10" 7.9 NA 2016 Rms Vacan…
7 Downtown Montréal/Îles-des-Soeurs "250 - $1" NA NA 2016 Rms Vacan…
8 Downtown Montréal/Îles-des-Soeurs "499\"" NA NA 2016 Rms Vacan…
9 Downtown Montréal/Îles-des-Soeurs "\"$1...14" 6.3 NA 2016 Rms Vacan…
10 Downtown Montréal/Îles-des-Soeurs "500 +\"" NA NA 2016 Rms Vacan…
# … with 380 more rows, and abbreviated variable name ¹`Census geography`
# ℹ Use `print(n = ...)` to see more rows
Warning message:
Problem while computing `Value = parse_numeric(.data$Value)`.
ℹ NAs introduced by coercion
Let me know if any more information is needed,
Thanks again!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.