Giter Site home page Giter Site logo

cmhc's People

Contributors

bdbmax avatar daniel-simeone avatar dshkol avatar mountainmath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cmhc's Issues

More modular parameters

Eventually there should be a clear interface for selecting different data in different ways. It will take some experimenting and playing with the CMHC calls to figure out what the best way of structuring the interface.

Halifax data missing?

It seems that the Halifax data is now missing. Perhaps the underlying structure has been changed?

cmhc::get_cmhc(survey = "Rms", series = "Vacancy Rate", dimension = "Bedroom Type", breakdown = "Historical Time Periods", geo_uid = 205, year = 2020)
and
cmhc::get_cmhc(survey = "Rms", series = "Vacancy Rate", dimension = "Bedroom Type", breakdown = "Survey Zones", geo_uid = 205)

Both give "No data available."

The given survey/series/dimension/breakdown all appear in list_cmhc_tables() and the SGC Code (205) is as given by get_cmhc_geography(level = "MET")

This is the case for all calls to geo_uid 205 that I looked at (across all 31 tables that have both Historical Time Periods and Survey Zones as available breakdowns).

I'll try to look into the get_cmhc function to see what is happening.

Survey Zones naming over time

Hello!

By getting the data through get_cmhc from different year, the naming of what seems to be the same survey zone can differ over time; here's an example.

plateau <- lapply(2015:2016, \(yr) {
  out <- cmhc::get_cmhc(survey = "Rms",
                        series = "Vacancy Rate",
                        dimension = "Rent Ranges",
                        breakdown = "Survey Zones",
                        geo_uid = 24462,
                        year = yr)
  out$`Survey Zones`[grepl("^Plateau", out$`Survey Zones`)]
})

print(unique(do.call(c, plateau)))

Output: [1] "Plateau Mont-Royal" "Plateau-Mont-Royal"

Naming for le Plateau in Montreal changes overtime. Before 2015 (included), there was no hyphen, and after 2015, the hyphen appeared. I believe this is the same zone, but there's no way to really be sure? From the description of the get_cmhc_geography function, it's stated that the geographic data corresponds to an extract from 2017, and that it won't necessary match regions from other years.
Could a year argument be added to the get_cmhc_geography function, letting us match names to spatial polygon for every individual year? And then year over year we could match the actual zones rather than names that might differ from a single string (in the hypothetical case that this is indeed the same survey zone).

Here is another example of names differing in the data, and a zone disappearing in some years:

st_lin <- lapply(2016:2021, \(yr) {
  out <- cmhc::get_cmhc(survey = "Rms",
                        series = "Vacancy Rate",
                        dimension = "Rent Ranges",
                        breakdown = "Survey Zones",
                        geo_uid = 24462,
                        year = yr)
  out$`Survey Zones`[grepl("^Saint-Lin", out$`Survey Zones`)]
})

print(st_lin)

Output: 
[[1]]
character(0)

[[2]]
[1] "Saint-Lin\u0096Laurentides V" "Saint-Lin\u0096Laurentides V"
[3] "Saint-Lin\u0096Laurentides V" "Saint-Lin\u0096Laurentides V"
[5] "Saint-Lin\u0096Laurentides V" "Saint-Lin\u0096Laurentides V"
[7] "Saint-Lin\u0096Laurentides V"

[[3]]
character(0)

[[4]]
character(0)

[[5]]
character(0)

[[6]]
[1] "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V"
[4] "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V" "Saint-Lin-Laurentides V"
[7] "Saint-Lin-Laurentides V"

Maybe the zone just has a different naming in some years?

I think getting the survey zones geography for every year, if at all possible, would be the best way to fix these non-matching namings. These zones also have a METZONE_UID in the output of the get_cmhc_geography, which would help idenfity the zone coming from the data to the spatial zone, if that code was also in the output of the get_cmhc. But having seen the content of the httr::POST call, I understand there's only a name in that table to identify the zone; and as stated, this name isn't constant over years.

I understand CMHC data isn't super easy to work with! From your experience working with it, do you see a possibility to solve this problem? The only thing I can think of is either get spatial polygons of zones for every year (which would be very reliable), or merging years of data with names using the closest string match (less reliable).

Thanks !

Column name variability - MET_CODE versus METCODE

The function get_cmhc_geography appears to have differerent column names for the METCODE depending on the level of geography selected.
The ZONE level is called MET_CODE while the MET level is called METCODE.
The latter looks like it might be hardcoded in the internal function census_to_cmhc_geocode, while the former may come from the gdb files.

library(tidyverse)
library(cmhc)
get_cmhc_geography("ZONE") %>%  select(starts_with("MET") )
get_cmhc_geography("MET") %>%    select(starts_with("MET") )

image

MetId is now required

There seems to be a change in the internal handling of the data on the data portal, to access up-to-date data one needs to specify the MetId in the POST parameters. In the current development version v0.2.6 this is now fudged in for regions within CMAs, but there seems to be an internal MetId for regions outside of CMAs that I will have to get from CMHC.

Region codes

Should have a lookup table for region codes to make it easier to pull data for different regions. Right now the region codes are hard-coded into the parameters.

Inaccessible geographies

Thanka a lot for this package, I look forward to using it to have much easier access to CMHC data!

While I can access data using cmhc::get_cmhc, I cannot access geographies using cmhc::get_cmhc_geography. It looks like the endpoint to the AWS bucket isn't right:

> cmhc::set_cmhc_cache_path("~/cmhc_cache", install = TRUE, overwrite = TRUE)
Your original .Renviron will be backed up and stored in your R HOME directory if needed.
Your cache path has been stored in your .Renviron and can be accessed by Sys.getenv("CMHC_CACHE_PATH").
[1] "~/cmhc_cache"
> cmhc::get_cmhc_geography(level = "ZONE")
Downloading geographies, this may take a minute...
List of 6
 $ Code     : chr "PermanentRedirect"
 $ Message  : chr "The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future "| __truncated__
 $ Endpoint : chr "mountainmath.s3.amazonaws.com"
 $ Bucket   : chr "mountainmath"
 $ RequestId: chr "DEVEV28Z0WV4E5FB"
 $ HostId   : chr "Wpn6zIAqge9ld7WR4zrY1cNmhGwV9tTfvVVw16Kxk6FVrY0AyYmfDrl+7xbYbTnbbeyZC7KTAG0="
 - attr(*, "headers")=List of 7
  ..$ x-amz-bucket-region: chr "ca-central-1"
  ..$ x-amz-request-id   : chr "DEVEV28Z0WV4E5FB"
  ..$ x-amz-id-2         : chr "Wpn6zIAqge9ld7WR4zrY1cNmhGwV9tTfvVVw16Kxk6FVrY0AyYmfDrl+7xbYbTnbbeyZC7KTAG0="
  ..$ content-type       : chr "application/xml"
  ..$ transfer-encoding  : chr "chunked"
  ..$ date               : chr "Thu, 06 Oct 2022 17:26:01 GMT"
  ..$ server             : chr "AmazonS3"
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 - attr(*, "class")= chr "aws_error"
NULL
Error in parse_aws_s3_response(r, Sig, verbose = verbose) : 
  Moved Permanently (HTTP 301).
In addition: Warning message:
In dir.create(file.path(base_directory)) :
  'C:\Users\maxim\OneDrive - McGill University\Documents\cmhc_cache' already exists

Let me know if there's any more information from my end you'd need me to share.

Thank you!

Rent ranges dimension

Hello again!

When getting the vacancy rates in the rent ranges dimension, the Rent Ranges column gets tweaked into (I believe) unwanted duplicated rows, and unreadable values.

 cmhc::get_cmhc(survey = "Rms", 
                            series = "Vacancy Rate", 
                            dimension = "Rent Ranges",
                            breakdown = "Survey Zones", 
                            geo_uid = "24462")
New names:                                                                                                                      
• `"$1` -> `"$1...6`
• `"$1` -> `"$1...10`
• `"$1` -> `"$1...14`
# A tibble: 390 × 7
   `Survey Zones`                    `Rent Ranges`    Value Quality     Censu…¹ Survey Series
   <chr>                             <fct>            <dbl> <fct>       <chr>   <chr>  <chr> 
 1 Downtown Montréal/Îles-des-Soeurs "Less Than $750"  NA   NA          2016    Rms    Vacan…
 2 Downtown Montréal/Îles-des-Soeurs "$750 - $999"      5.1 Fair (Use … 2016    Rms    Vacan…
 3 Downtown Montréal/Îles-des-Soeurs "\"$1...6"        NA   NA          2016    Rms    Vacan…
 4 Downtown Montréal/Îles-des-Soeurs "000 - $1"        NA   NA          2016    Rms    Vacan…
 5 Downtown Montréal/Îles-des-Soeurs "249\""            4.6 Good        2016    Rms    Vacan…
 6 Downtown Montréal/Îles-des-Soeurs "\"$1...10"        7.9 NA          2016    Rms    Vacan…
 7 Downtown Montréal/Îles-des-Soeurs "250 - $1"        NA   NA          2016    Rms    Vacan…
 8 Downtown Montréal/Îles-des-Soeurs "499\""           NA   NA          2016    Rms    Vacan…
 9 Downtown Montréal/Îles-des-Soeurs "\"$1...14"        6.3 NA          2016    Rms    Vacan…
10 Downtown Montréal/Îles-des-Soeurs "500 +\""         NA   NA          2016    Rms    Vacan…
# … with 380 more rows, and abbreviated variable name ¹​`Census geography`
# ℹ Use `print(n = ...)` to see more rows
Warning message:
Problem while computing `Value = parse_numeric(.data$Value)`.
ℹ NAs introduced by coercion 

Let me know if any more information is needed,
Thanks again!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.