Comments (6)
Also, is there any documentation for the calculate_heat_values.R? Bit opaque and really just curious to see the underlying math driving the calculation. Imagine it's some sort of inverse distance logic though curious what metric you used (Euclidean, Manhattan, whatever) and how that's justified. The report provided is very impressive though "local intensity" leaves a lot to the imagination.
from food-inspections-evaluation.
From my understanding of the code, there is a function called calculate_heat_values.R in the Functions folder. It appears to use a kernel density estimation with a grid of .01.
from food-inspections-evaluation.
Anticipating this question I actually already have a comparison between the
KDE function we use and the standard KDE in the MASS
package: .\CODE\not used\kde_comparison
Ironically I was initially annoyed that Allstate had created a new KDE
function. I believe my exact thoughts were "Why not use the KDE in MASS?
Show-offs.". Upon examination, I saw that their code is only a slight
modification of the function in MASS, but their version increases the
computational efficiency because it skips the parts we don't need. My
annoyance was quickly converted into appreciation, and I kept the
comparison for other people's future reference.
This heatmap function is a great example of a limitation of the
"evaluation" nature of this project. It's very specific to this project,
and even the density estimates are hard-coded. (You can blame me for this
short sightedness!) It would be better to pull out these functions into a
much more generic package that calculates scores for arbitrary data
(probably inspections). Then it would be nice to see that package applied
to this evaluation. This is high on my personal wishlist.
On Thu, Jul 30, 2015 at 7:49 AM, Rajiv Shah [email protected]
wrote:
From my understanding of the code, there is a function called
calculate_heat_values.R in the Functions folder. It appears to use a kernel
density estimation with a grid of .01.—
Reply to this email directly or view it on GitHub
#80 (comment)
.
from food-inspections-evaluation.
Thanks for the detail and ya I know from experience how it's tough to think about generalizing when you're in the thick of just getting it right for the immediate task at hand i.e. Chicago! Do you know off hand how the kde bandwith was selected?
h <- if (missing(h))
c(bandwidth.nrd(x), bandwidth.nrd(y))
h <- h / 4
Mostly just curious for that. And do you know how much computational efficiency is gained from that code line improvement? Is it material in generating the data to run the model? We're talking about an upper bound of 337k observations for the crime dataset so is run time really that much of an issue?
More broadly though it does seem that best practice as we look to apply these sorts of city analytics projects to more than just a single city would be to use a generic package (ideally off CRAN) so it's easier to redeploy.
Cheers,
PA
from food-inspections-evaluation.
Out of all the scripts the heatmap calculations are by far the most time intensive. For example last night it took 668 seconds to run the heat map script, and the next longest time was the business download which only took 142 seconds. I have not benchmarked how much time the alternative KDE calculation saves.
There is a discussion on the original kde2d
function from MASS on page 131 (really it starts on page 126) of the accompanying book, Modern Applied Statistics with S by W.N. Venables and B.D. Ripley. I have the 4th edition, so the page numbers may vary depending on which one you're using. I'm no expert on kenel density bandwith selection, but I trust whatever Venables and Ripley have to say about it.
from food-inspections-evaluation.
BTW, I do think it would be good to test the effectiveness of different assumptions in the density estimation, and I'm not sure how much of this was done at Allstate.
I was thinking it would be good to test
- More variables
- Different lengths of time
- Different density distances
- Different kernels (maybe)
from food-inspections-evaluation.
Related Issues (20)
- Updating download scripts / data cache to be in sync with the model code
- Update 00_Startup.R
- Split violation matrix calculation
- In `GenerateOtherLicenseInfo` guard against case with too few categories HOT 1
- Split "create model data" step and fix inspector data HOT 9
- Train/test data includes schools, hospitals, and other facility types HOT 3
- violations matrix HOT 1
- Predictions API
- Cannot find Inspection_Date problem
- Inspections are cyclic; how does prioritizing them help? HOT 2
- Would you mind adding a license to the code? HOT 3
- violations_dat.Rds does not have filtered inspections, but all inspects HOT 1
- bad characters in inspectors data (trivial) HOT 1
- Website header area is off-center HOT 1
- Update download steps to use RSocrata from CRAN
- Refactor `eval_model` and integrate evaluation function more deeply with `30_glmnet_model.R`
- Social media data as a predictor? HOT 1
- Source of weather data? HOT 5
- Report Metric Development relies on datTest which is created in CODE/31 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from food-inspections-evaluation.