Giter Site home page Giter Site logo

featurehoods's Introduction

neighborhood

redefine neighborhoods by home features


project description

A 4-part problem using my company's real estate listings data/history for San Francisco and East Bay, described below. I intend to complete parts 1 and 2, with 3 and 4 being extra credit in case I blow through it quickly.

1. Clustering (after preprocessing with feature extraction or selection)

Define 'neighborhoods' in terms of similar-price, similar-home-feature clusters. Not sure yet whether geographical proximity should be a factor, probably will try with and without. If clustering turns out to be irascible or un-illuminating, I can convert to classification by training the data with the lat/long that users are conducting their home searches in. Either reduce dimensions or use user data to select features (e.g. users are set on their bedroom count).

2. Text-mine descriptors to label the clusters

Text-mine listing descriptions and/or geo-tagged social media posts to describe clusters -- I'm fascinated by how apps like Yelp, Glassdoor, Amazon, LinkedIn etc are able to pull out the 'key point/insight' in customer reviews, and would like to replicate that. e.g. "Oh what we think of as Nob Hill should have its western boundaries extended if we think in terms of home price affordability." Either do n-grams (with listing descriptions) or identify which sentences have the highest cumulative tf-idf from words (most unique sentence), or find which words are tied to topic sentences.

3. Use APIs

Use restaurant & bar data as another dimension in determining the clusters and/or as the labels instead in a classification exercise -- I basically want opening hours and type of establishment (luxury, dive etc). I could pull this from Yelp, Google Maps or OpenTable's API if available, or just our company-purchased restaurant map layers if that becomes too gnarly.

Also, use Google Elevation API to enrich the data with altitude and identify hills

4. Predict future gentrification, pockets of home price growth

Understand how the clusters evolve, especially with regard to gentrification. If whatever I did for 1. didn't produce recommendations or a similarity distance metric (i.e. if I used geographic proximity), then also do this (i.e. 'if you like homes in Nob Hill cluster, you'll also like this random neighborhood in San Jose').


data

all single-family residences, condos and townhouses sold in the 9-county SF Bay Area between May 1, 2014 and Aug 31, 2014 all homes in the 9-county SF Bay Area that were assessed for property taxes at some point in 2009

featurehoods's People

Contributors

selwyth avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.