Giter Site home page Giter Site logo

allensmile / yelper_recommendation_system Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nickyuan/yelper_recommendation_system

0.0 1.0 0.0 127.54 MB

Yelper recommendation system

Home Page: https://chuansun76.com/2016/09/25/yelper-a-collaborative-filtering-based-recommendation-system/

Python 2.01% Jupyter Notebook 0.26% JavaScript 90.99% Scala 0.37% CSS 3.69% XSLT 2.09% HTML 0.60%

yelper_recommendation_system's Introduction

Yelper: A Collaborative Filtering Based Recommendation System

Chuan Sun

[chuansun76 at gmail dot com]

[twitter.com/sundeepblue]

Blog: https://chuansun76.com/2016/09/25/yelper-a-collaborative-filtering-based-recommendation-system/

This README file describes several major component of the "Yelper", a business recommendation system built mainly in Python using Spark framework.

Below are some features of the "Yelper":

  • Divide original business data by cities allows fine tuned and customized recommendation
  • Matrix Factorization based recommendation using Spark MLlib
  • User-business graph analysis using Spark GraphX in Scala
  • Real-time user request handling using Spark Streaming and Apache Kafka
  • User-business graph visualization using D3 and graph-tool library
  • Functional webserver to recommend high rated stuff for users

Now let me introduce in detail how to reproduce everything!

1. Preprocessing

(1) Convert all user ids and business ids to integers. This made subsequent graph building a lot easier.

(2) Split the entire business data into smaller subsets by city. Obtained 9 major cities:

  • us_charlotte
  • us_lasvegas
  • us_madison
  • us_phoenix
  • us_pittsburgh
  • us_urbana_champaign
  • canada_montreal
  • germany_karlsruhe
  • uk_edinburgh

All necessary util functions can be found here: ./rating_data_utils.py

Run this command:

$ spark-submit ./parse_ratingdata_for_major_cities.py

2. Network analysis for user-business graph

Extract connected components using Spark GraphX

Since there is no Python support for GraphX, I wrote the code in Scala. Note, the scala code has to be built using "sbt". Make sure the GraphX library is properly configured in file "./spark_graphx_analysis/config.sbt".

Source code: "./spark_graphx_analysis/src/main/scala/YelpUserBusinessGraphAnalysis.scala"

Below are the commands to run the graph analysis:

  • $ cd /Users/sundeepblue/Bootcamp/allweek/week9/capstone/spark_graphx_analysis
  • $ sbt package
  • $ spark-submit --master local --class "YelpUserBusinessGraphAnalysis" target/scala-2.11/simple-project_2.11-1.0.jar

The file is saved to "businessid_to_indegree.csv"

3. Build MF-based recommendation models for 9 major cities

Run this command to prepare mf based model for each major city:

$ python mf_based_recommendation_trainer.py

4. Build real-time user request handler using Spark Streaming and Apache Kafka

The purpose here is to simulate continuous user request handling.

STEP 0: Start Zookeeper and Kafka server

Note that kafka zookeeper default port is 2181 not 9092! And, the zookeeper server and kafka server should be started in two separate terminals.

  • $ cd /Users/sundeepblue/Bootcamp/allweek/week9/capstone/kafka/kafka_2.11-0.10.0.1
  • $ bin/zookeeper-server-start.sh config/zookeeper.properties
  • $ bin/kafka-server-start.sh config/server.properties

STEP 1: Create Kafka topic

  • $ bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic user-request-topic --partitions 1 --replication-factor 1

STEP 2: Launch Spark Streaming

Note that this command should also be run in a new terminal. Use port 2181 and use this topic: "user-request-topic"

  • $ cd /Users/sundeepblue/Bootcamp/allweek/week9/capstone
  • $ spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0 ./handle_user_requests_streaming.py

STEP 3: Produce user requests (TBD)

Note, do not specify port in KafkaProducer()!

  • $ cd /Users/sundeepblue/Bootcamp/allweek/week9/capstone
  • $ python ./user_requests_producer.py

5. Build user-business graph for D3 visualization

The purpose here is to generate .js file using all the nodes and edges in the user-business graph, such that I can load it and visualize graphs in web server.

See file:

./build_nodes_and_edges_js_for_d3_visualization.py

6. Finally, how to run local web server to recommend something?

This web server was built using:

  • Spark
  • Flask
  • cherrypy
  • Python paste

How to launch the web server?

  • $ cd /Users/sundeepblue/Bootcamp/allweek/week9/capstone/webserver
  • $ unset PYSPARK_DRIVER_PYTHON
  • $ spark-submit server.py

Sample web server URLs

recommendation for user '10081786' in city 'us_charlotte':

recommendation for user '10033545' in city 'us_madison':

users-businesses social network in Madison, USA:

The javascript file "./webserver/static/data/generated_nodes_and_edges_from_json_us_madison.js" was programmatically generated by python code "./build_nodes_and_edges_js_for_d3_visualization.py"

The most important files for the server

  • ./webserver/app.py (contains code to interact with Spark, recommend businesses, etc)
  • ./webserver/server.py (how to make spark work with flask)
  • ./templates/map.html (contain Google Map API calling)

7. Future works

  • More graph analysis
    • Graph pagerank analysis using GraphX
    • Community discovery (similar to Facebook social network)
  • Improve recommendation
    • Content-based recommendation
    • Clustering all businesses
    • Extract object from business photos using Convolutional Neural Network
  • Code
    • Redirect spark execution log to txt file
    • Add try/except to handle potential exceptions in codes
    • Add more comments
    • Add test cases for critical logics
  • Google Map based web page
    • Fine tune the webpage to support more features

yelper_recommendation_system's People

Contributors

sundeepblue avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.