Giter Site home page Giter Site logo

analyse-yelp-dataset-with-spark-parquet-format-on-azure-databricks's Introduction

Analyse-Yelp-Dataset-with-Spark-Parquet-Format-on-Azure-Databricks

Analysis and Visualisation of Yelp Dataset using Apache Spark | Elastic Search | Kibana

Introduction

Most businesses seek to get reviews on their goods and services one way or another. It is a most basic way for the business to improve their efficiency and subsequently their bottom-line. Get the review is not only the issue, ability to extract and visualize analytics from review data is critical to business success.

In Apache Spark Project, we will use the yelp review dataset to analyze businesses and reviews over a period of time. Perhaps we will spot potential gaps in service delivery or see how business thrive in different scenarios.

Beyond processing this data, we will ingest the final output of our data processing in Elasticsearch and use the visualization tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Goal

The goal of this Spark project is to analyze business reviews from Yelp dataset and ingest the final output of data processing in Elastic Search.Also, use the visualisation tool in the ELK stack to visualize various kinds of ad-hoc reports from the data.

Architecture

image

image

• Import data from yelp dataset into relational database(MySQL)

please go through file(get_yelp_in_mysql_databaset.txt) for more information • Ingesting data from relational database (MySQL) using Sqoop into Hadoop HDFS

please go through file(yelp_mysql_sqoop_commands.txt) for more information • Ingesting data from relational database directly into Spark

• Processing relational data in Spark

• Ingesting processed data into Elasticsearch

• Visualizing review analytics using Kibana

Technology stack

image

Area Technology DataSet Yelp Relational Database MySQL Big Data Ingestion Tool Hadoop (Sqoop) Distributed File System Hadoop (HDFS) Cluster Computing Framework Apache Spark (Scala) Search and Analytics Engine Elasticsearch

Yelp schema

image

Out of all attributes we will focus on some shown below

Business

category

hours

Review

Use Cases considered for Visualisation

Top 10 Business Categories

Yelp Business Map

Business distribution by state

Average rating of business over time

Top rated businesses

User sign up trend

Configuring Environment

Installation of Cloudera quickstart VM

Installation of Elk stack

Later Configuring Scala Runtime to Cloudera QuickStart VM

Watch the below video for more information

https://www.youtube.com/watch?v=SFJsuo2XISs

Execution Instructions Launch the Spark Shell

spark-shell --packages org.elasticsearch:elasticsearch-spark-13_2.10:6.1.1 --conf spark.es.index.auto.create=true --conf spark.es.nodes= Ipaddress:port(Elastic search)

Visualisation Screen shots

After Ingesting processed data into Elasticsearch

Yelp User sign up trend

image

Business distribution by state

image

Yelp review

image

Top 10 Business Categories

image

image

Dashboard

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.