Giter Site home page Giter Site logo

winereviewanalysis's Introduction

wine_chalkboard

Wine Review Analysis

Team Pura Pythonic Vida

Contact Information

William Ponton: LinkedIn

Email: @gorbulus

REPL: @gorbulus

Github: gorbulus

Project Description

Welcome to Team Pura Vida's Wine Review Analysis for the RMOTR Data Science Coursework.

Started: 2.17.19

Completed: 3.17.19

Project Vision Statement:

This project will explore a dataset with Python and use standard Data Science skills to clean, analyze, visualize, and interpret data and elegantly use the data patterns to provide scientific meaning to a dataset I found on kaggle.

Inspiration

Since I have spent the majority of my career in the FoodService and FoodService Equipment Industry, I wanted this project to be related to a culinary dataset of some kind. The Wine Review dataset caught my attention because I always had been suspicious of the concepts of quality and price being directly affect the consumer's mind. Usually "you get what you pay for" is a safe bet, but sometimes a more expensive commodity is not the "best" or preferred choice for consumers. This is especially evident in the Food Industry where epicurean prices are related to a subjective palette. I want to analyze the Wine Reviews and see if the more expensive wines are always the highest rated ones by consumers.

I think that this dataset offers some great opportunities for text related predictive models. My overall goal in the future is to use another version of this dataset (with three additional columns) to do some predictive analysis on the Text based Description column. Ultimately I am interested in building a bot that could produce a convincing Wine Review. If anyone has any ideas, breakthroughs, or other interesting insights/models please post them~! Feel free to fork as well. I am open to constructive feedback and tips as well.

Project goals

  1. Import and inspect the dataset using pandas.

  2. Analyze the dataset using pandas and numpy.

  3. Create visualizations using matplotlib and seaborn.

  4. Interpret meanings from the data using the Scientific Method ("Data Science!").

Project Stack

Python has a rich Data Science functionality that has been motivated by teams of scientists and engineers trying to solve scientific and engineering problems. Python's Object Oriented Design, ease of syntax, and available libraries make it the industry standard for Data Analysis. A 2016 study done by O'Reily shows that Python is now dominant over R throughout the Data Science community, favoring Python 3.6 to the soon to be extinct Python 2.7.

Python has become the fastest growing programming language of 2019, and continues to remain the industry standard for modeling and analysis in the scientific and engineering industries. The Scientific Python Stack is an array of technologies that make Python so powerful for Data analysis and statistical prediction.

To get everything running in this project, use pip install -r requirements.txt

Let's take a quick tour of the Scientific Python (SciPy) stack I used for the Wine Review Analysis:

Language

  • Python 3.6 (replacing legacy Python 2.7 in 2020)
  • Cython (a speedy C library for backing up numpy)

Scientific & Numeric Power

  • SciPy
  • NumPy
  • SciKitLearn

Interactive Environment

  • Anaconda IDE
  • IPython Notebooks
  • GitHub (version control)
  • RMOTR Notebooks

Data Science Libraries

  • Analysis tools
    • NumPy
    • Pandas
    • Cython
  • Visualization tools
    • Matplotlib
    • Seaborn
    • Bokeh

Python_Stack

Dataset Overview

I searched Food related datasets on kaggle and found the Wine Review dataset. I was looking for a medium sized CSV file between 50MB - 1GB. I also wanted something that would take some processing but wasn't a wrangling project. I wanted to make some visualizations as well using the seaborn library. I used kaggle's filtering and search and found the Wine Review dataset.

The data was scraped from Wine Enthusiast Magazine during the week of June 15th, 2017.

This dataset is 150,930 Wine Reviews in one csv file of about 50 MB:

winemag-data_first150k.csvcontains 10 columns and 150k+ rows of Wine Reviews scraped from WineEnthusiast during June of 2017.

Each record in the dataset represents a single wine review from an online user of Wine Enthusiast Magazine

The following is a brief summary of the 10 different columns of data included in winemag-data_first150k.csv:

reviews_df_dtypes

Data Columns

  1. Country - The country of origin of the wine.

  2. Description - The description of the wine's flavor profile.

  3. Designation - The vineyard where the wine's grapes are sourced.

  4. Points - The number of points Wine Enthusiast Magazine rated the wine on a scale of 1-100.

  5. Price - The cost for a single bottle of the wine.

  6. Province - The province or state that the wine is from.

  7. Region 1 - The wine growing area in a province or state (for example, Napa Valley in California).

  8. Region 2 - (Optional) A more specific region in a wine growing area (for example, Rutherford inside Napa Valley).

  9. Variety - The type of grapes used to make the wine (for example, Pinot Noir).

  10. Winery - The winery that made the wine.

Analysis

Check out the WineReviewAnalysis Notebook for the analysis.

Dependencies

To get everything running, use pip install -r requirements.txt

Results

After cleaning and inspecting the Wine Reviews dataset, we used numerical and statistical analysis to create visualizations from the dataset. Using the focused plotting of point distributions, jointplots, and heatmaps, it has been determined that the best value of wines in the 150,930 reviews is as follows:

  • Made in California
  • A Chardonnay, Pinot Grigio, or Cabernet Savignon
  • 12.00 - 18.00 USD per bottle
  • 87.5 or greater points is highly likely

Conclusions

  • California is well known for its Wine producing industry, and agriculture capabilities.

  • This means that overall, wines in the 10.00 - 20.00 range have frequently better ratings when compared to more expensive wines.

  • This could be due to the price point of these wines, or the fact that most consumers drink expensive wines less frequently or only for special occasions.

  • It makes sense for the wine producers to focus on the market demand for their products and target their resources towards the taste of the public.

  • There is no correlation between price and quality when comparing the majority of commercial wines.

Contact Information

What did you think about the Wine Review Analysis?

Cheers.

Email: @gorbulus

Github: gorbulus

REPL: @gorbulus

William Ponton: LinkedIn

FoodCubes

winereviewanalysis's People

Contributors

gorbulus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.