Giter Site home page Giter Site logo

computingvictor / yelp_stars Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 7.98 MB

Final project of the Machine Learning subject using the Yelp dataset to set a business case and create a predictive model

Jupyter Notebook 99.94% Python 0.06%
business cunef jupyter machine-learning networkx python yelp-dataset

yelp_stars's Introduction

Yelp Stars Prediction

imagen_readme.jpeg

Project

This is the final project for the subject Machine Learning of CUNEF Master´s in Data Science. The objective of this project is to study the Yelp dataset, find business cases and to create predictive models.

Business Case

The business objective is through the information registered by the businesses on the platform and their characteristics, to be able to predict whether the average score they will achieve from the users will be high (>=4 stars) or low (<4 stars).

To achieve this, a preprocessing of the data has been carried out, where JSON files have been treated and exported as a parquet. Then, an exploratory analysis of the data, the creation of pipelines to treat the selected variables according to the type of data, and testing of different models.

Finally, we proceed to calculate the local and global explainability to obtain the importance of the variables, also we created a graph with a business case applicable to the set used in our model.

How to run the Project?

To run the project, you should install the environment writing in the shell:

pip3 install -r requirements.txt

Then, you should download the Yelp dataset, extract and move it to the data/raw folder.

What did we use?

  • Python 3.9.13

  • Visual Studio Code

  • Jupyter Notebook

  • Networkx

Index

  1. Data Preprocessing

  2. EDA

  3. Feature Engineering

  4. Models

    • Base Model (Dummy Model)
    • Logistic Regression Lasso
    • Random Forest
    • Light GBM
    • Support Vector Machine
    • XGBoost
  5. Model Selection

  6. Interpretability

  7. Graph

Content of the repository

  • data:

    • raw: Documents downloaded from the source of the dataset.

    • processed: Data dictionay processed and data processed.

    • maps: Map to load the stars numbers by state

    • graphs: Folder where the graph will be exported.

  • images: Pictures used in the differents notebooks.

  • html: Notebooks exported as html files.

  • notebooks: Notebooks of the project and functions .py files.

  • models: Pickles of the different models.

  • env: Requirements of the environment.

Authors

Victor Viloria Vázquez

Antonio Nogués Podadera:

Project Link:

https://github.com/ComputingVictor/Yelp_Stars

yelp_stars's People

Contributors

computingvictor avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.