maralb83 / machine_learning_capstone_project Goto Github PK

View Code? Open in Web Editor NEW

Capstone project for Udacity's Machine Learning Engineer Nanodegree: predicting deal probability of online advertisements using heterogeneous data types (numerical, categorical, text, and image).

Jupyter Notebook 100.00%

machine_learning_capstone_project's Introduction

README

Capstone Project

Mario Albuquerque

May 31th, 2018

Summary

The subject of the project, which was done individually, was taken from a Kaggle competition named "Avito Demand Prediction Challenge".

The problem is one of determining the demand for an online advertisement given heterogeneous data types (categorical, numerical, text, and image).

The solution was devised in two approaches: a supervised classification model that flagged likely and unlikely deals; and, a supervised regression model that generated a deal probability forecast.

Data Source

The project used data provided through a Kaggle competition: "Avito Demand Prediction Challenge". Note that the location of the necessary files are assumed to have a root folder where the Jupyter Notebook Python files are located. There are two data files to be extracted:

train.csv.zip which has a file named train.csv with a total of 1,503,424 ads totaling around 931,000 KB. This is the main data source with the ads. This file should be in the folder "./Data/".
train_jpg.zip which has images corresponding to the ads in the train.csv dataset. There are a total of 1,390,836 images in the zipped folder and it totals around 52,000,000 KB. Note that not all ads in the train.csv dataset have an image. This file should be unzipped in the folder "./Data/Images/".

Python version and package requirements

This project was done with Python 3.5.3 and needs the following packages (outside of the Python Standard Library):

pandas 0.22.0
numpy 1.11.3
textblob 0.15.1
pillow 5.0.0
matplotlib 1.5.1
nltk 3.2.4
keras 2.1.4
opencv 3.2.0
ipython 6.1.0
scikit-learn 0.19.1

Jupiter Notebook Python Files

The implementation of the project was done through three Jupyter Notebooks:

EDA.ipynb: Exploratory data analysis.
Feature Engineering.ipynb: Feature engineering.
Model Development.ipynb: Model development and evaluation.

Recommend Projects

maralb83 / machine_learning_capstone_project Goto Github PK

machine_learning_capstone_project's Introduction

README

Capstone Project

Summary

Data Source

Python version and package requirements

Jupiter Notebook Python Files

machine_learning_capstone_project's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent