Giter Site home page Giter Site logo

maralb83 / machine_learning_capstone_project Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 1.97 MB

Capstone project for Udacity's Machine Learning Engineer Nanodegree: predicting deal probability of online advertisements using heterogeneous data types (numerical, categorical, text, and image).

Jupyter Notebook 100.00%

machine_learning_capstone_project's Introduction

README

Capstone Project

Mario Albuquerque

May 31th, 2018

Summary

The subject of the project, which was done individually, was taken from a Kaggle competition named "Avito Demand Prediction Challenge".

The problem is one of determining the demand for an online advertisement given heterogeneous data types (categorical, numerical, text, and image).

The solution was devised in two approaches: a supervised classification model that flagged likely and unlikely deals; and, a supervised regression model that generated a deal probability forecast.

Data Source

The project used data provided through a Kaggle competition: "Avito Demand Prediction Challenge". Note that the location of the necessary files are assumed to have a root folder where the Jupyter Notebook Python files are located. There are two data files to be extracted:

  • train.csv.zip which has a file named train.csv with a total of 1,503,424 ads totaling around 931,000 KB. This is the main data source with the ads. This file should be in the folder "./Data/".

  • train_jpg.zip which has images corresponding to the ads in the train.csv dataset. There are a total of 1,390,836 images in the zipped folder and it totals around 52,000,000 KB. Note that not all ads in the train.csv dataset have an image. This file should be unzipped in the folder "./Data/Images/".

Python version and package requirements

This project was done with Python 3.5.3 and needs the following packages (outside of the Python Standard Library):

  • pandas 0.22.0
  • numpy 1.11.3
  • textblob 0.15.1
  • pillow 5.0.0
  • matplotlib 1.5.1
  • nltk 3.2.4
  • keras 2.1.4
  • opencv 3.2.0
  • ipython 6.1.0
  • scikit-learn 0.19.1

Jupiter Notebook Python Files

The implementation of the project was done through three Jupyter Notebooks:

  • EDA.ipynb: Exploratory data analysis.

  • Feature Engineering.ipynb: Feature engineering.

  • Model Development.ipynb: Model development and evaluation.

machine_learning_capstone_project's People

Contributors

maralb83 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.