Giter Site home page Giter Site logo

arvato_capstone_project's Introduction

arvato_capstone_project

Udacity - Machine Learning Engineer Nanodegree Program

Project Overview

In this project, we analyze demographic data for customers of a mail-order sales company in Germany, comparing it against demographics information for the general population. Exploratory Data Analysis is performed to understand and clean the data. Unsupervised learning techniques are used to perform customer segmentation, identifying the parts of the population that best describe the core customer base of the company. Then, we'll apply what we've learned on a third dataset with demographic information for targets of a marketing campaign for the company, and use a model to predict which individuals are most likely to convert into becoming customers for the company.

Software and Libraries

This project uses Python 3 and is designed to be completed through the Jupyter Notebooks IDE. It is highly recommended that you use the Anaconda distribution to install Python, since the distribution includes all necessary Python libraries as well as Jupyter Notebooks. The following libraries are expected to be used in this project:

  • NumPy
  • pandas
  • Sklearn / scikit-learn
  • Matplotlib (for data visualization)
  • Seaborn (for data visualization)
  • joblib (use to save sklearn models)
  • missingno (to analyze missing data)
  • pyarrow (to save to parquet format)
  • tqdm (to show progress bars)
  • imblearn (for resampling imbalanced data)
  • xgboost (for the supervised model)
  • hyperopt (for hyperparameter tuning using Bayesian optimization)
  • functions.py (various utility functions used through the project)
  • other some functions from scikit-learn

How the project is organized

There are 3 Jupyter Notebooks that are supposed to be ran in order

1.Data Cleaning and Transformation.ipynb 2.Customer Segmentation.ipynb 3.Supervised Learning Model.ipynb

The notebooks expect that the following files:

  • Udacity_AZDIAS_052018.csv
  • Udacity_CUSTOMERS_052018.csv
  • Udacity_MAILOUT_052018_TEST.csv
  • Udacity_MAILOUT_052018_TRAIN.csv

The data is the property of Bertelsmann Arvato Analytics and it is not included in the repository.

Also AZDIAS_Feature_Summary.csv is expected to be presented. This file is constructed for this project and contains only metadata (the file is provided).

For the Kaggle competition the submit.csv file is included in the notebooks folder

arvato_capstone_project's People

Contributors

hazemhamada avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.