20190226_identify-customer-segments's Introduction

20190226_Identify-Customer-Segments

[Udacity Data Scientist Project] Identify Customer Segments of Arvato using k-means clustering

This project is a part of Udacity Data Science Nanodegree. AZ Direct and Arvato Finance Solution supplied their customer dataset. The goal of the project is to identify facets of the population that are most likely to be purchasers of their products. I used k-means clustering to organize the general population into clusters, then use those clusters to see which of them comprise the main user base for the company. Prior to applying the machine learning methods, I also asseseds and cleaned the data in order to convert the data into a usable form.

Data

Udacity_AZDIAS_Subset.csv: Demographics data for the general population of Germany; 891211 persons (rows) x 85 features (columns).
Udacity_CUSTOMERS_Subset.csv: Demographics data for customers of a mail-order company; 191652 persons (rows) x 85 features (columns).
Data_Dictionary.md: Detailed information file about the features in the provided datasets.
AZDIAS_Feature_Summary.csv: Summary of feature attributes for demographics data; 85 features (rows) x 4 columns

I am not allowed to publish the data provided by Arvato Financial Services due to the terms and conditions. However, you can find the code and the analysis that I performed in the html file Identify_Customer_Segments.html.

Recommend Projects

hajeong-noh / 20190226_identify-customer-segments Goto Github PK