Giter Site home page Giter Site logo

faraazarsath / project_customer-segmentation_e_commerce Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 832 KB

Customer Segmentation of E commerce purchase database

Jupyter Notebook 100.00%
kmeans-clustering latent-semantic-analysis natural-language-processing tfidf-vectorizer

project_customer-segmentation_e_commerce's Introduction

Project_Customer-segmentation_E_commerce

Customer Segmentation of E commerce purchase database Project Customer Segmentation

Problem Statement

As an e-commerce platform, it is very important to profile your customers, dividing your clientele base into groups based on their needs and expectations. Grouping will help us come up with dedicated marketing strategies and will aid us in recommending products to different user bases. In this project, we are interested in analyzing the content of an E-commerce database that lists purchases made by โˆผ4000 customers over a period of one year (1/12/2010 to 9/12/2011). Based on this analysis, we would like to develop models to group the 4000 customers into different buckets. Such a model must take into account the similarity between the products purchased between the users (i.e. a user might purchase 2 different products which are very similar to each other), the spending patterns of a user, their meta information, etc.

Minimum Requirements

The end objective of the participant is to come up with customer segmentations that take into account all the information that is presented in the dataset. The participant is expected to use NLP techniques to find similarity between the products.

Project Summary:

In this project I used NLP (Natural Language Processing) techniques to extract frequent words in the 'Description' column of Products purchased and done Clustering text documents using k-means to obtain Product Categories.

Used Feature Extaction method TfidfVectorizer whch uses an in-memory vocabulary (a Python dict) to map the most frequent words to features indices and hence compute a word occurrence frequency (sparse) matrix. The word frequencies are then reweighted using the Inverse Document Frequency (IDF) vector collected feature-wise over the corpus. Performed dimensionality reduction using LSA (Latent semantic analysis) to this extracted features before fitting into K- means.

Finally with the Product Categories we clustered Customers based on thier purchases and spending pattern.

EDA

Top 10 Country by Sales

Top 15 Product Description by Sales

Product Category:

Product Categories word cloud

Monthly Sales per category

Monthly Sales - Product Category

Customer Cluster :

Customer Cluster countCustomer Cluster

Customer Cluster based on Product Category sales -

Out of total 4.35 million Sales in United Kingdom.

  • The customers in cluster 1 are contributing 2.15 million Sales
  • The customers in cluster 2 are contributing 0.87 million Sales
  • The customers in cluster 3 are contributing 0.78 million Sales.
  • The customers in cluster 0 are least contributing 0.53 million Sales

project_customer-segmentation_e_commerce's People

Contributors

faraazarsath avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.