Giter Site home page Giter Site logo

logistic_regression_tmall's Introduction

Background

Tmall.com (simplified Chinese: 天猫; pinyin: Tiānmāo), formerly Taobao Mall, is a Chinese-language website for business-to-consumer (B2C) online retail, operated in China by Alibaba Group.

It is a platform for Chinese people and international businesses to sell brand-name goods to consumers. In the last few years, it has opened its features to brands, not only for online sales but also for developing brand awareness.

ANT credit pay is a consumer credit product launched by ANT Financial, which can be used in Alipay consumption, and has similar functions to credit cards.

ANT credit pay will provide users with credit ranging from RMB 1,000 to RMB 50,000 according to their consumption levels and credit scores.

Analysis goal

Based on user data and consumer behavior data

  • build a classification model and perform logistic regression.
  • predict which customer group has a higer probability of using coupons.

1 Data analysis

1.1 Index explanation

  • id
  • age
  • job
  • marriage
  • default: if there is a breach of contact
  • returned: if there is returning the goods
  • loan: pay with ANT Credit pay
  • coupon_used_in_last_6month
  • coupon_used_in_last_month
  • coupon_ind: if using coupon in the purchase

1.2 Clean data

  • change categorical variables: default, returned and loan to numercical varaibles, rename to coupon1
  • concat coupon and coupon1
coupon = pd.concat([coupon, coupon1], axis = 1)
  • remove 'ID', 'default', 'default_no', 'returned', 'returned_no', 'loan' and 'loan_no' columns
  • rename coupon_ind to flag

2 Univariate analysis

  • 2.1 Observe balance of 'flag' samples 0 and 1
    • In the binary classification problem, the proportion of 0 and 1 should be balanced, and in actual situations it should not be less than 0.05, otherwise it will affect the prediction of the model.
    • The proportions of 0 and 1 in this data set are both higher than 0.05, so the distribution is reasonable.
  • 2.2 Observe the mean value
  • 2.3 Visualization

returned_goods-view

    • Compared with customers who have not returned goods, customers who return goods are less likely to use coupons.

marriage-view

    • Married customers are slightly more likely to use coupons than unmarried and divorced customers.
    • Married people have a higher probability of not using coupons than unmarried people.

job-view

    • Customers whose job title are management, technician, blue-collar are more likely to use coupon.

distplot-view

    • The data shows that the customer cluster with a higher probability of using coupons is 40 years old.
    • It is found that age > 60 years old has fewer extreme values, but they affect the overall data distribution, and these needs to be excluded from the scope of analysis.

pie-view

    • The 20-40 year old customer group has the highest probability of using coupon
    • 18, 32, and 48 years old are the average ages with the highest probability of using coupon among the three age groups.

3 Revlevant and visualization

corr-view heatmap-view

    • flag has strong pairwise positive relations with coupon_used_in_last_month and age.
    • flag has strong pairwise negative relations with coupon_used_in_last6_month and returned_yes.

4 Establish and evaluate logistic regression model

4.1 Model result

  • when coupon_used_in_last_month increases from 0 to 1,the probability increase from not using coupon to using coupon is e^0.3867,that is 1.46 times.
  • when returned_yes from 1 to 0,the probability increase from using coupon to not using coupon is 0.9590 times.
  • when loan_yes from 1 to 0,the probability increase from using coupon to not using coupon is 0.5622 times.
  • Therefore, from the perspective of probability, customers who have used coupons last month, customers who have not returned goods, and customers who have not paid with ANT credit pay will have a higher probability of using coupon again.

4.2 Model evaluation

Compute accuracy score

Score Training set score Test set score
0 0.880424 0.885071

4.3 Model refinement

Score Training set score Test set score
Before 0.880424 0.885071
After 0.880989 0.885861

5 Summary

  • The data shows that 18-95 years old customers are more likely to use coupons, and the customer group with a higher probability of using coupons is 40 years old.
  • The 20-40 years old customer group has the highest probability of using coupon.
  • 18, 32, and 48 are the average ages with the highest probability of using coupon among < 20, < 40, < 60 age groups.

logistic_regression_tmall's People

Contributors

zjw-92 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.