Statistical Analysis: A/B Testing and Machine Learning

This Data Analytics Project focuses on the application of A/B testing and Machine Learning methodologies.

Project Objective

The objective is to explore the application of A/B testing and Machine Learning techniques to determine the popularity of different versions among users and uncover relationships between variables.

Methods Used

Inferential Statistics
Data Visualization
Probability Calculation
A/B Testing
Machine Learning

Model Used

Statistical Analysis:
- A/B Testing
Machine Learning:
- Logistic Regression

Technologies and Packages Used

Python, Jupyter Notebook
Numpy, Matplotlib.pyplot
Pandas, Statsmodels.api

Project Description

Motivation:

A/B testing is a common practice in data analysis and data science. This project aims to analyze the results of an A/B test conducted by an e-commerce website to assist in decision-making regarding the implementation of a new page or the retention of the old page.

Data and Scope:

The dataset consists of approximately 300,000 variables with 5 independent factors initially used in A/B testing. Additional factors are incorporated for analysis using Logistic Regression. The dataset requires no additional cleaning as it is well-organized. Calculating probabilities is a critical step in building a robust model.

Methodology Approach:

Calculated Probability:

Compute the proportion of users converted.
Group the dataset with treatment into with/without the new version.
Check for duplicates and remove them.
Calculate probabilities for each assigned group.

A/B Testing:

Determine required probabilities for A/B testing.
Visualize results with histogram plots.
Compute p-value and interpret results.
Apply A/B testing with statistical packages to determine testing outcomes.

Modeling Approach:

Utilize Logistic Regression for categorical dataset.
Summarize the Logistic Regression model.
Merge with countries data and summarize the model again.
Test with additional factors and interpret results.

Conclusion:

Based on the Calculated Probability, over half of the users show interest in visiting the new_pages, lending credibility to the model. While the conversion rate to the new_pages is approximately 11.96%, it remains acceptable in a business context. Additionally, both the "control" and "treatment" groups have a similar likelihood of receiving the new page, around 12%, suggesting comparable group sizes and conversion rates. Hence, the experimental model appears robust and compelling.

In the A/B Testing results, the p-value stands at approximately 9.6%, exceeding the Type I error threshold of 5%. Consequently, I should retain the null hypothesis (H0). Notably, the control group's actual p-value surpasses that of the treatment group, indicating no overfitting. Given the higher p-value for the old version compared to the new, sticking with the old version is advisable. Moreover, since the probabilities for both new_pages and old_pages usage align, no bias is evident.

Regarding the Modeling Approach, Logistic Regression proves a fitting method, given the categorical nature of the response variables. Incorporating new variables like 'timestamp' could strengthen the results. Specifically, time-based classifications such as 'morning', 'afternoon', and 'evening', as well as 'weekday' versus 'weekend', could be added for improved performance. However, this may introduce complexity, requiring careful scrutiny to ensure variable interdependence. If confirmed, incorporating higher-order terms could enhance predictive accuracy, otherwise, the results remain reliable.

References:

sajidkhan2067 / datascience-ab-testing Goto Github PK

datascience-ab-testing's Introduction

Statistical Analysis: A/B Testing and Machine Learning

Project Objective

Methods Used

Model Used

Technologies and Packages Used

Project Description

Motivation:

Data and Scope:

Methodology Approach:

Calculated Probability:

A/B Testing:

Modeling Approach:

Conclusion:

datascience-ab-testing's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent