In this project I analyzed the interactions that users have with articles on the IBM Watson Studio platform, and made recommendations of articles they will like.
Five steps for this project:
Before we dive into the details of recommendation system, we explore the data and ask questions about the data we are working with.
Find the most popular articles based on the most interactions with users. Since there is no rating for any of the articles, it is easy to assume the articles with the most interactions are the most popular.
To provide better recommendations to the users, we could look at users that are similar in terms of the items they have interacted with. These items could then be recommended to the similar users.
We might be able to use NLP to to develop a content based recommendation system. ( This is not required to complete this project.)
Finally, we can use machine learning approach to building recommendation system. we can build a matrix decomposition using the user-item interactions. Using this decomposition, we can get an idea of how well we can predict new articles an individual might interact with.
To improve the model, we can use A/B test to verify the effect of training vs testing data. Firstly, we can randomly divided the users into two groups, where the articles that the users see are recommended by recommendation system. In the other group, the articles that the users see are randomly selected from all the articles. Then, we can evaluate how much experiment size should be collected. Finally, we can use these experimental data to make a hypothesis test. The null hypothesis is that the recommendations with any of the recommendation system do not raise the proportion of article which users interact with. We can then use a parametric test or non-parametric test to estimate p vale. If the p value is less than 0.05, it means the hypothesis is rejected and the recommendation systems are an improvement to how users currently find articles.
The code contained in this repository was written in Python 3, and requires the following Python packages: pandas, numpy, matplotlib, seaborn.
This app was developed as part of the Udacity Data Scientist Nanodegree.