Giter Site home page Giter Site logo

shaikhalid / book-recommendation-system-using-pyspark Goto Github PK

View Code? Open in Web Editor NEW

This project forked from laxmivanam/book-recommendation-system-using-pyspark

0.0 0.0 0.0 6.41 MB

The book recommendation system is based on the Item based collaborative filtering technique. The script is written using pyspark on top of Spark's built in cluster manager. It is used to recommend similar books to each other based on the ratings and the strength of the ratings.

Python 100.00%

book-recommendation-system-using-pyspark's Introduction

Book-recommendation-system-using-Pyspark

The book recommendation system is based on the Item based collaborative filtering technique. The script is written using pyspark on top of Spark's built in cluster manager. It is used to recommend similar books to each other based on the ratings and the strength of the ratings.

This is based on the concept that "Users who liked this item also liked …”

Steps followed are as follows:

  1. It will take a book and find the users who liked that book.
  2. It then finds other books that similar users liked and form pairs of books that were read by a user
  3. It then measures the similarity of thir ratings across all the users who read both
  4. It takes items and outputs other items as recommendations sorting by strength of similarity.

Metric used: Cosine similarity: Compute how similar two non-zero vectors (of ratings) are in order to determine the similarity score between two books.

Future enhancements:

  • Adjust the thresholds for the number of co-raters and the minimum score
  • The quality of the similarities can be improved with different similarity metrics (Pearson correlation coeffient/ Jaccard similarity metric etc.)
  • Invent a new similarity metric that takes number of co-raters into consideration.
  • Take the author of the books into consideration to boost the scores.

Dataset: •BX-Books - Books are identified by their respective ISBN. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is obtained from Amazon Web Services. •BX-Book-Ratings - Contains the book rating information. Ratings are either explicit, expressed on a scale from 1-10 (higher values denoting higher appreciation), or implicit, expressed by 0.

This is based on the publicly available Books crossing dataset pulled from :http://www2.informatik.uni-freiburg.de/~cziegler/BX/

Keywords- collaborative filtering, recommendation systems, pyspark,cache, persist, broadcast variables, cosine similarity, command line arguments etc.

book-recommendation-system-using-pyspark's People

Contributors

laxmivanam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.