Giter Site home page Giter Site logo

aninditamishra / resque-stack-overflow-question-recommendation-system Goto Github PK

View Code? Open in Web Editor NEW

This project forked from akhilamangipudi/resque-stack-overflow-question-recommendation-system

0.0 1.0 0.0 524.61 MB

Stack Overflow Questions Recommendation System

Jupyter Notebook 84.37% Python 15.63%

resque-stack-overflow-question-recommendation-system's Introduction

ResQue-Stack-Overflow-Question-Recommendation-System

Stack Overflow Questions Recommendation System

Steps followed to generate Question Recommendations:

Pre-processing:

The Posts.xml file which has all the question and answer information is parsed into Questions.csv and Answers.csv run pre-processing/data_preprocessing.py and csv files will be generated in csv_files directory

Create embeddings:

The question body, title and tag information is taken from the csv files and word embeddings are created for each of them run embeddings/create_embeddings.py and the question_title, question_body, question_tag, answer_body embeddings are created in embeddings folder.

Clustering:

For feature vector representation, combinations of only title, title+body and title+body+tags were experimented upon. The final word embedding for a question is formed based on the combination we wish to use. Clustering is applied on all the questions and the technique used is Affinity Propagation run clustering/clustering.py and the cluster centers and the cluster labels are stored in clustering directory

Question Recommendations:

  1. Given a question, find which cluster the question belongs to
  2. Find the 5 nearby clusters to the given cluster
  3. Assemble all the questions from the nearby clusters and the given cluster
  4. Compute cosine similarity and rank the questions according to their similarity to the given question

Additionally, answer information was also used to generate question recommendations. The hypothesis used was: If two answers are similar, then their questions are also similar. The first 5 recommendations from question and answer information were interleaved to form the final ranked list

run models/questions_embeddings_cosine.py and it generates ques_recommendations.dat which has information of each question and 10 most similar questions to the given question.

Hybrid Recommendations:

To capture the rare terms as well as the semantic relations, combination of tf-idf and word embeddings were used. run models/hybrid_recommendations.py to generate hybrid recommendations for a given question

Evaluation:

  1. Read PostLinks.xml and club related question ids and answer ids for a particular question.
  2. Evaluate our model by taking weighted average of the number of questions from relatedID list (from above step) that's present in the list of recommendations and the number of questions from relatedID list that's present in the top half of the list of recommendations

resque-stack-overflow-question-recommendation-system's People

Contributors

akhilamangipudi avatar aninditamishra avatar vkhanna92 avatar woohuu avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.