Giter Site home page Giter Site logo

tbsraja / quora_questionpairs Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 1.74 MB

Build a Classification model to predict if the questions asked in Quora are duplicates of the existing questions.

Jupyter Notebook 100.00%
deep-learning exploratory-data-analysis machine-learning natural-language-processing lstm-networks

quora_questionpairs's Introduction

Quora Question Pair Similarity

Quora Question Pairs- kaggle

Problem statement

  • Predict if the questions asked in Quora are duplicates of the existing questions.
  • This could be useful to instantly provide answers to questions that have already been answered.

Mapping real world problem to ML problem

  • Binary Classification problem: In this case study, we predict whether a pair of questions are duplicates or not.

Business objectives

  • The cost of a mis-classification can be high. (Customer disappointment if non-duplicates are classified as duplicates)
  • Provide the probability that a pair of questions can be duplicates so that a custom threshold can be chosen.
  • No-strict latency requirements (can take few seconds)
  • Interpretability is partially important. (Not needed for the Customer)

Performance metric

  • Binary Cross entropy
  • Binary confusion matrix

Overview of the Data set

  • Data in train.csv
  • Train.csv contains 5 columns : qid1, qid2, question1, question2, is_duplicate
  • Size of Train.csv - 60MB
  • Number of rows in Train.csv = 404,290

Description of columns:

  • qid1 : unique ID of question number 1
  • qid2 : unique ID of question number 2
  • question1 : Textual content of question1
  • question2 : Textual content of question2
  • is_duplicate : target variable
  • 0 : Pair of questions are not duplicate
  • 1 : Pair of questions are duplicate

Train-Test Split: We build train and test by randomly splitting in the ratio of 70:30

quora_questionpairs's People

Contributors

tbsraja avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.