Giter Site home page Giter Site logo

idanmoradarthas / quora-questions-pairs-app Goto Github PK

View Code? Open in Web Editor NEW
5.0 0.0 3.0 34.59 MB

In this research I'd like to use BERT with the huggingface PyTorch library to fine-tune a model which will perform best in question pairs classification. The app is build using Streamlit.

License: MIT License

Python 3.77% Jupyter Notebook 96.23%
bert-model huggingface huggingface-transformer huggingface-transformers huggingface-library bert pytorch pytorch-tutorial python-3 python3 streamlit-webapp quora-question-pairs quora-question-pair quora quora-questions nlp nlp-machine-learning natural-language-processing

quora-questions-pairs-app's Introduction

Quora-Questions-Pairs-App

This research is based on the toturial BERT Fine-Tuning Tutorial with PyTorch.

Under training-bert folder you can find a Jupyter notebook. There I show how I fined-tune base-uncased bert model to solve the classification problem of duplication questions from Quora website.

Introduction

In this research I'd like to use BERT with the huggingface PyTorch library to fine-tune a model which will perform best in question pairs classification. The app is build using Streamlit.

So firstly let's talk about the model and the dataset:

Bert

Bidirectional Encoder Representations from Transformers (BERT) was released, and pretrained, in late 2018 by Google (see original model code here) for NLP (Natural Language Processing) tasks. Bert was created originally by Jacob Devlin with two corpora in pre-training: BookCorpus and English Wikipedia.

BERT consists of 12 Transformer Encoding layers (or 24 for large BERT). If you stack Transformer Decoding layers you'll GPT model to generate senetances.

You can more information inthe those videos:

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

BERT Neural Network - EXPLAINED!

Quora Question Pairs Dataset

Quora is a question-and-answer website where questions are asked, answered, followed, and edited by Internet users, either factually or in the form of opinions. Quora was co-founded by former Facebook employees Adam D'Angelo and Charlie Cheever in June 2009. website was made available to the public for the first time on June 21, 2010. Today the website is available in many languages.

Over 100 million people visit Quora every month, so it's no surprise that many people ask similarly worded questions. Multiple questions with the same intent can cause seekers to spend more time finding the best answer to their question, and make writers feel they need to answer multiple versions of the same question.

The goal is to predict which of the provided pairs of questions contain two questions with the same meaning. The ground truth is the set of labels that have been supplied by human experts. The dataset itself can be downloaded from kaggle: here.

Application

How to use it?

see the following video:

Instructions video

Install

Clone the repo:

git clone https://github.com/idanmoradarthas/Quora-Questions-Pairs-App.git
cd Quora-Questions-Pairs-App

go to the training folder, install the requirements and run the notebook in order to create the model:

cd training-bert
pip install -r requirements.txt
jupyter notebook

Install the requirements in the main folder:

cd ..
pip install -r requirements.txt

Run Streamlit:

streamlit run app.py

quora-questions-pairs-app's People

Contributors

dependabot[bot] avatar idanmoradarthas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

quora-questions-pairs-app's Issues

Get finetuned model

Hi, This is not an issue, but more of a request.

I am currently working on sentence similarity classification. If possible, can you please provide your fine-tune model location for me to predict?

Also, I tried a few kinds of stuff using pre-trained models, but I am getting fixed prediction results.

Thank you!!

Need fine tuned model

Hi, This is not an issue, but more of a request.

I am currently working on sentence similarity classification. If possible, can you please provide your fine-tune model location for me to predict?

Also, I tried a few kinds of stuff using pre-trained models, but I am getting fixed prediction results.

Thank you!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.