Giter Site home page Giter Site logo

yucoian / bertqa-attention-on-steroids Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ankit-ai/bertqa-attention-on-steroids

0.0 2.0 0.0 39.1 MB

BertQA - Attention on Steroids

License: Apache License 2.0

Python 6.74% Dockerfile 0.04% Shell 3.00% Jupyter Notebook 90.22%

bertqa-attention-on-steroids's Introduction

BertQA - Attention on Steroids

Developers - Ankit Chadha ([email protected]) Rewa Sood ([email protected])


This repository is based off Hugging face's PyTorch BERT implementation

This was done as part of CS224n: Natural Language Processing with Deep Learning - Stanford / Winter 2019 class project. At the time of submission, we were #1 on the class's SQuAD Leaderboard.



Abstract


In this work, we extend the Bidirectional Encoder Representations from Transformers (BERT) with an emphasis on directed coattention to obtain an improved F1 performance on the SQUAD2.0 dataset. The Transformer architecture on which BERT is based places hierarchical global attention on the concatenation of the context and query. Our additions to the BERT architecture augment this attention with a more focused context to query and query to context attention via a set of modified Transformer encoder units. In addition, we explore adding convolution based feature extraction within the coattention architecture to add localized information to self-attention. The base BERT architecture with no SQUAD2.0 specific finetuning produces results with an F1 of 74. We found that coattention significantly improves the no answer F1 by 4 points while causing a loss in the has answer F1 score by the same amount. After adding skip connections the no answer F1 improved further without causing an additional loss in has answer F1. The addition of localized feature extraction added to attention produced the best results with an overall dev F1 of 77.03 due to a marked improvement in the has answer F1 score. We applied our findings to the large BERT model which contains twice as many layers and further used our own augmented version of the SQUAD 2.0 dataset created by back translation. Finaly, we performed hyperparameter tuning and ensembled our best models for a final F1/EM of 82.148/79.239 (Attention on Steroids, PCE Test Leaderboard).

Neural Architecture


Here is an overview of our network architecture BERTQA

Dataset (SQuAD 2.Q)


We use an augmented version of the SQuAD 2.0 dataset based on the concept of Back Translation. You can download the dataset here.

To read more on the process of Back Translation you can refer this resource

Command Lines


This repository has command line bash files with the optimal hyperparameters our network was tuned for.

1. Sanity Check 
#Launch a debug run on 1 example out of the SQuAD 2.0 training set - Beyonce paragraph 
examples/rundbg.sh

2. Train on SQuAD 2.Q
#Fine tunes BERT layers on SQuAD 2.Q and trains additional directed co-attention layers.
run_bertqa_expt.sh

3. Train on SQuAD 2.0
#Fine tunes BERT embedding layers on SQuAD 2.0 and trains additional directed co-attention layers.
examples/run_bertqa.sh

Refer to the paper for more details on our hyperparameters chosen.

bertqa-attention-on-steroids's People

Contributors

ankit-ai avatar rewasood21 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.