Giter Site home page Giter Site logo

mattivi / covid-papers-browser Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gsarti/covid-papers-browser

0.0 1.0 0.0 5.78 MB

Browse Scientific Articles about Covid-19 & SARS-CoV-2 with SciBERT-NLI ๐Ÿฆ 

License: GNU General Public License v2.0

Python 100.00%

covid-papers-browser's Introduction

Covid-19 Browser: Browse Scientific Articles about Covid-19 & SARS-CoV-2 with SciBERT-NLI

Covid-19 Browser is an interactive experimental tool leveraging a state-of-the-art language model to search relevant content inside the COVID-19 Open Research Dataset (CORD-19) recently published by the White House and its research partners. The dataset contains over 44,000 scholarly articles about COVID-19, SARS-CoV-2 and related coronaviruses.

The model used to perform the search is SciBERT-NLI, a version of AllenAI's SciBERT fine tuned on the Natural Language Inference tasks SNLI and MultiNLI using the sentence-transformers library to produce universal sentence embeddings, as shown by the InferSent paper by Conneau et al. Embeddings are subsequently used to perform semantic search on CORD-19

Currently supported operations are:

  • Browse paper abstract with interactive queries.

  • Reproduce SciBERT-NLI training results.

Setup

Python 3.6 or higher is required to run the code. First, install the required libraries with pip:

pip install -r requirements.txt

Using the Browser

First of all, download the model from HuggingFace cloud repository.

python scripts/download_scibert.py

Second, download the data from the Kaggle challenge page and place it in the data folder.

Finally, simply run:

python scripts/interactive_search.py

to enter the interactive demo. Using a GPU is suggested since the creation of the embeddings for the entire corpus might be time-consuming otherwise. Both the corpus and the embeddings are cached on disk after the first execution of the script.

Use the interactive demo as follows:

Demo GIF

Reproducing the SciBERT-NLI Training

First, download AllenAI SciBERT.

python scripts/download_scibert.py --model scibert

Second, download the NLI datasets used for training and the STS dataset used for testing.

python scripts/get_finetuning_data.py

Finally, run the finetuning script.

python scripts/finetune_nli.py

The model will be evaluated against the test portion of the Semantic Text Similarity (STS) benchmark dataset at the end of training. Default parameters were the ones used for training gsarti/scibert-nli.

covid-papers-browser's People

Contributors

gsarti avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.