Giter Site home page Giter Site logo

hover's Introduction

Hover

This repo contains the source code of the baseline models described in the following paper

  • "HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification" in Findings of EMNLP, 2020. (paper).

The basic code structure was adapted from Transformers.

0. Preparation

Dependencies

  • PyTorch 1.4.0/1.6.0
  • See requirements.txt.

Data

  • Run download_data.sh to download the HoVer dataset.

1. Document Retrieval

TF-IDF Pre-retrieval

We provide the top-100 Wikipedia articles retrieved by running DRQA on the HoVer dataset. It was already downloaded in data/hover/tfidf_retrieved.

Training Neural-based Document Retrieval Model

  • Prepare the data by running:
python prepare_data_for_doc_retrieval.py --data_split=dev --doc_retrieve_range=20
python prepare_data_for_doc_retrieval.py --data_split=train --doc_retrieve_range=20

This will add the top-20 TF-IDF retrieved documents to the data as candidates of the following neural document retrieval stage.

  • Run ./train_scripts/train_doc_retrieval.sh. The model checkpoints are saved in out/hover/exp1.0/doc_retrieval.

Evaluating Neural-based Document Retrieval Model

  • Run the evaluation:
./eval_scripts/eval_doc_retrieval_on_dev.sh
./eval_scripts/eval_doc_retrieval_on_train.sh

This will evaluate the model on both the training set and dev set because we need both predictions to construct the training/dev set for the sentence selection.

2. Sentence Selection

Training Sentence-selection Model

  • First, start the Stanford Corernlp in the background. We use Corenlp to split the sentences:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000
  • Prepare the data by running:
python prepare_data_for_sent_retrieval.py --data_split=dev --sent_retrieve_range=5
python prepare_data_for_sent_retrieval.py --data_split=train --sent_retrieve_range=5

This will add the sentences from the top-5 retrieved documents as candidates of the following sentence selection stage.

  • Run ./train_scripts/train_sent_retrieval.sh. The model checkpoints are saved in out/hover/exp1.0/sent_retrieval.

Evaluating Sentence-selection Model

  • Run the evaluation:
./eval_scripts/eval_sent_retrieval_on_dev.sh
./eval_scripts/eval_sent_retrieval_on_train.sh

This will evaluate the model on both the training set and dev set because we need both predictions to construct the training/dev set for the claim verification.

3. Claim Verification

Training Claim-verification Model

  • Prepare the data by running:
python prepare_data_for_claim_verification.py --data_split=dev
python prepare_data_for_claim_verification.py --data_split=train
  • Run ./train_scripts/train_claim_verification.sh. The model checkpoints are saved in out/hover/exp1.0/claim_verification.

Evaluating Claim-verification Model

  • Run the evaluation:
./eval_scripts/eval_claim_verification_on_dev.sh

Citation

@inproceedings{jiang2020hover,
  title={{HoVer}: A Dataset for Many-Hop Fact Extraction And Claim Verification},
  author={Yichen Jiang and Shikha Bordia and Zheng Zhong and Charles Dognin and Maneesh Singh and Mohit Bansal.},
  booktitle={Findings of the Conference on Empirical Methods in Natural Language Processing ({EMNLP})},
  year={2020}
}

hover's People

Contributors

boxxfish avatar jiangyctarheel avatar hover-nlp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.