Giter Site home page Giter Site logo

fever's Introduction

FEVER

The code in this repository has three functions:

  • (1) Automatically download all the data from the FEVER Challenge,
  • (2) Create databases to access the wikipedia pages/claims
  • (3) Create databases with the tokenized text/claim

The advantages of this approach are:

  • (1) The text in the databases are all stored in a common utf format. The raw data from the wikipedia pages and claims have different utf fromats. In the database creation process, I convert it to a common utf format.
  • (2) The Spacy package is used for tokenization and is better than tokenization with e.g. nltk, because it takes context into conisderation. This is costly and therefore the tokenized text is directly stored, so it can be retrieved quickly.
  • (3) The databases are created with end-to-end checks to esure that the databases are constructed fully. If an error occurs and the database is called again, then an error is automatically generated and the database cannot be accessed.
  • (4) Allows to use RAM or ROM for database with initialisation. The database classes are not called any differently in order to retrieve data. This makes the database scructure and code independent of the database storing method.

computer settings

  • python3
  • laptop

setup (only once):

description: this script sets up the folder structure and downloads the wikipedia and claim dataset

$ bash setup.sh

description: add all packages to virtual environment

$ pip3 install -r requirements.txt

startup

description: this script sets the paths to the different directories in the folder. This script needs to be called every time a command prompt is started.

$ source set_paths.sh

train databases

description:

$ cd _10_scripts/_10_document_retrieval
$ python3 wiki_database.py
$ python3 claim_database.py
$ python3 wiki_database_n_grams.py
$ python3 claim_database_n_grams.py

tutorial

description: run the jupyter notebook tutorial.ipynb to train and investigate the databases

$ jupyter notebook

fever's People

Contributors

bartmelman avatar

Stargazers

Mara Graziani avatar

Watchers

James Cloos avatar  avatar Jonty P avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.