Giter Site home page Giter Site logo

ikmlab / cfever-data Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 1.52 MB

AAAI-24 CFEVER: A Chinese Fact Extraction and VERification Dataset

Home Page: https://ikmlab.github.io/CFEVER

License: Apache License 2.0

aaai2024 fact-checking fact-extraction fact-verification fever nlp

cfever-data's Introduction

CFEVER-data

Introduction to CFEVER

This repository contains the dataset for our AAAI 2024 paper, "CFEVER: A Chinese Fact Extraction and VERification Dataset". (Paper link will be provided soon.)

Leaderboard website

Please visit https://ikmlab.github.io/CFEVER to check the leaderboard of CFEVER.

Repository structure

CFEVER-data
├── data
│   ├── dev.jsonl # CFEVER development set
│   ├── test.jsonl # CFEVER test set without labels and evidence
│   └── train.jsonl # CFEVER training set
├── LICENSE
├── README.md
└── sample_submission.jsonl # sample submission file of the test set

Getting started

  • Download this repository
git clone https://github.com/IKMLab/CFEVER-data.git
cd CFEVER-data
unzip wiki-pages.zip
  • Then you will get a folder named wiki-pages containing 24 jsonl files. Each file contains the 50,000 processed Wikipedia pages.
    • In each jsonl file, each line is a json object representing a Wikipedia page. The json object has the following fields:
      • id: the Wikipedia page name
      • text: the processed text of the Wikipedia article
      • lines: the processed text of the Wikipedia article including the sentence numbers

Evaluation

Submission

  • Please include three fields (necessary) in the prediction file for each claim in the test set.
    • id
    • predicted_label
    • predicted_evidence
  • The id field has been already included in the test set. Please do not change the order.
  • The predicted_label should be one of supports, refutes, or NOT ENOUGH INFO.
  • The predicted_evidence should be a list of evidence sentences, where each evidence sentence is represented by a list of [page_id, line_number]. For example:
# One evidence sentence for the claim
{
    "id": 1,
    "predicted_label": "REFUTES",
    "predicted_evidence": [
        ["page_id_2", 2],
    ]
}
# Two evidence sentences for the claim
{
    "id": 1,
    "predicted_label": "SUPPORTS",
    "predicted_evidence": [
        ["page_id_1", 1],
        ["page_id_2", 2],
    ]
}
# The claim cannot be verified
{
    "id": 1,
    "predicted_label": "NOT ENOUGH INFO",
    "predicted_evidence": None
}

Reference

If you find our work useful, please cite our paper.

@article{Lin_Lin_Yeh_Li_Hu_Hsu_Lee_Kao_2024,
    title = {CFEVER: A Chinese Fact Extraction and VERification Dataset},
    author = {Lin, Ying-Jia and Lin, Chun-Yi and Yeh, Chia-Jen and Li, Yi-Ting and Hu, Yun-Yu and Hsu, Chih-Hao and Lee, Mei-Feng and Kao, Hung-Yu},
    doi = {10.1609/aaai.v38i17.29825},
    journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
    month = {Mar.},
    number = {17},
    pages = {18626-18634},
    url = {https://ojs.aaai.org/index.php/AAAI/article/view/29825},
    volume = {38},
    year = {2024},
    bdsk-url-1 = {https://ojs.aaai.org/index.php/AAAI/article/view/29825},
    bdsk-url-2 = {https://doi.org/10.1609/aaai.v38i17.29825}
}

cfever-data's People

Contributors

mcps5601 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

guihong01

cfever-data's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.