Giter Site home page Giter Site logo

neubaroco's Introduction

NeuBAROCO

Datasets and scripts for the ACL2024 Findings paper: "Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset".

Contents

Datasets

NLI (Natural Language Inference) Task Format

File

data/NeuBAROCO_NLI.tsv

Description

Column Name Description
ID problem ID
ORIGINAL_ID (INTERNAL) original problem ID
premises_ja two premises in Japanese
hypothesis_ja one hypothesis in Japanese
premises_en two premises in English
hypothesis_en one hypothesis in English
gold correct answer, the relationship of the hypothesis to the premises (entailment, contradiction, neutral)
mood the form of each premise and conclusion (three letters composed of A, E, I and O)
inference-type type of logical inferences (syllogism, propositional)
content-type classification based on belief congruency (symbolic, congruent, incongruent)
conversion associated with conversion error (yes, no)
atmosphere associated with atmosphere effect (yes, no)
  • See our paper for details on content-type, inference-type, conversion, and atmosphere.

Multiple-Choice Task Format

File

data/NeuBAROCO_MC.tsv

Description

Column Name Description
ID problem ID
premises_ja two premises in Japanese
hypothesis_ja_1 hypothesis 1 in Japanese
hypothesis_ja_2 hypothesis 2 in Japanese
hypothesis_ja_3 hypothesis 3 in Japanese
hypothesis_ja_4 hypothesis 4 in Japanese
hypothesis_ja_5 hypothesis 5 in Japanese
premises_en1 two premises in English
hypothesis_en_1 hypothesis 1 in English
hypothesis_en_2 hypothesis 2 in English
hypothesis_en_3 hypothesis 3 in English
hypothesis_en_4 hypothesis 4 in English
hypothesis_en_5 hypothesis 5 in English
gold correct answer (1-5)
content-type classification based on belief congruency (symbolic, contentual, congruent, incongruent)
mood the form of each premise and conclusion (three letters composed of A, E, I and O)
figure code for the order in which each term appears (1-4)
  • NOTE: One of the five hypotheses is "none of them".

Data used in the NALOMA2023 experiments

File

data/naloma2023/NeuBAROCO_NALOMA.tsv

Running scripts

Setup

git clone https://github.com/kmineshima/NeuBAROCO
cd NeuBAROCO
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Set API keys

export OPENAI_API_KEY=<YOUR_KEY>  # For OpenAI API
export HUGGINGFACE_API_KEY=<YOUR_KEY>  # For HuggingFace Inference Endpoints API

Evaluation

ACL2024 experiments

Basic usage

python -m scripts.experiments.acl2024 --help

NLI Task

Example:

python -m scripts.experiments.acl2024 nli --test_n=all --lang en ja --model gpt-3.5-turbo-1106 gpt-4-0613

Multiple-Choice Task

Example:

python -m scripts.experiments.acl2024 choice5 --test_n=all --lang en ja --model gpt-3.5-turbo-1106 gpt-4-0613

Citation

If you use this data in any published research, please cite the following:

@inproceedings{ozeki-etal-2024-exploring,
    title = "Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the {N}eu{BAROCO} Dataset",
    author = "Ozeki, Kentaro  and
      Ando, Risako  and
      Morishita, Takanobu  and
      Abe, Hirohiko  and
      Mineshima, Koji  and
      Okada, Mitsuhiro",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.950",
    pages = "16063--16077",
}

neubaroco's People

Contributors

ozekik avatar kmineshima avatar

Stargazers

Risako Ando avatar

Watchers

Kostas Georgiou avatar  avatar  avatar Risako Ando avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.