Giter Site home page Giter Site logo

infobench's Introduction

InfoBench

Citation

@article{qin2024infobench,
      title={InFoBench: Evaluating Instruction Following Ability in Large Language Models}, 
      author={Yiwei Qin and Kaiqiang Song and Yebowen Hu and Wenlin Yao and Sangwoo Cho and Xiaoyang Wang and Xuansheng Wu and Fei Liu and Pengfei Liu and Dong Yu},
      year={2024},
      eprint={2401.03601},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Evaluation with InFoBench

Step1: Dataset Usage

You can directly download it with huggingface datasets.

from datasets import load_dataset

dataset = load_dataset("kqsong/InFoBench")

Step2: Generating the response

Provide an output file in model/output.json. Each data entry should be a json object with a newline, containing all the fields in the input format. The generated response should be included in the json object with the new field named output.

We suggest using greedy decoding to avoid the randomness of decoding.

Step3: Evaluation

Evaluate LLM's outputs on decomposed questions. Using GPT-4-0314 by default in this research.

python evaluation.py \
  --api_key <OPENAI KEY> \
  --eval_model gpt-4-0314 \
  --input model/output.json \
  --output_dir evaluation/ \
  --temperature 0

Each data entry will include an "eval" key in the format of List[bool] which represents "Yes" or "No" answers to each decomposed question. The final output evaluation file will be saved in JSON format at location <output_dir>/<eval_model>/.

infobench's People

Contributors

yebowenhu avatar kaiqiangsong avatar qinyiwei avatar

Stargazers

Iftitahu Ni'mah avatar Zuo-Lihan avatar Mandy Wu avatar  avatar  avatar Hiroo Takizawa avatar feifeirun avatar  avatar E Bala avatar Payne_Wu avatar KABI avatar Benhao Huang avatar Jungwon Seo avatar Leon avatar Jeff Carpenter avatar Oliver avatar Kaizan.wyl avatar skykiseki avatar QianyuH avatar  avatar Pan Yinxu avatar Xiong Jun Wu(熊君武) avatar Jeonghwan Kim avatar  avatar Yuan Guo avatar Weixiao Zhou avatar  avatar (Bill) Yuchen Lin avatar QinLuo avatar Kunat Pipatanakul avatar Pengfei Liu avatar Wenlin Yao avatar TreMila avatar 西瓜榴莲鸡 avatar Hao avatar  avatar 唐国梁Tommy avatar  avatar Yanan Zhang avatar Chengshun SHI avatar  avatar

Watchers

 avatar  avatar Pengfei Liu avatar

infobench's Issues

There are some spelling mistakes in prompt. Is it intentional or accidental?

SYS_MSG ="Based on the provided Input (if any) and Generated Text, answer the ensuing Ouestions with either a YES or NOchoice. Your selection should be based on your judgment as well as the following rules:\n\n- YES: Select 'YES' if the generated text entirely fulfills the condition specified in the question. Howevernote that even minor inaccuracies exclude the text from receiving a 'YES' rating. As an illustration. consider aquestion that asks. "Does each sentence in the generated text use a second person?” If even one sentence doesnot use the second person, the answer should NOT be 'YES'. To qualify for a 'YES' rating, the generated textmust be entirely accurate and relevant to the question\n\n- NO: Opt for 'NO' if the generated text fails to meet the question's requirements or provides no informationthat could be utilized to answer the question. For instance, if the question asks. "Is the second sentence irthe generated text a compound sentence?" and the generated text only has one sentence. it offers no relevantinformation to answer the question. Consequently, the answer should be 'NO'.'''"

There are some spelling mistakes like "Ouestions", "Howevernote". Is it intentional or accidental?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.