Giter Site home page Giter Site logo

benchmark_llm_summarization's Introduction

Benchmarking Large Language Models for News Summarization

This repository contains the data release for the paper Benchmarking Large Language Models for News Summarization.

Likert evaluation data

likert_evaluation_results.jsonl contains the results of the Likert evaluation in Table 2. The file should be loaded as a single JSON file and is a List[Dict]. Each dictionary contains the following keys:

  • model: the model name
  • article: the text to the article
  • summary: model output summary
  • dataset: the dataset name, an option between "cnndm" or "xsum"
  • faithfulness: the faithfulness score given by the annotator. The score is binary (0 or 1).
  • coherence: the coherence score given by the annotator. The score is from 1 to 5.
  • relevance: the relevance score given by the annotator. The score is from 1 to 5.
  • annotation_id: the annotation id.

Pairwise evaluation data

pairwise_evaluation_results.jsonl contains the results of the pairwise evaluation in Figure 5. The file should be loaded as a single JSON file and is a List[Dict]. Each dictionary contains the following keys:

  • article_id: unique identifier for the article
  • writer_id: unique identifier for the writer of the writer summary
  • evaluator_id: unique identifier for the evaluator for the pairwise comparison
  • article_text: the text to the article
  • writer_summary: the summary written by the writer
  • text-davinci-002_summary: the summary generated by the model text-davinci-002
  • overall_writer_better: whether the writer summary is better than the model summary. The score is an option among True, False, or Equally Good.
  • informative_writer_better: whether the writer summary is better than the model summary in terms of informativeness. The score is an option among True, False, or Equally Good.

All freelance writer summaries

Because we did not evaluate all summaries written by the freelance writers, we release a separate file with all the summaries. writer_summaries.jsonl is a List[Dict] and contains the following keys:

  • article_id: unique identifier for the article
  • article: the text to the article
  • summary: the summary written by the freelance writer

Authors and citation

This work is done by:

If you find this data useful, please cite the following paper:

@misc{https://doi.org/10.48550/arxiv.2301.13848,
  url = {https://arxiv.org/abs/2301.13848},
  author = {Zhang, Tianyi and Ladhak, Faisal and Durmus, Esin and Liang, Percy and McKeown, Kathleen and Hashimoto, Tatsunori B.},
  title = {Benchmarking Large Language Models for News Summarization},
  publisher = {arXiv},
  year = {2023},
}

benchmark_llm_summarization's People

Contributors

tiiiger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.