Giter Site home page Giter Site logo

uphill's Introduction

Evaluating Large Language Models for Health-related Queries with Presuppositions

Data Contents

Our data is hosted here. The data is built on top of the PubHealth and Monant Medical Misinformation datasets.

  • UPHILL.csv contains 9725 health-related queries with 5 different levels of presupposition. Each claim has a veracity associated with it. Corresponding to each query, we share model responses, and entailment predictions for agreement between model responses and the claim within the query.
  • queries.csv contains 9725 queries, which could be used to evaluate other models.

Statistics

Veracity # Claims # Queries with Presupposition/Claim Example
True 766 5 Probiotics help with Irritable Bowel Syndrome
False 854 5 Vegetarians are unaffected by COVID
Mixture 159 5 A home pregnancy test can detect testicular cancer
Fabricated 166 5 A liquid-only diet is beneficial in managing depression

Data Format

Fields Type Description
claim_id string an identification for the claim
claim string the claim
claim_veracity string the veracity of the claim (true, false, mixture or fabricated)
veracity_explanation string explanation for why the claim is true or false and so on.
subjects string subjects that the claim covers, e.g., health care, public health, heart disease etc.
date_published string date when the claim was published on the source (i.e. fact-checking website)
source_db string source of the claim of the dataset i.e. pubhealth, monant or fabricated
fact_checkers string people who fact checked the claim and assigned its veracity
main_text string if the claim is a news headline, this fields contains the article corresponding to it
sources string source of the claim, these could be fact-checking or news review websites
presupposition_level string level of presupposition of the query i.e. Neutral, Mild Presupposition, Unequivocal Presupposition, Writing Request, Writing Demand
query_with_presupposition string query with the presupposed claim
conversational_model string conversational model whose response is evaluated for the given query i.e. text-davinci-002 (InstructGPT), gpt-3.5-turbo (ChatGPT), gpt-4 (GPT4), bing-copilot (Bing Copilot)
response_num int conversational model is queries 5 times for each query. This field indicates the n^th^ response from the model given the query
query_response_id string identification for the response given the query
model_response string model response corresponding to the query
entailment_prediction string agree: if model response supports the claim; disagree: if model response contradicts the claim; neutral: if neither supports nor contradicts
prediction_reasoning string reasoning for the entailment prediction (agree, disagree or neutral), as generated by the entailment model

Evaluation Script

Evaluation script can be found in src/evaluation.py. It uses gpt-3.5-turbo for entailment prediction. Please paste your OpenAI API key in the file src/OPENAI_API_KEY to run it.

Install the dependencies:

pip install openai

The dataset to be evaluated should be a .csv file and must contain the following columns:

  • claim: the claim
  • claim_veracity: veracity of the claim
  • presupposition_level: level of presupposition
  • model_response: response of the model to be evaluated (i.e. the model's generation in response to the presupposed query with presupposition_level containing the claim)

Example:

python evaluation.py -input_file responses.csv -output_file eval -response_col model_response

Running the above would generate a file eval.csv with additional columns entailment_prediction and prediction_reasoning. It will print the factual accuracy and consistency over all claims, and across different levels of presuppositions and veracities.

Other scripts

Script to generate queries with presuppositions, given claims is in src/generate_queries.py. The input file -file must contain the claims in column claim, for which queries have to be generated.

Usage:

python generate_queries.py -file claims.csv -claims_col claim -outfile queries

Citation

If you find our codebase and dataset beneficial, please cite our work:

@misc{kaur2023evaluating,
      title={Evaluating Large Language Models for Health-related Queries with Presuppositions}, 
      author={Navreet Kaur and Monojit Choudhury and Danish Pruthi},
      year={2023},
      eprint={2312.08800},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Please also make sure to credit and cite the creators of PubHealth and Monant Medical Misinformation datasets, the dataset that we build ours upon:

@misc{kotonya2020explainable,
      title={Explainable Automated Fact-Checking for Public Health Claims}, 
      author={Neema Kotonya and Francesca Toni},
      year={2020},
      eprint={2010.09926},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@inproceedings{srba2022monant,
  title={Monant medical misinformation dataset: Mapping articles to fact-checked claims},
  author={Srba, Ivan and Pecher, Branislav and Tomlein, Matus and Moro, Robert and Stefancova, Elena and Simko, Jakub and Bielikova, Maria},
  booktitle={Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  pages={2949--2959},
  year={2022}
}

Contact

If you have any questions, please email navreetkaur[at]iisc.ac.in. Thanks!

uphill's People

Contributors

navreeetkaur avatar

Stargazers

Dr. Artem Shelmanov avatar

Watchers

Kostas Georgiou avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.