Giter Site home page Giter Site logo

sinamohseni / ml-interpretability-evaluation-benchmark Goto Github PK

View Code? Open in Web Editor NEW
29.0 3.0 0.0 2.03 GB

This is a benchmark to evaluate machine learning local explanaitons quality generated from any explainer for text and image data

License: MIT License

ml-interpretability-evaluation-benchmark's Introduction

ML-Interpretability-Evaluation-Benchmark

This repository is a benchmark for quantitative evaluation of model saliency/attention map explanations from any interpretability technique. We propose a human-attention baseline for image and text domains using multi-layer human-attention masks aggregated from multiple human annotators.

Main Idea

In order for people to be able to trust and take advantage of the results of advanced machine learning solutions for real-world problems, people need to be able to understand the machine rationale for the given output. The proposed human-grounded benchmark enables fast, replicable, and objective execution of evaluation experiments for saliency explanations. In this paper, we demonstrate this benchmark's utility for quantitative evaluation of model explanations and compare it with the binary feature mask ground truth (e.g., object segmentation mask) and human judgment rating evaluations. Our study results reveal the efficiency of threshold-agnostic (e.g., mean absolute error) evaluation metrics with human-attention baseline compared to previous methods with binary ground truth masks. Our experiments also reveal different user biases in their subjective rating of explanations.

Human-Attention Mask

We present an evaluation benchmark to evaluate saliency explanations for text and image domains. The human-attention baseline in this benchmark is generated from 10 unique user annotations for each image and text samples. Here are some image examples to compare single-layer segmentation mask and multi-layer human-attention mask:

examples

Similarly, words and phrases are weighted based on annotators' selection for each text article. Human-attention masks for text samples are in JSON format files. This table shows the number of samples with human-attention ground truth from four publicly available image and text datasets:

Benchmark Structure

  • Image

    • human_attention_mask: Human-attention mask for objects in form of a weighted mask.
    • human_attention_overlay: Heatmap visualization of user weighted masks over the original image.
    • object_segmentation_mask: Binary target objects' segmentation mask for each image.
  • Text

    • org_documents: Original documents from IMDB movie review and 20newsgroup datasets.
    • user_evaluation: Human grounded attention masks for article in form of weighted words.

Citation

More details and experiment results in this paper: https://arxiv.org/pdf/1801.05075.pdf

@article{mohseni2020benchmark,
  title={Quantitative Evaluation of Machine Learning Explanations: A Human-Grounded Benchmark},
  author={Mohseni, Sina and Block, Jeremy E and Ragan, Eric D},
  journal={arXiv preprint arXiv:1801.05075},
  year={2020}
}

ml-interpretability-evaluation-benchmark's People

Contributors

sinamohseni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ml-interpretability-evaluation-benchmark's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.