Awesome Explanatory Supervision

Overview of literature on learning from supervision on the model's explanations. A .bib file of the papers below can be downloaded here.

Warning: permanent WIP.

Did we miss a relevant paper? Please submit a new entry in the following format:

- **An Artificially-intelligent Means to Escape Discreetly from the Departmental Holiday Party; guide for the socially awkward**
  Eve Armstrong; arXiv 2020 [paper](https://arxiv.org/abs/2003.14169)
  `Notes: it is a joke;  a pretty good joke actually.`

Online Resources
Passive Learning
Interactive Learning
Reinforcement Learning
Distillation
Regularization without Supervision
Machine Teaching
Applications
Related Works
Resources

Online Resources

Tutorial on Explanations in Interactive Machine Learning at AAAI-22 website Notes: includes recording.

Passive Learning

Approaches that supervise the model's explanations.

Rationalizing Neural Predictions Tao Lei, Regina Barzilay, Tommi Jaakkola; EMNLP 2016 paper code Notes: they learn an "explanation module" for text classificaiton from explanatory supervision, namely rationales.
Right for the right reasons: training differentiable models by constraining their explanations Andrew Slavin Ross, Michael C. Hughes, and Finale Doshi-Velez; IJCAI 2017 paper code
e-SNLI: natural language inference with natural language explanations Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom; NeurIPS 2018 paper code
Tell me where to look: Guided attention inference network Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Yun Fu; CVPR 2018 paper
Learning credible models Jiaxuan Wang, Jeeheh Oh, Haozhu Wang, and Jenna Wiens; KDD 2018 paper code
Not Using the Car to See the Sidewalk--Quantifying and Controlling the Effects of Context in Classification and Segmentation Rakshith Shetty, Bernt Schiele, Mario Fritz; CVPR 2019 paper Notes: not exactly about explanations, learns from ground-truth object annotations.
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, and Devi Parikh; ICCV 2019 pdf
Learning credible deep neural networks with rationale regularization Mengnan Du, Ninghao Liu, Fan Yang, Xia Hu; ICDM 2019 paper
Deriving Machine Attention from Human Rationales Yujia Bao, Shiyu Chang, Mo Yu, and Regina Barzilay; ACL 2019 paper code
TED: Teaching AI to explain its decisions Michael Hind, Dennis Wei, Murray Campbell, Noel Codella, Amit Dhurandhar, Aleksandra Mojsilović, Karthikeyan Ramamurthy, Kush Varshney; AIES 2019 paper
Saliency Learning: Teaching the Model Where to Pay Attention Reza Ghaeini, Xiaoli Fern, Hamed Shahbazi, Prasad Tadepalli; NAACL 2019 paper
Do Human Rationales Improve Machine Explanations? Julia Strout, Ye Zhang, Raymond Mooney; ACL Workshop BlackboxNLP 2019 paper
CARE: Class attention to regions of lesion for classification on imbalanced data Jiaxin Zhuang, Jiabin Cai, Ruixuan Wang, Jianguo Zhang, Weishi Zheng; International Conference on Medical Imaging with Deep Learning, 2019. paper
GradMask: Reduce Overfitting by Regularizing Saliency Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen; International Conference on Medical Imaging with Deep Learning, 2019. paper
Learning Global Transparent Models Consistent with Local Contrastive Explanations Tejaswini Pedapati, Avinash Balakrishnan, Karthikeyan Shanmugam, Amit Dhurandhar; NeurIPS 2020 paper
Model Agnostic Multilevel Explanations Karthikeyan Natesan Ramamurthy, Bhanukiran Vinzamuri, Yunfeng Zhang, Amit Dhurandhar; NeurIPS 2020 paper Notes: implicitly learns to generalize across multiple local explanations.
Interpretations are useful: penalizing explanations to align neural networks with prior knowledge Laura Rieger, Chandan Singh, William Murdoch, Bin Yu; ICML 2020 paper code
Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting Sayna Ebrahimi, Suzanne Petryk, Akash Gokul, William Gan, Joseph Gonzalez, Marcus Rohrbach; ICLR 2020 paper code Notes: uses saliency guided replay for continual learning.
Learning to Faithfully Rationalize by Construction Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron Wallace. ACL 2020 paper code
Reflective-Net: Learning from Explanations Johannes Schneider, Michalis Vlachos; arXiv 2020 paper
Learning Interpretable Concept-based Models with Human Feedback Isaac Lage, Finale Doshi-Velez; arXiv 2020 paper Notes: incrementally acquires side-information about per-concept feature dependencies; side-information is per-concept, not per-instance.
Improving performance of deep learning models with axiomatic attribution priors and expected gradients Gabriel Erion, Joseph D. Janizek, Pascal Sturmfels, Scott Lundberg, Su-In Lee; Nature Machine Intelligence 2019 paper preprint code
GLocalX-From Local to Global Explanations of Black Box AI Models Mattia Setzu, Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti; Artificial Intelligence 2021 page code Notes: converts a set of local explanations to a global explanation / white-box model.
IAIA-BL: A Case-based Interpretable Deep Learning Model for Classification of Mass Lesions in Digital Mammography Alina Barnett, Fides Schwartz, Chaofan Tao, Chaofan Chen, Yinhao Ren, Joseph Lo, Cynthia Rudin; Nature Machine Intelligence 2021 paper code
Debiasing Concept-based Explanations with Causal Analysis Mohammad Taha Bahadori, and David E. Heckerman; ICLR 2021 paper
Teaching with Commentaries Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, and Geoffrey Hinton; ICLR 2021 paper code
Saliency is a possible red herring when diagnosing poor generalization Joseph Viviano, Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen; ICLR 2021 paper code
Towards Robust Classification Model by Counterfactual and Invariant Data Generation Chun-Hao Chang, George Alexandru Adam, Anna Goldenberg; CVPR 2021 paper code
Global Explanations with Decision Rules: a Co-learning Approach Géraldin Nanfack, Paul Temple, Benoît Frénay1; UAI 2021 paper code
Explain and Predict, and then Predict Again Zijian Zhang, Koustav Rudra, Avishek Anand; WSDM 2021 paper code
Explanation-Based Human Debugging of NLP Models: A Survey Piyawat Lertvittayakumjorn, Francesca Toni; arXiv 2021 paper
When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data Peter Hase, Mohit Bansal; arXiv 2021 paper code
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience George Chrysostomou, Nikolaos Aletras; arXiv 2021 paper code
Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates Xiaochuang Han, Yulia Tsvetkov; arXiv 2021 paper code
Saliency Guided Experience Packing for Replay in Continual Learning Gobinda Saha, Kaushik Roy; arXiv 2021 paper Notes: leverages saliency for experience replay in continual learning.
What to Learn, and How: Toward Effective Learning from Rationales Samuel Carton, Surya Kanoria, Chenhao Tan; arXiv 2021 paper
Supervising Model Attention with Human Explanations for Robust Natural Language Inference Joe Stacey, Yonatan Belinkov, Marek Rei; AAAI 2022 paper code
Finding and removing Clever Hans: Using explanation methods to debug and improve deep models Christopher Anders, Leander Weber, David Neumann, Wojciech Samek, Klaus-Robert Müller, Klaus-Robert, Sebastian Lapuschkin; Information Fusion 2022 paper code code
Toward Learning Human-aligned Cross-domain Robust Models by Countering Misaligned Features Haohan Wang, Zeyi Huang, Hanlin Zhang, Eric P. Xing; arXiv 2022 paper
A survey on improving NLP models with human explanations Mareike Hartmann, Daniel Sonntag; arXiv 2022 paper
VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives Zhuofan Ying, Peter Hase, and Mohit Bansal; arXiv 2022 paper code
Identifying Spurious Correlations and Correcting them with an Explanation-based Learning Misgina Tsighe Hagos, Kathleen Curran, Brian Mac Namee; arXiv 2022 paper

Interactive Learning

Approaches that combine supervision on the explanations with interactive machine learning:

Principles of Explanatory Debugging to Personalize Interactive Machine Learning Todd Kulesza, Margaret Burnett, Weng-Keen Wong, Simone Stumpf; IUI 2015 paper
Explanatory Interactive Machine Learning Stefano Teso, Kristian Kersting; AIES 2019 paper code Notes: introduces explanatory interactive learning, focuses on active learning setup.
Toward Faithful Explanatory Active Learning with Self-explainable Neural Nets Stefano Teso; IAL Workshop 2019. paper code Notes: explanatory active learning with self-explainable neural networks.
Making deep neural networks right for the right scientific reasons by interacting with their explanations Patrick Schramowski, Wolfgang Stammer, Stefano Teso, Anna Brugger, Franziska Herbert, Xiaoting Shao, Hans-Georg Luigs, Anne-Katrin Mahlein, Kristian Kersting; Nature Machine Intelligence 2020 paper code Notes: introduces end-to-end explanatory interactive learning, fixes clever Hans deep neural nets.
Embedding Human Knowledge into Deep Neural Network via Attention Map Masahiro Mitsuhara, Hiroshi Fukui, Yusuke Sakashita, Takanori Ogata, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi; arXiv 2019 paper
One explanation does not fit all Kacper Sokol, Peter Flach; 2020 Künstliche Intelligenz paper
FIND: Human-in-the-loop Debugging Deep Text Classifiers Piyawat Lertvittayakumjorn, Lucia Specia, Francesca Toni; EMNLP 2020 paper
Human-driven FOL explanations of deep learning Gabriele Ciravegna, Francesco Giannini, Marco Gori, Marco Maggini, Stefano Melacci; IJCAI 2020 paper Notes: first-order logic.
Cost-effective Interactive Attention Learning with Neural Attention Process Jay Heo, Junhyeon Park, Hyewon Jeong, Kwang joon Kim, Juho Lee, Eunho Yang, Sung Ju Hwang; ICML 2020 paper code Notes: attention, interaction
Soliciting human-in-the-loop user feedback for interactive machine learning reduces user trust and impressions of model accuracy Donald Honeycutt, Mahsan Nourani, Eric Ragan; AAAI Conference on Human Computation and Crowdsourcing 2020 paper
ALICE: Active Learning with Contrastive Natural Language Explanations Weixin Liang, James Zou, Zhou Yu; EMNLP 2020 paper
Machine Guides, Human Supervises: Interactive Learning with Global Explanations Teodora Popordanoska, Mohit Kumar, Stefano Teso; arXiv 2020 paper code Notes: introduces narrative bias and explanatory guided learning, focuses on human-initiated interaction and global explanations.
Teaching an Active Learner with Contrastive Examples Chaoqi Wang, Adish Singla, Yuxin Chen. NeurIPS 2021. paper
Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations Wolfgang Stammer, Patrick Schramowski, and Kristian Kersting; CVPR 2021 paper code Notes: first-order logic, attention.
Right for Better Reasons: Training Differentiable Models by Constraining their Influence Function Xiaoting Shao, Arseny Skryagin, Patrick Schramowski, Wolfgang Stammer, Kristian Kersting; AAAI 2021 paper
User Driven Model Adjustment via Boolean Rule Explanations Elizabeth Daly, Massimiliano Mattetti, Öznur Alkan, Rahul Nair; AAAI 2021 paper
Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers Bhavya Ghai, Vera Liao, Yunfeng Zhang, Rachel Bellamy, Klaus Mueller. Proc. ACM Hum.-Comput. Interact. 2021 paper
Bandits for Learning to Explain from Explanations Freya Behrens, Stefano Teso, Davide Mottin; XAI Workshop 2021 paper code Notes: preliminary.
HILDIF: Interactive Debugging of NLI Models Using Influence Functions Hugo Zylberajch, Piyawat Lertvittayakumjorn, Francesca Toni; InterNLP Workshop 2021 paper code
Refining Neural Networks with Compositional Explanations Huihan Yao, Ying Chen, Qinyuan Ye, Xisen Jin, Xiang Ren; arXiv 2021 paper code
Interactive Label Cleaning with Example-based Explanations Stefano Teso, Andrea Bontempelli, Fausto Giunchiglia, Andrea Passerini; NeurIPS 2021 paper code
Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, Lin Guan; AAAI 2022 paper
Toward a Unified Framework for Debugging Gray-box Models Andrea Bontempelli, Fausto Giunchiglia, Andrea Passerini, Stefano Teso; AAAI-22 Workshop on Interactive Machine Learning paper
Active Learning by Acquiring Contrastive Examples Katerina Margatina, Giorgos Vernikos, Loïc Barrault, Nikolaos Aletras; EMNLP 2021 paper code
Finding and Fixing Spurious Patterns with Explanations Gregory Plumb, Marco Tulio Ribeiro, Ameet Talwalkar; arXiv 2021 paper
Interactively Generating Explanations for Transformer Language Models Patrick Schramowski, Felix Friedrich, Christopher Tauchmann, Kristian Kersting; arXiv 2021 paper
Interaction with Explanations in the XAINES Project Mareike Hartmann, Ivana Kruijff-Korbayová, Daniel Sonntag; arXiv 2021 paper
A Rationale-Centric Framework for Human-in-the-loop Machine Learning Jinghui Lu, Linyi Yang, Brian Mac Namee, Yue Zhang; ACL 2022 paper code
A Typology to Explore and Guide Explanatory Interactive Machine Learning Felix Friedrich, Wolfgang Stammer, Patrick Schramowski, Kristian Kersting; arXiv 2022 paper
CAIPI in Practice: Towards Explainable Interactive Medical Image Classification Emanuel Slany, Yannik Ott, Stephan Scheele, Jan Paulus, Ute Schmid; arXiv 2022 paper
Leveraging Explanations in Interactive Machine Learning: An Overview Stefano Teso, Öznur Alkan, Wolfgang Stammer, Elizabeth Daly; arXiv 2022 paper
Impact of Feedback Type on Explanatory Interactive Learning Misgina Tsighe Hagos, Kathleen Curran, Brian Mac Namee; ISMIS 2022 paper

Reinforcement Learning

Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati; Human And Machine in-the-Loop Evaluation and Learning Strategies paper
Learning from explanations and demonstrations: A pilot study Silvia Tulli, Sebastian Wallkötter, Ana Paiva, Francisco Melo, Mohamed Chetouani; Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence 2020 paper
Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati; NeurIPS 2021 pdf

Distillation

Model reconstruction from model explanations Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt; FAcct 2019 paper
Evaluating Explanations: How much do explanations from the teacher aid students? Danish Pruthi, Bhuwan Dhingra, Livio Baldini Soares, Michael Collins, Zachary C. Lipton, Graham Neubig, and William W. Cohen; arXiv 2020 paper Notes: defines importance of different kinds of explanations by measuring their impact when used as supervision.

Regularization without Supervision

Approaches that regularize the model's explanations in an unsupervised manner, often for improved interpretability.

Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients Andrew Ross and Finale Doshi-Velez. AAAI 2018 paper
Towards robust interpretability with self-explaining neural networks David Alvarez-Melis, Tommi Jaakkola; NeurIPS 2018 paper
Beyond sparsity: Tree regularization of deep models for interpretability Mike Wu, Michael Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez; AAAI 2018 paper
Regional tree regularization for interpretability in deep neural networks Mike Wu, Sonali Parbhoo, Michael Hughes, Ryan Kindle, Leo Celi, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez; AAAI 2020 paper
Regularizing black-box models for improved interpretability Gregory Plumb, Maruan Al-Shedivat, Ángel Alexander Cabrera, Adam Perer, Eric Xing, Ameet Talwalkar; NeurIPS 2020 paper
Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram; CVPR 2020 paper code
Trustworthy convolutional neural networks: A gradient penalized-based approach Nicholas Halliwell, Freddy Lecue; arXiv 2020 paper
Explainable Models with Consistent Interpretations Vipin Pillai, Hamed Pirsiavash; AAAI 2021 paper code
Explanation Consistency Training: Facilitating Consistency-based Semi-supervised Learning with Interpretability Tao Han, Wei-Wei Tu, Yu-Feng Li; AAAI 2021 paper
Improving Deep Learning Interpretability by Saliency Guided Training Aya Abdelsalam Ismail, Hector Corrada Bravo, Soheil Feizi; NeurIPS 2021 paper code
Generating Deep Networks Explanations with Robust Attribution Alignment Guohang Zeng, Yousef Kowsar, Sarah Erfani, James Bailey; ACML 2021 paper

Machine Teaching

Interpretable Machine Teaching via Feature Feedback Shihan Su, Yuxin Chen, Oisin Mac Aodha, Pietro Perona, Yisong Yue; Workshop on Teaching Machines, Robots, and Humans 2017 paper
Teaching Categories to Human Learners with Visual Explanations Oisin Mac Aodha, Shihan Su, Yuxin Chen, Pietro Perona, Yisong Yue; CVPR 2018 paper Notes: this is *inverse* teaching, i.e., machine teaches human.

Applications

Improving a neural network model by explanation-guided training for glioma classification based on MRI data Frantisek Sefcik, Wanda Benesova; arXiv 2021 paper Notes: based on layer-wise relevance propagation.

Related Works

Explanation-based learning, focuses on logic-based formalisms and learning strategies:

Explanation-based generalization: A unifying view Tom Mitchell, Richard Keller, Smadar Kedar-Cabelli; MLJ 1986 paper
Explanation-based learning: An alternative view Gerald DeJong, Raymond Mooney; MLJ 1986 paper
Explanation-based learning: A survey of programs and perspectives Thomas Ellman; ACM Computing Surveys 1989 paper
Probabilistic explanation based learning Angelika Kimmig, Luc De Raedt, Hannu Toivonen; ECML 2007 paper

Injecting invariances / feature constraints into models:

Tangent Prop - A formalism for specifying selected invariances in an adaptive network Patrice Simard, Bernard Victorri, Yann Le Cun, John Denker; NeurIPS 1992 paper Notes: injects invariances into a neural net by regularizing its gradient; precursor to learning from gradient-based explanations.
Training invariant support vector machines Dennis DeCoste, Bernhard Schölkopf; MLJ 2002 paper
The constrained weight space svm: learning with ranked features Kevin Small, Byron Wallace, Carla Brodley, Thomas Trikalinos; ICML 2011 paper

Dual label-feature feedback:

Active learning with feedback on features and instances Hema Raghavan, Omid Madani, Rosie Jones; JMLR 2006 paper
An interactive algorithm for asking and incorporating feature feedback into support vector machines Hema Raghavan, James Allan; ACM SIGIR 2007 paper
Learning from labeled features using generalized expectation criteria Gregory Druck, Gideon Mann, Andrew McCallum; ACM SIGIR 2008 paper
Active learning by labeling features Gregory Druck, Burr Settles, Andrew McCallum; EMNLP 2009 paper
A unified approach to active dual supervision for labeling features and examples Josh Attenberg, Prem Melville, Foster Provost; ECML-PKDD 2010 paper
Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances Burr Settles; EMNLP 2011 paper
Learning from discriminative feature feedback Sanjoy Dasgupta, Akansha Dey, Nicholas Roberts, Sivan Sabato; NeurIPS 2018 paper
Robust Learning from Discriminative Feature Feedback Sanjoy Dasgupta, Sivan Sabato; AISTATS 2020 paper
Practical Benefits of Feature Feedback Under Distribution Shift Anurag Katakkar, Weiqin Wang, Clay Yoo, Zachary Lipton, Divyansh Kaushik; arXiv 2021 paper

Learning from rationales:

Using “annotator rationales” to improve machine learning for text categorization Omar Zaidan, Jason Eisner, Christine Piatko; NAACL 2007 paper
Modeling annotators: A generative approach to learning from annotator rationales Omar Zaidan, Jason Eisner; EMNLP 2008 paper
Active learning with rationales for text classification Manali Sharma, Di Zhuang, Mustafa Bilgic; NAACL 2015 paper

Counterfactual augmentation:

Learning The Difference That Makes A Difference With Counterfactually-Augmented Data Divyansh Kaushik, Eduard Hovy, Zachary Lipton; ICLR 2019 paper code
Explaining the Efficacy of Counterfactually Augmented Data Divyansh Kaushik, Amrith Setlur, Eduard H. Hovy, Zachary Lipton; ICLR 2021. paper code
An Investigation of the (In)effectiveness of Counterfactually-augmented Data Nitish Joshi, He He; arXiv 2021 paper

Critiquing in recommenders:

Critiquing-based recommenders: survey and emerging trends Li Chen, Pearl Pu; User Modeling and User-Adapted Interaction 2012 paper
Coactive critiquing: Elicitation of preferences and features Stefano Teso, Paolo Dragone, Andrea Passerini; AAAI 2017 paper

Gray-box models:

Concept bottleneck models Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang; ICML 2020 paper

Resources

A selection of general resources on Explainable AI focusing on overviews, surveys, societal implications, and critiques:

Survey and critique of techniques for extracting rules from trained artificial neural networks Robert Andrews, Joachim Diederich, Alan B. Tickle; Knowledge-based systems 1995 page
Toward harnessing user feedback for machine learning Simone Stumpf, Vidya Rajaram, Lida Li, Margaret Burnett, Thomas Dietterich, Erin Sullivan, Russell Drummond, Jonathan Herlocker; IUI 2007 paper
The Mythos of Model Interpretability Zachary Lipton; CACM 2016 paper
A survey of methods for explaining black box models Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi; ACM Computing Surveys 2018 paper
Sanity checks for saliency maps Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim; NeurIPS 2018 paper code
Explanation in Artificial Intelligence: Insights from the Social Sciences Tim Miller; Artificial Intelligence, 2019 paper
Unmasking clever hans predictors and assessing what machines really learn Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller; Nature Communications 2019 paper
Interpretation of neural networks is fragile Amirata Ghorbani, Abubakar Abid, James Zou; AAAI 2019 paper
A Benchmark for Interpretability Methods in Deep Neural Networks Sara Hooke, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim; NeurIPS 2019 paper code
Is Attention Interpretable? Sofia Serrano, Noah A. Smith; ACL 2019 paper
Attention is not Explanation Sarthak Jain, and Byron C. Wallace; ACL 2019 paper
Attention is not not Explanation Sarah Wiegreffe, and Yuval Pinter; EMNLP-IJCNLP 2019 paper
The (un)reliability of saliency methods Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim; Explainable AI: Interpreting, Explaining and Visualizing Deep Learning 2019 paper
Explanations can be manipulated and geometry is to blame Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel; NeurIPS 2019 paper
Fooling Neural Network Interpretations via Adversarial Model Manipulation Juyeon Heo, Sunghwan Joo, and Taesup Moon; NeurIPS 2019 paper
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Cynthia Rudin; Nature Machine Intelligence 2019 page
The Principles and Limits of Algorithm-in-the-loop Decision Making Ben Green, Yiling Chen; PACM HCI 2019 paper
Shortcut learning in deep neural networks Robert Geirhos, Jorn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, Felix Wichmann; Nature Machine Intelligence 2020 page
When Explanations Lie: Why Many Modified BP Attributions Fail Leon Sixt, Maximilian Granz, Tim Landgraf. ICML 2020 paper
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? Jasmijn Bastings, Katja Filippova; Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP 2020 paper
Why Attention is Not Explanation: Surgical Intervention and Causal Reasoning about Neural Models Christopher Grimsley, Elijah Mayfield, Julia Bursten; Language Resources and Evaluation Conference 2020 paper
AI for radiographic COVID-19 detection selects shortcuts over signal Alex DeGrave, Joseph Janizek, Su-In Lee; Nature Machine Intelligence 2021 paper code
How Well do Feature Visualizations Support Causal Understanding of CNN Activations? Roland Zimmermann,Judy Borowski, Robert Geirhos, Matthias Bethge, Thomas Wallis, Wieland Brendel; arXiv 2021 paper
Post hoc explanations may be ineffective for detecting unknown spurious correlation Julius Adebayo, Michael Muelly, Harold Abelson, and Been Kim; ICLR 2022 paper code

Related Lists

Not Yet Sorted

Multimodal explanations: Justifying decisions and pointing to the evidence Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach; CVPR 2018 paper
Learning Deep Attribution Priors Based On Prior Knowledge Ethan Weinberger, Joseph Janizek, Su-In Lee; NeurIPS 2020 paper

TODO

Crawl & reference work on NLP.

Comments

This list is directly inspired by all the awesome awesome lists out there!

yaowuxie / awesome-explanatory-supervision Goto Github PK