Giter Site home page Giter Site logo

awesome-explanatory-supervision's Introduction

Awesome Explanatory Supervision Awesome

Overview of literature on learning from supervision on the model's explanations. A .bib file of the papers below can be downloaded here.

Warning: permanent WIP.

Did we miss a relevant paper? Please submit a new entry in the following format:

- **An Artificially-intelligent Means to Escape Discreetly from the Departmental Holiday Party; guide for the socially awkward**
  Eve Armstrong; arXiv 2020 [paper](https://arxiv.org/abs/2003.14169)
  `Notes: it is a joke;  a pretty good joke actually.`

Table of Contents


  • Tutorial on Explanations in Interactive Machine Learning at AAAI-22 website Notes: includes recording.

Approaches that supervise the model's explanations.

  • Rationalizing Neural Predictions Tao Lei, Regina Barzilay, Tommi Jaakkola; EMNLP 2016 paper code Notes: they learn an "explanation module" for text classificaiton from explanatory supervision, namely rationales.

  • Right for the right reasons: training differentiable models by constraining their explanations Andrew Slavin Ross, Michael C. Hughes, and Finale Doshi-Velez; IJCAI 2017 paper code

  • e-SNLI: natural language inference with natural language explanations Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom; NeurIPS 2018 paper code

  • Tell me where to look: Guided attention inference network Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Yun Fu; CVPR 2018 paper

  • Learning credible models Jiaxuan Wang, Jeeheh Oh, Haozhu Wang, and Jenna Wiens; KDD 2018 paper code

  • Not Using the Car to See the Sidewalk--Quantifying and Controlling the Effects of Context in Classification and Segmentation Rakshith Shetty, Bernt Schiele, Mario Fritz; CVPR 2019 paper Notes: not exactly about explanations, learns from ground-truth object annotations.

  • Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, and Devi Parikh; ICCV 2019 pdf

  • Learning credible deep neural networks with rationale regularization Mengnan Du, Ninghao Liu, Fan Yang, Xia Hu; ICDM 2019 paper

  • Deriving Machine Attention from Human Rationales Yujia Bao, Shiyu Chang, Mo Yu, and Regina Barzilay; ACL 2019 paper code

  • TED: Teaching AI to explain its decisions Michael Hind, Dennis Wei, Murray Campbell, Noel Codella, Amit Dhurandhar, Aleksandra Mojsilović, Karthikeyan Ramamurthy, Kush Varshney; AIES 2019 paper

  • Saliency Learning: Teaching the Model Where to Pay Attention Reza Ghaeini, Xiaoli Fern, Hamed Shahbazi, Prasad Tadepalli; NAACL 2019 paper

  • Do Human Rationales Improve Machine Explanations? Julia Strout, Ye Zhang, Raymond Mooney; ACL Workshop BlackboxNLP 2019 paper

  • CARE: Class attention to regions of lesion for classification on imbalanced data Jiaxin Zhuang, Jiabin Cai, Ruixuan Wang, Jianguo Zhang, Weishi Zheng; International Conference on Medical Imaging with Deep Learning, 2019. paper

  • GradMask: Reduce Overfitting by Regularizing Saliency Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen; International Conference on Medical Imaging with Deep Learning, 2019. paper

  • Learning Global Transparent Models Consistent with Local Contrastive Explanations Tejaswini Pedapati, Avinash Balakrishnan, Karthikeyan Shanmugam, Amit Dhurandhar; NeurIPS 2020 paper

  • Model Agnostic Multilevel Explanations Karthikeyan Natesan Ramamurthy, Bhanukiran Vinzamuri, Yunfeng Zhang, Amit Dhurandhar; NeurIPS 2020 paper Notes: implicitly learns to generalize across multiple local explanations.

  • Interpretations are useful: penalizing explanations to align neural networks with prior knowledge Laura Rieger, Chandan Singh, William Murdoch, Bin Yu; ICML 2020 paper code

  • Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting Sayna Ebrahimi, Suzanne Petryk, Akash Gokul, William Gan, Joseph Gonzalez, Marcus Rohrbach; ICLR 2020 paper code Notes: uses saliency guided replay for continual learning.

  • Learning to Faithfully Rationalize by Construction Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron Wallace. ACL 2020 paper code

  • Reflective-Net: Learning from Explanations Johannes Schneider, Michalis Vlachos; arXiv 2020 paper

  • Learning Interpretable Concept-based Models with Human Feedback Isaac Lage, Finale Doshi-Velez; arXiv 2020 paper Notes: incrementally acquires side-information about per-concept feature dependencies; side-information is per-concept, not per-instance.

  • Improving performance of deep learning models with axiomatic attribution priors and expected gradients Gabriel Erion, Joseph D. Janizek, Pascal Sturmfels, Scott Lundberg, Su-In Lee; Nature Machine Intelligence 2019 paper preprint code

  • GLocalX-From Local to Global Explanations of Black Box AI Models Mattia Setzu, Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti; Artificial Intelligence 2021 page code Notes: converts a set of local explanations to a global explanation / white-box model.

  • IAIA-BL: A Case-based Interpretable Deep Learning Model for Classification of Mass Lesions in Digital Mammography Alina Barnett, Fides Schwartz, Chaofan Tao, Chaofan Chen, Yinhao Ren, Joseph Lo, Cynthia Rudin; Nature Machine Intelligence 2021 paper code

  • Debiasing Concept-based Explanations with Causal Analysis Mohammad Taha Bahadori, and David E. Heckerman; ICLR 2021 paper

  • Teaching with Commentaries Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, and Geoffrey Hinton; ICLR 2021 paper code

  • Saliency is a possible red herring when diagnosing poor generalization Joseph Viviano, Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen; ICLR 2021 paper code

  • Towards Robust Classification Model by Counterfactual and Invariant Data Generation Chun-Hao Chang, George Alexandru Adam, Anna Goldenberg; CVPR 2021 paper code

  • Global Explanations with Decision Rules: a Co-learning Approach Géraldin Nanfack, Paul Temple, Benoît Frénay1; UAI 2021 paper code

  • Explain and Predict, and then Predict Again Zijian Zhang, Koustav Rudra, Avishek Anand; WSDM 2021 paper code

  • Explanation-Based Human Debugging of NLP Models: A Survey Piyawat Lertvittayakumjorn, Francesca Toni; arXiv 2021 paper

  • When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data Peter Hase, Mohit Bansal; arXiv 2021 paper code

  • Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience George Chrysostomou, Nikolaos Aletras; arXiv 2021 paper code

  • Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates Xiaochuang Han, Yulia Tsvetkov; arXiv 2021 paper code

  • Saliency Guided Experience Packing for Replay in Continual Learning Gobinda Saha, Kaushik Roy; arXiv 2021 paper Notes: leverages saliency for experience replay in continual learning.

  • What to Learn, and How: Toward Effective Learning from Rationales Samuel Carton, Surya Kanoria, Chenhao Tan; arXiv 2021 paper

  • Supervising Model Attention with Human Explanations for Robust Natural Language Inference Joe Stacey, Yonatan Belinkov, Marek Rei; AAAI 2022 paper code

  • Finding and removing Clever Hans: Using explanation methods to debug and improve deep models Christopher Anders, Leander Weber, David Neumann, Wojciech Samek, Klaus-Robert Müller, Klaus-Robert, Sebastian Lapuschkin; Information Fusion 2022 paper code code

  • Toward Learning Human-aligned Cross-domain Robust Models by Countering Misaligned Features Haohan Wang, Zeyi Huang, Hanlin Zhang, Eric P. Xing; arXiv 2022 paper

  • A survey on improving NLP models with human explanations Mareike Hartmann, Daniel Sonntag; arXiv 2022 paper

  • VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives Zhuofan Ying, Peter Hase, and Mohit Bansal; arXiv 2022 paper code

  • Identifying Spurious Correlations and Correcting them with an Explanation-based Learning Misgina Tsighe Hagos, Kathleen Curran, Brian Mac Namee; arXiv 2022 paper


Approaches that combine supervision on the explanations with interactive machine learning:

  • Principles of Explanatory Debugging to Personalize Interactive Machine Learning Todd Kulesza, Margaret Burnett, Weng-Keen Wong, Simone Stumpf; IUI 2015 paper

  • Explanatory Interactive Machine Learning Stefano Teso, Kristian Kersting; AIES 2019 paper code Notes: introduces explanatory interactive learning, focuses on active learning setup.

  • Toward Faithful Explanatory Active Learning with Self-explainable Neural Nets Stefano Teso; IAL Workshop 2019. paper code Notes: explanatory active learning with self-explainable neural networks.

  • Making deep neural networks right for the right scientific reasons by interacting with their explanations Patrick Schramowski, Wolfgang Stammer, Stefano Teso, Anna Brugger, Franziska Herbert, Xiaoting Shao, Hans-Georg Luigs, Anne-Katrin Mahlein, Kristian Kersting; Nature Machine Intelligence 2020 paper code Notes: introduces end-to-end explanatory interactive learning, fixes clever Hans deep neural nets.

  • Embedding Human Knowledge into Deep Neural Network via Attention Map Masahiro Mitsuhara, Hiroshi Fukui, Yusuke Sakashita, Takanori Ogata, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi; arXiv 2019 paper

  • One explanation does not fit all Kacper Sokol, Peter Flach; 2020 Künstliche Intelligenz paper

  • FIND: Human-in-the-loop Debugging Deep Text Classifiers Piyawat Lertvittayakumjorn, Lucia Specia, Francesca Toni; EMNLP 2020 paper

  • Human-driven FOL explanations of deep learning Gabriele Ciravegna, Francesco Giannini, Marco Gori, Marco Maggini, Stefano Melacci; IJCAI 2020 paper Notes: first-order logic.

  • Cost-effective Interactive Attention Learning with Neural Attention Process Jay Heo, Junhyeon Park, Hyewon Jeong, Kwang joon Kim, Juho Lee, Eunho Yang, Sung Ju Hwang; ICML 2020 paper code Notes: attention, interaction

  • Soliciting human-in-the-loop user feedback for interactive machine learning reduces user trust and impressions of model accuracy Donald Honeycutt, Mahsan Nourani, Eric Ragan; AAAI Conference on Human Computation and Crowdsourcing 2020 paper

  • ALICE: Active Learning with Contrastive Natural Language Explanations Weixin Liang, James Zou, Zhou Yu; EMNLP 2020 paper

  • Machine Guides, Human Supervises: Interactive Learning with Global Explanations Teodora Popordanoska, Mohit Kumar, Stefano Teso; arXiv 2020 paper code Notes: introduces narrative bias and explanatory guided learning, focuses on human-initiated interaction and global explanations.

  • Teaching an Active Learner with Contrastive Examples Chaoqi Wang, Adish Singla, Yuxin Chen. NeurIPS 2021. paper

  • Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations Wolfgang Stammer, Patrick Schramowski, and Kristian Kersting; CVPR 2021 paper code Notes: first-order logic, attention.

  • Right for Better Reasons: Training Differentiable Models by Constraining their Influence Function Xiaoting Shao, Arseny Skryagin, Patrick Schramowski, Wolfgang Stammer, Kristian Kersting; AAAI 2021 paper

  • User Driven Model Adjustment via Boolean Rule Explanations Elizabeth Daly, Massimiliano Mattetti, Öznur Alkan, Rahul Nair; AAAI 2021 paper

  • Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers Bhavya Ghai, Vera Liao, Yunfeng Zhang, Rachel Bellamy, Klaus Mueller. Proc. ACM Hum.-Comput. Interact. 2021 paper

  • Bandits for Learning to Explain from Explanations Freya Behrens, Stefano Teso, Davide Mottin; XAI Workshop 2021 paper code Notes: preliminary.

  • HILDIF: Interactive Debugging of NLI Models Using Influence Functions Hugo Zylberajch, Piyawat Lertvittayakumjorn, Francesca Toni; InterNLP Workshop 2021 paper code

  • Refining Neural Networks with Compositional Explanations Huihan Yao, Ying Chen, Qinyuan Ye, Xisen Jin, Xiang Ren; arXiv 2021 paper code

  • Interactive Label Cleaning with Example-based Explanations Stefano Teso, Andrea Bontempelli, Fausto Giunchiglia, Andrea Passerini; NeurIPS 2021 paper code

  • Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, Lin Guan; AAAI 2022 paper

  • Toward a Unified Framework for Debugging Gray-box Models Andrea Bontempelli, Fausto Giunchiglia, Andrea Passerini, Stefano Teso; AAAI-22 Workshop on Interactive Machine Learning paper

  • Active Learning by Acquiring Contrastive Examples Katerina Margatina, Giorgos Vernikos, Loïc Barrault, Nikolaos Aletras; EMNLP 2021 paper code

  • Finding and Fixing Spurious Patterns with Explanations Gregory Plumb, Marco Tulio Ribeiro, Ameet Talwalkar; arXiv 2021 paper

  • Interactively Generating Explanations for Transformer Language Models Patrick Schramowski, Felix Friedrich, Christopher Tauchmann, Kristian Kersting; arXiv 2021 paper

  • Interaction with Explanations in the XAINES Project Mareike Hartmann, Ivana Kruijff-Korbayová, Daniel Sonntag; arXiv 2021 paper

  • A Rationale-Centric Framework for Human-in-the-loop Machine Learning Jinghui Lu, Linyi Yang, Brian Mac Namee, Yue Zhang; ACL 2022 paper code

  • A Typology to Explore and Guide Explanatory Interactive Machine Learning Felix Friedrich, Wolfgang Stammer, Patrick Schramowski, Kristian Kersting; arXiv 2022 paper

  • CAIPI in Practice: Towards Explainable Interactive Medical Image Classification Emanuel Slany, Yannik Ott, Stephan Scheele, Jan Paulus, Ute Schmid; arXiv 2022 paper

  • Leveraging Explanations in Interactive Machine Learning: An Overview Stefano Teso, Öznur Alkan, Wolfgang Stammer, Elizabeth Daly; arXiv 2022 paper

  • Impact of Feedback Type on Explanatory Interactive Learning Misgina Tsighe Hagos, Kathleen Curran, Brian Mac Namee; ISMIS 2022 paper


  • Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati; Human And Machine in-the-Loop Evaluation and Learning Strategies paper

  • Learning from explanations and demonstrations: A pilot study Silvia Tulli, Sebastian Wallkötter, Ana Paiva, Francisco Melo, Mohamed Chetouani; Workshop on Interactive Natural Language Technology for Explainable Artificial Intelligence 2020 paper

  • Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati; NeurIPS 2021 pdf


  • Model reconstruction from model explanations Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt; FAcct 2019 paper

  • Evaluating Explanations: How much do explanations from the teacher aid students? Danish Pruthi, Bhuwan Dhingra, Livio Baldini Soares, Michael Collins, Zachary C. Lipton, Graham Neubig, and William W. Cohen; arXiv 2020 paper Notes: defines importance of different kinds of explanations by measuring their impact when used as supervision.


Approaches that regularize the model's explanations in an unsupervised manner, often for improved interpretability.

  • Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients Andrew Ross and Finale Doshi-Velez. AAAI 2018 paper

  • Towards robust interpretability with self-explaining neural networks David Alvarez-Melis, Tommi Jaakkola; NeurIPS 2018 paper

  • Beyond sparsity: Tree regularization of deep models for interpretability Mike Wu, Michael Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez; AAAI 2018 paper

  • Regional tree regularization for interpretability in deep neural networks Mike Wu, Sonali Parbhoo, Michael Hughes, Ryan Kindle, Leo Celi, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez; AAAI 2020 paper

  • Regularizing black-box models for improved interpretability Gregory Plumb, Maruan Al-Shedivat, Ángel Alexander Cabrera, Adam Perer, Eric Xing, Ameet Talwalkar; NeurIPS 2020 paper

  • Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram; CVPR 2020 paper code

  • Trustworthy convolutional neural networks: A gradient penalized-based approach Nicholas Halliwell, Freddy Lecue; arXiv 2020 paper

  • Explainable Models with Consistent Interpretations Vipin Pillai, Hamed Pirsiavash; AAAI 2021 paper code

  • Explanation Consistency Training: Facilitating Consistency-based Semi-supervised Learning with Interpretability Tao Han, Wei-Wei Tu, Yu-Feng Li; AAAI 2021 paper

  • Improving Deep Learning Interpretability by Saliency Guided Training Aya Abdelsalam Ismail, Hector Corrada Bravo, Soheil Feizi; NeurIPS 2021 paper code

  • Generating Deep Networks Explanations with Robust Attribution Alignment Guohang Zeng, Yousef Kowsar, Sarah Erfani, James Bailey; ACML 2021 paper


  • Interpretable Machine Teaching via Feature Feedback Shihan Su, Yuxin Chen, Oisin Mac Aodha, Pietro Perona, Yisong Yue; Workshop on Teaching Machines, Robots, and Humans 2017 paper

  • Teaching Categories to Human Learners with Visual Explanations Oisin Mac Aodha, Shihan Su, Yuxin Chen, Pietro Perona, Yisong Yue; CVPR 2018 paper Notes: this is *inverse* teaching, i.e., machine teaches human.


  • Improving a neural network model by explanation-guided training for glioma classification based on MRI data Frantisek Sefcik, Wanda Benesova; arXiv 2021 paper Notes: based on layer-wise relevance propagation.

Explanation-based learning, focuses on logic-based formalisms and learning strategies:

  • Explanation-based generalization: A unifying view Tom Mitchell, Richard Keller, Smadar Kedar-Cabelli; MLJ 1986 paper

  • Explanation-based learning: An alternative view Gerald DeJong, Raymond Mooney; MLJ 1986 paper

  • Explanation-based learning: A survey of programs and perspectives Thomas Ellman; ACM Computing Surveys 1989 paper

  • Probabilistic explanation based learning Angelika Kimmig, Luc De Raedt, Hannu Toivonen; ECML 2007 paper

Injecting invariances / feature constraints into models:

  • Tangent Prop - A formalism for specifying selected invariances in an adaptive network Patrice Simard, Bernard Victorri, Yann Le Cun, John Denker; NeurIPS 1992 paper Notes: injects invariances into a neural net by regularizing its gradient; precursor to learning from gradient-based explanations.

  • Training invariant support vector machines Dennis DeCoste, Bernhard Schölkopf; MLJ 2002 paper

  • The constrained weight space svm: learning with ranked features Kevin Small, Byron Wallace, Carla Brodley, Thomas Trikalinos; ICML 2011 paper

Dual label-feature feedback:

  • Active learning with feedback on features and instances Hema Raghavan, Omid Madani, Rosie Jones; JMLR 2006 paper

  • An interactive algorithm for asking and incorporating feature feedback into support vector machines Hema Raghavan, James Allan; ACM SIGIR 2007 paper

  • Learning from labeled features using generalized expectation criteria Gregory Druck, Gideon Mann, Andrew McCallum; ACM SIGIR 2008 paper

  • Active learning by labeling features Gregory Druck, Burr Settles, Andrew McCallum; EMNLP 2009 paper

  • A unified approach to active dual supervision for labeling features and examples Josh Attenberg, Prem Melville, Foster Provost; ECML-PKDD 2010 paper

  • Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances Burr Settles; EMNLP 2011 paper

  • Learning from discriminative feature feedback Sanjoy Dasgupta, Akansha Dey, Nicholas Roberts, Sivan Sabato; NeurIPS 2018 paper

  • Robust Learning from Discriminative Feature Feedback Sanjoy Dasgupta, Sivan Sabato; AISTATS 2020 paper

  • Practical Benefits of Feature Feedback Under Distribution Shift Anurag Katakkar, Weiqin Wang, Clay Yoo, Zachary Lipton, Divyansh Kaushik; arXiv 2021 paper

Learning from rationales:

  • Using “annotator rationales” to improve machine learning for text categorization Omar Zaidan, Jason Eisner, Christine Piatko; NAACL 2007 paper

  • Modeling annotators: A generative approach to learning from annotator rationales Omar Zaidan, Jason Eisner; EMNLP 2008 paper

  • Active learning with rationales for text classification Manali Sharma, Di Zhuang, Mustafa Bilgic; NAACL 2015 paper

Counterfactual augmentation:

  • Learning The Difference That Makes A Difference With Counterfactually-Augmented Data Divyansh Kaushik, Eduard Hovy, Zachary Lipton; ICLR 2019 paper code

  • Explaining the Efficacy of Counterfactually Augmented Data Divyansh Kaushik, Amrith Setlur, Eduard H. Hovy, Zachary Lipton; ICLR 2021. paper code

  • An Investigation of the (In)effectiveness of Counterfactually-augmented Data Nitish Joshi, He He; arXiv 2021 paper

Critiquing in recommenders:

  • Critiquing-based recommenders: survey and emerging trends Li Chen, Pearl Pu; User Modeling and User-Adapted Interaction 2012 paper

  • Coactive critiquing: Elicitation of preferences and features Stefano Teso, Paolo Dragone, Andrea Passerini; AAAI 2017 paper

Gray-box models:

  • Concept bottleneck models Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang; ICML 2020 paper

A selection of general resources on Explainable AI focusing on overviews, surveys, societal implications, and critiques:

  • Survey and critique of techniques for extracting rules from trained artificial neural networks Robert Andrews, Joachim Diederich, Alan B. Tickle; Knowledge-based systems 1995 page

  • Toward harnessing user feedback for machine learning Simone Stumpf, Vidya Rajaram, Lida Li, Margaret Burnett, Thomas Dietterich, Erin Sullivan, Russell Drummond, Jonathan Herlocker; IUI 2007 paper

  • The Mythos of Model Interpretability Zachary Lipton; CACM 2016 paper

  • A survey of methods for explaining black box models Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi; ACM Computing Surveys 2018 paper

  • Sanity checks for saliency maps Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim; NeurIPS 2018 paper code

  • Explanation in Artificial Intelligence: Insights from the Social Sciences Tim Miller; Artificial Intelligence, 2019 paper

  • Unmasking clever hans predictors and assessing what machines really learn Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller; Nature Communications 2019 paper

  • Interpretation of neural networks is fragile Amirata Ghorbani, Abubakar Abid, James Zou; AAAI 2019 paper

  • A Benchmark for Interpretability Methods in Deep Neural Networks Sara Hooke, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim; NeurIPS 2019 paper code

  • Is Attention Interpretable? Sofia Serrano, Noah A. Smith; ACL 2019 paper

  • Attention is not Explanation Sarthak Jain, and Byron C. Wallace; ACL 2019 paper

  • Attention is not not Explanation Sarah Wiegreffe, and Yuval Pinter; EMNLP-IJCNLP 2019 paper

  • The (un)reliability of saliency methods Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim; Explainable AI: Interpreting, Explaining and Visualizing Deep Learning 2019 paper

  • Explanations can be manipulated and geometry is to blame Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel; NeurIPS 2019 paper

  • Fooling Neural Network Interpretations via Adversarial Model Manipulation Juyeon Heo, Sunghwan Joo, and Taesup Moon; NeurIPS 2019 paper

  • Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Cynthia Rudin; Nature Machine Intelligence 2019 page

  • The Principles and Limits of Algorithm-in-the-loop Decision Making Ben Green, Yiling Chen; PACM HCI 2019 paper

  • Shortcut learning in deep neural networks Robert Geirhos, Jorn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, Felix Wichmann; Nature Machine Intelligence 2020 page

  • When Explanations Lie: Why Many Modified BP Attributions Fail Leon Sixt, Maximilian Granz, Tim Landgraf. ICML 2020 paper

  • The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? Jasmijn Bastings, Katja Filippova; Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP 2020 paper

  • Why Attention is Not Explanation: Surgical Intervention and Causal Reasoning about Neural Models Christopher Grimsley, Elijah Mayfield, Julia Bursten; Language Resources and Evaluation Conference 2020 paper

  • AI for radiographic COVID-19 detection selects shortcuts over signal Alex DeGrave, Joseph Janizek, Su-In Lee; Nature Machine Intelligence 2021 paper code

  • How Well do Feature Visualizations Support Causal Understanding of CNN Activations? Roland Zimmermann,Judy Borowski, Robert Geirhos, Matthias Bethge, Thomas Wallis, Wieland Brendel; arXiv 2021 paper

  • Post hoc explanations may be ineffective for detecting unknown spurious correlation Julius Adebayo, Michael Muelly, Harold Abelson, and Been Kim; ICLR 2022 paper code


Related Lists


Not Yet Sorted

  • Multimodal explanations: Justifying decisions and pointing to the evidence Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach; CVPR 2018 paper

  • Learning Deep Attribution Priors Based On Prior Knowledge Ethan Weinberger, Joseph Janizek, Su-In Lee; NeurIPS 2020 paper


TODO

  • Crawl & reference work on NLP.

Comments

This list is directly inspired by all the awesome awesome lists out there!

awesome-explanatory-supervision's People

Contributors

stefanoteso avatar patrickschrml avatar wolfstam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.