Data Science Principles for Interpretable and Explainable AI

Sankaran, Kris

doi:10.6339/24-JDS1150

Journal of Data Science

Data Science Principles for Interpretable and Explainable AI

Kris Sankaran

https://doi.org/10.6339/24-JDS1150

Pub. online: 18 September 2024 Type: Data Science Reviews

Open Access

Received
17 May 2024

Accepted
22 August 2024

Published
18 September 2024

Abstract

Society’s capacity for algorithmic problem-solving has never been greater. Artificial Intelligence is now applied across more domains than ever, a consequence of powerful abstractions, abundant data, and accessible software. As capabilities have expanded, so have risks, with models often deployed without fully understanding their potential impacts. Interpretable and interactive machine learning aims to make complex models more transparent and controllable, enhancing user agency. This review synthesizes key principles from the growing literature in this field. We first introduce precise vocabulary for discussing interpretability, like the distinction between glass box and explainable models. We then explore connections to classical statistical and design principles, like parsimony and the gulfs of interaction. Basic explainability techniques – including learned embeddings, integrated gradients, and concept bottlenecks – are illustrated with a simple case study. We also review criteria for objectively evaluating interpretability approaches. Throughout, we underscore the importance of considering audience goals when designing interactive data-driven systems. Finally, we outline open challenges and discuss the potential role of data science in addressing them. Code to reproduce all examples can be found at https://go.wisc.edu/3k1ewe.

Supplementary material

Supplementary Material

Code to reproduce our simulation experiment can be found at https://go.wisc.edu/v623lq.

References

Achtibat R, Dreyer M, Eisenbraun I, Bosse S, Wiegand T, Samek W, et al. (2023). From attribution maps to human-understandable explanations through concept relevance propagation. Nature Machine Intelligence, 5(9): 1006–1019. https://doi.org/10.1038/s42256-023-00711-8

Adebayo J, Gilmer J, Muelly M, Goodfellow I, Hardt M, Kim B (2018). Sanity checks for saliency maps. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, (S Bengio H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, eds.), 9525–9536. Curran Associates Inc., Red, Hook, NY, USA.

Agrawala M, Li W, Berthouzoz F (2011). Design principles for visual communication. Communications of the ACM, 54(4): 60–69. https://doi.org/10.1145/1924421.1924439

Arras L, Osman A, Samek W (2022). CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations. Information Fusion, 81: 14–40. https://doi.org/10.1016/j.inffus.2021.11.008

Baniecki H, Biecek P (2019). modelstudio: Interactive studio with explanations for ml predictive models. The Journal of Open Source Software, 4(43): 1798. https://doi.org/10.21105/joss.01798

Baniecki H, Kretowicz W, Piątyszek P, Wiśniewski J, Biecek P (2021). dalex: Responsible machine learning with interactive explainability and fairness in python. Journal of Machine Learning Research, 22(214): 1–7.

Baniecki H, Parzych D, Biecek P (2023). The grammar of interactive explanatory model analysis. Data Mining and Knowledge Discovery, 38: 1–37.

Bansal G, Wu T, Zhou J, Fok R, Nushi B, Kamar E, et al. (2021). Does the whole exceed its parts? The effect of AI explanations on complementary team performance. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21. (Y Kitamura, A Quigley, K Isbister, T Igarashi, P Bjorn, S Drucker, eds.), ACM.

Bengio Y (2009). Learning Deep Architectures for AI. NOW.

Bengio Y, Courville A, Vincent P (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1798–1828. https://doi.org/10.1109/TPAMI.2013.50

Bilodeau B, Jaques N, Koh PW, Kim B (2024). Impossibility theorems for feature attribution. Proceedings of the National Academy of Sciences, 121(2). https://doi.org/10.1073/pnas.2304406120

Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, et al. (2021). On the opportunities and risks of foundation models.

Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, et al. (2013). What makes a visualization memorable? IEEE Transactions on Visualization and Computer Graphics, 19(12): 2306–2315. https://doi.org/10.1109/TVCG.2013.234

Buçinca Z, Lin P, Gajos KZ, Glassman EL (2020). Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems. In: Proceedings of the 25th International Conference on Intelligent User Interfaces, IUI ’20, (F Paternò, N Oliver, C Conati, L Spano, N Tintarev, eds.). ACM.

Bykov K, Kopf L, Nakajima S, Kloft M, Höhne MMC (2023). Labeling neural representations with inverse recognition.

Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, (L Cao, C Zhang, T Joachims, G Webb, D D Margineantu, G Williams, eds), 1721–1730. Association for Computing Machinery, New York, NY, USA.

Casper S, Bu T, Li Y, Li J, Zhang K, Hariharan K, et al. (2023). Red teaming deep neural networks with feature synthesis tools. In: Advances in Neural Information Processing Systems (A Oh, T Naumann, A Globerson, K Saenko, M Hardt, S Levine, eds.), volume 36, 80470–80516. Curran Associates, Inc.

Chai R, Tsourdos A, Savvaris A, Chai S, Xia Y, Philip Chen C (2021). Review of advanced guidance and control algorithms for space/aerospace vehicles. Progress in Aerospace Sciences, 122: 100696. https://doi.org/10.1016/j.paerosci.2021.100696

Chen S, Guhur PL, Schmid C, Laptev I (2021). History aware multimodal transformer for vision-and-language navigation. In: Advances in Neural Information Processing Systems (M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, JW Vaughan, eds.), volume 34, 5834–5847. Curran Associates, Inc.

Cleveland WS (1993). Visualizing Data. Hobart Press.

Coenen A, Reif E, Yuan A, Kim B, Pearce A, Viégas F, et al. (2019). Visualizing and Measuring the Geometry of BERT. Curran Associates Inc., Red, Hook, NY, USA.

Colin J, T Cadene R FEL, Serre T (2022). What I cannot predict, I do not understand: A human-centered evaluation framework for explainability methods. In: Advances in Neural Information Processing Systems (S Koyejo, S Mohamed, A Agarwal, D Belgrave, K Cho, A Oh, eds.), volume 35, 2832–2845. Curran Associates, Inc.

Crabbe J, Qian Z, Imrie F, van der Schaar M (2021). Explaining latent representations with a corpus of examples. In: Advances in Neural Information Processing Systems (M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, JW Vaughan, eds.), volume 34, 12154–12166. Curran Associates, Inc.

Dandl S, Casalicchio G, Bischl B, Bothmann L (2023). Interpretable regional descriptors: Hyperbox-based local explanations. In: Machine Learning and Knowledge Discovery in Databases: Research Track, Lecture Notes in Computer Science, (D Koutra, C Plant, M Gomez Rodriguez, E Baralis, F Bonchi, eds.), 479–495. Springer Nature, Switzerland, Cham.

Devlin J, Chang MW, Lee K, Toutanova K (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (J Burstein, C Doran, T Solorio, eds.), 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota.

Ding DY, Li S, Narasimhan B, Tibshirani R (2022). Cooperative learning for multiview analysis. Proceedings of the National Academy of Sciences, 119(38).

Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010). Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11(19): 625–660.

Fridovich-Keil S, Yu A, Tancik M, Chen Q, Recht B, Kanazawa A (2022). Plenoxels: Radiance fields without neural networks. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (R Chellappa, J Matas, L Quan, M Shah, eds). IEEE.

Friedman J, Hastie T, Tibshirani R (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1): 1–22. https://doi.org/10.18637/jss.v033.i01

Friedman JH (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5). 1189–1232. https://doi.org/10.1214/aos/1013203451

Fumagalli F, Muschalik M, Kolpaczki P, Hüllermeier E, Hammer B (2023). SHAP-IQ: Unified approximation of any-order shapley interactions.

Ghorbani A, Wexler J, Zou J, Kim B (2019). Towards Automatic Concept-Based Explanations. Curran Associates Inc., Red, Hook, NY, USA.

Gosmann C, Anahtar MN, Handley SA, Farcasanu M, Abu-Ali G, Bowman BA, et al. (2017). Lactobacillus-deficient cervicovaginal bacterial communities are associated with increased HIV acquisition in young South African women. Immunity, 46(1): 29–37. https://doi.org/10.1016/j.immuni.2016.12.013

Gu T, Liu K, Dolan-Gavitt B, Garg S (2019). Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7: 47230–47244. https://doi.org/10.1109/ACCESS.2019.2909068

Guidotti R (2021). Evaluating local explanation methods on ground truth. Artificial Intelligence, 291(103428): 103428. https://doi.org/10.1016/j.artint.2020.103428

Guidotti R (2022). Counterfactual explanations and how to find them: literature review and benchmarking. Data Mining and Knowledge Discovery. 28: 2770–2770.

Hazimeh H, Ponomareva N, Mol P, Tan Z, Mazumder R (2020). The tree ensemble layer: Differentiability meets conditional computation. In: Proceedings of the 37th International Conference on Machine Learning, ICML’20, (H Daumé, A Singh, eds.). JMLR.org.

Hedström A, Bommer P, KK Samek W W, Lapuschkin S, Höhne MMC (2023a). The meta-evaluation problem in explainable AI: Identifying reliable estimators with MetaQuantus.

Hedström A, Weber L, Krakowczyk D, Bareeva D, Motzkus F, Samek W, et al. (2023b). Quantus: An explainable AI toolkit for responsible evaluation of neural network explanations and beyond. Journal of Machine Learning Research, 24(34): 1–11.

Heer J (2019). Agency plus automation: Designing artificial intelligence into interactive systems. Proceedings of the National Academy of Sciences, 116(6): 1844–1850. https://doi.org/10.1073/pnas.1807184115

Herbinger J, Bischl B, Casalicchio G (2022). REPID: Regional effect plots with implicit interaction detection. 10209–10233.

Hesse R, Schaub-Meyer S, Roth S (2023). FunnyBirds: A synthetic vision dataset for a part-based analysis of explainable AI methods. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), (J Kosecka, J Ponce, C Schmid, A Zisserman, eds.). IEEE.

Hewitt J, Manning CD (2019). A structural probe for finding syntax in word representations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (J Burstein, C Doran, T Solorio, eds.), 4129–4138. Association for Computational Linguistics, Minneapolis, Minnesota.

Holmes S (2017). Statistical proof? The problem of irreproducibility. Bulletin of the American Mathematical Society, 55(1): 31–55. https://doi.org/10.1090/bull/1597

Hooker S, Erhan D, Kindermans PJ, Kim B (2019). A benchmark for interpretability methods in deep neural networks. In: Advances in Neural Information Processing Systems (H Wallach, H Larochelle, A Beygelzimer, F Alché-Buc, E Fox, R Garnett, eds.), volume 32. Curran Associates, Inc.

Huber PJ (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1): 73–101. https://doi.org/10.1214/aoms/1177703732

Hutchins EL, Hollan JD, Norman DA (1985). Direct manipulation interfaces. Human-Computer Interaction, 1(4): 311–338. https://doi.org/10.1207/s15327051hci0104_2

Jeganathan P, Callahan BJ, Proctor DM, Relman DA, Holmes SP (2018). The block bootstrap method for longitudinal microbiome data.

Jeyakumar JV, Noor J, Cheng YH, Garcia L, Srivastava M (2020). How can I explain this to you? An empirical study of deep neural network explanation methods. In: Advances in Neural Information Processing Systems, volume 33, (H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin, eds.), 4211–4222. Curran Associates, Inc.

Karimi AH, Barthe G, Schölkopf B, Valera I (2022). A survey of algorithmic recourse: Contrastive explanations and consequential recommendations. ACM Computing Surveys, 55(5): 1–29.

Kim B (2022). Beyond Interpretability: Developing a Language to Shape Our Relationships with AI.

Kodikara S, Ellul S, Cao KA L (2022). Statistical challenges in longitudinal microbiome data analysis. Briefings in Bioinformatics, 23(4). https://doi.org/10.1093/bib/bbac273

Koh PW, Liang P (2017). Understanding black-box predictions via influence functions. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, (D Precup, Y.W Teh, eds.), 1885–1894. JMLR.org.

Koh PW, Nguyen T, Tang YS, Mussmann S, Pierson E, Kim B, et al. (2020). Concept bottleneck models.

Kolesnikov A, Dosovitskiy A, Weissenborn D, Heigold G, Uszkoreit J, Beyer L, et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale.

Kostic AD, Gevers D, Siljander H, Vatanen T, Hyötyläinen T, Hämäläinen AM, et al. (2015). The dynamics of the human infant gut microbiome in development and in progression toward type 1 diabetes. Cell Host and Microbe, 17(2): 260–273. https://doi.org/10.1016/j.chom.2015.01.001

Krishnan M (2019). Against interpretability: A critical examination of the interpretability problem in machine learning. Philosophy and Technology, 33(3): 487–502. https://doi.org/10.1007/s13347-019-00372-9

Kundaliya D (2023). Computing - incisive media: Google AI chatbot bard gives wrong answer in its first demo. Computing. Nom - OpenAI; Copyright - Copyright Newstex Feb 9, 2023; Dernière mise à jour - 2023-11-30.

Lee JS (2021). Transformers: a Primer.

Lipton ZC (2018). The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3): 31–57. https://doi.org/10.1145/3236386.3241340

Liu Y, Khandagale S, White C, Neiswanger W (2021). Synthetic benchmarks for scientific research in explainable machine learning.

Loh W (2014). Fifty years of classification and regression trees. International Statistical Review, 82(3): 329–348. https://doi.org/10.1111/insr.12016

Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1): 56–67. https://doi.org/10.1038/s42256-019-0138-9

Lundberg SM, Lee SI (2017). A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, (U. von Luxburg, I Guyon, S Bengio, H Wallach, R Fergus, eds.), 4768–4777. Curran Associates Inc., Red, Hook, NY, USA.

Lundstrom DD, Huang T, Razaviyayn M (2022). A rigorous study of integrated gradients method and extensions to internal neuron attributions. In: Proceedings of the 39th International Conference on Machine Learning (K Chaudhuri, S Jegelka, L Song, C Szepesvari, G Niu, S Sabato, eds.), volume 162 of Proceedings of Machine Learning Research, 14485–14508. PMLR.

Ma J, Lai V, Zhang Y, Chen C, Hamilton P, Ljubenkov D, et al. (2024). OpenHEXAI: An open-source framework for human-centered evaluation of explainable machine learning.

Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6): 1–35. https://doi.org/10.1145/3457607

Mothilal RK, Sharma A, Tan C (2020). Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, (M Hildebrandt, C Castillo, E Celis, S Ruggieri, L Taylor, G Zanfir-Fortuna, eds.), ACM, New York, NY, USA.

Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019). Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44): 22071–22080. https://doi.org/10.1073/pnas.1900654116

Nannini L, Balayn A, Smith AL (2023). Explainability in AI policies: A critical review of communications, reports, regulations, and standards in the EU, US, and UK. In: 2023 ACM Conference on Fairness, Accountability, and Transparency. ACM, New York, NY, USA.

Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011). Multimodal deep learning. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, (L Getoor, T Scheffer, eds.), 689–696. Omnipress, Madison, WI, USA.

Nguyen LH, Holmes S (2019). Ten quick tips for effective dimensionality reduction. PLoS Computational Biology, 15(6): e1006907. https://doi.org/10.1371/journal.pcbi.1006907

Nussberger AM, Luo L, Celis LE, Crockett MJ (2022). Public attitudes value interpretability but prioritize accuracy in artificial intelligence. Nature Communications, 13(1): 5821.

Oppermann M, Munzner T (2022). Vizsnippets: Compressing visualization bundles into representative previews for browsing visualization collections. IEEE Transactions on Visualization and Computer Graphics, 28(1): 747–757. https://doi.org/10.1109/TVCG.2021.3114841

Pennington J, Socher R, Manning C (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (A Moschitti, B Pang, W Daelemans, eds.), Association for Computational Linguistics.

Poursabzi-Sangdeh F, Goldstein DG, Hofman JM, Wortman Vaughan JW, Wallach H (2021). Manipulating and measuring model interpretability. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, (Y Kitamura, A Quigley, K Isbister, T Igarashi, P Bjørn, S Drucker, eds.), ACM, New York, NY, USA.

Raghu M, Unterthiner T, Kornblith S, Zhang C, Dosovitskiy A (2021). Do vision transformers see like convolutional neural networks? In: Advances in Neural Information Processing Systems (M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, JW Vaughan, eds.), volume 34, 12116–12128. Curran Associates, Inc.

Raghu M, Zhang C, Kleinberg J, Bengio S (2019). Transfusion: Understanding Transfer Learning for Medical Imaging. Curran Associates Inc., Red, Hook, NY, USA.

Resnick P, Varian HR (1997). Recommender systems. Communications of the ACM, 40(3): 56–58. https://doi.org/10.1145/245108.245121

Ribeiro MT, Singh S, Guestrin C (2016). “Why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, (B Krishnapuram, M Shah, A Smola, C Aggarwal, D Shen, R Rastogi, eds.), 1135–1144. Association for Computing Machinery, New York, NY, USA.

Rong Y, Leemann T, Nguyen TT, Fiedler L, Qian P, Unhelkar V, et al. (2024). Towards human-centered explainable AI: A survey of user studies for model explanations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(4): 2104–2122. https://doi.org/10.1109/TPAMI.2023.3331846

Ross AS, Hughes MC, Doshi-Velez F (2017). Right for the right reasons: Training differentiable models by constraining their explanations. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, (C Sierra, ed.), California.

Rudin C (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5): 206–215. https://doi.org/10.1038/s42256-019-0048-x

Sankaran K, Holmes SP (2023). Generative models: An interdisciplinary perspective. Annual Review of Statistics and Its Application, 10(1): 325–352. https://doi.org/10.1146/annurev-statistics-033121-110134

Sedlmair M, Meyer M, Munzner T (2012). Design study methodology: Reflections from the trenches and the stacks. IEEE Transactions on Visualization and Computer Graphics, 18(12): 2431–2440. https://doi.org/10.1109/TVCG.2012.213

Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization, (K Ikeuchi, G Medioni, M Pelillo, eds.), In: 2017 IEEE International Conference on Computer Vision (ICCV), 618–626.

Silverman JD, Shenhav L, Halperin E, Mukherjee S, David LA (2018). Statistical considerations in the design and analysis of longitudinal microbiome studies.

Simonyan K, Vedaldi A, Zisserman A (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In: Workshop at International Conference on Learning Representations.

Sokol K, Flach P (2020). One explanation does not fit all. KI. Künstliche Intelligenz, 34(2): 235–250. https://doi.org/10.1007/s13218-020-00637-y

Sturmfels P, Lundberg S, Lee SI (2020). Visualizing the impact of feature attribution baselines. Distill, 5(1). https://doi.org/10.23915/distill.00022

Sundararajan M, Taly A, Yan Q (2017). Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, (D Precup, YW Teh), 3319–3328. JMLR.org.

Teh YW (2019). On Statistical Thinking in Deep Learning A Blog Post. IMS Medallion Lecture.

Tufte ER (2001). The Visual Display of Quantitative Information, 2 edition. Graphics Press, Cheshire, CT.

Tukey JW (1959). A Survey of Sampling from Contaminated Distributions. volume 33 of STRG Technical report, 57.

Van Boxtel GJM, et al. (2021). gsignal: Signal Processing.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. (2017). Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, (U von Luxburg, I Guyon, S Bengio, H Wallach, R Fergus, eds.), 6000–6010. Curran Associates Inc., Red, Hook, NY, USA.

Wang L, Shen Y, Peng S, Zhang S, Xiao X, Liu H, et al. (2022). A fine-grained interpretability evaluation benchmark for neural NLP. In: Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), (A Fokkens, V Srikumar, eds.), Association for Computational Linguistics, Stroudsburg, PA, USA.

Williamson B, Feng J (2020). Efficient nonparametric statistical inference on population feature importance using Shapley values. In: Proceedings of the 37th International Conference on Machine Learning (HD III (A Singh, ed.), volume 119 of Proceedings of Machine Learning Research, 10282–10291. PMLR.

Wongsuphasawat K, Smilkov D, Wexler J, Wilson J, Mané D, Fritz D, et al. (2018). Visualizing dataflow graphs of deep learning models in tensorflow. IEEE Transactions on Visualization and Computer Graphics, 24: 1–12. https://doi.org/10.1109/TVCG.2017.2744878

Wu T, Ribeiro MT, Heer J, Weld D (2021). Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), (C Zong, F Xia, W Li, R Navigli, eds.), Association for Computational Linguistics.

Xie SM, Raghunathan A, Liang P, Ma T (2022). An explanation of in-context learning as implicit Bayesian inference. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.

Xin R, Zhong C, Chen Z, Takagi T, Seltzer M, Rudin C (2022). Exploring the whole rashomon set of sparse decision trees, arXiv preprint: https://arxiv.org/abs/2209.08040.

Yosinski J, Clune J, Nguyen AM, Fuchs TJ, Lipson H (2015). Understanding neural networks through deep visualization, arXiv preprint: https://arxiv.org/abs/1506.06579.

Yuksekgonul M, Wang M, Zou J (2023). Post-hoc concept bottleneck models. In: The Eleventh International Conference on Learning Representations, (H Wang, W Lin, H He, D Wang, C Mao, M Zhang, eds.).

Zeiler MD, Fergus R (2013). Visualizing and understanding convolutional networks, arXiv preprint: https://arxiv.org/abs/1311.2901.

Zeng J, Ustun B, Rudin C (2016). Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society. Series A. Statistics in Society, 180(3): 689–722. https://doi.org/10.1111/rssa.12227

Zhong C, Chen Z, Liu J, Seltzer M, Rudin C (2023). Exploring and interacting with the set of good sparse generalized additive models.

Zhou Y, Booth S, Ribeiro MT, Shah J (2022). Do feature attribution methods correctly attribute features? Proceedings of the AAAI Conference on Artificial Intelligence, 36(9): 9623–9633.

Zou X, Yang J, Zhang H, Li F, Li L, Wang J, et al. (2023). Segment everything everywhere all at once. In: Advances in Neural Information Processing Systems (A Oh, T Naumann, A Globerson, K Saenko, M Hardt, S Levine, eds.), volume 36, 19769–19782. Curran Associates, Inc.

2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.

Open access article under the CC BY license.

Keywords

explainability Human Computer Interaction interpretability trustworthy machine learning

Funding

This work was supported in part by grant number R01GM152744 from the National Institute of General Medical Sciences.

Metrics

since February 2021

1971

Article info
views

477

PDF
downloads

RSS

Authors

Abstract

Supplementary material

References

Export citation

Copy and paste formatted citation

Download citation in file