Abstract

Significance The recent surge in interpretability research has led to confusion on numerous fronts. In particular, it is unclear what it means to be interpretable and how to select, evaluate, or even discuss methods for producing interpretations of machine-learning models. We aim to clarify these concerns by defining interpretable machine learning and constructing a unifying framework for existing methods which highlights the underappreciated role played by human audiences. Within this framework, methods are organized into 2 classes: model based and post hoc. To provide guidance in selecting and evaluating interpretation methods, we introduce 3 desiderata: predictive accuracy, descriptive accuracy, and relevancy. Using our framework, we review existing work, grounded in real-world studies which exemplify our desiderata, and suggest directions for future work.

Keywords

InterpretabilityComputer scienceArtificial intelligenceCategorizationMachine learningContext (archaeology)Interpretation (philosophy)Modularity (biology)VocabularyData scienceFocus (optics)

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
article
Volume
116
Issue
44
Pages
22071-22080
Citations
1865
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1865
OpenAlex
51
Influential

Cite This

William J. Murdoch, Chandan Singh, Karl Kumbier et al. (2019). Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences , 116 (44) , 22071-22080. https://doi.org/10.1073/pnas.1900654116

Identifiers

DOI
10.1073/pnas.1900654116
PMID
31619572
PMCID
PMC6825274
arXiv
1901.04592

Data Quality

Data completeness: 88%