Abstract
We aim to produce predictive models that are not only accurate, but are also\ninterpretable to human experts. Our models are decision lists, which consist of\na series of if...then... statements (e.g., if high blood pressure, then stroke)\nthat discretize a high-dimensional, multivariate feature space into a series of\nsimple, readily interpretable decision statements. We introduce a generative\nmodel called Bayesian Rule Lists that yields a posterior distribution over\npossible decision lists. It employs a novel prior structure to encourage\nsparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy\non par with the current top algorithms for prediction in machine learning. Our\nmethod is motivated by recent developments in personalized medicine, and can be\nused to produce highly accurate and interpretable medical scoring systems. We\ndemonstrate this by producing an alternative to the CHADS$_2$ score, actively\nused in clinical practice for estimating the risk of stroke in patients that\nhave atrial fibrillation. Our model is as interpretable as CHADS$_2$, but more\naccurate.\n
Keywords
Affiliated Institutions
Related Publications
Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)
Boosting is one of the most important recent developments in\nclassification methodology. Boosting works by sequentially applying a\nclassification algorithm to reweighted versi...
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
In an industrial maintenance context, degradation diagnosis is the problem of determining the current level of degradation of operating machines based on measurements. With the ...
Path Aggregation Network for Instance Segmentation
The way that information propagates in neural networks is of great importance. In this paper, we propose Path Aggregation Network (PANet) aiming at boosting information flow in ...
A general approach for developing system‐specific functions to score protein–ligand docked complexes using support vector inductive logic programming
Abstract Despite the increased recent use of protein–ligand and protein–protein docking in the drug discovery process due to the increases in computational power, the difficulty...
IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies
Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to fin...
Publication Info
- Year
- 2015
- Type
- article
- Volume
- 9
- Issue
- 3
- Citations
- 743
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1214/15-aoas848
- arXiv
- 1511.01644