Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)

Leo Breiman

doi:10.1214/ss/1009213726

Abstract

There are two cultures in the use of statistical modeling to reach\nconclusions from data. One assumes that the data are generated by a given\nstochastic data model. The other uses algorithmic models and treats the data\nmechanism as unknown. The statistical community has been committed to the\nalmost exclusive use of data models. This commitment has led to irrelevant\ntheory, questionable conclusions, and has kept statisticians from working on a\nlarge range of interesting current problems. Algorithmic modeling, both in\ntheory and practice, has developed rapidly in fields outside statistics. It can\nbe used both on large complex data sets and as a more accurate and informative\nalternative to data modeling on smaller data sets. If our goal as a field is to\nuse data to solve problems, then we need to move away from exclusive dependence\non data models and adopt a more diverse set of tools.

Keywords

Computer scienceData setField (mathematics)Statistical modelSet (abstract data type)Range (aeronautics)Data scienceStatistical theoryEconometricsData miningMachine learningArtificial intelligenceStatisticsMathematics

Related Publications

Statistical modeling: The two cultures

Leo Breiman

Abstract. There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated bya given stochastic data model. The ...

2001 1341 citations

Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion)

Aki Vehtari , Andrew Gelman , Daniel Simpson +2 more

Markov chain Monte Carlo is a key computational tool in Bayesian statistics,\nbut it can be challenging to monitor the convergence of an iterative stochastic\nalgorithm. In this...

2020 Bayesian Analysis 1245 citations

A multivariate technique for multiply imputing missing values using a sequence of regression models

Trivellore E. Raghunathan , James M. Lepkowski , John Van Hoewyk +1 more

This article describes and evaluates a procedure for imputing missing values for a relatively complex data structure when the data are missing at random. The imputations are obt...

2001 Survey methodology 1994 citations

Arcing classifier (with discussion and a rejoinder by the author)

Leo Breiman

Recent work has shown that combining multiple versions of unstable\nclassifiers such as trees or neural nets results in reduced test set error. One\nof the more effective is bag...

1998 The Annals of Statistics 1088 citations

Maximum Likelihood Estimation and Model Selection in Contingency Tables with Missing Data

Camil Fuchs

Abstract In many studies the values of one or more variables are missing for subsets of the original sample. This article focuses on the problem of obtaining maximum likelihood ...

1982 Journal of the American Statistical A... 146 citations

Publication Info

Year: 2001
Type: article
Volume: 16
Issue: 3
Citations: 4037
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

4037

OpenAlex

Cite This

APA Style

                            
                                    Leo Breiman
                                
                            (2001). 
                            Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). 
                            Statistical Science
                            , 16
                            (3)
                            .
                            https://doi.org/10.1214/ss/1009213726

Identifiers

DOI: 10.1214/ss/1009213726