Abstract

The problem of selecting the best subset or subsets of independent variables in a multiple linear regression analysis is two-fold. The first, and most important problem is the development of criterion for choosing between two contending subsets. Applying these criteria to all possible subsets, if the number of independent variables is large, may not be economically feasible and so the second problem is concerned with decreasing the computational effort. This paper is concerned with the second question using the C p -statistic of Mallows as the basic criterion for comparing two regressions. A procedure is developed which will indicate 'good' regressions with B minimum of computation.

Keywords

Selection (genetic algorithm)StatisticsRegressionRegression analysisComputer scienceMathematicsMachine learning

Affiliated Institutions

Related Publications

The Choice of Variables in Multiple Regression

Summary This paper is concerned with the analysis of data from a multiple regression of a single variable, y, on a set of independent variables, x 1,x 2,...,xr. It is argued tha...

1968 Journal of the Royal Statistical Soci... 234 citations

Publication Info

Year
1967
Type
article
Volume
9
Issue
4
Pages
531-531
Citations
77
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

77
OpenAlex

Cite This

R. R. Hocking, R. N. Leslie (1967). Selection of the Best Subset in Regression Analysis. Technometrics , 9 (4) , 531-531. https://doi.org/10.2307/1266192

Identifiers

DOI
10.2307/1266192