Abstract
Recent reviews have dealt with the subject of which variables to select and which to discard in multiple regression problems. Lindley (1968) emphasized that the method to be employed in any analysis should be related to the use intended for the finally fitted regression. In the report by Beale et al. (1967), the emphasis is on selecting the best subset for any specified number of retained independent variables. Here we will be concerned with pointing out the advantages of the variable selection scheme in which independent variables are successively discarded one at a time from the original full set. While these advantages are not unknown to workers in this field, they are however not appreciated by the statistical community in general. For the purposes of this demonstration it is assumed that we are in the nonsingular case so that the number of observations exceeds the number of regressor variables. Let us begin by considering economy of effort. Suppose that we were using a step-up regression procedure, ignoring for the while its theoretical deficiencies (to be discussed later). We should then first fit k simple regressions, one for each of the k regressor variables considered, selecting the single most significant individual regressor variable. Having made this selection we would proceed with k - 1 additional fits to determine which of the remaining variables in conjunction with the first selected yielded the greatest reduction in residual variation. This process is continued on so as to provide a successive selection and ordering of variables. We may even require the ordering of all k variables, leaving for later decision what critical juncture is to be employed in determining which of the k variables to retain, which to reject-if we do so we shall have made a total of k(k + 1)/2 fits, albeit they may have differed greatly in their degree of complexity. A complete stepdown regression procedure however requires but k fits, as will now be indicated. Suppose we have done a multiple regression on all k variables and wish to consider the k possible multiple regressions on all sets of k - 1 variables, that is where 1 variable has been deleted. The results for these k possible multiple regressions are implicit in the initial k-variable regression, provided we have secured the inverse matrix, or at least its diagonal, necessary for testing the significance of the fitted partial regression coefficients. The case
Keywords
Affiliated Institutions
Related Publications
Selection of Variables for Fitting Equations to Data
Selecting a suitable equation to represent a set of multifactor data that was collected for other purposes in a plant, pilot-plant, or laboratory can be troublesome. If there ar...
A Comparison of Least Squares and Latent Root Regression Estimators
Miilticollinesrity among the columns of regressor variables is known to cause severe distortion of the least squares estimates of the parameters in a multiple linear regression ...
On the misuse of residuals in ecology: regression of residuals vs. multiple regression
1 Residuals from linear regressions are used frequently in statistical analysis, often for the purpose of controlling for unwanted effects in multivariable datasets. This paper ...
Inadmissibility of Maximum Likelihood Estimators in Some Multiple Regression Problems with Three or More Independent Variables
Consider a multiple regression problem in which the dependent variable and (3 or more) independent variables have a joint normal distribution. This problem was investigated some...
The Choice of Variables in Multiple Regression
Summary This paper is concerned with the analysis of data from a multiple regression of a single variable, y, on a set of independent variables, x 1,x 2,...,xr. It is argued tha...
Publication Info
- Year
- 1970
- Type
- article
- Volume
- 12
- Issue
- 3
- Pages
- 621-625
- Citations
- 261
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1080/00401706.1970.10488701