Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables

Abstract

The use of automated subset search algorithms is reviewed and issues concerning model selection and selection criteria are discussed. In addition, a Monte Carlo study is reported which presents data regarding the frequency with which authentic and noise variables are selected by automated subset algorithms. In particular, the effects of the correlation between predictor variables, the number of candidate predictor variables, the size of the sample, and the level of significance for entry and deletion of variables were studied for three automated subset algorithms: BACKWARD ELIMINATION, FORWARD SELECTION, and STEPWISE. Results indicated that: (1) the degree of correlation between the predictor variables affected the frequency with which authentic predictor variables found their way into the final model; (2) the number of candidate predictor variables affected the number of noise variables that gained entry to the model; (3) the size of the sample was of little practical importance in determining the number of authentic variables contained in the final model; and (4) the population multiple coefficient of determination could be faithfully estimated by adopting a statistic that is adjusted by the total number of candidate predictor variables rather than the number of variables in the final model.

Keywords

StatisticsSelection (genetic algorithm)MathematicsMonte Carlo methodStatisticFeature selectionSample size determinationNoise (video)Sample (material)PopulationAlgorithmComputer scienceArtificial intelligence

Affiliated Institutions

University of Manitoba CA

Related Publications

Feature selection: evaluation, application, and small sample performance

Anil K. Jain , Douglas E. Zongker

A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection algorithm, proposed by...

1997 IEEE Transactions on Pattern Analysis... 2147 citations

Comparative Performance of Bayesian and AIC-Based Measures of Phylogenetic Model Uncertainty

Michael E. Alfaro , John P. Huelsenbeck

Reversible-jump Markov chain Monte Carlo (RJ-MCMC) is a technique for simultaneously evaluating multiple related (but not necessarily nested) statistical models that has recentl...

2006 Systematic Biology 90 citations

Latent Class Model Diagnosis

Elizabeth S. Garrett , Scott L. Zeger

Summary. In many areas of medical research, such as psychiatry and gerontology, latent class variables are used to classify individuals into disease categories, often with the i...

2000 Biometrics 275 citations

How to Use a Monte Carlo Study to Decide on Sample Size and Determine Power

Linda K. Muthén , Bengt Muthén

Abstract A common question asked by researchers is, "What sample size do I need for my study?" Over the years, several rules of thumb have been proposed. In reality there is no ...

2002 Structural Equation Modeling A Multid... 2142 citations

A study of the power associated with testing factor mean differences under violations of factorial invariance

David M. Kaplan , Rani Mary George

We examine the power associated with the test of factor mean differences when the assumption of factorial invariance is violated. Utilizing the Wald test for obtaining power, is...

1995 Structural Equation Modeling A Multid... 80 citations

Publication Info

Year: 1992
Type: article
Volume: 45
Issue: 2
Pages: 265-282
Citations: 760
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

760

OpenAlex

Cite This

APA Style

                            
                                    Shelley Derksen, 
                                
                                    H. J. Keselman
                                
                            (1992). 
                            Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. 
                            British Journal of Mathematical and Statistical Psychology
                            , 45
                            (2)
                            , 265-282.
                            https://doi.org/10.1111/j.2044-8317.1992.tb00992.x

Identifiers

DOI: 10.1111/j.2044-8317.1992.tb00992.x