Generalization in Reinforcement Learning: Safely Approximating the Value Function

Abstract

A straightforward approach to the curse of dimensionality inreinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approximation is not robust, and in even very benign cases, may produce an entirely wrong policy. Wethenintroduce Grow-Support, a new algorithm which is safe from divergence yet can still reap the bene ts of successful generalization. 1

Keywords

Reinforcement learningGeneralizationBellman equationComputer scienceCurse of dimensionalityDynamic programmingFunction approximationConvergence (economics)Divergence (linguistics)Function (biology)Domain (mathematical analysis)Mathematical optimizationArtificial intelligenceMarkov decision processArtificial neural networkAlgorithmMathematicsMarkov process

Affiliated Institutions

Related Publications

Achieving coordination tasks in finite time via nonsmooth gradient flows

Jorge Cortés

This paper introduces the normalized and signed gradient dynamical systems associated with a differentiable function. Extending recent results on nonsmooth stability analysis, w...

2006 27 citations

Global optimization of a neural network-hidden Markov model hybrid

Yoshua Bengio , Renato De Mori , Giovanni Flammia +1 more

An original method for integrating artificial neural networks (ANN) with hidden Markov models (HMM) is proposed. ANNs are suitable for performing phonetic classification, wherea...

2002 18 citations

Multivariate Smoothing Spline Functions

Dennis D. Cox

Given data $z_i = g(t_i ) + \varepsilon _i , 1 \leqq i \leqq n$, where g is the unknown function, the $t_i $ are known d-dimensional variables in a domain $\Omega $, and the $\v...

1984 SIAM Journal on Numerical Analysis 109 citations

AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics

Johan A. A. Nylander , James C. Wilgenbusch , Dan L. Warren +1 more

Abstract Summary: A key element to a successful Markov chain Monte Carlo (MCMC) inference is the programming and run performance of the Markov chain. However, the explicit use o...

2007 Bioinformatics 1743 citations

Network In Network

Min Lin , Qiang Chen , Shuicheng Yan

Abstract: We propose a novel deep network structure called In Network (NIN) to enhance model discriminability for local patches within the receptive field. The conventional con...

2014 arXiv (Cornell University) 1037 citations

Publication Info

Year: 1994
Type: article
Volume: 7
Pages: 369-376
Citations: 506
Access: Closed

External Links

Citation Metrics

506

OpenAlex

Cite This

APA Style

                            
                                    Justin A. Boyan, 
                                
                                    Andrew Moore
                                
                            (1994). 
                            Generalization in Reinforcement Learning: Safely Approximating the Value Function. 
                            
                            , 7
                            
                            , 369-376.