Abstract

A straightforward approach to the curse of dimensionality inreinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approximation is not robust, and in even very benign cases, may produce an entirely wrong policy. Wethenintroduce Grow-Support, a new algorithm which is safe from divergence yet can still reap the bene ts of successful generalization. 1

Keywords

Reinforcement learningGeneralizationBellman equationComputer scienceCurse of dimensionalityDynamic programmingFunction approximationConvergence (economics)Divergence (linguistics)Function (biology)Domain (mathematical analysis)Mathematical optimizationArtificial intelligenceMarkov decision processArtificial neural networkAlgorithmMathematicsMarkov process

Affiliated Institutions

Related Publications

Multivariate Smoothing Spline Functions

Given data $z_i = g(t_i ) + \varepsilon _i , 1 \leqq i \leqq n$, where g is the unknown function, the $t_i $ are known d-dimensional variables in a domain $\Omega $, and the $\v...

1984 SIAM Journal on Numerical Analysis 109 citations

Network In Network

Abstract: We propose a novel deep network structure called In Network (NIN) to enhance model discriminability for local patches within the receptive field. The conventional con...

2014 arXiv (Cornell University) 1037 citations

Publication Info

Year
1994
Type
article
Volume
7
Pages
369-376
Citations
506
Access
Closed

External Links

Citation Metrics

506
OpenAlex

Cite This

Justin A. Boyan, Andrew Moore (1994). Generalization in Reinforcement Learning: Safely Approximating the Value Function. , 7 , 369-376.