Abstract
A straightforward approach to the curse of dimensionality inreinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximator such as a neural net. Although this has been successful in the domain of backgammon, there is no guarantee of convergence. In this paper, we show that the combination of dynamic programming and function approximation is not robust, and in even very benign cases, may produce an entirely wrong policy. Wethenintroduce Grow-Support, a new algorithm which is safe from divergence yet can still reap the bene ts of successful generalization. 1
Keywords
Affiliated Institutions
Related Publications
Achieving coordination tasks in finite time via nonsmooth gradient flows
This paper introduces the normalized and signed gradient dynamical systems associated with a differentiable function. Extending recent results on nonsmooth stability analysis, w...
Global optimization of a neural network-hidden Markov model hybrid
An original method for integrating artificial neural networks (ANN) with hidden Markov models (HMM) is proposed. ANNs are suitable for performing phonetic classification, wherea...
Multivariate Smoothing Spline Functions
Given data $z_i = g(t_i ) + \varepsilon _i , 1 \leqq i \leqq n$, where g is the unknown function, the $t_i $ are known d-dimensional variables in a domain $\Omega $, and the $\v...
AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics
Abstract Summary: A key element to a successful Markov chain Monte Carlo (MCMC) inference is the programming and run performance of the Markov chain. However, the explicit use o...
Network In Network
Abstract: We propose a novel deep network structure called In Network (NIN) to enhance model discriminability for local patches within the receptive field. The conventional con...
Publication Info
- Year
- 1994
- Type
- article
- Volume
- 7
- Pages
- 369-376
- Citations
- 506
- Access
- Closed