Abstract
An efficient scheme is presented for a learning control problem of finite Markov chains with unknown dynamics, i.e. with unknown transition probabilities. The scheme is designed to optimize the asymptotic system performance and for easy application to models with relatively many states and decisions. In this scheme a control policy is determined each time through maximization of a simple performance criterion that explicitly incorporates a tradeoff between estimation of the unknown probabilities and control of the system. The policy determination can be easily performed even in the case of large-size models, since the maximizing operation can be greatly simplified by use of the policy-iteration method. It is proven that this scheme becomes epsilon -optimal as well as optimal by suitable choice of control parameter values in the sense that a relative frequency coefficient of making optimal decisions tends to the maximum.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
Keywords
Affiliated Institutions
Related Publications
Decentralized learning in finite Markov chains
The principal contribution of this paper is a new result on the decentralized control of finite Markov chains with unknown transition probabilities and rewords. One decentralize...
Stochastic Petri net representation of discrete event simulations
In the context of discrete event simulation, the marking of a stochastic Petri net (SPN) corresponds to the state of the underlying stochastic process of the simulation and the ...
A unifying maximum-likelihood view of cumulant and polyspectral measures for non-Gaussian signal classification and estimation
Classification and estimation of non-Gaussian signals observed in additive Gaussian noise of unknown covariance are addressed using cumulants or polyspectra. By integrating idea...
The performance of adaptive window flow controls in a dynamic load environment
The behavior of window flow control schemes that adapt to changing network conditions is studied. A dynamic window scheme, which adjusts the window size based on explicit networ...
A tutorial on hidden Markov models and selected applications in speech recognition
This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of i...
Publication Info
- Year
- 1988
- Type
- article
- Volume
- 18
- Issue
- 5
- Pages
- 677-684
- Citations
- 34
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/21.21595