Abstract

An efficient scheme is presented for a learning control problem of finite Markov chains with unknown dynamics, i.e. with unknown transition probabilities. The scheme is designed to optimize the asymptotic system performance and for easy application to models with relatively many states and decisions. In this scheme a control policy is determined each time through maximization of a simple performance criterion that explicitly incorporates a tradeoff between estimation of the unknown probabilities and control of the system. The policy determination can be easily performed even in the case of large-size models, since the maximizing operation can be greatly simplified by use of the policy-iteration method. It is proven that this scheme becomes epsilon -optimal as well as optimal by suitable choice of control parameter values in the sense that a relative frequency coefficient of making optimal decisions tends to the maximum.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>

Keywords

Markov chainMaximizationScheme (mathematics)Mathematical optimizationControl (management)Optimal controlComputer scienceMarkov decision processIterative learning controlSimple (philosophy)MathematicsMarkov processApplied mathematicsArtificial intelligenceStatisticsMachine learning

Affiliated Institutions

Related Publications

Publication Info

Year
1988
Type
article
Volume
18
Issue
5
Pages
677-684
Citations
34
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

34
OpenAlex

Cite This

Mitsuo Satõ, K. Abe, Hiroshi Takeda (1988). Learning control of finite Markov chains with an explicit trade-off between estimation and control. IEEE Transactions on Systems Man and Cybernetics , 18 (5) , 677-684. https://doi.org/10.1109/21.21595

Identifiers

DOI
10.1109/21.21595