Abstract

The principal contribution of this paper is a new result on the decentralized control of finite Markov chains with unknown transition probabilities and rewords. One decentralized decision maker is associated with each state in which two or more actions (decisions) are available. Each decision maker uses a simple learning scheme, requiring minimal information, to update its action choice. It is shown that, if updating is done in sufficiently small steps, the group will converge to the policy that maximizes the long-term expected reward per step. The analysis is based on learning in sequential stochastic games and on certain properties, derived in this paper, of ergodic Markov chains. A new result on convergence in identical payoff games with a unique equilibrium point is also presented.

Keywords

Stochastic gameMarkov chainMarkov decision processConvergence (economics)Computer scienceMathematical optimizationMarkov processExamples of Markov chainsErgodic theoryQ-learningMathematical economicsMarkov kernelVariable-order Markov modelMarkov modelReinforcement learningMathematicsArtificial intelligenceMachine learningEconomics

Affiliated Institutions

Related Publications

Publication Info

Year
1986
Type
article
Volume
31
Issue
6
Pages
519-526
Citations
100
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

100
OpenAlex

Cite This

Richard M. Wheeler, Kumpati S. Narendra (1986). Decentralized learning in finite Markov chains. IEEE Transactions on Automatic Control , 31 (6) , 519-526. https://doi.org/10.1109/tac.1986.1104342

Identifiers

DOI
10.1109/tac.1986.1104342