Learning control of finite Markov chains with an explicit trade-off between estimation and control
An efficient scheme is presented for a learning control problem of finite Markov chains with unknown dynamics, i.e. with unknown transition probabilities. The scheme is designed...