Abstract

In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, pilco can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-ofthe-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks. 1. Introduction and Related

Keywords

Computer scienceReinforcement learningInferenceProbabilistic logicKey (lock)Artificial intelligenceMachine learningPolicy learningScratchData modelingControl (management)Approximate inference

Affiliated Institutions

Related Publications

Publication Info

Year
2011
Type
article
Pages
465-472
Citations
1076
Access
Closed

External Links

Citation Metrics

1076
OpenAlex

Cite This

Marc Peter Deisenroth, Carl Edward Rasmussen (2011). PILCO: A Model-Based and Data-Efficient Approach to Policy Search. Scientific Repository (Petra Christian University) , 465-472.