Abstract

We study novel aspects of multi-agent Qlearning in a model market in which two identical, competing "pricebots" strategically price a commodity. Two fundamentally different solutions are observed: an exact, stationary solution with zero Bellman error consisting of symmetric policies, and a non-stationary, broken-symmetry pseudosolution, with small but non-zero Bellman error. This "pseudo-convergent" asymmetric solution has no analog in ordinary Qlearning. We calculate analytically the form of both solutions, and map out numerically the conditions under which each occurs. We suggest that this observed behavior will also be found more generally in other studies of multi-agent Q-learning, and discuss implications and directions for future research. 1. Introduction Within the next few years, we expect electronic commerce to be an important multi-agent domain in which reinforcement learning will find numerous applications. One such application is automated dynamic pricing...

Keywords

Artificial intelligenceMathematicsComputer science

Related Publications

Publication Info

Year
2000
Type
article
Pages
463-470
Citations
24
Access
Closed

External Links

Citation Metrics

24
OpenAlex

Cite This

Jeffrey O. Kephart, Gerald Tesauro (2000). Pseudo-convergent Q-Learning by Competitive Pricebots. , 463-470.