Abstract
We study novel aspects of multi-agent Qlearning in a model market in which two identical, competing "pricebots" strategically price a commodity. Two fundamentally different solutions are observed: an exact, stationary solution with zero Bellman error consisting of symmetric policies, and a non-stationary, broken-symmetry pseudosolution, with small but non-zero Bellman error. This "pseudo-convergent" asymmetric solution has no analog in ordinary Qlearning. We calculate analytically the form of both solutions, and map out numerically the conditions under which each occurs. We suggest that this observed behavior will also be found more generally in other studies of multi-agent Q-learning, and discuss implications and directions for future research. 1. Introduction Within the next few years, we expect electronic commerce to be an important multi-agent domain in which reinforcement learning will find numerous applications. One such application is automated dynamic pricing...
Keywords
Related Publications
Experiments with a new boosting algorithm
In an earlier paper, we introduced a new "boosting" algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learni...
Contraction Mappings in the Theory Underlying Dynamic Programming
Next article Contraction Mappings in the Theory Underlying Dynamic ProgrammingEric V. DenardoEric V. Denardohttps://doi.org/10.1137/1009030PDFBibTexSections ToolsAdd to favorite...
Publication Info
- Year
- 2000
- Type
- article
- Pages
- 463-470
- Citations
- 24
- Access
- Closed