Pseudo-convergent Q-Learning by Competitive Pricebots

Jeffrey O. Kephart; Gerald Tesauro

Abstract

We study novel aspects of multi-agent Qlearning in a model market in which two identical, competing &quot;pricebots&quot; strategically price a commodity. Two fundamentally different solutions are observed: an exact, stationary solution with zero Bellman error consisting of symmetric policies, and a non-stationary, broken-symmetry pseudosolution, with small but non-zero Bellman error. This &quot;pseudo-convergent&quot; asymmetric solution has no analog in ordinary Qlearning. We calculate analytically the form of both solutions, and map out numerically the conditions under which each occurs. We suggest that this observed behavior will also be found more generally in other studies of multi-agent Q-learning, and discuss implications and directions for future research. 1. Introduction Within the next few years, we expect electronic commerce to be an important multi-agent domain in which reinforcement learning will find numerous applications. One such application is automated dynamic pricing...

Keywords

Artificial intelligenceMathematicsComputer science

Related Publications

Experiments with a new boosting algorithm

Yoav Freund , Robert E. Schapire

In an earlier paper, we introduced a new &quot;boosting&quot; algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learni...

1996 7561 citations

Contraction Mappings in the Theory Underlying Dynamic Programming

Eric V. Denardo

Next article Contraction Mappings in the Theory Underlying Dynamic ProgrammingEric V. DenardoEric V. Denardohttps://doi.org/10.1137/1009030PDFBibTexSections ToolsAdd to favorite...

1967 SIAM Review 464 citations

Publication Info

Year: 2000
Type: article
Pages: 463-470
Citations: 24
Access: Closed

External Links

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Jeffrey O. Kephart, 
                                
                                    Gerald Tesauro
                                
                            (2000). 
                            Pseudo-convergent Q-Learning by Competitive Pricebots. 
                            
                            , 463-470.