Jonathan Baxter; Andrew Tridgell; Lex Weaver

doi:10.1023/a:1007634325138

Abstract

In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program "KnightCap" used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.

Keywords

Computer scienceArtificial intelligenceDomain (mathematical analysis)The InternetVariation (astronomy)Function (biology)Machine learningMathematicsWorld Wide Web

Affiliated Institutions

Australian National University AU

Related Publications

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

Gerald Tesauro

TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results, based on the TD(λ) reinforcement le...

1994 Neural Computation 783 citations

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning

Stefan Elfwing , Eiji Uchibe , Kenji Doya

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level hu...

2018 Neural Networks 1643 citations

Temporal difference learning and TD-Gammon

Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, che...

1995 Communications of the ACM 1457 citations

Move Evaluation in Go Using Deep Convolutional Neural Networks

Chris J. Maddison , Aja Huang , Ilya Sutskever +1 more

The game of Go is more challenging than other board games, due to the difficulty of constructing a position or move evaluation function. In this paper we investigate whether dee...

2014 arXiv (Cornell University) 92 citations

GameFlow

Penelope Sweetser , Peta Wyeth

Although player enjoyment is central to computer games, there is currently no accepted model of player enjoyment in games. There are many heuristics in the literature, based on ...

2005 Computers in entertainment 2008 citations

Publication Info

Year: 2000
Type: article
Volume: 40
Issue: 3
Pages: 243-263
Citations: 132
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

132

OpenAlex

Cite This

APA Style

                            
                                    Jonathan Baxter, 
                                
                                    Andrew Tridgell, 
                                
                                    Lex Weaver
                                
                            (2000). 
                            . 
                            Machine Learning
                            , 40
                            (3)
                            , 243-263.
                            https://doi.org/10.1023/a:1007634325138

Identifiers

DOI: 10.1023/a:1007634325138