2000
Machine Learning
132 citations
Abstract
In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program "KnightCap" used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.
Keywords
Computer scienceArtificial intelligenceDomain (mathematical analysis)The InternetVariation (astronomy)Function (biology)Machine learningMathematicsWorld Wide Web
Affiliated Institutions
Related Publications
TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play
TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results, based on the TD(λ) reinforcement le...
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning
In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro's TD-Gammon achieved near top-level hu...
Temporal difference learning and TD-Gammon
Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, che...
Move Evaluation in Go Using Deep Convolutional Neural Networks
The game of Go is more challenging than other board games, due to the difficulty of constructing a position or move evaluation function. In this paper we investigate whether dee...
GameFlow
Although player enjoyment is central to computer games, there is currently no accepted model of player enjoyment in games. There are many heuristics in the literature, based on ...
Publication Info
- Year
- 2000
- Type
- article
- Volume
- 40
- Issue
- 3
- Pages
- 243-263
- Citations
- 132
- Access
- Closed
External Links
Social Impact
Altmetric
PlumX Metrics
Social media, news, blog, policy document mentions
Citation Metrics
132
OpenAlex
Cite This
Jonathan Baxter,
Andrew Tridgell,
Lex Weaver
(2000).
.
Machine Learning
, 40
(3)
, 243-263.
https://doi.org/10.1023/a:1007634325138
Identifiers
- DOI
- 10.1023/a:1007634325138