Abstract

In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program "KnightCap" used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.

Keywords

Computer scienceArtificial intelligenceDomain (mathematical analysis)The InternetVariation (astronomy)Function (biology)Machine learningMathematicsWorld Wide Web

Affiliated Institutions

Related Publications

Temporal difference learning and TD-Gammon

Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, che...

1995 Communications of the ACM 1457 citations

GameFlow

Although player enjoyment is central to computer games, there is currently no accepted model of player enjoyment in games. There are many heuristics in the literature, based on ...

2005 Computers in entertainment 2008 citations

Publication Info

Year
2000
Type
article
Volume
40
Issue
3
Pages
243-263
Citations
132
Access
Closed

External Links

Social Impact

Altmetric
PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

132
OpenAlex

Cite This

Jonathan Baxter, Andrew Tridgell, Lex Weaver (2000). . Machine Learning , 40 (3) , 243-263. https://doi.org/10.1023/a:1007634325138

Identifiers

DOI
10.1023/a:1007634325138