TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

1994 Neural Computation 783 citations

Abstract

TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results, based on the TD(λ) reinforcement learning algorithm (Sutton 1988). Despite starting from random initial weights (and hence random initial strategy), TD-Gammon achieves a surprisingly strong level of play. With zero knowledge built in at the start of learning (i.e., given only a “raw” description of the board state), the network learns to play at a strong intermediate level. Furthermore, when a set of hand-crafted features is added to the network's input representation, the result is a truly staggering level of performance: the latest version of TD-Gammon is now estimated to play at a strong master level that is extremely close to the world's best human players.

Keywords

Reinforcement learningSet (abstract data type)Representation (politics)Computer scienceArtificial intelligenceArtificial neural network

Affiliated Institutions

Related Publications

In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our ...

2000 Machine Learning 132 citations

Temporal difference learning and TD-Gammon

Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, che...

1995 Communications of the ACM 1457 citations

Publication Info

Year
1994
Type
article
Volume
6
Issue
2
Pages
215-219
Citations
783
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

783
OpenAlex

Cite This

Gerald Tesauro (1994). TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation , 6 (2) , 215-219. https://doi.org/10.1162/neco.1994.6.2.215

Identifiers

DOI
10.1162/neco.1994.6.2.215