Abstract

TD($λ$) with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD($λ$) has been shown to minimise the squared error between the approximate value of each state and the true value. However, as far as policy is concerned, it is error in the relative ordering of states that is critical, rather than error in the state values. We illustrate this point, both in simple two-state and three-state systems in which TD($λ$)--starting from an optimal policy--converges to a sub-optimal policy, and also in backgammon. We then present a modified form of TD($λ$), called STD($λ$), in which function approximators are trained with respect to relative state values on binary decision problems. A theoretical analysis, including a proof of monotonic policy improvement for STD($λ$) in the context of the two-state system, is presented, along with a comparison with Bertsekas' differential training method [1]. This is followed by successful demonstrations of STD($λ$) on the two-state system and a variation on the well known acrobot problem.

Keywords

Reinforcement learningMonotonic functionState (computer science)Context (archaeology)Q-learningBellman equationComputer scienceApproximation errorState spaceFunction (biology)Differential (mechanical device)Function approximationMathematicsMathematical optimizationControl theory (sociology)Artificial intelligenceAlgorithmStatisticsArtificial neural networkControl (management)

Related Publications

In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our ...

2000 Machine Learning 132 citations

Publication Info

Year
2025
Type
article
Citations
7
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

7
OpenAlex

Cite This

Lex Weaver, Jonathan Baxter (2025). Reinforcement Learning From State and Temporal Differences. arXiv (Cornell University) . https://doi.org/10.48550/arxiv.2512.08855

Identifiers

DOI
10.48550/arxiv.2512.08855