Actor–critic networks with analogue memristors mimicking reward-based learning

2025 Nature Machine Intelligence 1 citations

Abstract

Abstract Advancements in memristive devices have given rise to a new generation of specialized hardware for bio-inspired computing. However, most of these implementations draw only partial inspiration from the architecture and functionalities of the mammalian brain. Moreover, the use of memristive hardware is typically restricted to specific elements within the learning algorithm, leaving computationally expensive operations to be executed in software. Here we demonstrate reinforcement learning through an actor–critic temporal difference algorithm implemented on analogue memristors, mirroring the principles of reward-based learning in a neural network architecture similar to the one found in biology. Memristors are used as multipurpose elements within the learning algorithm: they act as synaptic weights that are trained online, they calculate the weight updates associated with the temporal difference error directly in hardware and they determine the actions to navigate the environment. Owing to these features, weight training can take place entirely in memory, eliminating data movement. We test our framework on two navigation tasks—the T-maze and the Morris water maze—using analogue memristors based on the valence change memory effect. Our approach represents the first step towards fully in-memory and online neuromorphic computing engines based on bio-inspired learning schemes.

Affiliated Institutions

Related Publications

Publication Info

Year
2025
Type
article
Citations
1
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1
OpenAlex

Cite This

Kevin Portner, Till Zellweger, Flavio Martinelli et al. (2025). Actor–critic networks with analogue memristors mimicking reward-based learning. Nature Machine Intelligence . https://doi.org/10.1038/s42256-025-01149-w

Identifiers

DOI
10.1038/s42256-025-01149-w