Actor–critic networks with analogue memristors mimicking reward-based learning

Kevin Portner; Till Zellweger; Flavio Martinelli; Laura Bégon‐Lours; Valeria Bragaglia; Christoph Weilenmann; Daniel Jubin; Donato Francesco Falcone; F. Hermann; Oscar Hrynkevych; Tommaso Stecconi; Antonio La Porta; Ute Drechsler; Antonis Olziersky; Bert Jan Offrein; Wulfram Gerstner; Mathieu Luisier; Alexandros Emboras

doi:10.1038/s42256-025-01149-w

Abstract

Abstract Advancements in memristive devices have given rise to a new generation of specialized hardware for bio-inspired computing. However, most of these implementations draw only partial inspiration from the architecture and functionalities of the mammalian brain. Moreover, the use of memristive hardware is typically restricted to specific elements within the learning algorithm, leaving computationally expensive operations to be executed in software. Here we demonstrate reinforcement learning through an actor–critic temporal difference algorithm implemented on analogue memristors, mirroring the principles of reward-based learning in a neural network architecture similar to the one found in biology. Memristors are used as multipurpose elements within the learning algorithm: they act as synaptic weights that are trained online, they calculate the weight updates associated with the temporal difference error directly in hardware and they determine the actions to navigate the environment. Owing to these features, weight training can take place entirely in memory, eliminating data movement. We test our framework on two navigation tasks—the T-maze and the Morris water maze—using analogue memristors based on the valence change memory effect. Our approach represents the first step towards fully in-memory and online neuromorphic computing engines based on bio-inspired learning schemes.

Affiliated Institutions

Related Publications

Memristive crossbar arrays for brain-inspired computing

Qiangfei Xia , J. Joshua Yang

With their working mechanisms based on ion migration, the switching dynamics and electrical behaviour of memristive devices resemble those of synapses and neurons, making these ...

2019 Nature Materials 1593 citations

Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia

Randall C. O׳Reilly , Michael J. Frank

The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and executive functions (deciding how to manipulat...

2005 Neural Computation 1079 citations

Active perception and recognition learning system based on Actor‐Q architecture

Katsunari Shibata , T. Nishino , Yoichi Okabe

Abstract This paper proposes the Actor‐Q architecture, which is a combination of Q‐Learning and Actor‐Critic architecture, as well as the active perception and recognition learn...

2002 Systems and Computers in Japan 8 citations

Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems

Bahare Kiumarsi , Frank L. Lewis

This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input ...

2014 IEEE Transactions on Neural Networks ... 311 citations

Actor-Critic Reinforcement Learning with Energy-Based Policies

Nicolas Heess , David Silver , Yee Whye Teh

We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly re...

2012 48 citations

Publication Info

Year: 2025
Type: article
Citations: 1
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Actor–critic networks with analogue memristors mimicking reward-based learning

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Kevin Portner, 
                                
                                    Till Zellweger, 
                                
                                    Flavio Martinelli
                                
                                et al.
                            
                            (2025). 
                            Actor–critic networks with analogue memristors mimicking reward-based learning. 
                            Nature Machine Intelligence
                            
                            .
                            https://doi.org/10.1038/s42256-025-01149-w

Identifiers

DOI: 10.1038/s42256-025-01149-w