Reinforcement learning with spiking neural networks

CHEVTCHENKO, Sergio Fernandovitch

Use este identificador para citar ou linkar para este item: https://repositorio.ufpe.br/handle/123456789/54351

Compartilhe esta página

Título:	Reinforcement learning with spiking neural networks
Autor(es):	CHEVTCHENKO, Sergio Fernandovitch
Palavras-chave:	Aprendizagem por reforço; STDP; Redes neurais de impulsos; FEAST; ODESA
Data do documento:	15-Ago-2023
Editor:	Universidade Federal de Pernambuco
Citação:	CHEVTCHENKO, Sérgio Fernandovitch. Reinforcement learning with spiking neural networks. 2023. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.
Abstract:	Artificial intelligence systems have made impressive progress in recent years, but they still lag behind simple biological brains in terms of control capabilities and power con- sumption. Spiking neural networks (SNNs) seek to emulate the energy efficiency, learning speed, and temporal processing of biological brains. However, in the context of reinforce- ment learning (RL), SNNs still fall short of traditional neural networks. The primary aim of this work is to bridge the performance gap between spiking models and powerful deep RL (DRL) algorithms on specific tasks. To this end, we have proposed new architectures that have been compared, both in terms of learning speed and final accuracy, to DRL algorithms and classical tabular RL approaches. This thesis consists of three stages. The initial stage presents a simple spiking model that addresses the scalability limitations of related models in terms of the state space. The model is evaluated on two classical RL problems: grid-world and acrobot. The results suggest that the proposed spiking model is comparable to both tabular and DRL algorithms, while maintaining an advantage in terms of complexity over the DRL algorithm. In the second stage, we further explore the proposed model by combining it with a binary feature extraction network. A binary con- volutional neural network (CNN) is pre-trained on a set of naturalistic RGB images and a separate set of images is used as observations on a modified grid-world task. We present improvements in architecture and dynamics to address this more challenging task with image observations. As before, the model is experimentally compared to state-of-the-art DRL algorithms. Additionally, we provide supplementary experiments to present a more detailed view of the connectivity and plasticity between different layers of the network. The third stage of this thesis presents a novel neuromorphic architecture for solving RL problems with real-valued observations. The proposed model incorporates feature extrac- tion layers, with the addition of temporal difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model’s performance. Our model consistently outper- forms the tabular approach and successfully discovers stable control policies in mountain car, cart-pole and acrobot environments. Although the proposed model does not outper- form PPO in terms of optimal performance, it offers an appealing trade-off in terms of computational and hardware implementation requirements: the model does not require an external memory buffer nor global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcast TD-error signal. We conclude by highlighting the limitations of our approach and suggest promising directions for future research.
URI:	https://repositorio.ufpe.br/handle/123456789/54351
Aparece nas coleções:	Teses de Doutorado - Ciência da Computação

Arquivos associados a este item:

Arquivo	Descrição	Tamanho	Formato
TESE Sergio Fernandovitch Chevtchenko.pdf		10,96 MB	Adobe PDF	Visualizar/Abrir

Este arquivo é protegido por direitos autorais

Ver licença

Mostrar registro completo do item Recomendar este item Visualizar estatísticas

Este item está licenciada sob uma Licença Creative Commons