Use este identificador para citar ou linkar para este item:
https://repositorio.ufpe.br/handle/123456789/46630
Compartilhe esta página
Título: | DyLam : a dynamic reward weighting method for reinforcement learning policy gradient algorithms |
Autor(es): | MACHADO, Mateus Gonçalves |
Palavras-chave: | Engenharia da computação; Aprendizagem |
Data do documento: | 7-Jun-2022 |
Editor: | Universidade Federal de Pernambuco |
Citação: | MACHADO, Mateus Gonçalves. DyLam: a dynamic reward weighting method for reinforcement learning policy gradient algorithms. 2022. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2022. |
Abstract: | Reinforcement Learning (RL) is an emergent subfield of Machine Learning in which an agent interacts with an environment and leverages their experiences to learn, by trial and error, which actions are the most appropriate for each state. At each step the agent receives a positive or negative reward signal, which is the main feedback used for learning. RL finds applications in many areas, such as robotics, stock exchange, and even in cooling systems, presenting superhuman performance in learning to play board games (Chess and Go) and video games (Atari Games, Dota2, and StarCraft2). However, RL methods still struggle in environments with sparse rewards. For example, an agent may receive very few goal score rewards in a soccer game. Thus, it is hard to associate rewards (goals) with actions. Researchers frequently introduce multiple intermediary rewards to help learning and circumvent this problem. However, adequately combining multiple rewards to compose the unique reward signal used by the RL methods frequently is not an easy task. This work aims to solve this specific problem by introducing DyLam. It extends existing policy gradient methods by decomposing the reward function used in the environment and dynamically weighting each component as a function of the agent’s performance on the associated task. We prove the convergence of the proposed method and show empirically that it overcomes competitor methods in the environments evaluated in terms of learning speed and, in some cases, the final performance. |
URI: | https://repositorio.ufpe.br/handle/123456789/46630 |
Aparece nas coleções: | Dissertações de Mestrado - Ciência da Computação |
Arquivos associados a este item:
Arquivo | Descrição | Tamanho | Formato | |
---|---|---|---|---|
DISSERTAÇÃO Mateus Gonçalves Machado.pdf | 7,09 MB | Adobe PDF | ![]() Visualizar/Abrir |
Este arquivo é protegido por direitos autorais |
Este item está licenciada sob uma Licença Creative Commons