Please use this identifier to cite or link to this item:
https://repositorio.ufpe.br/handle/123456789/46630
Share on
Title: | DyLam : a dynamic reward weighting method for reinforcement learning policy gradient algorithms |
Authors: | MACHADO, Mateus Gonçalves |
Keywords: | Engenharia da computação; Aprendizagem |
Issue Date: | 7-Jun-2022 |
Publisher: | Universidade Federal de Pernambuco |
Citation: | MACHADO, Mateus Gonçalves. DyLam: a dynamic reward weighting method for reinforcement learning policy gradient algorithms. 2022. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2022. |
Abstract: | Reinforcement Learning (RL) is an emergent subfield of Machine Learning in which an agent interacts with an environment and leverages their experiences to learn, by trial and error, which actions are the most appropriate for each state. At each step the agent receives a positive or negative reward signal, which is the main feedback used for learning. RL finds applications in many areas, such as robotics, stock exchange, and even in cooling systems, presenting superhuman performance in learning to play board games (Chess and Go) and video games (Atari Games, Dota2, and StarCraft2). However, RL methods still struggle in environments with sparse rewards. For example, an agent may receive very few goal score rewards in a soccer game. Thus, it is hard to associate rewards (goals) with actions. Researchers frequently introduce multiple intermediary rewards to help learning and circumvent this problem. However, adequately combining multiple rewards to compose the unique reward signal used by the RL methods frequently is not an easy task. This work aims to solve this specific problem by introducing DyLam. It extends existing policy gradient methods by decomposing the reward function used in the environment and dynamically weighting each component as a function of the agent’s performance on the associated task. We prove the convergence of the proposed method and show empirically that it overcomes competitor methods in the environments evaluated in terms of learning speed and, in some cases, the final performance. |
URI: | https://repositorio.ufpe.br/handle/123456789/46630 |
Appears in Collections: | Dissertações de Mestrado - Ciência da Computação |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
DISSERTAÇÃO Mateus Gonçalves Machado.pdf | 7,09 MB | Adobe PDF | ![]() View/Open |
This item is protected by original copyright |
This item is licensed under a Creative Commons License