Open Access System for Information Sharing

Conference

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedbac

Title: Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedbac

Authors: Kim, Jangwon; Kim, Hangyeol; kang, Jiwook; Baek, Jongchan; HAN, SOOHEE

Abstract: We present a novel actor-critic algorithm for an environment with delayed feedback, which addresses the state-space explosion problem of conventional approaches. Conventional approaches use an augmented state constructed from the last observed state and actions executed since visiting the last observed state Using the augmented state space, the correct Markov decision process for delayed environments can be constructed; however, this causes the state space to explode as the number of delayed timesteps increases, leading to slow convergence. Our proposed algorithm, called Belief-Projection-Based Q-learning (BPQL), addresses the state-space explosion problem by evaluating the values of the critic for which the input state size is equal to the original state-space size rather than that of the augmented one. We compare BPQL to traditional approaches in continuous control tasks and demonstrate that it significantly outperforms other algorithms in terms of asymptotic performance and sample efficiency. We also show that BPQL solves long-delayed environments, which conventional approaches are unable to do.