Adaptation for changing stochastic environments through online POMDP policy learning Academic Article uri icon

abstract

  • Computing optimal or approximate policies for partially observable Markov decision processes (POMDPs) is a difficult task. When in addition the characteristics of the environment change over time, the problem is further compounded. A policy that was computed offline may stop being useful after sufficient changes to the environment have occurred. We present an online algorithm for incrementally improving POMDP policies, that is highly motivated by the Heuristic Search Value Iteration (HSVI) approach—locally improving the current value function after every action execution. Our algorithm adapts naturally to slow changes in the environment, without the need to explicitly model the changes. In initial empirical evaluation our algorithm shows a marked improvement over other online POMDP algorithms.

publication date

  • January 1, 2005