Can Eren Sezener: Computing the Value of Computation for Planning

BCCN Berlin / TU Berlin

An intelligent agent performs actions in order to achieve its goals. Such actions can either be externally directed, such as opening a door, or internally directed, such as writing data to a memory location or strengthening a synaptic connection. Some internal actions, to which we refer as computations, potentially help the agent choose better external actions. Considering that actions and computations might draw upon the same resources, such as time and energy, the agent needs to decide between acting or computing. This decision could be made by assigning values not only to actions, but to computations as well.
In an environment that provides rewards depending on an agent's behavior, an action's value is typically defined as the sum of expected long-term rewards succeeding the action (itself a complex quantity that depends on what the agent goes on to do after the action in question). However, defining the value of a computation is not as straightforward, as computations are only valuable in a higher order way, through the alteration of actions.
This thesis offers a principled way of computing the value of a computation in a planning setting formalized as a Markov decision process. We present two different definitions of computation values: namely static and dynamic values. They address two extreme cases of the computation budget: affording calculation of one or infinitely many steps in the future. We show that these values have desirable properties, such as time consistency, and asymptotic convergence.
Furthermore, we propose methods for efficiently computing or approximating the static and dynamic computation values. We describe a sense in which the policies that greedily maximize these values can be optimal. Furthermore, we utilize these principles to construct Monte Carlo tree search algorithms that outperform the state-of-the-art in terms of finding higher quality actions given the same simulation resources.

Additional Information

Master thesis defence in the International Master Program Computational Neuroscience

Organized by

Klaus Obermayer / Robert Martin

Go back