Mehdi Keramati, UCL, London, UK
A spectrum from habitual to goal-directed responding by optimising the depth of planning
Animals and computers alike enjoy two different methods for making choices in complex, multi-step, environments. One is a prospective, goal-directed, decision process that relies on mental simulation of the environment; the other is a retrospective habitual process that caches returns garnered when those choices were made in the past. Artificial systems combine the two by simulating the environment up to some depth, and then exploiting habitual cached values as proxies for further future consequences. Here we present a model for computing the optimal depth of forward planning in such a normative plan-until-habit system. We suggests that a speed-accuracy tradeoff controls the depth of planning with a deeper search leading to more accurate evaluation, but at a cost of slower decision-making.
Furthermore, we use a novel task and provide the first evidence that human subjects use this plan-until-habit strategy. Supporting the theoretical predictions, increasing time pressure led to shallower goal-directed planning. Our experimental results imply that the dichotomy of habitual and goal-directed responding only captures two extremes of a spectrum where the depth of planning can take on intermediate values.
Additional Information
Part of the Colloquium of the GRK "Sensory Computation in Neural Systems"
Organized by
Alexander Genauck / Andreas Heinz