Reinforcement learning models have long promised to
unify computational, psychological and neural accounts of appetitively
conditioned behavior. However, the bulk of data on animal conditioning comes
from free-operant experiments measuring how hard animals will work for
reinforcement. Existing reinforcement learning (RL) models are silent about
these tasks, because they lack any notion of vigor. They thus fail to address
the simple observation that hungrier animals will work harder for food, as well
as stranger facts such as their sometimes greater productivity even when
working for irrelevant outcomes such as water. Here, we develop an RL framework
for free-operant behavior, suggesting that subjects choose how vigorously to
perform selected actions by optimally balancing the costs and benefits of quick
responding. Motivational states such as hunger shift these factors, skewing the
tradeoff. This accounts normatively for the effects of motivation on
productivity, as well as many other classic findings. Finally, we suggest that
tonic dopamine may be involved in the computation linking motivational state to
optimal responding, thereby explaining the complex vigor-related effects of
pharmacological manipulation of dopamine.