Rationale: Dopamine neurotransmission has long been known to exert a
powerful influence over the vigor, strength or rate of responding.
However, there exists no clear understanding of the computational
foundation for this effect; predominant accounts of dopamine's
computational function focus on a role for phasic dopamine in controlling
the discrete selection between different actions, and have nothing to say
about response vigor, or indeed the free operant tasks in which it is
typically measured. Objectives: We seek to accommodate free operant
behavioral tasks within the realm of models of optimal control and thereby
capture how dopaminergic and motivational manipulations affect response
vigor. Methods: We construct an average reward reinforcement learning
model in which subjects choose both which action to perform, and also the
latency with which to perform it. Optimal control balances the costs
of acting quickly against the benefits of getting reward earlier,
and thereby chooses a best response latency. Results: In this framework,
the long run average rate of reward plays a key role as an
opportunity cost, and mediates motivational influences on rates
and vigor of responding. We review evidence suggesting that the
average reward rate is reported by tonic levels of dopamine,
putatively in the nucleus accumbens. Conclusions: Our extension of
reinforcement learning models to free operant tasks unites psychologically
and computationally inspired ideas about the role of tonic dopamine in
striatum, explaining from a normative point of view why higher levels of
dopamine might be associated with more vigorous
responding.