Evolution
of Reinforcement Learning in Uncertain Environments: A Simple Explanation for
Complex Foraging Behaviors
Reinforcement
learning is a fundamental process by which organisms learn to achieve goals
from their interactions with the environment. Using Evolutionary Computation
techniques we evolve (near) optimal neuronal learning rules in a simple neural
network model of reinforcement learning in bumblebees foraging for nectar. The resulting neural networks exhibit
efficient reinforcement learning, allowing the bees to respond rapidly to
changes in reward contingencies. The evolved synaptic plasticity dynamics give
rise to varying exploration/exploitation levels and to the well-documented
choice strategies of risk aversion and probability matching. Risk-averse
behavior is evolved even in a risk-less environment, and in contrast to
existing theories in economics and game theory, it is shown to be a direct
consequence of optimal reinforcement learning, without requiring additional
assumptions such as the existence of a non-linear subjective utility function.
Our results are corroborated by a rigorous mathematical analysis, and their
robustness in real-world situations is supported by experiments in a mobile
robot. Thus we provide a biologically
founded, parsimonious and novel explanation for risk aversion and probability
matching.