Evolution
of Reinforcement Learning in Uncertain Environments: A Simple Explanation for
Complex Foraging Behaviors
Reinforcement
learning is a fundamental process by which organisms learn to achieve goals
from their interactions with the environment. Using Evolutionary Computation
techniques we evolve (near) optimal neuronal learning rules in a simple neural
network model of reinforcement learning in bumblebees foraging for nectar.� The resulting neural networks exhibit
efficient reinforcement learning, allowing the bees to respond rapidly to
changes in reward contingencies. The evolved synaptic plasticity dynamics give
rise to varying exploration/exploitation levels and to the well-documented
choice strategies of risk aversion and probability matching. Risk-averse
behavior is evolved even in a risk-less environment, and in contrast to
existing theories in economics and game theory, it is shown to be a direct
consequence of optimal reinforcement learning, without requiring additional
assumptions such as the existence of a non-linear subjective utility function.
Our results are corroborated by a rigorous mathematical analysis, and their
robustness in real-world situations is supported by experiments in a mobile
robot.� Thus we provide a biologically
founded, parsimonious and novel explanation for risk aversion and probability
matching.