Evolution of Reinforcement Learning in Uncertain Environments: A Simple Explanation for Complex Foraging Behaviors

Yael Niv, Daphna Joel, Isaac Meilijson and Eytan Ruppin

Reinforcement learning is a fundamental process by which organisms learn to achieve goals from their interactions with the environment. Using Evolutionary Computation techniques we evolve (near) optimal neuronal learning rules in a simple neural network model of reinforcement learning in bumblebees foraging for nectar.� The resulting neural networks exhibit efficient reinforcement learning, allowing the bees to respond rapidly to changes in reward contingencies. The evolved synaptic plasticity dynamics give rise to varying exploration/exploitation levels and to the well-documented choice strategies of risk aversion and probability matching. Risk-averse behavior is evolved even in a risk-less environment, and in contrast to existing theories in economics and game theory, it is shown to be a direct consequence of optimal reinforcement learning, without requiring additional assumptions such as the existence of a non-linear subjective utility function. Our results are corroborated by a rigorous mathematical analysis, and their robustness in real-world situations is supported by experiments in a mobile robot.� Thus we provide a biologically founded, parsimonious and novel explanation for risk aversion and probability matching.

(Full text - ps file 865K)