Signals,
noise and information
(A course for PhD
students at Rockefeller University, Fall 2008)
(Last update 30 November 2008)
Professor William Bialek
(wbialek@rockefeller.edu)
Teaching Assistant: Stefano
Di Talia (ditalis@mail.rockefeller.edu)
Much
of biological function is about the flow and processing of information.
Examples range from bacteria to brains, with many stops in between. All
of these many different biological systems, however, are constrained by common
physical principles. For example, using only a limited number of
molecules to transmit signals means that cells will experience some irreducible
noise related to the random behavior of the individual molecules. There is a
long history of experiments on signals and noise in biological systems, and in
recent years these experiments have expanded to encompass a wider variety of
biological processes, including the regulation of gene expression.
Remarkably, many biological systems seem to operate near the relevant
physical limits to the their signaling performance, and this may give us a
glimpse of the �design principles� which select the structure of these systems
from the wide range of possibilities. In this course we'll explore the underlying
physical principles, the associated mathematical tools, and (in some detail)
the connection of these theoretical ideas to quantitative experiments.
Note: This will be taught in the style of
physics courses, with regularly assigned problem sets. Problems will
range from analytic calculations to small simulations to the analysis of real
data. I hope to provide a clear view of rather advanced ideas,
which is easier if the students have some background. In the first offering
of the course, however, I will try to adjust the mathematical level (including
the problems) to match the students' experience. I also hope that
students will bring their knowledge of different biological systems to the
discussion, stimulating everyone to think about how the theoretical ideas can
cut across traditional subdivisions of biology. Please feel free to contact me with any questions, or to
discuss your background and interests.
Organizational meeting: Thursday, 4 September, 11:30 AM in
Smith Hall Annex Room B29
Lectures:
Tuesdays 10 AM �til noon, starting on 7 October, continuing through 2
December; wrapup on Thursday, 11 December. Meetings in Smith Hall Annex Room B29. No lecture on 11 November
Discussion sessions for problem sets: Thursdays 10 AM �til noon, starting on
9 October, continuing through 4 December. Meetings in Smith Hall Annex Room
B29.
For the last three weeks, we will focus on
information theory (as described below).
Problems will be embedded in a set of lecture notes. Here is the
first installment, and here is the second; still more
to come.
For the sixth problem set,
we focus on the ideas of noise and estimation as illustrated by the problem of
bacterial chemotaxis. Most of the
effort is to understand the nature of dynamics and noise in diffusion, filling
in the background to the classical discussion by Berg and Purcell.
The fifth problem set will come in two parts, and
carry us through until 18 November.
For the first part, try these two problems. For the second part
you�ll need data, which you will find here; you should
download the file rather than opening it.
For the fourth problem
set, I would like you to look at some real data on the responses of rod
photoreceptor cells to single photons, and explore for yourself some of the
issues about decision making that we discussed in class.
For the third problem set, you should read Hopfield�s original paper about kinetic
proofreading, and answer several
questions. There will be a
discussion session on 16 Oct, as usual; please turn in your answers to the
questions on 21 Oct.
Second problem set will be
due on 14 Oct, although we may discover that things spill over into the
following week. Let�s see how it
goes. There will be a discussion
session on 9 Oct.
First problem set will be due
on 1 Oct. An introduction to
MATLAB will be given on Th 11 Sep, 10 AM �til noon in the Computer Center; this
is only for students with no
previous exposure to MATLAB. A
discussion session will be held on Th 25 Sep, 10 AM �til noon in Smith Hall Annex Room B29.
What follows is a description of the topics I
would like to cover. As I get to
know the students, and get some sense for the right level of the course, I will
add links to a fuller outline, hopefully with some lecture notes and references
as well. You will also find links
to problem sets as they become available.
Part
One: The central dogma, revisited (7 & 14 Oct)
Genetic information is encoded by the identity of
molecular components (bases) in a polymer (DNA), and the transmission of this
information depends upon the selection of complementary components out of a
soup of possibilities. We know
that, as enzymes catalyze these reactions, there is some rate at which the
�wrong� components are chosen; more generally, any reaction mechanism involves
some unproductive or incorrect side branches. On a macroscopic scale we can measure the rate at which the
wrong reactions occur, but on the scale of single molecules—which is what
matters for the life of the cell!—these rates translate into probabilities of error:
the probability that the wrong base will be inserted into the DNA or
mRNA sequence, that the wrong tRNA is charged with a particular amino acid, or
that the wrong amino acid will be incorporated into a protein.
By way of introduction to the course, we will
discuss the probabilities of error in these very special biosynthetic
reactions, and explore the origins and consequences of these errors. Hopefully this relatively
familiar biological context will provide a good setting in which to illustrate
more abstract definitions of signals, noise and information. We�ll use the connections between error
probabilities and free energies to remind ourselves of some important concepts
from physics, and we�ll see how the nominal limits to precision set by
thermodynamics can be evaded, but only at the cost of dissipating energy
(kinetic proofreading). Once cells
implement these active mechanisms for error correction, they have to make
choices about how to trade between the energetic cost of accuracy and the
functional cost of errors. In
important cases, such as protein synthesis, the error rates also depend on
other parameters that cells can control, such as the relative abundances of the
different tRNA species. These
observations suggest that, at a system level, cells may have strategies for
optimizing their performance in the face of errors. We�ll try to formulate these strategies and explore how such
abstract principles can be tested experimentally.
Part
Two: Making decisions (21 & 28 Oct)
As you must have noticed by now, an important
part of life is making decisions.
Cells make decisions about whether to divide, people make decisions
about the identity (or attractiveness) of faces seen across the room, and there
are many other examples. In all
these cases, organisms face the problem that the available data are not
necessarily reliable indicators of the correct decision. Indeed, the signals that can be
collected typically are only probabilistically related to the decision, and so
there is some chance of error. The
problem faced by the organism thus is to process the incoming data in ways that
minimize this probability of error, or more generally to minimize the cost associated
with these errors.
There is a common mathematical framework for
thinking about this wide range of decision problems. This approach focuses our attention on a variety of
questions. First, what limits the
reliability of the data? Are there
fundamental reasons (such as the random behavior of individual molecules) for
the �noisiness� of the available signals, or is this noise itself something
that the organism could reduce?
Next, given a characterization of the signals and noise, what computations
should the organism do in order to maximize the reliability of its
decisions? How can it combine data
from multiple sources, or integrate evidence over time, to increase its
reliability? What processing is
needed to insure that decisions are invariant to irrelevant variables? Finally, how can these abstract
operations be implemented in real biological hardware, either at the level of
biochemical circuits within single cells or in the neural circuits of our
brains? Can we identify aspects of
this circuitry that serve to optimize decision making, perhaps in surprising
ways? In this part of the course
we�ll address these questions, using examples that span the full range from
single cells to cognition.
Part
Three: Making estimates (and acting
on them) (4 & 18 Nov)
A glass begins to fall from the table, and you
reach to catch it. This simple,
seemingly reflexive act requires a great deal of computation. You use your visual system to estimate
the trajectory of the falling object; you transform this estimate into commands
to move your arm and hand into position to intercept the trajectory; once you
make contact your somatosensory system provides data about the weight and
texture of the object; and you transform these new data into commands for the
strength of your grip and the lift force required to carry the glass gently
back to the table. At the same time, cells in your immune system are sensing
the concentration of various chemicals, estimating the direction in which they
should crawl to find the foreign invaders. In each case, estimates need to be accurate or precise in
order to allow the organism (or the single cell) to function efficiently, and
it seems that we can define a notion of optimal estimation, corresponding to
maximally efficient performance.
Parallel to our discussion of decision making,
many different estimation problems can be given a unified mathematical
formulation, and again even setting up the problem in such mathematical terms
the problem raises several basic questions. What is the nature of the noise source that limits the
precision of estimation? How can
the available data be processed to minimize the impact of this noise and
maximize precision? In the context
of estimation, however, we will see that it much more critical to understand
how to combine the immediately relevant data (e.g., from the touch receptors in
our fingertips) with prior expectations (e.g., about the weight of
objects). We�ll even
see examples where the same incoming data should be processed in qualitatively
different ways depending on the nature of our expectations. This is true for problems that we
(humans) solve at the level of neural computation, and also for problems that
are solved at the level of biochemical circuits in single cells, although the
latter case has been less well explored.
To the usual questions about how different processing strategies are
implemented in biological hardware, we thus have to add questions about how
organisms acquire and represent their prior expectations. This leads us from thinking about
estimation to aspects of adaptation, learning and evolution. Once again we�ll try to address all of
these questions across a broad range of biological systems, hopefully with
input from the students about their own favorite systems.
Part
Four: Transmitting information (25 Nov, 2 & 11 Dec)
Beyond decisions or estimates, we sometimes speak
more vaguely of �information� in biological systems. Thus, the streams of
action potentials along the axons of the optic nerve provide information about
the visual world, and cells in a developing embryo acquire positional
information that determines their fate in the spatial structure of the
organism. Perhaps surprisingly,
these vague and abstract words about �information� can be made precise. In 1948, Shannon proved that there is
only one way to measure information quantitatively if we want this measure to
obey some simple and plausible conditions. Remarkably, this information measure
is essentially the same as the entropy that we know from thermodynamics and
statistical mechanics. Further, entropy also answers the practical question of
how much space we need to use in writing down a description of the signals or
states that we observe. This
leads to a notion of efficiency, much like the more prosaic ideas of efficiency
in thermodynamics.
In this section of the course, we�ll look at the
foundations of Shannon�s information theory, discuss the connection between
these abstract ideas and more familiar measures of utility or fitness in a
biological context, and then see how information transmission can actually be measured
in particular biological systems ranging from bacteria to brains. The transmission of information
is limited, as with decisions and estimates, by fundamental physical
constraints, and we will test the idea that biological systems operate near
these limits, squeezing as much relevant information as they can out of limited
resources. More deeply, we will explore the idea that aspects of the functional
mechanisms in biological systems can be predicted from the need to optimize
information transmission in this sense.