Signals, noise and information

Signals, noise and information

(A course for PhD students at Rockefeller University, Fall 2008)

(Last update 30 November 2008)

Professor William Bialek (wbialek@rockefeller.edu)

Teaching Assistant: Stefano Di Talia (ditalis@mail.rockefeller.edu)

Much of biological function is about the flow and processing of information. Examples range from bacteria to brains, with many stops in between. All of these many different biological systems, however, are constrained by common physical principles. For example, using only a limited number of molecules to transmit signals means that cells will experience some irreducible noise related to the random behavior of the individual molecules. There is a long history of experiments on signals and noise in biological systems, and in recent years these experiments have expanded to encompass a wider variety of biological processes, including the regulation of gene expression. Remarkably, many biological systems seem to operate near the relevant physical limits to the their signaling performance, and this may give us a glimpse of the �design principles� which select the structure of these systems from the wide range of possibilities. In this course we'll explore the underlying physical principles, the associated mathematical tools, and (in some detail) the connection of these theoretical ideas to quantitative experiments.

Note: This will be taught in the style of physics courses, with regularly assigned problem sets. Problems will range from analytic calculations to small simulations to the analysis of real data. I hope to provide a clear view of rather advanced ideas, which is easier if the students have some background. In the first offering of the course, however, I will try to adjust the mathematical level (including the problems) to match the students' experience. I also hope that students will bring their knowledge of different biological systems to the discussion, stimulating everyone to think about how the theoretical ideas can cut across traditional subdivisions of biology. Please feel free to contact me with any questions, or to discuss your background and interests.

Organizational meeting: Thursday, 4 September, 11:30 AM in Smith Hall Annex Room B29

Lectures: Tuesdays 10 AM �til noon, starting on 7 October, continuing through 2 December; wrapup on Thursday, 11 December. Meetings in Smith Hall Annex Room B29. No lecture on 11 November

Discussion sessions for problem sets: Thursdays 10 AM �til noon, starting on 9 October, continuing through 4 December. Meetings in Smith Hall Annex Room B29.

For the last three weeks, we will focus on information theory (as described below). Problems will be embedded in a set of lecture notes. Here is the first installment, and here is the second; still more to come.

For the sixth problem set, we focus on the ideas of noise and estimation as illustrated by the problem of bacterial chemotaxis. Most of the effort is to understand the nature of dynamics and noise in diffusion, filling in the background to the classical discussion by Berg and Purcell.

The fifth problem set will come in two parts, and carry us through until 18 November. For the first part, try these two problems. For the second part you�ll need data, which you will find here; you should download the file rather than opening it.

For the fourth problem set, I would like you to look at some real data on the responses of rod photoreceptor cells to single photons, and explore for yourself some of the issues about decision making that we discussed in class.

For the third problem set, you should read Hopfield�s original paper about kinetic proofreading, and answer several questions. There will be a discussion session on 16 Oct, as usual; please turn in your answers to the questions on 21 Oct.

Second problem set will be due on 14 Oct, although we may discover that things spill over into the following week. Let�s see how it goes. There will be a discussion session on 9 Oct.

First problem set will be due on 1 Oct. An introduction to MATLAB will be given on Th 11 Sep, 10 AM �til noon in the Computer Center; this is only for students with no previous exposure to MATLAB. A discussion session will be held on Th 25 Sep, 10 AM �til noon in Smith Hall Annex Room B29.

What follows is a description of the topics I would like to cover. As I get to know the students, and get some sense for the right level of the course, I will add links to a fuller outline, hopefully with some lecture notes and references as well. You will also find links to problem sets as they become available.

Part One: The central dogma, revisited (7 & 14 Oct)

Genetic information is encoded by the identity of molecular components (bases) in a polymer (DNA), and the transmission of this information depends upon the selection of complementary components out of a soup of possibilities. We know that, as enzymes catalyze these reactions, there is some rate at which the �wrong� components are chosen; more generally, any reaction mechanism involves some unproductive or incorrect side branches. On a macroscopic scale we can measure the rate at which the wrong reactions occur, but on the scale of single molecules—which is what matters for the life of the cell!—these rates translate into probabilities of error: the probability that the wrong base will be inserted into the DNA or mRNA sequence, that the wrong tRNA is charged with a particular amino acid, or that the wrong amino acid will be incorporated into a protein.

By way of introduction to the course, we will discuss the probabilities of error in these very special biosynthetic reactions, and explore the origins and consequences of these errors. Hopefully this relatively familiar biological context will provide a good setting in which to illustrate more abstract definitions of signals, noise and information. We�ll use the connections between error probabilities and free energies to remind ourselves of some important concepts from physics, and we�ll see how the nominal limits to precision set by thermodynamics can be evaded, but only at the cost of dissipating energy (kinetic proofreading). Once cells implement these active mechanisms for error correction, they have to make choices about how to trade between the energetic cost of accuracy and the functional cost of errors. In important cases, such as protein synthesis, the error rates also depend on other parameters that cells can control, such as the relative abundances of the different tRNA species. These observations suggest that, at a system level, cells may have strategies for optimizing their performance in the face of errors. We�ll try to formulate these strategies and explore how such abstract principles can be tested experimentally.

Part Two: Making decisions (21 & 28 Oct)

As you must have noticed by now, an important part of life is making decisions. Cells make decisions about whether to divide, people make decisions about the identity (or attractiveness) of faces seen across the room, and there are many other examples. In all these cases, organisms face the problem that the available data are not necessarily reliable indicators of the correct decision. Indeed, the signals that can be collected typically are only probabilistically related to the decision, and so there is some chance of error. The problem faced by the organism thus is to process the incoming data in ways that minimize this probability of error, or more generally to minimize the cost associated with these errors.

There is a common mathematical framework for thinking about this wide range of decision problems. This approach focuses our attention on a variety of questions. First, what limits the reliability of the data? Are there fundamental reasons (such as the random behavior of individual molecules) for the �noisiness� of the available signals, or is this noise itself something that the organism could reduce? Next, given a characterization of the signals and noise, what computations should the organism do in order to maximize the reliability of its decisions? How can it combine data from multiple sources, or integrate evidence over time, to increase its reliability? What processing is needed to insure that decisions are invariant to irrelevant variables? Finally, how can these abstract operations be implemented in real biological hardware, either at the level of biochemical circuits within single cells or in the neural circuits of our brains? Can we identify aspects of this circuitry that serve to optimize decision making, perhaps in surprising ways? In this part of the course we�ll address these questions, using examples that span the full range from single cells to cognition.

Part Three: Making estimates (and acting on them) (4 & 18 Nov)

A glass begins to fall from the table, and you reach to catch it. This simple, seemingly reflexive act requires a great deal of computation. You use your visual system to estimate the trajectory of the falling object; you transform this estimate into commands to move your arm and hand into position to intercept the trajectory; once you make contact your somatosensory system provides data about the weight and texture of the object; and you transform these new data into commands for the strength of your grip and the lift force required to carry the glass gently back to the table. At the same time, cells in your immune system are sensing the concentration of various chemicals, estimating the direction in which they should crawl to find the foreign invaders. In each case, estimates need to be accurate or precise in order to allow the organism (or the single cell) to function efficiently, and it seems that we can define a notion of optimal estimation, corresponding to maximally efficient performance.

Parallel to our discussion of decision making, many different estimation problems can be given a unified mathematical formulation, and again even setting up the problem in such mathematical terms the problem raises several basic questions. What is the nature of the noise source that limits the precision of estimation? How can the available data be processed to minimize the impact of this noise and maximize precision? In the context of estimation, however, we will see that it much more critical to understand how to combine the immediately relevant data (e.g., from the touch receptors in our fingertips) with prior expectations (e.g., about the weight of objects). We�ll even see examples where the same incoming data should be processed in qualitatively different ways depending on the nature of our expectations. This is true for problems that we (humans) solve at the level of neural computation, and also for problems that are solved at the level of biochemical circuits in single cells, although the latter case has been less well explored. To the usual questions about how different processing strategies are implemented in biological hardware, we thus have to add questions about how organisms acquire and represent their prior expectations. This leads us from thinking about estimation to aspects of adaptation, learning and evolution. Once again we�ll try to address all of these questions across a broad range of biological systems, hopefully with input from the students about their own favorite systems.

Part Four: Transmitting information (25 Nov, 2 & 11 Dec)

Beyond decisions or estimates, we sometimes speak more vaguely of �information� in biological systems. Thus, the streams of action potentials along the axons of the optic nerve provide information about the visual world, and cells in a developing embryo acquire positional information that determines their fate in the spatial structure of the organism. Perhaps surprisingly, these vague and abstract words about �information� can be made precise. In 1948, Shannon proved that there is only one way to measure information quantitatively if we want this measure to obey some simple and plausible conditions. Remarkably, this information measure is essentially the same as the entropy that we know from thermodynamics and statistical mechanics. Further, entropy also answers the practical question of how much space we need to use in writing down a description of the signals or states that we observe. This leads to a notion of efficiency, much like the more prosaic ideas of efficiency in thermodynamics.

In this section of the course, we�ll look at the foundations of Shannon�s information theory, discuss the connection between these abstract ideas and more familiar measures of utility or fitness in a biological context, and then see how information transmission can actually be measured in particular biological systems ranging from bacteria to brains. The transmission of information is limited, as with decisions and estimates, by fundamental physical constraints, and we will test the idea that biological systems operate near these limits, squeezing as much relevant information as they can out of limited resources. More deeply, we will explore the idea that aspects of the functional mechanisms in biological systems can be predicted from the need to optimize information transmission in this sense.