Learning, complexity and relevant information

Learning, complexity and relevant information

William Bialek

Papers are in chronological order, most recent papers at the bottom. Numbers refer to a full list of publications.

[60.] Field theories for learning probability distributions. W Bialek, CG Callan & SP Strong, Phys Rev Lett 77, 4693-4697 (1996).

[80.] Occam factors and model-independent Bayesian learning of continuous distributions. I Nemenman & W Bialek, Phys Rev E 65, 026137 (2002).

[69.] The information bottleneck method. N Tishby, FC Pereira, & W Bialek, in Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, B Hajek & RS Sreenivas, eds, pp 368-377 (University of Illinois, 1999).

When Shannon developed information theory he left open the problem of assigning relevance to a signal. Here we showed that if we observe one signal but are interested in another, then the statistical associations between these signals defines what is relevant and one can (selectively) compress the observed signal to “squeeze out” the relevant bits by formulating the efficient representation of relevant information as an optimization principle. Crucially, this formulation does not require any assumptions about what it means for signals to be similar; indeed the various signals need not even live in a metric space. There are deep connections to clustering—especially to the statistical mechanics formulation in which separate clusters emerge through a series of phase transitions—and many different problems from signal processing to learning can be cast into this unified information theoretic framework. We believe that this is a fundamentally new and principled approach to a wide variety of problems of interest both as models of the problems solved by the brain and as practical problems in their own right.

[76.] Predictability, complexity and learning. W Bialek, I Nemenman & N Tishby, Neural Comp 13, 2409-2463 (2001).

[79.] Complexity through nonextensivity. W Bialek, I Nemenman & N Tishby, Physica A 302, 89-99 (2001).

We have reached an understanding of the connections between learning and complexity as unified by the idea of predictive information, which is equivalent to sub–extensive components in the entropy. The results provide a conclusive answer to the long standing problem of how to characterize the complexity of time series, and serve to unify ideas from different areas of physics and computer science. In particular we can classify data streams by their complexity, and if there is something to be learned from the data stream then this classification corresponds to measures for the complexity of the model that can be learned. From a technical point of view it was essential to have a calculable example in the regime where models to be learned cannot be described by a finite number of parameters, and in related work we showed how these nonparametric learning problems could be given a field theoretic formulation [60, 80]. Perhaps the most interesting direction to grow out of this work is the possibility of measuring directly the complexity of models used by humans and other animals as they learn about the world.

[84.] Thinking about the brain. W Bialek, in Physics of Biomolecules and Cells: Les Houches Session LXXV, H Flyvbjerg, F Jülicher, P Ormos, & F David, eds, pp 485-577 (EDP Sciences, Les Ulis; Springer-Verlag, Berlin, 2002).

[83.] Entropy and inference, revisited. I Nemenman, F Shafee, & W Bialek, in Advances in Neural Information Processing 14, TG Dietterich, S Becker & Z Ghahramani, eds, pp 471-478 (MIT Press, Cambridge, 2002).

[101.] Entropy and information in neural spike trains: Progress on the sampling problem. I Nemenman, W Bialek & R de Ruyter van Steveninck, physics/0306063.

[95.] Ambiguous model learning made unambiguous with 1/f priors. GS Atwal & W Bialek, to appear in Advances in Neural Information Processing 16, (MIT Press, Cambridge, 2004).

[96.] Optimal manifold representation of data: An information theoretic perspective. D Chigirev & W Bialek, to appear in Advances in Neural Information Processing 16, (MIT Press, Cambridge, 2004).