Algorithmic Randomness

Illustration by Marius Watz

Hutter’s work is much too theoretical to visualize, so here’s a pseudo-random composition created by a computer program.

Marcus Hutter is a professor in the Research School of Computer Science at the Australian National University in Canberra. He’s most notable for his work in the field of artificial intelligence theory, which is like philosophy with way more math. Here he explains how algorithmic randomness is essential to the construction of AI.

Algorithmic randomness is generally accepted as the best, or at least the default, notion of randomness. There are several equal definitions of algorithmic randomness, and one is the following: You take all the tests you can perform for randomness on a computer, and if a sequence of data passes all of these tests, then you say the sequence is algorithmically random, or Martin-Löf random. There’s a much simpler definition, and that is if a sequence cannot be compressed by the ultimate best theoretical compressor—the so-called Kolmogorov complexity—then it is Kolmogorov random. The second definition makes intuitive sense, because if you assume that a sequence is compressible, it can be shortened. That means you have extracted some regularity, and hence it can’t be random. And these two definitions are actually equivalent.

The short answer to why randomness is important in artificial intelligence theory is that an intelligent system can distinguish between a useful signal and random noise, and therefore it is important to study randomness, but that’s not the primary reason. If a string is algorithmically random, there is no structure, and there’s nothing in it from which the agent could learn. On the other hand, if the string is compressible, an intelligent system should exploit this structure. So compressing a string is closely related to finding regularities in the string, which means understanding the string and being able to build models of it. And the agent can use these models for predictions—we all know it’s important to predict at least some aspect of the future in order to act intelligently. The process goes from compression to regularity to building models to prediction, and then to intelligent actions.

The theory of algorithmic randomness helps with the problem of induction indirectly via compressibility. Let’s say you have a sequence of weather data and want to predict whether or not it will rain tomorrow. What you usually do is try to build sophisticated climate models based on historic data—that’s all we have, we can observe our world and then build models from it. And this building of models is the inductive step. But how do you build models? People do it by hand, somehow, but how should a computer do it? You can program it to do something similar to what you do, but if you push it to the extreme you end up asking, philosophically or generically, what is model building? And the answer is: Model building is finding short descriptions of your data.

If you want to build a general-purpose AI system, then you need a general-purpose induction step or model-learning step. The idea is to find short programs of the data and use these programs as models for prediction. And that’s completely generic; you don’t have to make any assumptions about your data. Whatever your data is, use a compressor to find a short program, and use this program for making predictions.

One beautiful thing about algorithmic randomness and Solomonoff’s theory of induction is that you don’t have to worry about separating noise from meaningful data. For instance, if you have a sequence that is 10101010… but a few of the bits are distorted due to noise, Solomonoff will predict that the future will also be 10101010… with a little bit of noise, without explicitly separating the noise from the data.

In practice, you have to approximate this theory. At the moment, we haven’t gotten that far, but we have built a single agent that is able to learn from scratch without being informed of what it’s actually supposed to do. It learns by itself to behave properly. It learns just from experience to play tic-tac-toe; if you interface it with Pac-Man it learns to play that; if you interface it with a very simple form of poker it learns to play poker. It isn’t programmed to know the rules of the games; it learns just by playing them and receiving occasional feedback for winning and losing. Hopefully in the next ten years we can go from these toy games to more realistic games or other problems.