Abstract
I will discuss two main topics in this lecture. Firstly, the Universal Distribution and some of its properties: its accuracy, its incomputability, its subjectivity. Secondly, I'm going to tell how to use this distribution to create very intelligent machines. Many years ago-in 1960-I discovered what we now call the Universal Probability Distribution. It is the probability distribution on all possible output strings of a universal computer with random input. It seemed to solve all kinds of prediction problems and resolve serious difficulties in the foundations of Bayesian statistics. Suppose we have a string, x, and we want to know its universal probability with respect to machine M. There will be many inputs to machine M that will give x as output. Say si is the ith such input. If si is of length L(si) bits, the probability that a random binary input would be si is just 2-L(si). To get the probability that x will be produced by any of its programs, we sum the probabilities of all of them to get PM(x), the probability assigned to x by the universal distribution, using machine M as reference: PM(x) = *S2-L(si) (1). It is easy to use this distribution for prediction: if x is a binary string, then the probability that 1 will be the next symbol of x is just PM(x1)/(PM(x0) + P M(x1)). Five years later, in 1965, Kolmogorov, not yet having read my paper, independently discovered 'Kolmogorov Complexity'. The Kolmogorov Complexity of a string of symbols, x, is the length of the shortest program for a reference universal computer that produces x as output. It is closely related to the Universal Distribution. If K is the Kolmogorov Complexity of x then 2-K is an approximation to the probability of x obtained by the universal distribution. This is easy to see, since the shortest program for x will give the most weight of all of the terms in Equation (1).