Control and single-series prediction
A derivation of the mathematical equations of prediction had been accomplished in a limited sense some years before Wiener’s work on cybernetics. In 1931 Wiener had collaborated with an Austrian-born U.S. mathematician, Eberhard Hopf, to solve what is now called the Wiener-Hopf integral equation, an equation that had been suggested in a study of the structure of stars but later recurred in many contexts, including electrical-communication theory, and was seen to involve an extrapolation of continuously distributed numerical values. During World War II, gun- and aircraft-control problems stimulated further research in extrapolation, and Wiener composed a purely mathematical treatise, Extrapolation, Interpolation, and Smoothing of Stationary Time Series, which was published in 1949. As early as 1939, a note on extrapolation by a Russian mathematician, A.N. Kolmogorov, had appeared in the French journal Comptes Rendus. Although the Wiener-Hopf work was concerned exclusively with astronomy and done without the guiding influence of computers, it was recognized in World War II that high-speed computations could involve input information from a moving object and, through prediction or extrapolation, provide output data to correct its path. This recognition was the seed of the concept of the guided missile and radar-controlled aircraft. Weather prediction was also possible, as was computerized research on brain waves whose traces on the electroencephalograph offered another physical realization of the time series that are predictable. The mathematics that was necessary for a complete understanding of prediction included the concept of a stochastic process, as described in the article probability theory.
The Wiener and Kolmogorov research on extrapolation of time series became known as single-series prediction and owed much to the studies (1938) of a Swedish mathematician named Herman Wold, whose work was predicated on the assumption that, if X1, X2, X3, · · ·, are successive values of a series identified with discrete points in time t = 1, t = 2, t = 3, · · ·, then the successive values are not entirely unrelated (for if they were, there would be no way for an algorithm or an automaton to generate information about later members of the sequence—that is, to predict). It was assumed, with anticipation that there is frequently such a thing in nature, that a transformation T relates members of the series by successively transforming an underlying space of points ω according to a rule. The rule states that the kth member of the time series is a function of an initial point ω that has migrated in the underlying space Xk = X(Tkω). It was also assumed that, if sets of points {ω} constituted a region (of sufficient simplicity called “measurable”) in space, then when the set was transformed under the influence of T its volume would not be changed. The last assumption had, in fact, been proved by a French mathematician, Joseph Liouville, a century earlier for a wide class of physical processes whose behaviour is correctly described by the so-called Hamiltonian equations. The clearly stated assumptions of Wiener and Kolmogorov, referred to as the stationarity of the time series, were supplemented with the idea (the linearity restriction) that a solution Sk(ω) for the predicted value of the series, displaced in time k steps into the future, should be restricted to a linear combination of present and past values of the series (see ).
With the help of one other mathematical assumption, it was then possible to solve the single-series prediction problem by specifying an algorithm that would determine the coefficients in the linear combination for Sk(ω), in which k is a positive integer (see ). It was possible also to solve for the error of prediction (see )—that is, a measure of the discrepancy between the value predicted and the true value of the series that would occur at time k in the future. This meant that for a variety of circumstances, such as the prediction of atmospheric pressure measured at one weather station, or the prediction of a single parameter in the position specification of a particle (such as a particle of smoke) moving according to the laws of diffusion, an automaton could be designed that could sense and predict the chance behaviour of a sufficiently simple component of its environment.
Multiple-prediction theory
Generalizations of the above limited accomplishments are tantalizing to mathematicians. If animals, and humans in particular, are viewed, even in part, as automata with varying degrees of accomplishment and success that depend on their abilities to cope with their environment, then human beings could be better understood and their potentialities could be further realized by exploring a generalized version of an automaton’s ability to predict. Success in generalizations of this kind have already been achieved under the heading of what is called multiple-prediction theory. A reference to the problem of multiple prediction without a complete solution was made as early as 1941 by a Russian mathematician, V. Zasuhin. The first major step forward, after Zasuhin, was taken by Wiener in 1955 under the title “On the Factorization of Matrices.” Many significant results soon followed.
If multiple-prediction theory is identified with part of automata theory (which is not always done), it is possible to consider the construction of a computing machine, or automaton, capable of sensing many interdependent elements of its environment at once and, from a long history of such data, of predicting a future that is a function of the same interdependent elements. It is recognized that multiple prediction is the most general approach to the study of the automaton and its environment in the sense that it is a formulation of prediction free of the linearity restriction earlier mentioned with reference to single series (see ). To express a future point Sk(ω), for example, as a linear function of its present and past values as well as first derivatives, or rates of change, of its present and past values is to perform a double prediction or prediction based on the two time series X1, X2, X3, · · · ; X′1, X′2, X′3, · · ·, in which primes indicate derivatives with respect to time. Such double prediction is a first step toward nonlinear prediction.
Automata with unreliable components
In 1956 with the continuing development of faster and more complex computing machines, a realistic study of component misfiring in computers was made. Von Neumann recognized that there was a discrepancy between the theory of automata and the practice of building and operating computing machines because the theory did not take into account the realistic probability of component failure. The number of component parts of a modern all-purpose digital computer was in the mid-20th century already being counted in millions. If a component performing the logical disjunction (A or B) misfired, the total output of a complex operation could be incorrect. The basic problem was then one of probability: whether given a positive number δ and a logical operation to be performed, a corresponding automaton could be constructed from given organs to perform the desired operation and commit an error in the output with probability less than or equal to δ. Affirmative results have been obtained for this problem by mimicking the redundant structure of parallel channels of communication that is frequently found in nature—i.e., rather than having a single line convey a pulse of information, a bundle of lines in parallel are interpreted as conveying a pulse if a sufficient number of members in the bundle do so. Neumann was able to show that with this redundancy technique (multiplexing) “the number of lines deviating from the correctly functioning majorities of their bundles” could with sufficiently high probability be kept below a critical level.
Automata with random elements
The term algorithm has been defined to mean a rule of calculation that guides an intelligent being or a logical mechanism to arrive at numerical or symbolic results. As discussed above under Neural nets and automata, a formalization of the intuitive idea of algorithm has led to what is now called an automaton. Thus, a feature of an automaton is the predictability, or the logical certainty, that the same output would be obtained for successive operations of an automaton that is provided with the same input data. If, as a substitute for the usual input data, random numbers (or results due to chance) are provided, the combination of input data and automaton is no longer completely predictable. It is notable, however, that unpredictable results that might be obtained with the use of uncertain input are not without their practical application. Such a method of combining the operation of a computer with the intentional injection of random data is called the “Monte-Carlo method” of calculation and in certain instances (such as in the numerical integration of functions in many dimensions) has been found to be more efficient in arriving at correct answers than the purely deterministic methods.
Quite apart from the questions of efficiency that might bear upon the addition of an element in a computing machine (automaton) that could produce numbers due to chance, the purely logical question has been asked: Is there anything that can be done by a machine with a random element that cannot be done by a deterministic machine? A number of questions of this type have been investigated, but the first clear enunciation and study of such a question was accomplished in 1956 by the U.S. engineer Claude E. Shannon and others. If the random element in the machine is to produce a sequence of digits 0 and 1 in a random order so that the probability is p for a digit 1 to occur, then (assuming that the number p is, itself, obtainable from a Turing machine as a computable number) the machine can do nothing new, so to speak, as compared to the unmodified Turing machine. This result is precisely expressed in the language of automata theory by saying that the sets enumerated by the automaton with random elements can be enumerated also by the unmodified automaton. The computability of p, however, is critical and is necessary for the result. It is also important to emphasize, in order to distinguish this result from what follows, that the computability of p is under discussion, not the computability of the sequence of random digits.