While information theory has been most helpful in the design of more efficient telecommunication systems, it has also motivated linguistic studies of the relative frequencies of words, the length of words, and the speed of reading.
The best-known formula for studying relative word frequencies was proposed by the American linguist George Zipf in Selected Studies of the Principle of Relative Frequency in Language (1932). Zipf’s Law states that the relative frequency of a word is inversely proportional to its rank. That is, the second most frequent word is used only half as often as the most frequent word, and the 100th most frequent word is used only one hundredth as often as the most frequent word.
Consistent with the encoding ideas discussed earlier, the most frequently used words tend to be the shortest. It is uncertain how much of this phenomenon is due to a “principle of least effort,” but using the shortest sequences for the most common words certainly promotes greater communication efficiency.
Information theory provides a means for measuring redundancy or efficiency of symbolic representation within a given language. For example, if English letters occurred with equal regularity (ignoring the distinction between uppercase and lowercase letters), the expected entropy of an average sample of English text would be log2(26), which is approximately 4.7. The table Relative frequencies of characters in English text shows an entropy of 4.08, which is not really a good value for English because it overstates the probability of combinations such as qa. Scientists have studied sequences of eight characters in English and come up with a figure of about 2.35 for the average entropy of English. Because this is only half the 4.7 value, it is said that English has a relative entropy of 50 percent and a redundancy of 50 percent.
A redundancy of 50 percent means that roughly half the letters in a sentence could be omitted and the message still be reconstructable. The question of redundancy is of great interest to crossword puzzle creators. For example, if redundancy was 0 percent, so that every sequence of characters was a word, then there would be no difficulty in constructing a crossword puzzle because any character sequence the designer wanted to use would be acceptable. As redundancy increases, the difficulty of creating a crossword puzzle also increases. Shannon showed that a redundancy of 50 percent is the upper limit for constructing two-dimensional crossword puzzles and that 33 percent is the upper limit for constructing three-dimensional crossword puzzles.
Shannon also observed that when longer sequences, such as paragraphs, chapters, and whole books, are considered, the entropy decreases and English becomes even more predictable. He considered longer sequences and concluded that the entropy of English is approximately one bit per character. This indicates that in longer text nearly all of the message can be guessed from just a 20 to 25 percent random sample.
Various studies have attempted to come up with an information processing rate for human beings. Some studies have concentrated on the problem of determining a reading rate. Such studies have shown that the reading rate seems to be independent of language—that is, people process about the same number of bits whether they are reading English or Chinese. Note that although Chinese characters require more bits for their representation than English letters—there exist about 10,000 common Chinese characters, compared with 26 English letters—they also contain more information. Thus, on balance, reading rates are comparable.
Algorithmic information theory
In the 1960s the American mathematician Gregory Chaitin, the Russian mathematician Andrey Kolmogorov, and the American engineer Raymond Solomonoff began to formulate and publish an objective measure of the intrinsic complexity of a message. Chaitin, a research scientist at IBM, developed the largest body of work and polished the ideas into a formal theory known as algorithmic information theory (AIT). The algorithmic in AIT comes from defining the complexity of a message as the length of the shortest algorithm, or step-by-step procedure, for its reproduction.
Physiology
Almost as soon as Shannon’s papers on the mathematical theory of communication were published in the 1940s, people began to consider the question of how messages are handled inside human beings. After all, the nervous system is, above all else, a channel for the transmission of information, and the brain is, among other things, an information processing and messaging centre. Because nerve signals generally consist of pulses of electrical energy, the nervous system appears to be an example of discrete communication over a noisy channel. Thus, both physiology and information theory are involved in studying the nervous system.
Many researchers (being human) expected that the human brain would show a tremendous information processing capability. Interestingly enough, when researchers sought to measure information processing capabilities during “intelligent” or “conscious” activities, such as reading or piano playing, they came up with a maximum capability of less than 50 bits per second. For example, a typical reading rate of 300 words per minute works out to about 5 words per second. Assuming an average of 5 characters per word and roughly 2 bits per character yields the aforementioned rate of 50 bits per second. Clearly, the exact number depends on various assumptions and could vary depending on the individual and the task being performed. It is known, however, that the senses gather some 11 million bits per second from the environment.
The table Information transmission rates of the senses shows how much information is processed by each of the five senses. This table immediately directs attention to the problem of determining what is happening to all this data. In other words, the human body sends 11 million bits per second to the brain for processing, yet the conscious mind seems to be able to process only 50 bits per second.
sensory system | bits per second |
---|---|
eyes | 10,000,000 |
skin | 1,000,000 |
ears | 100,000 |
smell | 100,000 |
taste | 1,000 |
It appears that a tremendous amount of compression is taking place if 11 million bits are being reduced to less than 50. Note that the discrepancy between the amount of information being transmitted and the amount of information being processed is so large that any inaccuracy in the measurements is insignificant.
Two more problems suggest themselves when thinking about this immense amount of compression. First is the problem of determining how long it takes to do the compression, and second is the problem of determining where the processing power is found for doing this much compression.
The solution to the first problem is suggested by the approximately half-second delay between the instant that the senses receive a stimulus and the instant that the mind is conscious of a sensation. (To compensate for this delay, the body has a reflex system that can respond in less than one-tenth of second, before the mind is conscious of the stimulus.) This half-second delay seems to be the time required for processing and compressing sensory input.
The solution to the second problem is suggested by the approximately 100 billion cells of the brain, each with connections to thousands of other brain cells. Equipped with this many processors, the brain might be capable of executing as many as 100 billion operations per second, a truly impressive number.
It is often assumed that consciousness is the dominant feature of the brain. The brief observations above suggest a rather different picture. It now appears that the vast majority of processing is accomplished outside conscious notice and that most of the body’s activities take place outside direct conscious control. This suggests that practice and habit are important because they train circuits in the brain to carry out some actions “automatically,” without conscious interference. Even such a “simple” activity as walking is best done without interference from consciousness, which does not have enough information processing capability to keep up with the demands of this task.
The brain also seems to have separate mechanisms for short-term and long-term memory. Based on psychologist George Miller’s paper “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information” (1956), it appears that short-term memory can only store between five and nine pieces of information to which it has been exposed only briefly. Note that this does not mean between five and nine bits, but rather five to nine chunks of information. Obviously, long-term memory has a greater capacity, but it is not clear exactly how the brain stores information or what limits may exist. Some scientists hope that information theory may yet afford further insights into how the brain functions.
Physics
The term entropy was originally introduced by the German physicist Rudolf Clausius in his work on thermodynamics in the 19th century. Clausius invented the word so that it would be as close as possible to the word energy. In certain formulations of statistical mechanics a formula for entropy is derived that looks confusingly similar to the formula for entropy derived by Shannon.
There are various intersections between information theory and thermodynamics. One of Shannon’s key contributions was his analysis of how to handle noise in communication systems. Noise is an inescapable feature of the universe. Much of the noise that occurs in communication systems is a random noise, often called thermal noise, generated by heat in electrical circuits. While thermal noise can be reduced, it can never be completely eliminated. Another source of noise is the homogeneous cosmic background radiation, believed to be a remnant from the creation of the universe. Shannon’s work permits minimal energy costs to be calculated for sending a bit of information through such noise.
Another problem addressed by information theory was dreamed up by the Scottish physicist James Clerk Maxwell in 1871. Maxwell created a “thought experiment” that apparently violates the second law of thermodynamics. This law basically states that all isolated systems, in the absence of an input of energy, relentlessly decay, or tend toward disorder. Maxwell began by postulating two gas-filled vessels at equal temperatures, connected by a valve. (Temperature can be defined as a measure of the average speed of gas molecules, keeping in mind that individual molecules can travel at widely varying speeds.) Maxwell then described a mythical creature, now known as Maxwell’s demon, that is able rapidly to open and close the valve so as to allow only fast-moving molecules to pass in one direction and only slow-moving molecules to pass in the other direction. Alternatively, Maxwell envisioned his demon allowing molecules to pass through in only one direction. In either case, a “hot” and a “cold” vessel or a “full” and “empty” vessel, the apparent result is two vessels that, with no input of energy from an external source, constitute a more orderly isolated system—thus violating the second law of thermodynamics.
Information theory allows one exorcism of Maxwell’s demon to be performed. In particular, it shows that the demon needs information in order to select molecules for the two different vessels but that the transmission of information requires energy. Once the energy requirement for collecting information is included in the calculations, it can be seen that there is no violation of the second law of thermodynamics.
George Markowsky