1 Introduction
1.1 HISTORICAL DEVELOPMENT AND ASPECTS OF PROBABILITY THEORY
By considering the past the future can be cast.
Probability theory was developed for the purpose of predicting the future on the basis of some knowledge of the present and the past. In this text the amount of information known about a system is used to derive its probabilistic properties. Every time one asserts that a particular event has a certain chance of occurring, one is trying to predict the future. Indeed, predicting the future goes way back in history to the earliest civilizations. Ancient rulers had soothsayers and other fortunetellers. National leaders continue the practice to this day. Of course, soothsayers are called pollsters and economists in the 21st century. As will be shown, how much is known about a system will play an important role in the mathematical models that will be used to foretell the future.
Technology is the application of the mathematical models of nature developed by humans. All these mathematical models of nature such as Newton’s classical mechanics, Maxwell’s electromagnetic theory, Einstein’s general theory of relativity, chemistry, quantum mechanics, quantum electrodynamics, etc., are only valid for limited ranges of natural phenomena. To this day, no general field theory has been developed to completely explain all of nature. Perhaps this is not possible. Therefore, no natural phenomenon can be predicted nor can a machine designed with absolute certainty. Only love, beliefs, and, especially, hate can be expressed with absolute certainty.1
For example, a lever, as shown in Figure 1.1, consists of atoms that probably do not instantaneously follow its average motion. This stretches the more or less elastic bonds between atoms and sets the atoms vibrating. The vibration diverts some of the energy applied to tilting the lever. The atoms consist of electrons and nuclei, the nuclei consist of protons and neutrons, etc. Each particle does not instantaneously follow the average motion of the lever. Thus, the classical mechanics model of nature describing the motion of an idealized rigid lever is only an approximation. Because most of these effects are small, the classical mechanics model of nature is quite useful.
In the 14th century, William of Ockham* stated in Latin “Pluralitas non est poneda sine neccesitate,” which translates as “Entities should not be multiplied unnecessarily.” This is known as Occam’s razor. It has generally been interpreted as follows: if several theories are postulated to explain a certain physical phenomenon, the simpler one is probably correct. It is also used to eliminate unnecessary additions to scientific theories that have nothing to do with the physical phenomenon. In this text, too, this principle is used to select the simplest and most correct models.
FIGURE 1.1 A lever consists of atoms that do not instantaneously follow its average motion. This stretches the more or less elastic bonds between atoms. It sets the atoms vibrating, which diverts some of the energy applied to tilting the lever. Thus, the classical mechanics model of nature describing the motion of an idealized rigid lever is only an approximation.
Ancient soothsayers and astrologers tried empirically to deduce correlations between various natural phenomena such as the appearance of certain constellations in the sky, lines on people’s palms, or patterns on tea leaves and various human events. They would notice that whenever a particular constellation appeared in the sky, the king would be victorious in battle. Most of these correlations were, of course, just accidental. Based on these observations, they would counsel the king when to go to war. Perhaps, if they were wrong the king would not survive to come back and chop their heads off. The problem with the ancient fortunetellers was that they used too few events to obtain their correlations. Suppose the king was successful in battle thrice under a certain constellation. The ancient soothsayers would have thought of this as a very good correlation on which future predictions could be based. Actually, the probability of the king being successful in battle thrice under the same heavenly sign is not that small even though the constellations probably did not have anything to do with the king’s fortunes. Indeed, the probability of success is of the order of ⅛. This kind of foretelling survives to this day in the practice of astrology.
For example, a solid contains of the order of 1024 atoms per cm3. The atoms execute small random motions about their equilibrium positions. The internal heat energy of the solid depends on these motions. Because of the very large number of atoms, one is able to predict the internal heat energy of the solid with a large degree of certainty. Thermodynamics, a science that has nothing whatsoever to do with dynamics, is based on the statistics of a very large number of particles. Thermodynamic parameters such as heat energy, pressure, entropy, etc., are average values of the motion of these particles. Other large stochastic systems such as words occurring in a text, the arrangements of nucleotides in DNA molecules, or the values of stocks on the stock market can also be described by various macroscopic parameters. Indeed, a system is often described by various average “macroscopic” parameters rather than by the probabilities of individual events occurring.
FIGURE 1.2 There is a small but finite probability that the thermal motion of most of the atoms in a chair will simultaneous be in the upward direction, causing a chair to jump into the air.
There is an exceedingly small but finite probability that the majority of atoms in a chair may at some instant of time simultaneously move in an upward direction and the chair will spontaneously jump into the air, as shown in Figure 1.2. As discussed in Chapter 9, there is a practical problem in which such a small probability is important in current technology.
The probability of any given event occurring, such as a particular number of people voting for some candidate in an election, is one way of analyzing a set of events. Another way would be to compare the events of some random system that is to be analyzed to the events of a random set whose properties are very familiar. For example, the probability of, say, a candidate receiving exactly 32,768 votes is as random as, say, a set of 15 coins, or the probability of the same candidate receiving exactly 524,288 votes is as random as a set of 19 coins, etc.; that is, the candidate has a randomness of 15 coins to receive 32,768 votes and a randomness of 19 coins to receive 524,288 votes. The randomness values are just as good as the probabilities in describing the probabilistic properties of a system. One can calculate the average value of the randomness associated with each event. It is known as the average randomness, or entropy. This is a very useful quantity, as will be shown in the text.
There is an average randomness associated in every system with a large number of components, be they electrons, molecules, ants, reindeer, or people. For example, it is impossible to reduce the “waste” associated with the average randomness of the activities of people below a certain minimum value in a large organization such as a large corporation or a government. Politicians devote a considerable time to this futile task. This is similar to the activity of alchemists in the 16th century trying to make gold out of lead.
FIGURE 1.3 The mathematical model used to describe the motion of a mechanical clock can, without difficulty, be made to run either forward or backward. Indeed, clocks in barber shops that are meant to be viewed in a mirror run backward. However, the probability that the scent molecules escaping from a perfume bottle will simultaneously assemble back into the bottle is exceedingly small.
It has been experimentally observed that the world is becoming more random as time progresses. The fact that a unique direction of time is observed itself might be a consequence of the increasing randomness of nature. Most mathematical models of physical systems work equally well for time that goes forward as well as for time that runs backward. Indeed, classical mechanics, a mathematical model that describes macroscopic phenomena that are readily observable by humans without the aid of any devices, works equally well for forward-or reverse-progressing time. For example, a mathematical model of a mechanical clock can, without difficulty, be made to run either forward or backward. Indeed, clocks in barber shops that are meant to be viewed in a mirror appear to us to run backward (see Figure 1.3). The real clock, its surroundings, and observers, which consist of a very large number of atomic-scale particles, progress monotonically in time.
However, the development in time of probabilistic systems seem to exhibit an arrow of time. A simple example of this is that perfume molecules escape from a perfume bottle, but are never observed to suddenly assemble and go back into the perfume bottle (see Figure 1.3). Time evolution of probabilistic systems will also be studied in this text. Time-dependent stochastic systems that at any instant of time can have any of a number of different values are known as random processes.
It is interesting to note that information and randomness (or entropy) are described by similar mathematical expressions. This mathematical model will be used, for example, to discuss the information transmission capabilities of languages and analyze the information content in nucleotide strings in genes.
At the time of this writing, the complete genetic information known as the genome of many animals, including humans, has been deciphered. In living organisms the genome information is stored in very large nucleotide strings. This information is now also stored in information storage media developed by humans. It can be stored in printed form, or as a recording on a compact disk, tape, or other storage medium. Perhaps in the distant future when the technology will be available to do this, a 21st century human can be reconstructed from this information.
FIGURE 1.4 Nucleotide in a DNA (deoxyribonucleic acid) molecule.
The genetic code is encoded using just four nucleotide molecules: cytosine, guanine, adenine, and thymine. These are designated with the letters C, G, A, and T. The human genome is estimated to contain about 3 billion pairs of nucleotide molecules that are arranged in 20,000–25,000 genes. The nucleotide pairs are arranged end to end to form the DNA (deoxyribonucleic acid) molecule, which contains all of the genetic information. It is schematically shown in Figure 1.4. This order of the nucleotides spells out the exact instructions required to create a particular organism with its own unique traits. DNA from all organisms is made up of the same chemical and physical components.
Neither the atoms nor the nucleotide molecules in which the information to construct a living being is encoded is living matter. There probably is minimum information required for the encoded information to represent a living being. Thus, the difference between living and nonliving matter is information.
Information transmitted through a noisy channel is also a probabilistic system. The optimum rates of transmitting information through noisy channels will be calculated. A simple example of a transmission system with a noisy channel is illustrated in Figure 1.5.
1.2 DISCUSSION OF THE MATERIAL IN THIS TEXT
In Chapter 2 and the first part of Chapter 3, a small number of basic principles are developed. These are used in the rest of the text to develop various useful examples of probability theory. Two further basic concepts are introduced in Chapter 6 and 7: The concept of macroscopic parameters and average values is introduced in Chapter 6, and the concept of randomness is introduced in Chapter 7. These, too, are used in the subsequent text to develop various useful examples. The discussion of th...