Probability is the study of randomness. It has a mathematical aspect and a philosophical aspect. The mathematical aspect, described in Chapter 9, "Probability: Axioms and Fundaments," starts with a collection of axioms (assumptions) and derives consequences that are true for anything that satisfies the axioms. The philosophical aspect, described in this chapter, connects the mathematical theory with the world--it says what "probability" means when we make statements like "the probability that a fair coin lands heads is 50%." The current chapter also reviews some of the mathematical concepts required for Chapter 9, in particular, set theory. Set theory provides a foundation for all of mathematics. The language of probability is much the same as the language of set theory. Logical statements can be interpreted as statements about sets. This chapter also reviews elementary logic and its connection to set theory.
What does random mean? In ordinary speech, we use random to denote things that are unpredictable. In this course, we distinguish between random and haphazard. Events that are random are not perfectly predictable, but they have long-term regularities that we can describe and quantify using probability. In contrast, haphazard events do not necessarily have long-term regularities. Moreover, we limit the term probability to events for which we can specify all possible outcomes.
How a fair coin lands when it is tossed vigorously is a canonical example of a random event. One cannot predict perfectly whether the coin will land heads or tails; . however, in repeated tosses, the fraction of times the coin lands heads will tend to settle down to a limit of 50%. The outcome of an individual toss is not perfectly predictable, but the long-term average behavior is predictable. Thus it is reasonable to consider the outcome of tossing a fair coin to be random.
What does it mean to say that a fair coin toss has a 50% chance of landing heads? There are several standard answers, called Theories of Probability.
Theories of Probability assign meaning to statements like "the probability that A occurs is p%." Theories of probability connect the mathematics of probability to the real world. As you might expect, bridging mathematics to reality is not so easy--the philosophical problems are deep, and it is hard to be consistent without being circular. We shall examine three theories of probability. The one I find most satisfactory is the Frequency Theory, although the others have their places too. In all three theories, probability is on a scale of 0% to 100%. "Probability" and "chance" are synonymous.
This theory of probability is the oldest. It originated in the study of games of chance, such as dice games and card games. In the Theory of Equally Likely Outcomes, probability has to do with symmetries and the indistinguishability of outcomes. If a given experiment or trial has n possible outcomes among which Nature should show no preference, they are equally likely. The probability of each outcome is then 100%/n. The key to applying this theory is to reason from symmetries in the situation or other considerations that no particular outcome should occur in preference to another. For example, if a coin is balanced well, there is no reason for it to land heads in preference to tails when it is tossed vigorously, so according to the Theory of Equally Likely Outcomes, the probability that the coin lands heads is equal to the probability that the coin lands tails, and both are 100%/2 = 50%. (This ignores the essentially impossible outcomes that the coin does not land at all or lands balanced on its edge.) Similarly, if a die is fair (properly balanced) the chance that when it is rolled vigorously it lands with the side with one spot on top (the chance that the die shows one spot) is the same as the chance that it shows two spots or three spots or four spots or five spots or six spots: 100%/6, about 16.7%. In the Theory of Equally Likely Outcomes, probabilities are between 0% and 100%. If an event consists of more than one possible outcome, the chance of the event is the number of ways it can occur, divided by the total number of things that could occur. For example, the chance that a die lands showing an even number of spots is the number of ways it could land showing an even number of spots (3, namely, landing showing 2, 4, or 6 spots), divided by the total number of things that could occur (6, namely, landing showing 1, 2, 3, 4, 5, or 6 spots).
In the Frequency Theory of Probability, probability is the limit of the relative frequency with which an event occurs in repeated (independent--more about this in Chapter 9, Probability: Axioms and Fundaments) trials. Relative frequencies are always between 0% (the event essentially never happens) and 100% (the event essentially always happens). According to the Frequency Theory of Probability, what it means to say that "the probability that A occurs is p%" is that if you repeat the experiment over and over again, independently and under essentially identical conditions, the percentage of the time that A occurs will converge to p. For example, under the Frequency Theory, to say that the chance that a coin lands heads is 50% means that if you toss the coin over and over again, independently, the ratio of the number of times the coin lands heads to the total number of tosses approaches a limiting value of 50% as the number of tosses grows. Because the ratio of heads to tosses is always between 0% and 100%, when the probability exists it must be between 0% and 100%.
In the Subjective Theory of Probability, probability measures the speaker's "degree of belief" that the event will occur, on a scale of 0% (complete disbelief that the event will happen) to 100% (certainty that the event will happen). According to the Subjective Theory, what it means for me to say that "the probability that A occurs is 2/3" is that I believe that A will happen twice as strongly as I believe that A will not happen. The Subjective Theory is particularly useful in assigning meaning to the probability of events that in principle can occur only once. For example, how might one assign meaning to a statement like "there is a 25% chance of an earthquake on the San Andreas fault with magnitude 8 or larger before 2050?" (See Freedman and Stark, 2001, for more discussion of earthquake probabilities.) It is very hard to use either the Theory of Equally Likely Outcomes or the Frequency Theory to make sense of the assertion. Can you think of other examples?
These three theories of probability assign different meanings to the statement "the chance that A occurs is p%." Each theory has situations in which it is most natural, and each theory has shortcomings. This book uses the Frequency Theory primarily.
The theory of Equally Likely Outcomes. Even when there are only a few possible outcomes, it is not always clear whether they should be deemed equally likely. For example, consider tossing two coins at the same time. The possible outcomes could be {two heads, not two heads} or {two heads, one head and one tail, two tails} or {two heads, head on coin 1 and tail on coin 2, tail on coin 1 and head on coin 2, two tails}. The last of these assigns the same probabilities the Frequency Theory does. For instance, if the equally likely outcomes are taken to be {two heads, head on coin 1 and tail on coin 2, tail on coin 1 and head on coin 2, two tails}, both the Theory of Equally Likely Outcomes and the Frequency Theory would say that the chance of two heads is 25%. If one is using probability to bet on games of chance, long-term relative frequencies--which the Frequency Theory contemplates--are perhaps the most important consideration, because they determine how much one wins or loses in the long run. It seems rather artificial to introduce a distinction between two otherwise identical coins in order to make the probabilities calculated using the Theory of Equally Likely Outcomes agree with the probabilities calculated using the Frequency Theory. Perhaps a more serious limitation of the Theory of Equally Likely Outcomes is that many situations do not have natural symmetries to exploit to decide which outcomes are equally likely. For example, what is the chance that a thumbtack lands with its point up when it is tossed vigorously? What is the chance that a die that has been "loaded" (modified to be unbalanced) lands showing one spot? Neither of these problems has a natural symmetry from which to argue that the outcomes are equally likely. Moreover, in many situations there are an infinite number of possible outcomes; dividing 100% by infinity yields zero.
The Frequency Theory. The Frequency Theory requires an assumption about how the world works: The relative frequency with which an event occurs in repeated trials is assumed to converge to a limit. What is a limit? In the case of coin tossing, the theory says that for any positive number a, no matter how small, there is some number M, which can depend on a, such that
| (#heads in n tosses)/n - 50% | < a
whenever the number of tosses n > M. Not all sequences of heads and tails satisfy this assumption. For example, suppose the first toss gives a head. The relative frequency of heads is then 100%. Suppose the next 3 tosses give tails. The relative frequency of heads is then 25%. Suppose the next 100 tosses give heads. The relative frequency of heads is then over 97%. Suppose the next 5000 tosses give tails. The relative frequency of heads is then about 20%. If we continue in this way, with ever longer runs of heads and of tails, the relative frequency of heads never approaches a limit.
The Empirical Law of Averages says this never happens: The world works in such a way that the relative frequency with which a random event occurs in repeated trials always settles down to a limit. This "law" is an assumption about how the world works. It is not a mathematical fact, and it is not an observation because no one can continue tossing coins forever to see whether the relative frequency of heads starts to vary again after, say, 100,000,000,000,000 tosses. The Empirical Law of Averages is essential to the Frequency Theory. The second limitation of the Frequency Theory is that many events to which we might like to assign probabilities are not the outcomes of repeatable experiments. For example, what is the probability that the universe will end in a "big crunch?" What is the probability that my 2010 tax return will be audited? What is the probability that in 2007 more online textbooks than paper textbooks will be sold? What is the probability that the Dow Jones Industrial Average reaches 20,000 before the year 2020? Can you think of other examples?
The Subjective Theory. The principal shortcoming of the Subjective Theory is that colloquially we think of probability as being a property of an event in the external (objective) world, not merely a reflection of our state of mind. When I say "this thumbtack has probability 66% of landing point up when I toss it," you probably think I am talking about the tack, not about my state of mind with respect to the tack. Similarly, under the Subjective Theory, you and I can disagree about the probability of an event and both be correct, which seems unsatisfactory in many scientific settings.
There are a variety of technical difficulties in the Subjective Theory regarding how to measure the probability of an event. One possible resolution is to study the bets you are willing to take. Would you be indifferent between a bet that a coin lands heads and a bet with the same stakes that it lands tails? If so, some theorists would conclude that your subjective probability that the coin lands heads is 50%. Some factors can complicate this approach. For example, even though you know that buying a lottery ticket is almost certainly throwing your money away, you might buy a ticket anyway, reasoning that you would not particularly miss the $1 cost of the ticket, while you would definitely notice winning $20,000,000. In this scenario, the probability of winning is less of an issue than the possibility of winning.
Here is another example: I will bet you $1,000,000 against $500 that there will not be a nuclear bomb dropped on Berkeley, California, by the year 2020. Even if I am confident that a nuclear bomb will be dropped, if it is dropped, I won't have to pay off the lost bet (I live in Berkeley), but if it is not dropped, I could use the $500 you would owe me.
Another problem with the Subjective Theory has to do with scientific method. Some philosophers of science maintain that unless an hypothesis can, in principle, be shown to be false, it is not scientific. An hypothesis that in principle can be disproved is called falsifiable. In the Frequency Theory, one can collect evidence against the statement that "the probability that A occurs is p%" by repeating an experiment over and over and looking at the fraction of times the event A occurs. In the Subjective Theory, evidence against the hypothesis that "the probability that A occurs is p%" is found by psychological testing to see whether the individual making the statement is telling the truth and is internally consistent in his assignments of probability. Running the real-world experiment over would not be relevant.
The following exercises check whether you understand the differences among the Theories of Probability. The examples illustrate using the Fundamental Rule of Counting and the Theories of Probability in more complex settings.
In Lotto, you pick 6 numbers between 1 and 53; 6 numbers are drawn "at random" in such a way that every subset of 6 of the number {1, 2, … , 53} is equally likely. To win the jackpot, all 6 of the numbers drawn must match the numbers you picked. The total number of combinations of 53 numbers taken 6 at a time is 53C6=22,957,480. Only one of those combinations matches your exactly, so the chance of winning the jackpot is 1/(22,957,480) = 0.0000044%.
There is a payoff of $5 if you match 3 of the 6 numbers. To match 3, the drawing has to result in 3 of your 6 numbers, and in 3 of the 47 numbers you did not pick. You can think of this as a sequence of two experiments: drawing 3 numbers from among your 6 (which can result in 6C3 = 20possible outcomes), and drawing 3 number from the 47 remaining numbers that are not among your six (which can result in 47C3 = 16,215 possible outcomes). Because the number of possibilities in the second does not depend on what happened in the first, we can apply the fundamental rule of counting to conclude that the total number of ways of drawing 3 of the 6 numbers you picked, and 3 of the 47 you did not pick, is 6C3×47C3. The chance the drawing matches exactly 3 of your 6 numbers is therefore
6C3×47C3/53C6 = 20×16,215/22,957,480 = 1.41%.
What is the chance of being dealt exactly one pair (two of a kind) in a 5 card hand from a well shuffled deck of cards?
Solution. How many distinct 5-card hands can one form from a deck of 52 cards? In a hand of cards, the order in which you receive the cards does not matter. The number of 5-card hands is the number of ways of picking a set of 5 things from 52 things, which is 52C5 = 52!/(5! 47!). Assuming that the deck is well-shuffled, every 5-card hand is equally likely, so the chance of being dealt exactly one pair is the number of hands that contain exactly one pair, divided by the total number of hands (which we just found).
To specify a hand containing one pair, we need to specify what face the pair is of, the two suits of the cards in the pair, and the three remaining cards. Because we can think of this as a sequence of trials, we can invoke the Fundamental Rule of Counting to find the total number of hands that contain exactly one pair by multiplying the number of possible outcomes of each of these trials:
By the Fundamental Rule of Counting, the total number of distinct 5-card hands that contain exactly one pair is
13×6×220×64 = 1,098,240.
The probability of being dealt one of those hands from a well-shuffled deck is thus
1,098,240/52C5 = 42.3%.
What is the chance of being dealt two pair in a 5-card hand from a well-shuffled deck?
Solution. We need to specify which two faces the pairs are of, the suits of those faces, and the remaining (fifth) card. There are 13C2 ways of choosing the two faces. For each of those choices, there are 4C2 ways of specifying the suits of each of those faces individually (so there are 4C2 × 4C2 ways of specifying the suits of the two cards of the two faces). There are 52 - 4 - 4 = 44 cards in the deck that we could use for the fifth card, without making the hand be a full house, so there are 44C1 = 44 ways of choosing the fifth card. By the fundamental rule of counting, there are
13C2 × 4C2 × 4C2 × 44C1
distinct five-card hands that comprise two pair. If the deck is well shuffled, each of the 52C5 five-card hands is equally likely, so the chance of being dealt two pair is
13C2 × 4C2 × 4C2 × 44C1 / (52C5).
What is the chance of being dealt a royal flush from a well-shuffled deck?
Solution. A royal flush is {10, J, Q, K, A} of the same suit. The only thing unspecified is the suit. There are 4 ways of picking a suit from the four there are (4C1), so there are 4 ways of getting a royal flush. Again, assuming the deck is well-shuffled, the chance of a royal flush is 4/(52C5).
The following exercise checks that you can calculate the chance of a given poker hand.
A random experiment or random trial is basically any situation whose outcome is not perfectly predictable, but for which we can specify all possible outcomes, and that shows long-term regularities. For example, when we toss a coin, we do not know how it will land, but it certainly must land heads, tails, on its edge, or not land at all. There is no other possibility. The set of all possible outcomes of a random experiment is called the outcome space. The letter S will denote outcome space. We are free to choose the outcome space to correspond to what we deem relevant for the experiment, as long as it is essentially inevitable that the random experiment will result in some outcome in the outcome space. For example, the outcome space we just described was {heads, tails, edge, doesn't land}. It might be adequate for our purposes for the outcome space to be {heads, not heads}.
Often we shall tailor outcome spaces for specific problems. Here is an example: Imagine a box containing tickets that are indistinguishable except that each has written upon it a unique number between 1 and the number of tickets, n. We shake the box, draw a ticket from the box without looking, and record the number written on the ticket we happened to draw. The natural outcome space of this experiment is the set of numbers {1, 2, … , n}. However, suppose we are interested only in whether the number on the ticket we draw is even. The outcome space then could be reduced to {even number on ticket, odd number on ticket}, or coded even more abstractly as {0, 1}, where the outcome is the number of even-numbered tickets drawn.
An event is a subset of outcome space: a collection of outcomes in the outcome space. For example, in the experiment of drawing a numbered ticket from the box, suppose there are 10 tickets in all, and that we choose the outcome space to be the numbers {1, 2, 3, … , 9, 10}. Then "we draw the number 1" is the event {1}, and "we draw an even number" is the event {2, 4, 6, 8, 10}, both of which are subsets of the set of possible outcomes (the outcome space).
Two events are said to be disjoint or mutually exclusive if the occurrence of one is incompatible with the occurrence of the other; that is, if they have no outcome in common. This is equivalent to the definition of disjoint sets (reviewed later in this chapter), viewing events as sets. One event implies another if it is a subset of the other. Probability calculations involve working with sets. Chapter 9, "Probability: Axioms and Fundaments," presupposes familiarity with set theory, so we review elementary set theory here.
A set is a collection of things, without regard to their order. There are several simple operations on sets that yield other sets, and there are names for special relationships between sets. This section reviews the definitions and operations, and how to manipulate them. These definitions and operations are part of the branch of Mathematics called Set Theory. Translating word problems into the language of set theory is crucial in solving probability problems. Venn diagrams provide a way to visualize sets and relationships among sets. The following table gives some of the definitions used in Set Theory.
Venn diagrams represent sets and the relationships among sets pictorially. A Venn diagram represents the universe S by a two-dimensional region (usually a rectangle or a circle). Subsets of S are represented by sub-regions. The complement of the set A is represented by everything in the region that represents S that is not in the sub-region that represents A. The overlap of the sub-region that represents the set A with the sub-region that represents the set B represents the intersection AB, and so on. Shading or highlighting is used in Venn diagrams to draw attention to special relationships or subsets. contains a Venn diagram that displays the universe S and two subsets of S, A and B. The figure has check boxes on the right side. Checking a box highlights the corresponding set; the options are as follows:
You can click and drag A and B to change the amount by which they intersect. Scrollbars at the bottom of the figure let you change the sizes of A and B. Usually, Venn diagrams do not have a scale, but because this figure is intended to represent the outcome space S, it represents the probability of an event by the area of the event. Probability is never greater than 100%, so the area of S in the diagram is 100%. The area of the subsets of S represents their probability, so the areas of A and of B are between 0% and 100%; these are denoted P(A) and P(B) at the bottom of the figure. The areas of the events AB and A∪B are listed in the figure as P(AB) and P(A or B), respectively.
You need Java to see this.
The following exercises check your ability to translate word problems into the language of set theory.
This section reviews elementary logic--the calculus of combining statements that can be true or false--and the connection between logic and set theory. The fundamental elements of logic are propositions--statements that can be either true or false. A proposition is like a variable that can take two values, the value "true" and the value "false." Logical operators combine propositions to make other propositions, following rules that are outlined in this section. In this section, we shall use lowercase italic letters like p, q, and r to stand for propositions, the letter T to stand for true, and the letter F to stand for false. We also use the letter T to stand for a proposition that is always true, and the letter F to stand for a proposition that is always false. The logical operators we review are NOT, OR, AND, XOR, IMPLIES, and IFF. We also review some simple identities for logical operators, and the order of operations for evaluating compound propositions.
Suppose we have two propositions, p and q. The propositions are equal or logically equivalent if they always have the same truth value. That is, if p is true whenever q is true, and vice versa, and if p is false whenever q is false, and vice versa. If p and q are logically equivalent, we write p = q.
The simplest logical operation is negation. Negation operates on a single proposition. The logical negation of the proposition p, is NOT p. This is sometimes called the inverse of p. If p is a proposition, so is NOT p: NOT p is true when p is false, and NOT p is false when p is true. Another way to state this relation is (NOT T) = F, and (NOT F) = T. Logical negation is like a negative sign in arithmetic (a negative sign, not a minus sign, which operates on a pair of numbers),
The logical operation OR is an operation on two propositions that results in another proposition: the proposition (p OR q) is true if p is true or if q is true (or both). We can say what OR does by saying what value it takes for each of the the four combinations of true and false its arguments can take:
(T OR T) = T, (T OR F) = T, (F OR T) = T, (F OR F) = F.
This can be summarized in a truth table, which displays in tabular form the value of (p OR q) for each combination of values of p and q:
The margins of the table show the truth values of p and q individually; the body of the table gives the corresponding truth values of (p OR q). For example, the entry corresponding to p being true and q being true is T, because (T OR T) = T. The logical operator OR is analogous to addition in arithmetic.
The proposition (p AND q) is true if both p is true and q is true; it is false if either p is false or q is false (or both): (T AND T) = T, (T AND F) = F, (F AND T) = F, (F AND F) = F. Here is the corresponding truth table:
The logical operator AND is analogous to multiplication in arithmetic. All the remaining logical operations can be defined in terms of NOT, OR, and AND.
For example, another operation on a pair of propositions is exclusive or, written XOR. The proposition (p XOR q) is true if exactly one of the propositions is true; otherwise, it is false. Thus (T XOR T) = F, (T XOR F) = T, (F XOR T) = T, and (F XOR F) = F. The proposition (p XOR q) is logically equivalent to the proposition
( (p OR q) AND NOT (p AND q) ),
Here is the truth table for (p XOR q):
The following identities are useful:
The last identity says that both (p AND p) and (p OR p) are logically equivalent to p: If p is true, so is (p AND p) and so is (p OR p). If p is false, so is (p AND p) and so is (p OR p).
The operation of negation takes precedence over all other operations, so, for example, (p OR NOT q) is interpreted as (p OR (NOT q)).
The proposition (p IMPLIES q), also written (IF p THEN q) and (p -->q), is true if p is false, if q is true, or both. The proposition (p IMPLIES q) is logically equivalent to ( (NOT p) OR q). Here is the truth table for (p IMPLIES q):
In logic, the proposition (p IMPLIES q) is always true if p is false, which some people find counter-intuitive. In fact, that (F IMPLIES T) and (F IMPLIES F) are both true is a matter of definition, but it does not disagree with common usage: Think of (p IMPLIES q) as the assertion (IF p THEN q), that is, "if p is true, then q is also true." This assertion says nothing about the truth of q when p is false, only that if p is true, q must also be true. Therefore, when p is false, the assertion cannot be incorrect. If p is true, q must also be true, or the assertion is incorrect.
Finally, the proposition (p IF AND ONLY IF q), also written (p IFF q) and (p <-->q), is true if both p and q are true, or if both p and q are false; otherwise, the proposition is false. That is, (p IFF q) is logically equivalent to
( (p AND q) OR ( (NOT p) AND (NOT q) ) ).
p IFF q is also equivalent to
( (p IMPLIES q) AND (q IMPLIES p) ).
Thus (T IFF T) = T, (T IFF F) = F, (F IFF T) = F, and (F IFF F) = T. Here is the truth table for (p IFF q):
Recall that two propositions are equal (or logically equivalent), p = q, if they always have the same value, that is, if (p IFF q) is always true. Here are some useful identities that combine NOT with AND and OR:
The logical operations AND and OR behave much like multiplication and addition, respectively. They are associative, distributive, and commute with themselves (but not each other). Here are the associative relations: if p, q, and r are propositions, then both of the following are true:
These are much like the arithmetic identities (a×b)×c = a×(b×c) = a×b×c and (a+b)+c = a+(b+c) = a+b+c. AND and OR also commute with themselves (but not with each other) as follows:
Those relations are like the arithmetic identities a×b = b×a and a+b = b+a. Moreover, AND and OR satisfy distributive relationships:
Those relationships are like the arithmetic identity a×(b+c) = a×b + a×c.
The converse of the proposition (p IMPLIES q) is the proposition (q IMPLIES p). The contrapositive of the proposition (p IMPLIES q) is ( (NOT q) IMPLIES (NOT p) ). The proposition (p IMPLIES q) is logically equivalent to its contrapositive, which we can prove as follows, using the identities above:
(p IMPLIES q) = ( (NOT p) OR q )
= ( (NOT p) OR (NOT (NOT q)) )
= ( NOT (NOT q) OR (NOT p) )
= ( (NOT q) IMPLIES (NOT p) ).
There are at least two strategies to find a truth table for complicated combinations of propositions: simply plug in all combinations of values of true and false for the propositions it is built from, or try to simplify the proposition using the identities presented previously.
The following exercises test your ability to find truth tables for compound propositions built from the propositions p and q.
There is an intimate connection between logical operations and set operations: Every logical operation can be represented as an operation on sets by thinking of propositions as subsets of the outcome space S. The subset corresponding to p is the collection of outcomes for which p is true. Suppose we have two propositions, p and q. Let P be the subset of S corresponding to p, and let Q be the subset of S corresponding to q. The subset corresponding to (NOT p) is the complement of the subset corresponding to p, Pc. The subset corresponding to the proposition (p OR q) is the union of the set corresponding to p and the set corresponding to q, (P ∪ Q). The subset of S corresponding to the proposition (p AND q) is the intersection of the set corresponding to p and the set corresponding to q, PQ. The subset of S corresponding to (p XOR q) is (PQc ∪ PcQ). The set corresponding to the proposition (p IMPLIES q) is (Pc ∪ Q). If P is a subset of Q, then
(Pc ∪ Q) = (Qc ∪ Q) = S,
because then Pc contains Qc. Thus if P is a subset of Q, (p IMPLIES q) is always true. The set corresponding to the proposition (p IFF q) is (PQ ∪ (PcQc)). If P = Q, then
(PQ ∪ PcQc) = (P ∪ Pc) = S,
so in that case, (p IFF q) is always true.
Theories of Probability assign meaning to probability statements about the world. The Theory of Equally Likely Outcomes says that if an experiment must result in one of n outcomes, and there is no reason Nature should prefer one of the outcomes to another, then the probability of each outcome is 1/n×100%. The Frequency Theory says that the probability of an event is the limit of the relative frequency with which the event occurs in repeated trials under essentially identical conditions. The Subjective Theory says that probability is a measure of strength of belief on a scale of 0 to 100%.
Outcome space S is the set of all things that could possibly happen in an experiment. The empty set is {}. Events are subsets of S. Set operations turn sets into other sets. The complement of a set A, Ac, is the set of elements of S that are not in A. The set A is a subset of the set B if every element of A is also an element of B. The intersection of the set A and the set B, AB, is the set of things that are in both A and B. The union of the set A and the set B, A∪B, contains every element of A, every element of B, and nothing else. A and B are disjoint or mutually exclusive if they have no elements in common, that is, if AB = {}. The collection {A1, A2, A3, … } exhausts A if every element of A is in at least one of the sets in the collection; that is, if A = A1 ∪ A2 ∪ A3∪ … . The collection {A1, A2, A3, … } is disjoint if AiAj = {} when i ≠ j. The collection {A1, A2, A3, … } partitions A if every element of A is in exactly one of the sets in the collection; that is, if the collection is disjoint and exhausts A.
Set theory is tied closely to logic. A proposition p is a statement that can be true (T) or false (F). Logical operations turn propositions into other propositions; examples include NOT, OR, XOR, AND, IMPLIES, IFF. They operate as shown in the following table:
The logical operations satisfy associative, commutative, and distributive laws. Logical propositions can be thought of as events: The proposition is true if and only if the event occurs. Then logical NOT becomes the set complement, logical AND becomes the set intersection, logical OR becomes the set union, and the rest of the associations follow from these three. Set theory forms the basis of the mathematical study of probability, which we begin in Chapter 9, "Probability: Axioms and Fundaments."