Friday, August 16, 2013

“If A, then probably C” entails “Probably, if A then C”


The fact that “If A, then probably C” entails “Probably, if A then C” is useful in a lot of philosophical discussions. Take for example this argument:
  1. If atheism is true, then objective morality does not exist.
  2. Objective morality does exist.
  3. Therefore, atheism is false.
In Does Objective Morality Exist If God Does Not Exist? I basically argued that if atheism is in fact true, objective morality probably doesn’t exist, in which case premise (1) is probably true. I thus used an “If A, then probably C” claim to show that an “Probably, if A then C” claim is true.

I would’ve thought “If A, then probably C” entailing “Probably, if A then C” would be uncontroversial even among internet atheists, but in discussing an argument against the possibility of an infinite past, someone I dialogued with on Facebook claimed the following:
If A, then probably C

does not entail

Probably, If A then C.
If you encounter an internet atheist (or anyone else) who disputes that “If A, then probably C” entails “Probably, if A then C” you can point them to this article which features a mathematical proof demonstrating that “If A, then probably C” does indeed entail “Probably, if A then C.” Fortunately the proof requires nothing more difficult than high school (or middle school) mathematics. Don’t worry if your math is a bit rusty; I’ll give a crash course in some basic probability and set theory.

The General Idea

First, an explanation of what “If A, then C” means exactly. The “If A, then C” material conditional says it is not the case that A is true and C is false (this is often good enough for philosophical arguments, since in a true material conditional, when A is true, C is true as well—because a true material conditional prohibits C from being false when A is true).

The general idea is that “Given A, C is probably true” means it is probably not the case that A is true and C is false. While I think the general idea is somewhat intuitively obvious, this article will mathematically prove the general idea to be true.

Mathematical Background

If you’re already savvy in math (particularly with some basic set algebra and probability theory) feel free to skip this section and go straight to the proof. Otherwise I’ll introduce some basic math stuff so that folks who aren’t quite so math savvy can follow along.

Set Operations

To illustrate some set operations, suppose our “universe” consists entirely of natural numbers 1 through 9. Now let A and B be the following:
A = {1, 5, 9}
B = {1, 5, 7, 8}
C = {2, 3}

(element of)
1 ∈ AFor any set S, x ∈ S means that x is an element of S.

(not an element of)
1 ∉ CFor any set S, x ∉ S means that x is not an element of S.

A ∩ B = {1, 5}Given sets S and T, S ∩ T contains all the elements x such that x ∈ S and x ∈ T.

A ∪ B = {1, 5, 7, 8, 9}Given sets S and T, S ∪ T contains all the elements x such that x ∈ S or x ∈ T.

(empty set)
A ∩ C = ∅The empty set is a set that doesn’t contain any members.
(universal set)
A ∪ A’ = ξ
B ∩ ξ = B
ξ is basically “everything” in whatever universe the sets are “talking about,” e.g. if we’re dealing with sets of lowercase alphabets, like {a, e, i, o, u}, the universal set would be the entire lowercase alphabet. Sometimes the universal set is depicted as U or S.
(complement of S)
B’ = {2, 3, 4, 6, 9}The complement of set S, denoted as S or SC or S’ or −S (among other variants), are all the elements x such that x ∉ S and x ∈ ξ.

It should be remembered that the union (∪) is using the “inclusive-or,” and so A ∪ B would include all elements that are in both A and B.

One notable thing is how similar some set operations are to propositional logic:

Set TheoryRough Equivalent in Logic
A ∪ BA ∨ B (“A or B”)
A ∩ BA ∧ B (“A and B”)
A’, −A¬A (“not-A”), alternatively, ~A and −A

So for example, x ∈ (A ∪ B) means that x ∈ A or x ∈ B.

Set Algebra

There are certain equality rules with sets involving stuff like unions and complements. Here’s a sample of some algebraic set rules:

Commutative laws: A ∪ B = B ∪ A  |  A ∩ B = B ∩ A
Identity laws: A ∩ ξ = A  |  A ∪ ∅ = A  |  A ∩ ∅ = ∅
Complement laws: A ∪ A’ = ξ  |  A ∩ A’ = ∅  |  (A’)’ = A
Distributive laws: A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

Probability Symbolism

Probability often uses the language of set theory to symbolize the probabilities of certain events happening. Here a set denotes an event, like getting three or higher when rolling a die, where an event is a set of one or more outcomes. So for example if we let F represent the event of “rolling a four or higher” for a six-sided die, the set of outcomes would look like this:
F = {4, 5, 6}
If we let T be the event of “getting a 3,” T would look like this:
T = {3}
F ∪ T symbolizes all the outcomes that are in F or T, which in this case is rolling a 3 or higher. Pr(F ∪ T) denotes the probability that the outcome will be a member of set F or T. Some basic probability symbolism:

Pr(A) = The probability of A being true; e.g. Pr(A) = 0.5 means “The probability of A being true is 50%.”
Pr(A|B) = The probability of A being true given that B is true. For example:
Pr(I am wet|It is raining) = 0.8
This means “The probability that I am wet given that it is raining is 80%.”
Pr(¬A) = The probability of A being being false (¬A is read as “not-A”); e.g. Pr(¬A) = 0.5 means “The probability of A being false is 50%.”
Pr(B ∪ C) = The probability that B or C (or both) are true.
Pr(B ∩ C) = The probability that B and C are both true.
Pr(A|B ∩ C) = The probability of A given that both B and C are true.

Some alternate forms:

One VersionAlternate Forms
Pr(A) P(A)
Pr(¬A)  Pr(~A), Pr(−A), Pr(AC)
Pr(B ∪ C) Pr(A ∨ B)
Pr(B ∩ C) Pr(B ∧ C), Pr(B&C)

The alternate forms can be combined, e.g. an alternate form of Pr(H|E) is P(H/E).

Probability Rules

In addition to the mathematical symbolism, there are also a number of mathematical rules regarding probability. When events A and B have no outcomes in common, i.e. when A ∩ B =∅, events A and B are set to be mutually exclusive or disjoint. For example, “rolling a two or lower” and “rolling a five or higher” are mutually exclusive events for rolling a six-sided die. Two events are said to be independent of each other if the outcome of one does not affect the outcome of the other, e.g. rolling a 6 the first time and rolling a 5 the second time for a six-sided die. Because I think it makes things clearer in what I’ll do later in this article, I’ll use the symbolism ¬A to denote “not-A” rather than A’. With that in mind:

Rule NameRule
Addition rule:Pr(A ∪ B) = Pr(A) + Pr(B) when A ∩ B = ∅
General addition rule:Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B), regardless of whether A ∩ B = ∅
(note that when A ∩ B = ∅, Pr(A ∩ B) = 0)
Complement rule:Pr(¬A) = 1 − Pr(A)
Multiplication rule:Pr(A ∩ B) = Pr(A) × Pr(B) when A and B are independent
General multiplication rule:Pr(A ∩ B) = Pr(A) × Pr(B|A), regardless of whether A and B are independent
(Pr(B|A) = Pr(B) when A and B are independent)

Notice that because of the general multiplication rule (and a bit of simple algebra), this is also true for any events A and B:
Pr(B|A) = 
Pr(A ∩ B)
And that’s pretty much all the math background you’ll need to follow along.

The Proof

For this to work I’ll break the proof in separate steps. In math and logic, a lemma is a claim that is proved to demonstrate something else later in a proof. For this proof I’ll be using several lemmas.

Recall that the “If A, then C” material conditional means it is not the case that A is true and C is false. Thus the probability that the material conditional is true can be mathematically depicted as this:
Pr(¬(A ∩ ¬C)) = 1 − Pr(A ∩ ¬C)
Which means “The probability of it not being the case that A and ¬C are both true.”

By “If A, then probably C” I mean “Given A, C is probably true,” which in turn means that Pr(C|A) is high. So to show that high Pr(C|A) entails a high Pr(¬(A ∩ ¬C)), I want to prove the following:
Pr(C|A) ≤ 1 − Pr(A ∩ ¬C)
Or equivalently:
Pr(¬(A ∩ ¬C)) ≥ Pr(C|A)
Lemma (1): (A ∩ C) and (A ∩ ¬C) are disjoint (mutually exclusive). We can show that no element in the universe can be a member of both (A ∩ C) and (A ∩ ¬C). Let x be an arbitrary element and let’s suppose x is a member of both (A ∩ C) and (A ∩ ¬C). With a bit of math logic, we show that there can’t be any x such that x ∈ (A ∩ C) and x ∈ (A ∩ ¬C) by assuming there is such an x and deriving an impossibility, like so:
  1. x ∈ (A ∩ C) and x ∈ (A ∩ ¬C)
  2. (x ∈ A and x ∈ C) and (x ∈ A and x ∈ ¬C), from (1) and definition of ∩
  3. x ∈ A and x ∈ C and x ∈ A and x ∈ ¬C, from (2)
  4. x ∈ C and x ∈ ¬C, from (3)
Of course, it’s impossible for there to be an element that is a member of a set and its complement, since (C ∩ ¬C) = ∅. Thus (A ∩ C) and (A ∩ ¬C) are disjoint, i.e. (A ∩ C) ∩ (A ∩ ¬C) = ∅.

With this in mind, let ξ be the universal set.
A ∩ ξ = A
⇔ A ∩ (C ∪ ¬C) = A
⇔ (A ∩ C) ∪ (A ∩ ¬C) = A
Lemma (2): Since (A ∩ C) and (A ∩ ¬C) and are mutually exclusive, by the rules of probability:
Pr(A ∩ C) + Pr(A ∩ ¬C) = Pr(A)
With those two lemmas in mind, consider this statement:
Pr(C|A) = 
Pr(C ∩ A)
Now we swap Pr(A) for Pr(A ∩ C) + Pr(A ∩ ¬C), and this is a legitimate move thanks to the equality proved in lemma (2):
Pr(C|A) = 
Pr(C ∩ A)
Pr(A ∩ C) + Pr(A ∩ ¬C)

⇔  Pr(C|A) = 
Pr(A ∩ C)
Pr(A ∩ C) + Pr(A ∩ ¬C)
Just to make this easier to read, let’s have x represent Pr(A ∩ C) like so:
Pr(C|A) = 
x + Pr(A ∩ ¬C)
Given some value for Pr(A ∩ ¬C), what is the highest Pr(C|A) possible? One hint is this: given some Pr(A ∩ ¬C), when x goes to zero, so does Pr(C|A); a smaller x means a smaller Pr(C|A).[1] So to get the highest Pr(C|A) value given some Pr(A ∩ ¬C), we want x to be as big as possible. Now since this is true:
Pr(A ∩ C) + Pr(A ∩ ¬C) = Pr(A) ≤ 1

    ⇔ Pr(A ∩ C) + Pr(A ∩ ¬C) ≤ 1

    ⇔ Pr(A ∩ C) ≤ 1 − Pr(A ∩ ¬C)
The highest Pr(A ∩ C) (and thus x) can be is 1 − Pr(A ∩ ¬C). So substituting the maximum value for x to obtain an upper limit for Pr(C|A) gives us this:
Pr(C|A) ≤ 
1 − Pr(A ∩ ¬C)
1 − Pr(A ∩ ¬C) + Pr(A ∩ ¬C)

⇔  Pr(C|A) ≤ 
1 − Pr(A ∩ ¬C)
1 + [−Pr(A ∩ ¬C)] + Pr(A ∩ ¬C)

⇔  Pr(C|A) ≤ 
1 − Pr(A ∩ ¬C)
1 + 0

⇔  Pr(C|A) ≤ 
1 − Pr(A ∩ ¬C)

⇔  Pr(C|A) ≤  1 − Pr(A ∩ ¬C)

⇔  Pr(C|A) ≤  Pr(¬(A ∩ ¬C))

⇔  Pr(¬(A ∩ ¬C)) ≥  Pr(C|A)
This means that Pr(¬(A ∩ ¬C)) must be at least as great as Pr(C|A), which means Pr(C|A) being high entails Pr(¬(A ∩ ¬C)) being high. This in turn means “If A, then probably C” entails “Probably, if A then C.”

In response, one could attack the relationship between “If A, then probably C” and Pr(C|A). But of course, Pr(C|A) is the probability of C given A. So Pr(C|A) being high means that given A, C is probably true. “If A, then probably C” is saying that given A, C is probably true. Hence, “If A, then probably C” entails a high Pr(C|A), which entails “Probably, if A then C.”

[1] We can prove this more rigorously with some simple calculus. Since Pr(A ∩ ¬C) is constant, we can replace Pr(A ∩ ¬C) in the equation below with k (to symbolize a constant) and take the derivative:
x + Pr(A ∩ ¬C)
The calculus would thus be this using the quotient rule:
x + k
(x + k)(1) − (x)(1)
(x + k
x + kx
x² + 2k + k²
x² + 2k + k²
Since both x and k symbolize possible probability values, they must each be in the interval [0, 1]. One can see that for all k > 0, all values of x ∈ [0,1] produce a positive value in the derivative above, which means the rate of change is always positive throughout the x ∈ [0,1] interval, which means the largest value in the [0,1] interval will be when x = 1. For when k = 0, the slope will be constant (not changing) for all x > 0. What about when x goes to 0? Obviously we can’t just plug in 0 for x when k = 0, but we can take the limit:
x² + 2(0)² + 0²

 ⇔  lim

We can then use L’Hôpital’s rule a couple times:


 ⇒  lim

 ⇒  lim
   = 0
The slope is thus still constant (not changing) for all x ∈ [0,1] when k = 0, and the maximum value will still be found at x = 1.


  1. There is a much simpler way to do this

    1. Ack! Use unicode man! ;-)

      Reformatting it:

      1) P(A|B) ≤ 1

      2) P(A|B)(1-P(A)) ≤ (1-P(A))

      3) P(A|B) ≤ P(A|B)P(A)+1-P(A)

      4) P(A|B) ≤ P(A|B)P(A)+1-P(A)

      5) P(A|B) ≤ 1-(P(A)-P(A∩B))

      I'm only a math minor, but there seems there might be a problem between (4) and (5). Note that while this is true:

      P(A|B)P(B) = P(A∩B)

      The following is not true in general:

      P(A|B)P(A) = P(A∩B)

      Example: There is a sack containing balls numbered 1, 2, 4, and 6. Let P(A) represent the probability that I have drawn an even numbered ball, which in this case P(A) = 0.75. Let B represent the outcome that I had drawn a ball less than 4. Then P(A|B) represents the probability that I have drawn an even numbered ball given that I had drawn a ball less than 4, and P(A|B) = 0.5. Yet the following equation is not true:

      P(A|B)P(A) = P(A∩B)

      For P(A∩B) = 50% yet P(A|B)P(A) = 37.5%