|Image: portrait of Thomas Bayes, public domain|
Essentially, we can now use any valid equation in probability theory as a rule of logic. We saw an useful example in the last post: P(C|A) = P(C|BA)P(B|A) + P(C|~BA)P(~B|A). This captures the intuitive idea that if A is likely to lead to B, and B is likely to lead to C, then A is likely to lead to C. But it also does more - it tells us precisely how to calculate the probability for our conclusion, while simultaneously sharpening, guiding, and correcting our thinking. In particular, it tells us that in the second step, it's not enough that B is likely to lead to C, instead requiring that BA is likely to lead to C. (Incidentally, this is why a blind person is not likely to get traffic tickets, even though a blind person is likely to be a bad driver, and bad drivers are likely to get tickets.)
In this post, I will introduce another such equation in probability theory:
P(A|B) = P(B|A)/P(B) * P(A)
This is Bayes' Theorem. Named after Reverend Thomas Bayes, this unassuming little equation, which can be derived immediately from the definition that P(A|B) = P(AB)/P(B), is so important that its use is nearly synonymous with Bayesian logic, and its interpretation is the logical basis for the scientific method. At its heart, this equation tells you how to update your beliefs based on the evidence. To see how that works, set A = "hypothesis", and B = "observation" in the formula. The equation then becomes:
P(hypothesis|observation) = P(observation|hypothesis)/P(observation) * P(hypothesis)
Each factor can then be translated into words as:
P(hypothesis): probability that the hypothesis is true, before considering the observation.
P(hypothesis|observation): probability that the hypothesis is true, after considering the observation.
P(observation|hypothesis): probability for the observation, as predicted by the hypothesis.
P(observation): probability for the observation, averaged over the predictions from every hypothesis.
This equation tells us how we should update our opinion on a hypothesis after we make a relevant observation. That is, it tells us how to go from P(hypothesis) to P(hypothesis|observation). It says that a hypothesis becomes more likely to be true if it's able to predict an observation better than the "average" hypothesis: the bigger the ratio of P(observation|hypothesis)/P(observation), the more likely the hypothesis becomes. Conversely it becomes less likely to be true if it could not beat the "average" hypothesis in its predictions. In short, it says that an observation counts as evidence for the hypothesis that better predicted it. We already intuitively knew this to be true - but Bayes' theorem states it in a mathematically rigorous fashion, and allows us to put firm numbers to some of these factors.
Let's consider an example: Alice and Bob go out on a date. Bob liked Alice and wants to ask her to a second date, but he's not sure how she'll respond. So he hypothesizes two possible outcomes: Alice will say "yes" to a second date, or she will say "no". Based on all the information he has - how Alice acted before and during the date, how they communicated afterwards, etc. - he thinks that there's a 50-50 chance between Alice saying "yes" or "no". That is to say:
P(Alice will say "yes") = P(Alice will say "no") = 0.5
For the sake of simplicity, we will not consider other possibilities, such as Alice saying some form of "maybe". These two "yes" and "no" will serve as our complete set of possible hypotheses.
While Bob is agonizing over this second date, he runs into Carol, who is a mutual friend to both Alice and Bob. She tells Bob, "Alice absolutely loved it last night! She can't wait to go out with you again!". Carol's affirmation serves as evidence that Alice will say "yes" to a second date. We already knew this intuitively: Carol's affirmation is obviously good news for Bob. But Bayes' theorem allows us to calculate the probability explicitly from some starting probabilities. To see this, we need to evaluate two probability values: P(Carol's affirmation|Alice will say "yes"), and P(Carol's affirmation|Alice will say "no").
What value should we assign to P(Carol's affirmation|Alice will say "yes")? That is, if Alice would say "yes" to a second date, what is the probability that Carol would have given Bob her affirmation? Not particularly high - After all, Carol could have simply forgotten to mention Alice's reaction, or Alice and Carol might not have had a chance to discuss the first date, or Alice could have had a terrible time, but she might still give Bob a second chance. All these are ways that the "yes" hypothesis might not lead to Carol's affirmation. Taking these things into account, let's say that P(Carol's affirmation|Alice will say "yes") = 0.2.
What about P(Carol's affirmation|Alice will say "no")? This is the probability that Carol would still communicate her affirmation to Bob, even though Alice would say "no" to a second date. Now, it could be that Alice hated her first date with Bob, but Carol deliberately lied to him. Or maybe Carol simply wanted to encourage Bob even though she didn't really know how Alice felt. Or Alice did really enjoy her time with Bob, but she'll be suddenly struck by amnesia before Bob asks her out again. But assuming that Alice and Carol are honest people, and that nothing particularly strange happens, it's very unlikely that Carol gives Bob her affirmation if Alice is going to say "no". So let's say that P(Carol's affirmation|Alice will say "no") = 0.02
Now, what about P(Carol's affirmation)? This is the last factor we need to apply Bayes' theorem. This is the probability that Carol gives Bob her affirmation, averaged over both the "yes" and "no" hypotheses. Since there's a 50-50 chance that Alice will say "yes" or "no", this is simply the average of the two probabilities mentioned above: 0.5*0.2 + 0.5*0.02 = 0.11. This step can get complicated, but because of the 50-50 chance for our two hypotheses, it is mercifully short in this simple example. So P(Carol's affirmation)=0.11.
This now gives Bob enough information to compute P(Alice will say "yes"|Carol's affirmation). That is, given that Carol told Bob that Alice wants to go out again, what is the probability that Alice will answer "yes" to a second date? According to Bayes' theorem:
P(Alice will say "yes"|Carol's affirmation) =
P(Carol's affirmation|Alice will say "yes")/P(Carol's affirmation) * P(Alice will say "yes") =
0.2/0.11 * 0.5 = 0.909090... = 10/11
Carol's affirmation, upon considering it as evidence, has pulled the probability from 50% to 91%. That is, if Bob thought before that there was only a 50% chance that Alice will agree to a second date, he should now think that there is a 91% chance. That is what evidence does: it pulls the probability for a hypothesis in one direction or another. A strong piece of evidence might pull it all the way from 0.1% to 99.9%, whereas a weak piece of evidence might only pull it from 50% to 60%. An opposing piece of evidence will pull the probability in the other direction, as in a tug-of-war. This is why we commonly speak of "weighing" the evidence. This exemplifies how Bayesian reasoning corresponds to the common sense we use in everyday life, except that it's mathematically precise.
Note that Bayes' theorem, as with all Bayesian reasoning, compels you to accept its conclusions: you cannot simply say "I don't buy this argument" or "I don't find this convincing". If you accept its premises, you must accept it conclusion: otherwise you're violating the rules of mathematical logic.
But where do the premises come from? How did I assign, for example, that P(Alice will say "yes")=0.5, or P(Carol's affirmation|Alice will say "no")=0.02? Well, for the sake of this problem, I just made up some reasonable values. In real life, computing these values would be far more difficult than the example problem itself. For instance, to calculate P(Alice will say "yes"), Bob would have to consider all the relevant background information he has. This would include how Alice interacted with him during the first date, his knowledge of human mating behaviors (it's a good sign if she laughs at your jokes, it's bad if she calls the cops on you, etc), and any other relevant information. Based on this total information, he would calculate how often a woman like Alice would agree to a second date, and that would be his P(Alice will say "yes"). That's why this probability can be thought of as a personal, subjective degree of belief: because nobody else has the exact set of background information that Bob has.
What about calculating P(Carol's affirmation|Alice will say "no")? This is the probability that Carol will convey Alice's approval to Bob, even though Alice will say "no" to the second date. This number might be obtained through some sociological studies, by asking questions like "How often do women tell their friends that they enjoyed a date even if they didn't?" The nature of the relationship between Alice, Bob, and Carol also needs to be taken into account, along with their personalities. Are Alice and Carol very close friends? Is Carol generally reliable, or is she prone to hyperbole? The value of P(Carol's affirmation|Alice will say "no") is all this information condensed into a single number.
You may be disappointed that these probabilities are not simple to calculate. This is often the case in real-life scenarios. It turns out that humans and human relationships cannot be reduced down to a simple calculation, even with Bayes' theorem. Real life is complicated: this should not surprise anyone. Often, the relevant starting probabilities can only be guessed at from intuition. Being able to do that well is a large part of what it means to be a reasonable, logical person in the real world.
So those are the strengths and weaknesses of Bayes' theorem. On the one hand, it provides a firm, computationally exact way of updating your beliefs based on the evidence. On the other hand, the probabilities needed to perform the calculations can be difficult or impossible to assign. In particular, the assignment of prior probabilities - The degree of belief in the hypothesis before considering the observation - is a famously contentious issue within Bayesian reasoning, and there is no way to assign these numbers that's been established as being correct. This was the value of P(Alice will say "yes") in our example above, and I have described it as a personal, subjective probability based on the unique set of background information that a person has. This gives us a ballpark number that we can immediately use, but its imprecise and subjective nature, combined with the human capacity for self-deception, is a cause for concern.
Is Bayesian reasoning still useful in light of these weaknesses? Definitely. It is still far more applicable than propositional logic, and it still tells us, in a mathematically precise way, how to logically incorporate evidence into your beliefs. And often, when there is enough evidence, the specific values of these questionable probabilities turn out to be irrelevant. This is why we often look for overwhelming evidence, beyond any reasonable doubt, before we decide to take action based on a hypothesis. So while Bayes' theorem cannot tell us everything (nothing in this world can), it is a very useful tool for sharpening our thinking and processing evidence to update our beliefs.
In my next post, I will re-cast Bayes' theorem into a different mathematical equation - the odds form - which eases some of the difficulties of Bayesian reasoning. I will use this new form to discuss more examples in Bayesian reasoning.
You may next want to read:
Basic Bayesian reasoning: a better way to think (Part 3) (Next post of this series)
Why are there so few Christians among scientists? (part 1)
How to make a fractal: version 2.0
Another post, from the table of contents