Basic Bayesian reasoning: a better way to think (Part 4)

Have you read the last several posts? In those posts we began the tale of Alice and Bob, a pair of murder suspects who recently started dating one another. Through their sordid tale, we'll examine Bayesian reasoning, the scientific method, and the so-called fallacy of "affirming the consequent".

Alice and Bob are going through a rough patch in their relationship. One day, Alice accuses Bob of infidelity, and they have this conversation:
Alice:
You spent the night at Carol's house last weekend! You're cheating on me with her! 
Bob:
What?! How do you figure that? I'm innocent! 
Alice:
If you're cheating on me with her, it makes perfect sense that you'd spend the night at her house! 
Bob:
Ha! You're "affirming the consequent". You've started from "if [cheating], then [night at Carol's house]", then concluded that "if [night at Carol's house], then [cheating]". This is a logical fallacy, and your argument is invalid. Cheating on you is not the only possible explanation for me spending the night at Carol's. There are other, perfectly innocent explanations - like the fact that Carol threw a party that ran late, and a bunch of us just crashed at her place for the night rather than risk driving home tired and drunk.
Now, let's pause the conversation here for the moment and assess the situation. So far, Bob's logic follows the example in the Wikipedia page on "affirming the consequent". And he certainly seems right - "affirming the consequent" is a fallacy in propositional logic, and Alice can't necessarily conclude that Bob cheated on her just because he spent a night at Carol's house. So, is Alice committing a logical fallacy? And therefore Bob is innocent? Let's continue and see:
Alice:
That party happened last weekend, when Carol knew I would be out of town! That makes perfect sense if Carol plotted to have you come without me! 
Bob:
That's ridiculous. You're still just "affirming the consequent". You've started from "if [Carol plotting], then [party on that weekend]", then concluded that "if [party on that weekend], then [Carol plotting]". I told you before, this is a logical fallacy. There are many other explanations for why the party might have happened on that weekend. 
Alice:
Whenever I ask Carol whether she's seeing anyone, she avoids the question! 
Bob:
That's more flawed reasoning. Do I have to explain it to you again? There are other possible explanations why Carol avoids that topic with you. That doesn't mean I'm cheating on you with her.
Alice:
But she was always very forthcoming about her dating life before. What would make her so reluctant to talk about it now? 
Bob:
I don't know. I suggest you ask her. Many innocent explanations are possible. You're still "affirming the consequent". You've started with a hypothesis - that I'm cheating on you with Carol - and then produced observations that fit with that hypothesis, then used those observations to justify the original hypothesis. That's like saying "The Bible is true because God wrote it, and it says that God exists. Therefore God exists". That kind of silly, circular reasoning is what happens when you "affirm the consequent", and you're using this logical fallacy over and over again to try to say that I've cheated on you.
Well, Bob certainly seems to be right. Alice can't, and shouldn't, conclude that Bob is cheating on her just because Carol is not talking about her dating life, or because the party happened on a certain weekend. After all, "affirming the consequent" is a logically fallacy, isn't it?
Alice:
Someone at the party saw you go into Carol's bedroom with her. 
Bob:
You're still "affirming the consequent", and it invalidates your conclusion. Your logic is flawed. There are perfectly innocent reasons to go into someone's bedroom. 
Alice:
With two bottles of wine. And you closed the door afterwards. 
Bob:
Still just "affirming the consequent". What, we're not allowed to drink wine at a party? We're not allowed to close the door when the music is loud out in the living room?
Alice:
This was at 10 pm, far before your normal bedtime, far before you're normally tired. 
Bob:
It was a crazy party. It wore me out fast. Why do you continue to "affirm the consequent"? Don't you see that you're still just starting with the idea that I'm cheating on you, then using that idea to interpret the events to justify itself? That's circular reasoning. You're saying that just because these are how things would play out IF I were cheating on you, therefore then I MUST be cheating on you. It's a logical fallacy, like I said many times, and you're just using it repeatedly.
Alice:
And you didn't come out from the bedroom until the next day. 
Bob:
Like I said, we got tired and decided to just sleep off the party rather than risk driving home drunk and exhausted.
Okay... hmm... I mean, "affirming the consequent" is still a logical fallacy, right? It's got a Wikipedia page and everything. How could Alice be right when she's committing this fallacy over and over? I mean... if you heard that your significant other went into a bedroom with someone else, along with two bottles of wine, and stayed behind closed doors until the next day, you wouldn't jump to any conclusions, right? Because you're a logical thinker and you don't want to commit a fallacy?
Alice:
My source from the party also tells me that it looked like you and Carol were making out before you went into her room.
Bob:
You know that eyewitnesses are unreliable. The living room was dark and your "source" was probably drunk as well. Or maybe your "source" is lying to break us up for his or her own ends. There's lots of possibilities, you can't conclude that I cheated on you from this, and you're only still "affirming the consequent" by bringing this up to say that I did.
Alice:
I also have this shopping receipt, dated the day before the party, for things that Carol bought. She purchased scented candles, those wine bottles we mentioned, and "sexy" lingerie. 
Bob:
What Carol does with her money and what lingerie she wears is none of my business. You're still using flawed logic, by starting from the idea that I'm cheating on you, to explain what Carol bought, then using that explanation to justify your initial assumption. 
Alice:
She also bought condoms. 
Bob:
I didn't know, I don't care, and it's not relevant. There are many reasons that Carol would buy condoms that have nothing to do with me cheating on you. 
Alice:
I found condoms of the same brand in Carol's trash dumpster after the party. They were used. 
Bob:
Are you crazy?! That's disgusting! Completely apart from your gross dumpster-diving, that doesn't prove anything. Those could have come from anywhere, thrown away by anyone. You're still trying to make everything fit into your preconceived notion that I've cheated on you. That's "affirming the consequent"! It's a logical fallacy! You're just repeating this fallacy over and over again!
Alice:
I had them tested at the lab. The DNA on the outside is a decently good match with samples from Carol's hair. 
Bob:
DNA matching is imperfect. There's thousands of people in this city that would also be a "good match" with that DNA sample. Even if were an "excellent" match there's still lots of people who would fit that criteria. Any one of them could have used the condom. And even if it WAS Carol, you can't conclude that I've cheated on you with her from just that. That would be "affirming the consequent"! 
Alice:
And the DNA on the inside is an excellent match with you. 
Bob:
LOGICAL FALLACY! Over and over again! Your reasoning is invalid! You're trying to go from "if A then B", to "if B then A"! That's circular logic! It's "affirming the consequent"! I did not cheat on you with Carol!
If you still believe Bob, then I have a bridge to sell you. The weight of evidence is overwhelming at this point. Bob did almost certainly cheat on Alice with Carol.

But what about "affirming the consequent"? Isn't Bob right that it's a logical fallacy? Isn't Alice's argument based entirely on using it over and over? What does Bayesian reasoning say about all this?

Now, Bayesian reasoning mirrors human common sense. It will never lead to a result that "normal" reasoning says is impossible. As I mentioned earlier, you don't actually need formal training to use it in your daily life, because its rules are just the rules of good thinking that's been refined to a mathematical precision. However, because of its precision and power over propositional logic, Bayesian reasoning can sometimes lead to surprising results for someone who's only versed in propositional logic. "Affirming the consequent" is one such result.

In Bayesian logic, "Affirming the consequent" is allowed in a mathematically precise way. You CAN relate "if A then B" to "if B then A". In Bayesian terms, where we assign probability values - P(A), P(B), P(A|B), et cetera - to all statements, "if A then B" can be expressed as P(B|A), and "if B then A" becomes P(A|B). And these two probabilities are directly related to one another, as it is plainly written out in Bayes' theorem:

P(A|B) = P(B|A)  * P(A)/P(B)

Essentially, the two factors grow together. As P(B|A) gets bigger, so does P(A|B). As B becomes better explained by A, A becomes more likely given B. The more strongly the consequences of a hypothesis are affirmed, the more likely the hypothesis is to be true. As more events around Carol's party are explained by Bob cheating on Alice, it becomes more certain that Bob cheated based on these events. So each event - each instance of "affirming the consequent" - actually strengthens the hypothesis that Bob cheated on Alice with Carol. Far from dooming Alice's hypothesis because of its status as a "logical fallacy", it actually serves as evidence for Alice's accusation.

That's right: "affirming the consequent" does not invalidate its conclusion, instead it actually serves as evidence FOR that conclusion.

It is the very fact that Alice used "affirming the consequent" OVER AND OVER that made her case so strong. It's crucial to note that if she had made only one such argument, even if that argument was the one from DNA on the condom, her case would have been weak and she would have been wrong to come to her conclusion. But with each instance of "affirming the consequent" - each time Alice successfully showed that the events around Carol's party fit with Bob cheating on her - her case grew stronger. Therefore, "affirming the consequent" is a "logical fallacy" only insofar as it's not being used enough.

So if you see someone say that Bill Gates must own Fort Knox because he's rich, you can legitimately say that this is flawed reasoning, and call him out on "affirming the consequent". In this case, you'd be using that term as a proper logical fallacy and saying that this person conclusion is invalid. But if this person repeated similar arguments over and over - if he showed that Bill Gates was part of a secret cabal that controlled the U.S. government, and that Gates had regularly been inside Fort Knox, and that there were mysterious changes to his net wealth that matched perfectly with mysterious changes in the amount of gold in Fort Knox, and that a highly ranked government official anonymously said "Bill Gates owns Fort Knox.", then we might be getting somewhere. Each of these things would by itself could be dismissed by citing "affirming the consequent", but together, each instance of "affirming the consequent" counts as evidence, and adds up to a strong case.

So "affirming the consequent" can both serve as evidence and be a mistake. But, in Bayesian terms, how can you tell when it's a mistake? What is the genuine blunder in logic when that happens? As a mistake, "affirming the consequent" is the act of coming to the conclusion without enough evidence. It's coming to the conclusion without affirming enough consequents. Or more properly, it's concluding that P(hypothesis|evidence) is high, when P(evidence|hypothesis) is not yet large enough to compensate for P(hypothesis)/P(evidence). The solution to this issue, in part, is not to stop "affirming the consequent", but to do it more - to look for more evidence.

The reason that propositional logic doesn't, and can't, follow this reasoning is because it cannot distinguish between probability values of 1% or 99%. In propositional logic, a statement can only be true, false, or undecided. But "affirming the consequent" works in Bayesian reasoning by moving the probability value: it perhaps starts at 1% (very unlikely to be true), but then slides to 20% (unlikely to be true), then to 70% (likely to be true), and to 99% (very likely to be true) as you affirm more consequents, over and over. Propositional logic sees this and says "all I recognize are undecided statements", and since 99% is not 100%, it will not let you say that the conclusion is true. This is why "affirming the consequent" is always a logical fallacy in propositional logic. But this really says more about the limits of propositional logic rather than reflecting true rationality.

How do you know when you've affirmed enough consequents? How many times to you have to "affirm the consequent" to be sure of your conclusion? Due to the difficulties associated with using Bayes' theorem in a real-world context, it may be hard or impossible to get actual numbers. But you have to at least walk through the equations to vaguely answer the question.

In particular, when you work through the equation it turns out that the most effective kind of evidence is that which could be affirmed by your hypothesis, but not by a rival hypothesis. "Affirming the consequent" is better than not affirming. Circular reasoning is better than contradictory reasoning. This is the essence of the odds form of Bayes' theorem, which shows the importance of comparing the hypotheses against one another. It has many important applications:

One such application is the scientific method. Bayesian reasoning is the logical framework that underlies the scientific method. Science, in part, relies on "affirming the consequent". Experimental verification of theoretical predictions serves as evidence for that theory. On the flip side, theories are falsified based on experiments as well. Both sides of that statement are together expressed in the odds form of Bayes' theorem. Between two competing theories or hypothesis, "affirming the consequent" is better than not affirming, and circular reasoning is better than contradictory reasoning.

Bayesian reasoning is also at the heart of presuppositional apologetics, which starts with the idea that God of the Bible - who is the basis for all rational thought - exists. It then "affirms the consequent" by verifying that the world does indeed bear the image of its Creator. Rival non-Christian worldviews cannot make the same affirmation, and therefore must borrow from the Christian worldview even in attacking it, thereby contradicting themselves. Of course, its critics have said that this approach is invalid because it "affirms the consequent", but I hope you now know better.

This reasoning is also the logical foundation for my blog here. I start with this fundamental postulate: God as revealed in Jesus Christ. I then "prove" that God exists by demonstrating that this postulate generates the universe - that is, by affirming the consequent.

This Bayesian reasoning is also the logical framework for my series of posts on how science itself - its axioms and long-term traits and properties - serves as strong evidence for Christianity. Because hypothesis should be measured against its rivals, I said that science is evidence for Christianity and against atheism. Of course, its critics accused me of "affirming the consequent" over and over again. By now, you should recognize this is as the mark of a strong argument, one with a great deal of evidence behind it. After all, "affirming the consequent" is a hallmark of science itself.

In all these areas, beware those who only cry "fallacy!", who will not state or test their hypothesis against yours, who only want to tear down arguments instead of building them. They pretend that their ignorance is strength, because they think that knowing nothing means they never have to affirm any consequents. They do not realize that this is actually the mark of profound weakness, and such know-nothing hypothesis can only survive by parasitically attaching itself to more established theories. But you should actively seek to find, build, critique, and refine your hypothesis. Rejecting a hypothesis is never an end in itself, but a step towards a better hypothesis. Remember that the devil comes to steal and to kill and to destroy. But it is God who creates.

We can now conclude by answering the questions I raised at the end of my last post. Yes, Bayesian reasoning allows for "affirming the consequent", and this actually serves as evidence FOR your conclusion. There is still a sense where "affirming the consequent" is a fallacy, which happens when you give a hypothesis too much credit based on a single instance of "affirming the consequent". But this only means that you haven't affirmed enough consequents. To escape this fallacy, you need to affirm more consequents with your hypothesis, while comparing it with its rival hypothesis. "Affirming the consequent" is a fallacy in propositional logic, but that's more indicative of propositional logic's inflexible limits rather than a reflection of actual rationality. In fact, "affirming the consequent" forms half of Bayes' theorem in odds form, which is the logical basis for the scientific method, presuppositional apologetics, and this very blog and the theories I put forth in it.


You may next want to read:
What is "evidence"? What counts as evidence for a certain position?
Science as evidence for Christianity (Summary and Conclusion)
"Proving" God's existence
Another post, from the table of contents

Basic Bayesian reasoning: a better way to think (Part 3)

In my last post, I introduced Bayes' theorem:

P(hypothesis|observation) = P(observation|hypothesis)/P(observation) * P(hypothesis)

Now, this is a powerful equation that tells us how to use observed evidence to update our beliefs about a hypothesis. But as I mentioned, it has two difficulties with its use: first, the probability prior to the observation - P(hypothesis) - is famously difficult to compute in a clear, objective manner, and it changes based on the background information that each person has. For these reasons it's often said to be a personal, subjective probability, reflecting a particular person's degree of belief based on his or her unique set of background information.

And second, things get even worse for P(observation): this is the probability of making the observation, averaged over the complete set of competing hypotheses. Because this is an average over the complete set, we have to know all P(hypothesis) values for every competing hypothesis. But as we said just in the previous paragraph, computing even one of these values is difficult. If that wasn't hard enough, in real-life situations we may not even be able to enumerate the complete set of competing hypotheses. And then, even if we somehow got through all these difficulties, we still have to calculate P(observation|hypothesis) values for each of these hypotheses, which itself is no trivial task, then calculate their average across all the hypotheses. This step often requires more computation than the rest of Bayes' theorem put together, even for well-defined problems with fixed values for all other probabilities.

For these reasons I often like to use Bayes' theorem in odds form: simply write down the equations for two different hypotheses and divide one by the other, and you get:

P(hypothesis A|observation)/P(hypothesis B|observation) =
P(hypothesis A)/P(hypothesis B) * P(observation|hypothesis A)/P(observation|hypothesis B)

This can be summarized as "posterior odds = prior odds * likelihood ratio (of the observation being made from each hypothesis)", where:

P(hypothesis A|observation)/P(hypothesis B|observation) = posterior odds,
P(hypothesis A)/P(hypothesis B) = prior odds,
P(observation|hypothesis A)/P(observation|hypothesis B) = likelihood ratio.

 Let's go through an example: say you're investigating a murder. You think that Alice is twice as likely to be guilty compared to Bob - this is your prior odds. You then observe fingerprints on the murder weapon that are 3000 times more likely to have come from Alice than from Bob - this is the likelihood ratio. You multiply these ratios to calculate your new opinion, the posterior odds: Alice is now 6000 times more likely to be guilty than Bob. Posterior odds is prior odds times likelihood ratio.

This is still Bayes' theorem, just in a different algebraic form. The intuition captured by this equation is the same: an observations counts as evidence towards the hypothesis that better predicts, anticipates, explains, or agrees with that observation. But notice that in this form, P(observation) - which was difficult or impossible to calculate - has been cancelled out. Also, P(hypothesis) - another troublesome number - only appears in a ratio of two competing hypotheses, which I think is a more reasonable way to think of it: it's easier to say how much more likely one hypothesis is than another, instead of assigning absolute probabilities to both of them. In short, this form makes the math easier, and allows you to think of just two hypotheses at a time, rather than having to account for the complete set of competing hypotheses all at once. You don't have to worry about Carol and her fingerprints for the time being in the above murder investigation example.

Let's go through a couple more examples:

Say that your friend claims that he has a trick coin: he says it lands "heads" all the time, rather than the 50% of the time that you'd normally expect. You're somewhat skeptical, and based on his general trustworthiness and the previous similar claims he's made, you only think that there's a 1:4 odds that this is a 100% "heads" coin, versus it being a normal coin. This is your P(always heads)/P(normal), the prior odds.

When you express your skepticism, your friend says, "well then, let me just show you!" and flips the coin. It lands "heads". "See!" says your friend. "I told you it'll always lands heads!" Now, obviously a single flip doesn't prove anything. But it certainly is evidence - not very strong evidence, but some evidence. Since the coin will land "heads" 100% of the time if your friend is right, but only 50% of the time if it's a normal coin, their ratio - the likelihood ratio - is 100%:50%, or 2:1.

Now, according to the odds form of Bayes' theorem, posterior odds is prior odds times likelihood ratio. 1:4 * 2:1  = 1:2, so you should now believe that there's a 1:2 odds that this is a trick coin like your friend claimed, versus it being a normal coin. You're still skeptical of the claim, but you're now less skeptical.

Noting your remaining skepticism, your friend then flips the coin again. "Ha, another heads!" he says as he calls out the result. Now, to calculate your new opinion, simply repeat the calculation above, with the previous answer - the old posterior odds of 1:2 - serving as the new prior odds. The likelihood ratio remains 2:1. Posterior odds is prior odds times likelihood ratio, so our new posterior odds is 1:2*2:1 = 1:1. You should now be completely uncertain as to whether this coin in fact is a trick coin. You say to your friend, "well, you may have something there".

"Okay, fine then." says your friend. "Let's flip this thing ten more times." And behold, it comes up "heads" all ten times. Your posterior odds get multiplied by 2:1 for each of the ten flips, and it's now 1:1 * (2:1)^10 = 1024:1. You should now believe that the chance of this being an "always heads" coin is 1024 times greater than it being a normal coin. If you're willing to consider "normal" and "always heads" as the complete set of competing hypotheses, this would give you over 99.9% certainty that your friend is right that this coin will always land heads.

"Wow, amazing." you tell your friend, as you're now pretty much convinced. "I've never actually seen one of these before", you say, as you idly grab the coin and flip it again, fully expecting it to land "heads" once more. But this time, it lands "tails".

What now? The likelihood ratio for the coin to land "tails" - P(tails|always heads)/P(tails|normal) - is 0%:50%, or 0:1. Our new posterior odds is 1024:1 * 0:1 = 0:1. There is now absolutely no chance that this coin is one that will land heads 100% of the time. But at the same time, it also seems unlikely that it's just a normal coin. given that it landed "heads" 12 times in a row just before this. A new possibility suggests itself: that this coin has something like a 90% chance of landing heads.

This illustrates one of the major advantages of the odds form of Bayes' theorem. Before this, you hadn't even considered that the chance for this coin to land "heads" was anything other than 50% or 100%. All of the other hypotheses - such as the coin landing "heads" 90% or 80% or 20% of the time - you had ignored. And yet, even without considering the complete set of competing hypotheses, you were still able to carry out valid calculations and make statistical inferences, reaching sound conclusions.

You both stare at the coin that landed "tails". You ask your friend, "What just happened?" He replies, "well, the magician I bought it from said that it would always land heads. And it seemed to be working fine up 'til now. Maybe he just meant that it'll land heads most of the time?" Being naturally suspicious, you respond, "Looks like he lied to you then. He probably just sold you a normal coin".  But your friend comes back with, "C'mon, you know that's not fair. Human language doesn't work like that. It's imprecise by its very nature. When someone says 'always' in casual conversation, they don't necessarily mean '100.000000...% of the time' with an infinite number of significant figures. Even 'normal' coins don't land heads exactly 50.000000...% of the time". Struck by your friend's rare moment of lucid articulation, you become temporarily speechless. "Besides", your friend continues, "the magician might have said that the coin 'nearly always lands heads'. I don't remember exactly".

With this new insight, you realize that your had set your priors to the wrong hypotheses at the beginning of the problem. Instead of the hypotheses that the coin to land "heads" exactly 100% of the time, or exactly 50% of the time, you should have set them to 'close to 100% of the time' and 'close to 50% of the time'. Giving the odds of P(close to 100%)/P(close to 50%) = 1:4 as before, and interpreting "close to" as a flat distribution within 2% of the given value, We get that the likelihood ratio for the coin landing "heads" is P(heads|close to 100)/P(heads|normal) = 99%:50% = 1.98:1, and for the coin landing "tails" is P(tails|close to 100)/P(tails|normal) = 1%:50% = 0.02:1. Then the value for the posterior odds after 12 heads and 1 tails is given by prior odds times likelihood ratio, and it is roughly:

1:4 * (1.98:1)^12 * 0.02:1 = 18.15:1

(This is an approximation, made by assuming that the probability distribution can be thought of as being entirely focused at the center of their interval. The actual value, 16.97:1, can be obtained by a straightforward integration over the probability distributions, but that calculation lies beyond the scope of this introductory post.)

So you don't have to abandon the "close to 100%" hypothesis along with the "exactly 100% hypothesis. The odds are still 18:1 in favor of the coin landing "heads" more than 98% of the time, against it being a "normal" coin - enough for you to be reasonably confident in believing as your friend does.

This illustrates again the advantages of using the odds form. Firstly, we again didn't have to consider other probability values for the coin landing "heads", such as 75%. We were still able to come to a reasonable conclusion without having to specify the complete set of competing hypotheses, and their probability distribution. Secondly, we were able to completely switch the class of hypotheses under consideration, without losing consistency. If we had stuck to the original form of Bayes' theorem, then we would have had to specify our prior probabilities for P(heads exactly 100% of the time) and P(heads exactly 50% of the time). To maintain our 1:4 ratio, we would assign them as 20% and 80%, taking up all 100% of our probability, because we were not thinking about other possibilities. But then, upon realizing our mistake, we would have no choice but to contradict our previous priors, and assign P(heads close to 100% of the time) and P(close to 50% of the time) some values, while going back and admitting that the chances of the coin giving exactly 50% or 100% "heads" are nearly zero. This is a problem created entirely by being unaware of the complete set of competing hypotheses.

But with the odds form, we don't have to have complete awareness. All the conclusions that we came to are still perfectly consistent with the data: there is zero chance for the coin to land "heads" exactly 100% of the time, yet it is much more likely that the "heads" probability is close to 100% than it being a normal coin. Our two sets of priors do not contradict each other either: it's quite reasonable for our prior odds to be 1:4 in both cases, because we have not specified how much of the total probability they take up. In general, I feel that it's easier to say how likely two hypotheses are relative to one another, rather than specifying the absolute probability value for a hypothesis.

I hope this convinces you of the virtues of the odds form of Bayes' theorem. This is how I use Bayes' theorem in everyday situations to sharpen my thinking: I didn't know if this one movie was going to be any good (prior odds), but upon its recommendation from a friend (likelihood ratio), I revise my opinion and are now more likely to see it (posterior odds). I didn't know whether Argentina or Germany is more likely to win the World Cup (prior odds), but upon watching Germany slaughter Brazil (likelihood ratio), I now consider Germany more likely than Argentina to win the World Cup (posterior odds). So on and so forth. Posterior odds is prior odds times likelihood ratio.

Let's consider a couple of last examples:

I don't know if Bill Gates owns Fort Knox (prior odds). But I know that he's rich, and he's more likely to be rich if the owns Fort Knox than if he does not (likelihood ratio). Therefore, given that Bill Gates is rich, he's more likely to own Fort Knox (posterior odds).

Does that reasoning sound suspicious? It should. I took it straight from the Wikipedia page on "affirming the consequent", which is a logical fallacy. But the structure of the above argument is correct according to Bayes' theorem. It follows the same structure as all of my other examples. So, has Bayesian reasoning lead to a logical fallacy? Oh no! What shall we do?

Hold that thought, while we consider our last example:

 I don't know whether Einstein's theory of general relativity, or Newton's theory of gravity is correct. (prior odds). But upon considering the experimental evidence of bending of starlight observed during the 1919 solar eclipse (likelihood ratio), I now consider general relativity much more likely to be correct than Newtonian gravity (posterior odds).

You should recognize that as the event that actually "proved" general relativity to the public, and the epitome of the scientific method at work: hypotheses are judged according to their agreement with experimental observations. But this is nothing more than just straightforward Bayesian reasoning, following the same structure as all of my other examples. So, it turns out that Bayesian reasoning underlies the scientific method, by providing the logical framework for it.

What are we to make of these two last examples? Does Bayesian reasoning allow for affirming the consequent? But isn't that a logical fallacy? But doesn't Bayesian reasoning also underlie the scientific method? Does that mean that science follows a logically flawed system? What are we to make of this?

I will address these issues in my next post.


You may next want to read:
Basic Bayesian reasoning: a better way to think (Part 4)
Isn't the universe too big to have humans as its purpose?
What is "evidence"? What counts as evidence for a certain position?
Another post, from the table of contents

Basic Bayesian reasoning: a better way to think (Part 2)

Image: portrait of Thomas Bayes, public domain
In my previous post, I explained that instead of thinking of logical statements as only being "true" or "false", we should assign probability values for their chance of being true. This is the fundamental tenet of Bayesian reasoning. This allows us to employ the entire mathematical field of probability theory in our thinking and expands the rules of logic far beyond their limited forms in propositional logic.

Essentially, we can now use any valid equation in probability theory as a rule of logic. We saw an useful example in the last post: P(C|A) = P(C|BA)P(B|A) + P(C|~BA)P(~B|A). This captures the intuitive idea that if A is likely to lead to B, and B is likely to lead to C, then A is likely to lead to C. But it also does more - it tells us precisely how to calculate the probability for our conclusion, while simultaneously sharpening, guiding, and correcting our thinking. In particular, it tells us that in the second step, it's not enough that B is likely to lead to C, instead requiring that BA is likely to lead to C. (Incidentally, this is why a blind person is not likely to get traffic tickets, even though a blind person is likely to be a bad driver, and bad drivers are likely to get tickets.)

In this post, I will introduce another such equation in probability theory:

P(A|B) = P(B|A)/P(B) * P(A)

This is Bayes' Theorem. Named after Reverend Thomas Bayes, this unassuming little equation, which can be derived immediately from the definition that P(A|B) = P(AB)/P(B), is so important that its use is nearly synonymous with Bayesian logic, and its interpretation is the logical basis for the scientific method. At its heart, this equation tells you how to update your beliefs based on the evidence. To see how that works, set A = "hypothesis", and B = "observation" in the formula. The equation then becomes:

P(hypothesis|observation) = P(observation|hypothesis)/P(observation) * P(hypothesis)

Each factor can then be translated into words as:

P(hypothesis): probability that the hypothesis is true, before considering the observation.
P(hypothesis|observation): probability that the hypothesis is true, after considering the observation.
P(observation|hypothesis): probability for the observation, as predicted by the hypothesis.
P(observation): probability for the observation, averaged over the predictions from every hypothesis.

This equation tells us how we should update our opinion on a hypothesis after we make a relevant observation. That is, it tells us how to go from P(hypothesis) to P(hypothesis|observation). It says that a hypothesis becomes more likely to be true if it's able to predict an observation better than the "average" hypothesis: the bigger the ratio of P(observation|hypothesis)/P(observation), the more likely the hypothesis becomes. Conversely it becomes less likely to be true if it could not beat the "average" hypothesis in its predictions. In short, it says that an observation counts as evidence for the hypothesis that better predicted it. We already intuitively knew this to be true - but Bayes' theorem states it in a mathematically rigorous fashion, and allows us to put firm numbers to some of these factors.

Let's consider an example: Alice and Bob go out on a date. Bob liked Alice and wants to ask her to a second date, but he's not sure how she'll respond. So he hypothesizes two possible outcomes: Alice will say "yes" to a second date, or she will say "no". Based on all the information he has - how Alice acted before and during the date, how they communicated afterwards, etc. - he thinks that there's a 50-50 chance between Alice saying "yes" or "no". That is to say:

P(Alice will say "yes") = P(Alice will say "no") = 0.5

For the sake of simplicity, we will not consider other possibilities, such as Alice saying some form of "maybe". These two "yes" and "no" will serve as our complete set of possible hypotheses.

While Bob is agonizing over this second date, he runs into Carol, who is a mutual friend to both Alice and Bob. She tells Bob, "Alice absolutely loved it last night! She can't wait to go out with you again!". Carol's affirmation serves as evidence that Alice will say "yes" to a second date. We already knew this intuitively: Carol's affirmation is obviously good news for Bob. But Bayes' theorem allows us to calculate the probability explicitly from some starting probabilities. To see this, we need to evaluate two probability values: P(Carol's affirmation|Alice will say "yes"), and P(Carol's affirmation|Alice will say "no").

What value should we assign to P(Carol's affirmation|Alice will say "yes")? That is, if Alice would say "yes" to a second date, what is the probability that Carol would have given Bob her affirmation? Not particularly high - After all, Carol could have simply forgotten to mention Alice's reaction, or Alice and Carol might not have had a chance to discuss the first date, or Alice could have had a terrible time, but she might still give Bob a second chance. All these are ways that the "yes" hypothesis might not lead to Carol's affirmation. Taking these things into account, let's say that P(Carol's affirmation|Alice will say "yes") = 0.2.

What about P(Carol's affirmation|Alice will say "no")? This is the probability that Carol would still communicate her affirmation to Bob, even though Alice would say "no" to a second date. Now, it could be that Alice hated her first date with Bob, but Carol deliberately lied to him. Or maybe Carol simply wanted to encourage Bob even though she didn't really know how Alice felt. Or Alice did really enjoy her time with Bob, but she'll be suddenly struck by amnesia before Bob asks her out again. But assuming that Alice and Carol are honest people, and that nothing particularly strange happens, it's very unlikely that Carol gives Bob her affirmation if Alice is going to say "no". So let's say that P(Carol's affirmation|Alice will say "no") = 0.02

Now, what about P(Carol's affirmation)? This is the last factor we need to apply Bayes' theorem. This is the probability that Carol gives Bob her affirmation, averaged over both the "yes" and "no" hypotheses. Since there's a 50-50 chance that Alice will say "yes" or "no", this is simply the average of the two probabilities mentioned above: 0.5*0.2 + 0.5*0.02 = 0.11. This step can get complicated, but because of the 50-50 chance for our two hypotheses, it is mercifully short in this simple example. So P(Carol's affirmation)=0.11.

This now gives Bob enough information to compute P(Alice will say "yes"|Carol's affirmation). That is, given that Carol told Bob that Alice wants to go out again, what is the probability that Alice will answer "yes" to a second date? According to Bayes' theorem:

P(Alice will say "yes"|Carol's affirmation) =
P(Carol's affirmation|Alice will say "yes")/P(Carol's affirmation) * P(Alice will say "yes") =
0.2/0.11 * 0.5 = 0.909090... = 10/11

Carol's affirmation, upon considering it as evidence, has pulled the probability from 50% to 91%. That is, if Bob thought before that there was only a 50% chance that Alice will agree to a second date, he should now think that there is a 91% chance. That is what evidence does: it pulls the probability for a hypothesis in one direction or another. A strong piece of evidence might pull it all the way from 0.1% to 99.9%, whereas a weak piece of evidence might only pull it from 50% to 60%. An opposing piece of evidence will pull the probability in the other direction, as in a tug-of-war. This is why we commonly speak of "weighing" the evidence. This exemplifies how Bayesian reasoning corresponds to the common sense we use in everyday life, except that it's mathematically precise.

So there is a 10 out of 11 probability, or about a 91% chance, that Alice will say "yes" to Bob's request for a second date. Things are looking good for Bob! Of course, there is the remaining 1/11 probability that Alice will say "no". Bob will have to live with that chance of rejection. That's the nature of Bayesian reasoning - you can't ever be 100% certain, but you can be certain enough to act. Bob should definitely ask Alice out again.

Note that Bayes' theorem, as with all Bayesian reasoning, compels you to accept its conclusions: you cannot simply say "I don't buy this argument" or "I don't find this convincing". If you accept its premises, you must accept it conclusion: otherwise you're violating the rules of mathematical logic.

But where do the premises come from? How did I assign, for example, that P(Alice will say "yes")=0.5, or P(Carol's affirmation|Alice will say "no")=0.02? Well, for the sake of this problem, I just made up some reasonable values. In real life, computing these values would be far more difficult than the example problem itself. For instance, to calculate P(Alice will say "yes"), Bob would have to consider all the relevant background information he has. This would include how Alice interacted with him during the first date, his knowledge of human mating behaviors (it's a good sign if she laughs at your jokes, it's bad if she calls the cops on you, etc), and any other relevant information. Based on this total information, he would calculate how often a woman like Alice would agree to a second date, and that would be his P(Alice will say "yes"). That's why this probability can be thought of as a personal, subjective degree of belief: because nobody else has the exact set of background information that Bob has.

What about calculating P(Carol's affirmation|Alice will say "no")? This is the probability that Carol will convey Alice's approval to Bob, even though Alice will say "no" to the second date. This number might be obtained through some sociological studies, by asking questions like "How often do women tell their friends that they enjoyed a date even if they didn't?" The nature of the relationship between Alice, Bob, and Carol also needs to be taken into account, along with their personalities. Are Alice and Carol very close friends? Is Carol generally reliable, or is she prone to hyperbole? The value of P(Carol's affirmation|Alice will say "no") is all this information condensed into a single number.

You may be disappointed that these probabilities are not simple to calculate. This is often the case in real-life scenarios. It turns out that humans and human relationships cannot be reduced down to a simple calculation, even with Bayes' theorem. Real life is complicated: this should not surprise anyone. Often, the relevant starting probabilities can only be guessed at from intuition. Being able to do that well is a large part of what it means to be a reasonable, logical person in the real world.

So those are the strengths and weaknesses of Bayes' theorem. On the one hand, it provides a firm, computationally exact way of updating your beliefs based on the evidence. On the other hand, the probabilities needed to perform the calculations can be difficult or impossible to assign. In particular, the assignment of prior probabilities - The degree of belief in the hypothesis before considering the observation - is a famously contentious issue within Bayesian reasoning, and there is no way to assign these numbers that's been established as being correct. This was the value of P(Alice will say "yes") in our example above, and I have described it as a personal, subjective probability based on the unique set of background information that a person has. This gives us a ballpark number that we can immediately use, but its imprecise and subjective nature, combined with the human capacity for self-deception, is a cause for concern.

Is Bayesian reasoning still useful in light of these weaknesses? Definitely. It is still far more applicable than propositional logic, and it still tells us, in a mathematically precise way, how to logically incorporate evidence into your beliefs. And often, when there is enough evidence, the specific values of these questionable probabilities turn out to be irrelevant. This is why we often look for overwhelming evidence, beyond any reasonable doubt, before we decide to take action based on a hypothesis. So while Bayes' theorem cannot tell us everything (nothing in this world can), it is a very useful tool for sharpening our thinking and processing evidence to update our beliefs.

In my next post, I will re-cast Bayes' theorem into a different mathematical equation - the odds form - which eases some of the difficulties of Bayesian reasoning. I will use this new form to discuss more examples in Bayesian reasoning.


You may next want to read:
Basic Bayesian reasoning: a better way to think (Part 3) (Next post of this series)
Why are there so few Christians among scientists? (part 1)
How to make a fractal: version 2.0
Another post, from the table of contents

Basic Bayesian reasoning: a better way to think (Part 1)

Image: by me. Feel free to use, just link back to this post.
What is Bayesian inference? I've already mentioned it in several of my previous posts, and I'm sure to bring it up again in the future. I obviously think it's important. Why?

Bayesian inference is the mathematical extension of propositional logic using probability theory. It is superior to deductive propositional logic, which is what many people think of when they hear the word "logic". In fact it includes the rules of propositional logic as special cases of its more powerful and general rules. It is the logical framework that underlies the scientific method, and it encompasses a great deal of what it means to be a rational, logical, scientific individual. As with "normal", propositional logic, you don't necessarily have to be formally trained to use it in your daily life, but knowing its basics will greatly clarify your thoughts and sharpen your rational thinking skills. The intent of this post is to provide an introduction to this important topic.

Let's study an easy problem in propositional logic as a prerequisite review and a starting point for Bayesian logic. You should have learned in middle or high school that if A implies B, and B implies C, then A implies C. With some symbols, it becomes "if (A → B) and (B → C), then (A → C)". In an example with words, it might look like "If Socrates was a human, and all human are mortals, then Socrates was mortal". This is well and good. This is a fine way of thinking. Learning how to think this way is useful and worth learning.

However, when we examine the world around us, this rule is severely restricted in its applicability. Consider the following: "If Socrates was a human, and all humans have ten fingers, then Socrates had ten fingers". Is this sound? Can we conclude that Socrates necessarily had ten fingers? Well, no. The second premise - "all humans have ten fingers" - is not strictly true. Certainly most humans do, but not all. So we cannot conclude that Socrates had ten fingers. For that matter, we're not completely 100% sure that Socrates was human either.

"What's wrong with that?" You ask. "Hasn't logic brought us to a correct conclusion, that Socrates might not have had ten fingers?" True. But that's a very weak conclusion. Someone who was basing an argument on the possibility that Socrates didn't have ten fingers would need some additional evidence. I mean, until now I had implicitly assumed that Socrates had ten fingers, and I don't think I was being particularly irrational. Isn't there some way to conclude that "Socrates probably had ten fingers"? Maybe with a rule like "If A is likely to lead to B, and B is likely to lead to C, then A is likely to lead to C"? Doesn't that seem like a pretty logical conclusion?

Of course, "If A is likely to lead to B, and B is likely to lead to C, then A is likely to lead to C" is not a valid argument in propositional logic, and you can certainly find examples where A is true while C is not. For instance, "A blind person is likely to be a poor driver. Poor drivers are likely to get traffic tickets. Therefore, a blind person is likely to get traffic tickets" seems to be a incorrect chain of reasoning. But how could we be sure that it's not just an instance of bad luck, that this is just one of the cases where that probabilistic statement, "likely", just didn't pan out?

So we can't come to any firm conclusions about the "likely" rule in logic, although it seems to make sense sometimes. At any rate we can't use rigid propositional logic with such statements. But this is an enormous restriction, because there is nothing we know in the physical world with absolute certainty. Every instrument of measurement - including your own eyes and hands - are subject to errors and uncertainty. Even if you double check and verify, that only reduces the uncertainty to infinitesimal levels, without ever completely eliminating it. How can we reason in such cases - that is, in any real world scenarios where we are perpetually plagued by uncertainty?

In Bayesian reasoning, these uncertainties are built into its foundations. The truth of a statement is not represented by just "true" and "false", but by a continuous numerical probability value between 0 and 1. So, for instance, the statement "It will rain tomorrow" might get a probability value of 0.1, representing a 10% chance of rain. A statement like "I will still be alive tomorrow" might get a value like 0.999999, as I will almost certainly not die today. "1" and "0" would respectively correspond to absolute certainty in the truth or falsehood of a statement, but as I said they cannot be used in statements about the physical world. Instead we use numbers like 0.5 to represent the certainty that the coin will land heads, or 0.65 to represent the certainty you feel that you're going to marry that girl.

But isn't any given statement ultimately either true or false? Perhaps, but we are not God. We're ignorant of many things. But we still need to reason, even in our uncertainties. Giving a numerical, probabilistic truth value to a statement allows Bayesian reasoning to mirror the human mind much more closely than propositional logic. In essence, you can treat the numerical value you give to a statement as your personal degree of subjective certainty that the statement is true, given the information that you have.

But isn't this all very probabilistic, subjective, and uncertain? In one sense, yes. And that is a strength of Bayesian reasoning, because that's an actual limitation of the human mind. In representing the truth in this way we are only accurately representing how the truth actually exists in our minds. If we actually cannot be certain, then it's appropriate that our logical system actually represents that uncertainty.

But in another sense, this probabilistic thinking is completely rigorous and unyielding. By assigning a probabilistic value to the truth, you can use all of the mathematical tools of probability theory to process them, and their conclusions are mathematically certain. Bayesian reasoning makes very definitive statements about what these probability values must be, and how they must change in light of new evidence. I said earlier that Bayesian reasoning is an extension of logic using math, and that is exactly as rigorous and compelling as it sounds.

To finish this post, let me give you an extended example to illustrate both the rigor and flexibility of Bayesian reasoning, and its superseding superiority over propositional logic. We will address the question about Socrates and his fingers. There will be some math ahead, but nothing you can't understand at a high school level.

First, let me introduce some notations:

P(X) is the probability that you assign to statement X being true. So, if statement X is "I will roll a 1 on this dice", you might assign P(X)=1/6. But if you happened to know that the dice was loaded, you might assign P(X)=1/2 instead.

P(X|Y) is the probably that X is true, given that Y is already known to be true. So, if X is "It will rain tomorrow" and Y is "It will be cloudy tomorrow", then P(X) might only be 0.1, whereas P(X|Y) would be larger, perhaps something like 0.3. That is, there is only a 10% chance of rain tomorrow, but if we know that it will be cloudy, the chance of rain increases to 30%. Notice that the probabilities change depending on what additional relevant information is known.

This P(X|Y) notation is a little bit awkward, as it's written backwards from the more intuitive "if Y, then X" way of thinking. But unfortunately it's the standard notation. By definition, P(X|Y) = P(XY)/P(Y), where P(XY) is the probability that both X and Y are true.

~X is the negation of X. It is the statement that "X is false". By the rules of probability, P(~X)+P(X)=1, because X must be either true or false. Likewise, P(~X|Y)+P(X|Y)=1, and P(~XY)+P(XY)=P(Y)

You can translate a statement in propositional logic into a statement in Bayesian, probabilistic representation, simply by setting certain probabilities to 1 or 0. For instance, "X implies Y", which would be written as "X → Y" in propositional logic, would be written as "P(Y|X)=1" in terms of probabilities.

Now back to Socrates. Let the relevant statements be represented as follows:

A: "This person is Socrates"
B: "This person is a Human"
C: "This person has ten fingers"

Given these statements, we can translate the following statements as follows:

"Socrates was Human": P(B|A)
"Humans have ten fingers": P(C|B)

Now, let's show that we can duplicate the results of propositional logic simply by setting the probabilities to 1. If P(B|A) = P(C|B) = 1, then by the definitions given earlier, P(BA)=P(A), P(CB)=P(B), therefore P(~BA)=P(~CB)=0, therefore P(C~BA)=P(~CBA)=P(~C~BA)=0. But P(C|A) = [ P(CBA)+P(C~BA) ] / [ P(CBA)+P(C~BA)+P(~CBA)+P(~C~BA) ], which reduces to P(CBA)/P(CBA) =1 after eliminating all the zero terms. That is to say, if P(B|A) = P(C|B) = 1, then P(C|A) = 1. Or, translating back into words, "If Socrates is human, and humans have ten fingers, then Socrates has ten fingers".

Don't worry too much if you got lost in the notation in the above paragraph. The important point is that Bayesian reasoning can reduce down to propositional logic for the special cases where the probability values are set to 1 or 0. Bayesian reasoning thereby completely encompasses and supersedes propositional logic, like General relativity supersedes Newtonian gravity.

What if the probabilities are not 100%? This is the real-life problem of dealing with uncertainties. What is the actual value of P(B|A), the probability that Socrates was human? Might he not have been an alien, or an angel? As ridiculous as these possibilities seem, they ruin our complete certainty and makes propositional logic flounder. What about P(C|B) - the probability that a human has ten fingers? It's certainly not 100%. And what can we conclude about P(C|A) - the probability that Socrates had ten fingers?

To tackle this question, we need to consider the following formula for P(C|A), which can be derived from straightforward application of the rules and definitions mentioned earlier. The fact that this formula exists - that we can actually derive it and use it to perform exact calculations - is one of the compelling fruits of the Bayesian way of thinking. Here it is:

P(C|A) = P(C|BA)P(B|A) + P(C|~BA)P(~B|A)

Let's say that Socrates has a P(B|A)=0.999 999 chance of being human, and that given all this, he has a P(C|BA)=0.998 chance of having ten fingers. This means that P(~B|A) = 0.000 001 is the chance that Socrates was not human. The last factor we need to know, P(C|~BA), is the probability that a non-human Socrates had ten fingers. This is nearly impossible to estimate, as we'd have to consider all the different things Socrates could have been - alien, angel, a demon in disguise, etc. But it will turn out not to matter much for our final result. Let's just assign P(C|~BA)=0.1. Plugging in the numbers and calculating, we get that P(C|A) = 0.997999102. That is to say, Socrates almost certainly had ten fingers.

If you want extra practice, you can also try using the formula on the "blind man getting traffic tickets" scenario above, and see why a blind man is not likely to get traffic tickets. Or you can wait until next week's post to get the answer.

But again, don't worry too much about the details of numerical calculation. The important point is that Bayesian reasoning provides an exact formula for calculating the probability of a conclusion, even when the premises were also only probabilities - which is always the case in the physical universe. Furthermore, the conclusions drawn this way are compelling, because they are mathematical results. If you accept the probabilities of the premises, then you must accept the conclusion. This is the same compelling force which is at work in propositional reasoning. The premises lead to inescapable conclusions.

I hope that this example demonstrates to you the usefulness of the Bayesian way of reasoning. It can be actually applied to situations with uncertain premises, which is really nearly all situations. It is completely rigorous in that a correct Bayesian argument forces you to accept its conclusions if you accept its premises. Yet it's also flexible in assigning probabilities to reflect your current, subjective, personal degree of belief in the truthfulness of a statement. It duplicates propositional logic as its special cases, and in its full form it's more general and more powerful than propositional logic. There are other advantages I have not yet touched on, such as its ability to naturally explain inductive reasoning and Occam's razor, and how it serves as the framework for the scientific method. On the whole, it encompasses a great deal of what it means to be a logical, rational, and scientific thinker.

In my next post, I will discuss a particularly important formula in this probabilistic way of thinking, one that is nearly synonymous with Bayesian reasoning - Bayes' theorem.


You may next want to read:
Basic Bayesian reasoning: a better way to think (Part 2) (Next post of this series)
What is "evidence"? What counts as evidence for a certain position?
Miracles: their definition, properties, and purpose
Another post, from the table of contents

How to make a fractal: version 2.0



You may next want to read:
What is "evidence"? What counts as evidence for a certain position?
15 puzzle: a tile sliding game
An analysis of "Let It Go" in Disney's "Frozen"
Another post, from the table of contents