## Blog pages

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 37)

Now, what kind of data do we have to determine the shape parameter?

We have the historical data, of course. We have some number of people who are said to have been resurrected in some sense, and each of these people has some amount of evidence associated with their resurrection claim.

We essentially want to "fit" these evidence data into a generalized Pareto distribution, and read off the shape parameter. However, this will be somewhat tricky. We do not have the complete data for all 1e9 reportable deaths throughout human history. We can reasonably assume that the vast majority of them would have essentially zero evidence for a resurrection, but the complete data set would be pretty much impossible to obtain. We don't even have the complete data set just for the outliers - cases like Apollonius or Zalmoxis, where there is a distinctly non-zero level of evidence for a resurrection. Furthermore, the precision on determining the level of evidence is rather poor. All this means that the usual "fit a curve through some kind of x-y scatterplot" approach would not work very well.

However, given that we already know we'll be fitting a generalized Pareto distribution, this is not necessary. We're just looking for the shape parameter, and for that, we merely need to count the number of outliers near the maximum value. Consider the following graph:

This is the same graph as before, in the sense that it just shows the generalized Pareto distribution, scaled so that the probability of x > 1 is 1e-9. Once again, this means that the maximum evidence from 1e9 reportable deaths is likely to appear around x = 1.

However, we now want to focus on how to fit the data. And since the data will have x values less than the maximum, this graph is scaled so that we're focusing to the left of the x = 1 line, instead of the tail to the right.

In particular, note the vast differences in the area under the curve for different shape parameters. The shaded regions represent the probability of finding an "outlier" - a non-Christian resurrection report with at least 20% of the evidence of the maximum report. For instance, the reports of the resurrection of Puhua or Apollonius would be considered an "outlier".

So, let's look at the green curve, with a shape parameter of 20, and a tiny area under the curve. If this were the skeptic's distribution, you'd expect essentially no other outliers. The maximum value would stand by itself, with no other outliers coming anywhere near its value.

Similarly, if the shape parameter is 2, you'd expect perhaps one outlier out of 1e9 samples - one other resurrection report would have at least 20% of the evidence of the maximum.

Lastly, if the shape parameter is 0.2, you'd expect many, many outliers. The probability distribution grows very rapidly as it goes backward from x = 1, and therefore you expect to find many other resurrection reports with a similar level of evidence as the maximum.

So by counting the number of outliers, we can make a determination about the shape parameters.

But... wait a minute. Having more outliers is associated with smaller shape parameters? But didn't smaller shape parameters correspond to a faster-decaying function, and therefore a lower probability for the "skeptic's distribution" generating a Jesus-level of evidence? Wouldn't this lead to the "skeptic's distribution" being less able to explain the evidence for Jesus's resurrection, and therefore make the resurrection more likely?

Are we saying that having MORE non-Christian resurrections reports (like Apollonius or Zalmoxis) make Jesus's resurrection MORE likely?

That is precisely what we are saying. The following analogy may help understand how this could be.
Alice accuses Bob of theft. Bob is known to have come into a sudden possession of \$100,000. He is also known to be a gambler. He claims that his sudden fortune came from a lucky night at the card table, but Alice believes that he stole the money - she claims that \$100,000 is far too large a sum for Bob to have naturally won through gambling.
Carol takes on this investigation. She looks into Bob's past gambling history, to see it's realistic for him to have won \$100,000 in a single night. She finds that, among Bob's past verifiable winnings, there were two nights where Bob won \$5,000 and \$3000. These are his most remarkable winnings on record, and Carol cannot find any other instances where he won more than \$1000 on a single night.
Carol concludes that she does not really have enough information. It could be that Bob plays a card game with an erratic payout scheme, where winning 20 or 30 times more money is not that unusual. Maybe it has some kind of "let it ride" or "double or nothing" mechanism which makes such returns plausible. Or maybe Bob himself is an erratic gambler, and decided to bet a lot more money that one night to win the \$100,000. Based on all this, Carol decides to be skeptical of Alice's claim that Bob stole the money. Her own "skeptic's distribution" for how much money Bob can win does not decay quickly enough. There are relatively few outliers near his maximum winnings of \$5000, and this suggests that it decays very slowly - meaning that the \$5000 cannot be established as a limit to what Bob can win. His theoretical winnings can possibly stretch quite far into the higher values, making it impossible to rule out a \$100,000 winning.
But then, Carol has a breakthrough in her investigation. She finds extensive, previously undiscovered records of Bob's gambling winnings, and it shows that Bob has won more than a \$1000 on dozens of nights. The maximum that he's won is still \$5000, but he's also regularly won thousands of dollars in a single night.
Carol takes this new information into account, and adjust her "skeptic's distribution" for how much Bob can win in a single night. Clearly, Bob's winnings are not erratic; he regularly wins up to about \$5000. But this also establishes, with the weight of those repeated winnings, that this is close to the likely upper limit for what he can win in one night.
Carol therefore decides to believe Alice. Her "skeptic's distribution" cannot explain how Bob would naturally win \$100,000 in a single night, because it goes against his established pattern of regularly winning up to \$5000. She pursues the case further, and eventually convicts Bob of theft.
This is not just a story; it can be mathematically established, and it will be in the future posts. For now, this story just provides the intuitive backing for the mathematical results to come.

So, having more non-Christian reports of a resurrection, with their pathetically low levels of evidence behind them, only make Jesus's resurrection more likely. When skeptics say "don't you know there are numerous other Jesus-like stories of someone dying and resurrecting?", they are only kicking against the goads. The more numerous such cases they come up with, the more firmly it establishes that Jesus really did rise from the dead.

The next post will bring the last several posts together, to fully spec out the program which will compute the complete "skeptic's distribution", from which we can calculate its chances of predicting a Jesus-level of evidence for a resurrection.

You may next want to read:
Christmas and time
For Christmas: the Incarnation

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 36)

We've decided on a power law as the general form of the "skeptic's distribution".

The details of the distribution near zero will not particularly matter. We're more concerned about how rapidly it decays at very large values. This allows us quite a bit of leeway in choosing the specific form of the power law distribution, as they all decay similarly as we move along the tail off to the right.

For this reason, I've chosen the generalized Pareto distribution as the specific form of the "skeptic's distribution", guided chiefly by the straightforward interpretation of its parameters. But the choice here will not affect any conclusions. Any other power law distribution would give the same results.

The generalized Pareto distribution is characterized by three parameters: location, scale, and shape. The location parameter determines where the distribution starts. It's where the probability density of the distribution is the largest. As the vast majority of humans have zero evidence suggesting that they rose from the dead, the location parameter should obviously be set at zero.

The scale parameter is irrelevant; it only controls how far the distribution should multiplicatively scale in the horizontal direction, and can be arbitrary changed by changing the unit of evidence we use. As we'll consider all the evidence for our the resurrection reports relative to one another (for example, as a fraction of the amount of evidence for Christ's resurrection), the value for the amount of evidence in some specific units never enters the picture. So we'll just set this parameter to 1, or to whatever is convenient for visualization, and forget about it.

The shape parameter is the interesting one. It's what we really care about. It effectively determines the power in the power law, and controls how quickly the function decays as the amount of evidence increases.

For example, this is what the tail end of the distribution looks like with various shape parameters:

In each case, the distribution has been scaled so that the total probability to the right of the grey line (at x=1) is 1e-9. Essentially, x = 1 is where you would expect the maximum value out of 1e9 samples to appear, corresponding to the level of evidence for the resurrection of a figure like Apollonius or Aristeas.

Note the different rates decay. With the shape parameter at 0.2, the probability density drops to practically zero as we move to larger x values. There is essentially nothing left by the time we've moved to x = 24, even if we integrate out to infinity. Therefore, if this were the final form of the "skeptic's distribution", the probability of generating a Jesus-level of evidence for a resurrection would be essentially zero.

However, with the shape parameter at 2, we see that the decay rate is much slower, and there is a good amount of probability even out at x = 24 and beyond. If this were the "skeptic's distribution", it would have a good chance of generating a Jesus-level of evidence for a resurrection, even if that level were 24 times higher than the runner-up.

A shape parameter of 20 decays more slowly still. It's hardly decaying at all by the time it reaches x = 24. In fact, it decays so slowly that the blue curve with the shape parameter of 2 will eventually move below it. If this were the "skeptic's distribution", it would have a non-negligible chance of generating an event at x values of much higher than 24.

So, it all comes down to the shape parameter. But how shall we decide on its value? Why, by choosing the one that best fits the data according to Bayes' theorem, of course.

We will outline this procedure in the next post.

You may next want to read:
Finding pi in a square grid: or, why you can have square brownies for pi day
Christianity and falsifiability

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 35)

Recall that we're constructing a "skeptic's distribution" - the probability distribution of generating a resurrection report with a certain level of evidence. We will construct it from historical, empirical data. This allows us to bypass the mess of trying to compute everything from first principles, and ensures that the this is the correct distribution - a skeptic cannot reject it without rejecting history or empiricism.

So, what form should this "skeptic's distribution" take?

How about using a normal distribution? Well, that would be plainly ridiculous. The data we have thus far has Jesus (very conservatively) having 24 times more evidence for his resurrection than anyone else in history. Our goal is to get the probability for something like this happening.

But if we used a normal distribution for the "skeptic's distribution", this could essentially never happen. Recall that human heights roughly follow a normal distribution. Then, our problem would be analogous to looking for someone 24 times taller than anyone else in history - that is, someone well over 200 feet tall. The probability for something like that is essentially zero. So if we chose the normal distribution, we'd essentially be dooming the skeptic's case from the start.

The same is true for an exponential distribution. An exponential distribution decreases in its probability value by a constant factor for each unit of increase in its domain. As the domain is "level of evidence" in our problem, this means that each piece of evidence would multiply the probability values. That is to say, we'd be treating each piece of evidence independently. And we already saw that even with a reasonable degree of dependence factored in, the probability values already reached numbers like 1e-54, again dooming the skeptic's case.

This is a testament to how quickly these distributions decay as they extend to the right. Their right tails are so "stubby" that the maximum values of their samples are strongly restricted, and getting something 24 times greater than that maximum is essentially impossible. Picking any such distribution would not be taking into account the dependence of the evidence, and would unfairly doom the skeptic's case from the start.

Rather, we need a distribution with a "long tail" - something that has a chance for a new high record to beat the previous record by factors like 24. Something that decays slowly enough that its probability values remain non-negligible as we move further to the right. The distribution should still be realistic and have some justification for being selected, but we want to give the skeptic the best chance.

Taking all that into account, I have chosen a power law function for the "skeptic's distribution". This should not be a surprise - indeed anyone familiar with the statistics of human behavior might have guessed it from just the histogram we're trying to fit:

What makes a power law particularly appropriate? Well, for one, power laws are the quintessential long-tailed distribution. They have one of the longest possible tails, and are fully "capable of  black swan behavior", according to Wikipedia. They can easily have tails so long that the overall distribution has an undefined (that is, infinite) mean. In fact, power laws, as mathematical functions, can decay so slowly that it's not allowed to be a probability density function, because the area under their curve can diverge. One can hardly ask for a more slowly decaying function than that. So this gives the skeptic the best chance at naturally generating a Jesus-level of resurrection evidence.

There exists distributions that decay even more slowly than a power law, but they're rare, obscure, and have no relation to what we're doing. By contrast, power law distributions are ubiquitous in human behavior. They form the basis for the well-known Pareto principle, and they capture the "dependency of evidence" factor we're currently trying to model.

For example, the distribution of income among people follows a power law. A few people, out at the long tail, have a great deal of wealth, because rich get richer - that is, because how rich you get depends on how rich you already are.

The size of cities also follows a power law. There are a few very large cities out at the long tail, because your chances of moving to a city depends on the number of people who already live there.

The number of links to a website follows a power law. There are a few, very popular websites out at the long tail, which have a lot of links to them. This is because a site's chances of getting a link depends on its popularity - that is, on the number of links it already has.

Don't let the specificity of these examples fool you. There are many, many more. Power law distributions are, as I said, ubiquitous in human behavior. They will frequently come up when one human behavior depends on the same kind of behavior, either by others or by the same person.

So it is entirely appropriate that we use a power law to  model the level of evidence for a resurrection report. There will be relatively few reports out at the long tail, like the "resurrection" of Apollonius or Krishna. In the context of things like conspiracy theories, this is because the chances of generating an additional piece of evidence depends on how much evidence it already has.

So there are excellent external reasons and examples to expect that the "skeptic's distribution" will follow a power law. Furthermore, power laws give one of the best possible chances for the skeptic's case, having a very "long tail" and allowing for a "black swan" event like the level of evidence in Jesus's resurrection event.

We will proceed to nail down the specifics of this power law distribution in the next post.

You may next want to read:
15 puzzle: a tile sliding game
Come visit my church

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 34)

Can we quantitatively tackle things like conspiracy theories? What do we do about the interdependency of evidence? One can already imagine the objections to any such attempt. Every assumption would be questioned, and every ridiculous possibility brought up demanding a full numerical treatment. Even if a traditional conspiracy were to be fully debunked in a numerical argument, a skeptic would just weasel the argument to be about a "groupthink induced by religious fervor" instead, and when that got debunked, they would just move on to "have you considered aliens?" Indeed, such weaseling is often the point of bringing up things like conspiracy theories in the first place: not to actually advocate for them, but to make the calculation appear intractable.

But I did say in my last post that I will approach this problem quantitatively - and that's exactly what I'm going to do. Furthermore, my argument will take EVERYTHING into account - government conspiracies, religious groupthink, practical jokes by aliens, everything. Every single possibility for every conceivable degree of evidence dependence will be fully considered.

In addition, empirical evidence will be the foundation of my whole argument. That is, in fact, the key that makes it totally comprehensive. Do you remember the following graph?

That is the level of empirical evidence that history has actually recorded for the resurrection of various individuals. It's a partial histogram - note the differing number of people with different amounts of evidence for their resurrection. This suggests a probability distribution.

Of course, the graph above isn't the complete record of everyone - it's a small sampling of some people who have the most evidence for their resurrection. But if we had a complete record, we could get a very accurate model for their underlying probability distribution. What would that probability distribution represent?

If we exclude Jesus and the other Christian resurrection reports, the probability distribution we get would be the EXACT model that an empirical skeptic of Christianity MUST use, in predicting the likelihood of a resurrection report. Essentially, the idea is that we can calculate the probability of getting a certain level of evidence for a resurrection, based on how frequently similar reports have come up in history.

Note that, because the raw data is gathered from empirical reports collected in history, this automatically takes things like conspiracy theories into account. The possible interdependency of the evidence is fully included in this model. So you think that a great deal of evidence can be built up through a conspiracy, because the evidence doesn't have to be independent? The distribution includes all such evidence-manufacturing conspiracies that actually existed in history. You want to switch your argument to a religious mass delusion instead? The result of all such mass delusions are also included, at the level of evidence that they actually generated in history.

How about something that has probably never happened in history at all, like an alien resurrecting someone as a joke? Even these possibilities are included, through at least two mechanisms. For one, there are a great many multitude of such unlikely scenarios - and at least one of them might have actually occurred in history, even if a specific one of them was unlikely. So they would have recorded their evidence in aggregate. And secondly, even if such unlikely scenarios never occurred, they can still be accounted for in the modeling of the probability distribution from the samples we actually have. As an analogy, if you were to model people's heights by sampling a thousand people, you will still deduce that human heights follow a roughly normal distribution, and can thereby figure out that there may be someone out there who's at least 7 feet tall, even if such a person was not in your sample.

So you see, this method does in fact take everything into account. It does generate the exact model that an empirical skeptic of Christianity must use. That's the great thing about arguing from empirical, historical records. You can bypass all the difficult and controversial calculations about the probabilities of conspiracies, or the precise degree of dependence among the evidence. All of that automatically gets incorporated into the historical data at their actually correct historical values, and all we have to do is to read off the final result.

Once we have this "skeptic's distribution", the rest of the calculation is fairly straightforward. We can calculate the probability of generating a Jesus-level of resurrection evidence, from the point of view of the skeptic's hypothesis. From a Christian perspective, this probability is within an order of unity, so the skeptic's probability then essentially becomes our Bayes' factor. We then simply see if this Bayes' factor is enough to overcome the low prior probability for a resurrection.

Now, I will use 1e-11 as the prior probability for a resurrection. This is higher than the 1e-22 that I had used earlier - but recall that I only got 1e-22 by simply giving away an additional factor of 1e11 for no reason. I can't afford to just give that away when faced against something as insidious as conspiracy theories. 1e-11 is still smaller than anything that a skeptic can demand based on empiricism, and the rest of the argument will be constructed so that even the minimum Bayes' factor will exceed 1e11. The actual Bayes' factor may exceed 1e22 by the end - I just can't be as precise in my calculations at that level.

We will start constructing this "skeptic's distribution" in the next post.

You may next want to read:
On becoming a good person
The lifetime of evil (part 1)

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 33)

So the diversity of the individuals involved in Christ's resurrection testimonies already make a high degree of interdependence unlikely. One could hardly find a less likely group of people to enter into a world-spanning conspiracy. You would expect disparate parts of such a group to be constantly at odds with each other, destroying the conspiracy almost immediately.

In fact, that's pretty much what happened: the disparate parts of the group were constantly at odds with each other - and yet, the "conspiracy" was preserved.

There were hints of confusion and division even before Jesus's crucifixion, in things like the disciples arguing about who will be the greatest, or who will be sitting by Jesus's side when he establishes his kingdom. Peter even berated Jesus for announcing his upcoming death, and there seems to have been a general confusion about the nature of the movement - are they going to lead an uprising against Rome? Do they need to be armed?

After Christ's ascension, very early in the book of Acts, there was a conflict between the Greek-speaking members and the other Jewish members of the Church, concerning the equitable distribution of food to the widows. This was a big enough deal that the Church instituted a whole new tier of leadership - the first deacons - to address the issue. And yet, the central tenant of the "conspiracy" - the resurrection - was unchanged.

Soon thereafter, an intense persecution befell the church. Several key members were killed, and the church was scattered across the known world. Gentiles were evangelised around this time as well (Cornelius, Ethiopian eunuch) - which in itself caused no small controversy. All of this further fragmented an already very diverse church. The problem was so bad that various evangelists regularly encountered people with very incomplete knowledge about Jesus. There was a group who did not know about the baptism of the Holy Spirit, and Apollos had to have his knowledge completed by Priscilla and Aquila. Still others were only attracted to the power associated with the name of Jesus and wanted to misusing it outright, like Simon Magus and the seven sons of Sceva. And despite all this persecution, fragmentation, and confusion, the "conspiracy" held together.

In the middle of all this, Paul - already mentioned as one of the early persecutors of the church - miraculously converted to Christianity, and became one of its foremost evangelists, to the point of becoming one of the named witnesses in 1 Corinthians 15. He then got embroiled in the central controversy of the early Christian Church: how to handle Gentile believers. This controversy got so heated that Paul once had to publicly rebuke Peter for his stance, and James wrote his epistle with a vastly different emphasis from Paul on what it means to truly be a "believer". In other words, this controversy set all three of the named witnesses in 1 Corinthians 15 against one another, to some degree. And yet, the "conspiracy" endured.

And that's not the end to the divisions of the early church - A number of outright heretical groups had to be condemned - Paul pronounced anathema to a group proclaiming "a different gospel", and John pointed out certain "antichrists" at large in the world, and also named the works of the Nicolaitans as the objects of Jesus's hate. And despite all this division, the "conspiracy" remained.

Again, what kind of conspiracy does this? What conspiracy kills off its leader, fragments itself into dozens of different pieces, bitterly fights itself on internal controversies, condemns some parts of itself, and still survives? And all for what purpose? Persecution, controversy, and death? That is all that any insider might have hoped to receive by adhering to their conspiracy. As Paul himself says in 1 Corinthians 15: "If in Christ we have hope in this life only, we are of all people most to be pitied."

If the "conspiracy" is that Jesus really did rise from the dead, and that this was the central truth that held early Christianity together, despite all of its divisions - then all this makes sense. But if you want all this to be the result of some made-up story, then you have to postulate a completely ridiculous conspiracy - one where the leaders somehow concocted the greatest and most effective lie the world had ever seen, despite being an inept, fractious group of people with little control over their followers. Or, you can instead postulate a truly vast conspiracy, one which planned for all this persecution and division and infighting from the beginning. You can postulate whatever you'd like. That's the whole appeal of conspiracy theories. But at the end, the prior probability for any conspiracy you postulate will be absolutely miniscule.

The next post will begin the process of putting all this on a quantitative footing.

You may next want to read:
Miracles: their definition, properties, and purpose
How to make a fractal

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 32)

So, could the resurrection testimonies really have a near-total dependency among them? Could they have been generated by a conspiracy of some sort? There are a multitude of reasons to think they were not.

First, there is the story of apostle Paul - one of the named witnesses in 1 Corinthians 15, and someone who first started out as a zealous persecutor of Christianity. He is then supernaturally converted by literally seeing the light on the road to Damascus, and becomes Christianity's most effective evangelist. How many conspiracies have something like that in their narrative?

Now, the conspiracy theorist can still say "that was obviously a part of the plan! You've been taken in by their trap!" After all, that's precisely what a conspiracy theory is designed to do. But as I said before, while this allows them to "explain" apostle Paul by keeping the Bayes' factor for his testimony to around 1, it is still a significant blow against a conspiracy theory. For a conspiracy that has planned for such a conversion is far less likely a priori than one that has not. In postulating a more vast, deep, and comprehensive conspiracy, the theorist has postulated a less likely one.

In fact, Paul's conversion is so unlikely that it's probably enough by itself to debunk the most common types of conspiracy theories. Ascribing Paul's actions to a conspiracy is like planning to punch a stranger in the streets in hopes of making money from the ensuing lawsuit, or asking a politician to concede an already won election to their bitter political rival solely out of respect and goodwill. Human beings just don't work that way.

But we're just getting started. Let's look at James - the biological brother of Jesus, and another of the named witnesses in 1 Corinthians 15. Consider his relationship with the rest of the early Christian movement.  Earlier on in his ministry, there is good reason to think that there were some strained relationships between Jesus's family and his disciples. And yet, after the resurrection James is considered one of the chief disciples, and is named as a witness to Christ's resurrection. Could this have been the result of a conspiracy?

Unlikely. Conspiracies are, I think, either family affairs or professional affairs. The relatively rare cases where it involves both are cases where the family member was already in the professional circle from the beginning (e.g. a family of politicians or bankers). I haven't really heard of a conspiracy which starts in a professional setting, then spreads to encompass estranged family members, as key players. Of course, you can postulate that James was in on the plan from the beginning - but you're again just postulating a bigger, and therefore a less likely, conspiracy.

So there is already a great deal of independence for Paul and James from the rest of Jesus's disciples, which include Peter. So our three named witnesses in 1 Corinthians 15 are quite unlikely to be dependent, and therefore their testimonies are unlikely to be the product of a conspiracy.

But the independence of the witnesses don't stop here. The twelve disciples may be thought of as a fairly interdependent group - after all, they were twelve Jewish males who all followed one leader. But looking into their background reveals a good amount of diversity. Some of them were fishermen - but their number also included, at a minimum, a tax collector (working for Rome) and a zealot (revolutionaries working against Rome). It's not easy to come up with three groups that would have gotten along less with each other than a tax collector, a zealot, and a regular Jewish worker, like a fisherman. Could a conspiracy rise from such a group? It's not impossible, but it's also not likely.

The diversity is further magnified in the earliest converts to Christianity, at the Pentecost. According to Acts 2, these people were from all over the known world. Many of them did not even consider Hebrew, Aramaic, or Greek to be their native tongue. Again, could a conspiracy spread out so quickly to such a diverse group, as the very first people to be taken in? It must have been a very flexible and compelling conspiracy indeed - and therefore a very unlikely one.

Lastly on the point of diverstiy, there are of course the women. They go unmentioned in the public declarations of 1 Corinthians 15, because women were not considered reliable witnesses in the 1st century Jewish society. Yet they are featured prominently in the actual narrative in all of the gospels - as the group that did not abandon Jesus at the cross, and the first witnesses to the risen Christ. What kind of conspiracy does this? Why have the first witnesses to the resurrection be a class of people the society considers unreliable? Why include them in the story at all, if you're not going to publically mention them among the chief witnesses?

If it's all true, then all this makes sense. But as a conspiracy theory, each one is a mystery. One can construct a conspiracy theory that fits all this, but such a conspiracy would be a rare one indeed, and highly unlikely a priori.

We will go over more reasons against conspiracies next week.

You may next want to read:
A common mistake in Bayesian reasoning
How to determine the specific purpose of the universe

### Some vague, unoriginal thoughts about the election

[...]
Surely the nations are like a drop in a bucket;
they are regarded as dust on the scales;
he weighs the islands as though they were fine dust.
Lebanon is not sufficient for altar fires,
nor its animals enough for burnt offerings.
Before him all the nations are as nothing;
they are regarded by him as worthless
and less than nothing.
[...]
Do you not know?
Have you not heard?
Has it not been told you from the beginning?
Have you not understood since the earth was founded?
He sits enthroned above the circle of the earth,
and its people are like grasshoppers.
He stretches out the heavens like a canopy,
and spreads them out like a tent to live in.
He brings princes to naught
and reduces the rulers of this world to nothing.
No sooner are they planted,
no sooner are they sown,
no sooner do they take root in the ground,
than he blows on them and they wither,
and a whirlwind sweeps them away like chaff.
[...]
Why do you complain, Jacob?
Why do you say, Israel,
“My way is hidden from the Lord;
my cause is disregarded by my God”?
Do you not know?
Have you not heard?
The Lord is the everlasting God,
the Creator of the ends of the earth.
He will not grow tired or weary,
and his understanding no one can fathom.
He gives strength to the weary
and increases the power of the weak.
Even youths grow tired and weary,
and young men stumble and fall;
but those who hope in the Lord
will renew their strength.
They will soar on wings like eagles;
they will run and not grow weary,
they will walk and not be faint.

[...]
Do not put your trust in princes,
in human beings, who cannot save.
on that very day their plans come to nothing.

Blessed are those whose help is the God of Jacob,
whose hope is in the Lord their God.
He is the Maker of heaven and earth,
the sea, and everything in them—
he remains faithful forever.
He upholds the cause of the oppressed
and gives food to the hungry.
The Lord sets prisoners free,
the Lord gives sight to the blind,
the Lord lifts up those who are bowed down,
the Lord loves the righteous.
The Lord watches over the foreigner
and sustains the fatherless and the widow,
but he frustrates the ways of the wicked.

The Lord reigns forever,
your God, O Zion, for all generations.
Praise the Lord.

You may next want to read:
Human laws, natural laws, and the Fourth of July
Religious freedom and religious accommodations

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 31)

Let us examine this general class of theories, that postulate a near-total interdependence in the evidence against them. What kind of theories are they? What are their properties? Is it fair to characterize them as "crackpot" theories?

Now, note that such theories requires a conspiracy of some kind, almost by definition. Near-total interdependence means that what appeared to be many pieces of evidence was really just controlled by a singular false entity, which manufactured all the other pieces of evidence. Whether this source was a group of disciples or an elite Roman secret society or some space aliens or whatnot doesn't particularly matter - All such theories share the following traits.

The first thing to note about such theories is that they have very low priors probabilities to begin with. Indeed, among those skeptical of Christ's resurrection, a theory of this type is almost never their first choice. Few people want to be labeled a conspiracy theorist, after all. The skeptics want the resurrection testimonies to have been produced "naturally". They'll invoke known social phenomena such as myth generation over a long time, or religious fervor or delusion. They want such ordinary explanations to be a plausible way to generate the resurrection testimonies. Of course, what we've demonstrated thus far is that such explanations are in fact not plausible - that they're faced against a Bayes' factor of more than 1e54.

Maybe some people will say that they'd rather be a conspiracy theorist than believe in the resurrection. But even so, such people only say this as a backup, while still trying to argue for a more ordinary explanation.

So, conspiracy theories and other similar hypothesis have low prior probabilities, even in the mind of skeptics. This is appropriate, as conspiracies are in fact very rare.

Secondly, these 'near-total interdependence of evidence' theories are designed to ignore the evidence. They are chosen precisely because they allow their adherents to say "but that's exactly what they want you to think!" to any evidence you bring against them. It's important to note that this is not an accidental, fortuitous property of these theories. 'Near-total interdependence of evidence' is the defining feature of such theories, and it's precisely that feature which allows them to dismiss all the evidence which would weigh against more likely theories.

In combination, the above two facts mean that such theories cannot really hope to win the day. Since they start with a low prior, and are designed for ignoring the evidence, they cannot really hope to prevail - they need evidence to increase that low prior probability, but they're designed mostly to ignore evidence.

Note that, when a conspiracy theorist ignores evidence by saying "that's exactly what they want you to think!", this doesn't actually help the theory. It merely turns a piece of evidence against the theory into no evidence. Yes, the conspiracy theory has "explained" the evidence, but only about as well as the rival theory. The Bayes' factor therefore stays around 1, meaning nothing has changed on that front, and the probability for the conspiracy theory remains at its low prior value.

But, such evidence does still hurt they conspiracy theory, because the prior probability itself is now a lower value. A greater conspiracy that explains more - one that is more vast and has planted more evidence and covered it up better - is a priori less likely to have come about than a lesser conspiracy. So a piece of evidence that the conspiracy has to dismiss does still hurt the theory. The hope of the conspiracy theorist is that this harm in the prior probability will be less than the exponential rate of harm that a fully independent piece of evidence would normally cause.

So the most such a theory can realistically hope for is a kind of non-total loss, where they lose less quickly and hope to say "at least it's not impossible!" at the end.

Now, there are very particular kinds of evidence that does help them - the ones that specifically demonstrates a conspiracy. Something like a document from a secret meeting that lays out the nefarious master plan would work. But, of course, for a vast majority of these theories, such evidence does not exist.

So, given all these traits - given that they are highly unlikely theories that are designed to ignore the evidence, with little chance at any positive evidence for them - I think it's fair to call them crackpot theories.

In fact, in the specific case of Christ's resurrection, the situation is even worse for these theories - for there are many factors within the resurrection testimonies that are highly effective in working against them. We will examine these in the next post.

You may next want to read:
"Simon, son of John, do you love me?"
The limits of science as evidence for Christianity

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 30)

Let us recall our purpose in collecting these non-Christian stories about a "resurrection": we wanted to verify our Bayes' factor for the evidence of Christ's resurrection. My claim is that it's at least 1e54.

The first part of our plan was to find the non-Christian resurrection story with the most evidence behind it. If we make the naturalistic assumption about these stories, we can then say that this level of evidence is approximately what corresponds to a Bayes' factor of 1e9. For by the virtue of having the most evidence, such a resurrection story would have narrowed the field down to itself - one case - from the approximately 1e9 reportable deaths in history.

As it turned out, the "resurrection"s of Krishna and Aristeas had the most evidence behind them, amounting to roughly 1/24th of the evidence for Christ's resurrection. According to our program, this must be assigned a Bayes' factor of roughly 1e9. Then 24 times that amount of evidence would correspond to raising the Bayes' factor to the 24th power - meaning, the evidence for Christ's resurrection has a Bayes' factor of... 1e216.

So yes, that does verify that the Bayes' factor is "at least 1e54". It furthermore demonstrates how much of an underestimate that value is. Recall that, in a slightly different context, I mentioned that the full odds for the resurrection would be far in excess of 1e100, and that our values for the Bayes' factors were drastic underestimates. All that is verified by this completely different methodology, of comparing with non-Christian resurrection stories.

But that's not all. This comparison also provides yet another layer of verification, in that it allows us to check the Bayes' factor of 1e8 for a disciple's testimony about Christ's resurrection. You see, among the non-Christian resurrection stories we've seen, there was not a single case of a person making an earnest, insistent testimony about someone rising from the dead. That says something about the strength and rarity of such testimonies. Granted, we have not investigated every existing non-Christian resurrection story - but if such a testimony really has a Bayes' factor of 1e8, there should be about ten such testimonies for us to find. The fact that we have not found a single one puts a lower bound on the Bayes' factor, of just about 1e8. As usual, there's some nitpicking possible, depending on whether you think there are a hundred or thousands of non-Christian resurrection stories. But it's unlikely for any of that to change the value of 1e8 by more than a couple of orders of magnitude. So our estimate about the strength of the disciples' testimony has now also been verified.

We can now be very confident that Jesus rose from the dead. Our previous calculation which first gave us this confidence has now been verified in multiple ways, using completely different methodologies - by double-checking with the historical background of non-Christian resurrection stories. Everything checks out, and all the numbers are in harmony.

But... all this has been computed under the assumption that there isn't any extreme dependence in the disciple's testimonies. We've already accounted for "normal" dependence, like ordinary social pressure or group conformity. But we have not yet accounted for the possibility that the entire set of testimony about Jesus's resurrection might have been been engineered to be in agreement by some unknown force. That is to say, we've been discounting crackpot theories - like a conspiracy by the disciples to steal Jesus's body, or an alien mind-controlling all the witnesses to the resurrection.

Ignoring such theories is fine and good, as long as both sides of the debate are agreed in dismissing them. Most doubters of the resurrection do not subscribe to these extreme theories, so carrying out our calculations in this way up to this point was still productive. However, they're now facing a double-checked Bayes' factor exceeding 1e54 for the resurrection. This makes the posterior probability against the resurrection so tiny, that the small prior probability assigned to crackpot theories now seem much larger in comparison. Someone set on disbelief can no longer ignore these theories. Indeed they have no other choice: they must fully embrace these crackpot theories.

We will begin to address such theories starting next week.

You may next want to read:
On becoming a good person
Human laws, natural laws, and the Fourth of July

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 29)

So, let us summarized these non-Christian accounts of a resurrection. For each supposedly "resurrected" person, the following table shows the level of evidence associated with their resurrection account, expressed as a fraction of the evidence we have for Christ's resurrection:

 Name of the person The level of evidence Apollonius of Tyana 1/30th Zalmoxis 1/60th Aristeas 1/24th Mithra 0 Osiris 1/60th Dionysus 1/60th Krishna 1/24th Bodhidharma 1/60th Puhua 1/60th

Here's how this looks like in a histogram:

What does all this tell us? Quite a bit - we'll discuss that starting next week.

You may next want to read:
How should we interpret the Bible? Look at it as scientific data.
Why are there so few Christians among scientists? (part 2)

### Questions and answers: my career change to data science

I've been working as a data scientist for some time now. This is a change for me. My background is in physics, and my previous work was mostly in education, teaching in the STEM fields. So this is an exciting period in my life. Career changes often are, and data science in particular is an exciting field, having been called "the sexiest job of the 21st century" - a description that I'll not complain about.

All this excitement has lead to people asking a number of questions about my job - in fact, I recently ran into an old friend who was contemplating a similar kind of career change, and I told her that I loved my new job. She then asked me the following questions:

1) Why do you love being a data scientist? Why does the job fit you like a glove?
2) What do you do day-to-day? What's a typical week like?
3) What were the challenges starting out? What are they now, some time later?
4) What are the best and worst parts of the job?
5) What are the top 3 actions you took to successfully land a job in data science?
6) If you could give career advice to yourself from a couple of years ago, or to a fellow scientist or mathematician now, what would it be?
7) Anything else I'm missing?

1) Why do you love being a data scientist? Why does the job fit you like a glove?

Doing data science is like all the fun parts of doing physics, without any of the drudgery. Iteration times are much faster. If I'm curious about something, I can just look in the data, instead of having to go get the liquid nitrogen, and wait for the vacuum system to pump down, and for the coarse approach to finish, etc. I can typically produce something meaningful in hours or days, instead of weeks or months.

I never was very good in the lab, with all the physical equipment. I was better in a more theoretical, abstracted setting. I felt more at home when sitting in front of a computer, or thinking about a math problem on a whiteboard, rather than working with physical lab equipment. Collecting physical data is hard - it's what ends up taking all the time and effort in physics. But in data science, there's already so much data out there that much more of my time could be spent on exploring, processing, analyzing, and acting on that data. I can do things like that for hours or days on end. So a data science job fits well with my talents and temperament.

The lengthy and time-consuming nature of physical data collection was another part of science that I didn't particularly like. It was hard to stay motivated - I can work hard on a problem for an extended period of time, but if there was no real sign of progress after a few days of focused effort, that would be very discouraging to me. This happened to me frequently in physics, and it probably ultimately prevented me from doing well in grad school.

By contrast, data science projects generally give me some real return on my investment within hours of putting in the effort. In addition, working in industry as opposed to academia means that the results of my work impact the real world much more quickly. This higher rate of reward for my efforts helps me stay motivated and is much more suited for my personality.

On a more big-picture level, I like the idea that my primary job is to think soundly about data. That, in the abstract, is a profound activity, closely connected to the depth of the nature of the universe. In that sense I still feel that I am trying to "know the thoughts of God", in much the same way that I felt when I was exploring the "big questions" of physics.

2) What do you do day-to-day? What's a typical week like?

I usually do some programming, some SQL data pulls, visualize data by making graphs, attend meetings, etc. These pieces get put together to answer a data question that's relevant to my employer, like "How effective is this feature in our business in serving our customers?" or "Which of our potential customers would benefit the most from which of our offerings?" I usually work on answering several of these questions per week, with some of the larger questions taking several weeks to answer completely.

There is also an element of teaching and education, as part of the role of a data scientist is helping others understand what the data is saying and what actions we can take in response to it. This is great for me, as it scratches my itch for teaching and personal interactions - more things that I couldn't often get while sitting in the basement in a physics lab.

3) What were the challenges starting out? What are they now, some time later?

Coming from a background in physics, there were some new things I had to pick up. Obviously, I had to ramp up on the programming end. I had picked up a smattering of obscure, little-used programming languages throughout my physics career, but I finally learned more useful languages like Python or JavaScript some time before switching to data science. Apart from general programming, there's also some specific sets of modules used by data scientists, and I had to become proficient in those as well. I also picked up some statistics.

In all these thing my physics background served me well. A great deal of learning to do data science was just learning to implement the things that I could already think of to do, in a specific development environment. But the foundation of the necessary abstract concepts and the mathematical tools were already in place.

Now that I've been in the field for a while, I'm still challenged to continually improve and add to my skill set. I see more of what's out there now - there's so much to branch out to that it's hard to say what I should learn next. There's also the other, more nebulous, non-technical questions that I have to answer, like "how can I maximize the impact that my work will have in my company?" or "how should I split my time between exploring new ideas and doing what others have asked of me?" I didn't have much time to contemplate these questions when I first started, but I find that they're becoming more important now.

4) What are the best and worst parts of the job?

My favorite part of the job is probably something that I've mentioned above - the idea that I'm sharpening my data interpretation skills, that my job is basically to think soundly about data.

It's also great to see my ideas make a difference in the real world very quickly. People are having a different experience right now because of what I've done. My co-workers come to me for help with the data, and my recommendations determine how they do their job and how our customers interact with our product.

I also like the regular, but flexible, hours for my job. I don't have to worry about being "on-call" like some software engineers do, where they have to get up and do work at 3am on a weekend when the website totally breaks or something. I work during normal working hours, but I can easily switch around things like coming in late one morning for staying late another day.

There really aren't that many bad parts. I guess I sometimes end up waiting for the computer to process things, which you sometimes have to do when you're looking at billions of rows of data. Sometimes the work can get a little boring, when I have to repeat a similar, routine analysis. Some of my work ends up not having an impact, either because other people ignore it or because the conclusions end up not being actionable - and that's annoying for the same reason that making a difference is satisfying. But overall, the good definitely outweighs the bad.

5) What are the top 3 actions you took to successfully land a job in data science?

First, I learned a great deal of programming on my own. I picked up Python and JavaScript, because the school I was working at asked me to teach these, and because I wanted to do more with my blog. Some of these pre-career change projects can be found on the other posts on this blog. The programming skills, combined with the quantitative mindset I already had from my science background, positioned me well for a data science job.

Second, I applied to a data science boot camp. Insight is a well-known program, as is The Data Incubator. There are many others. Seriously, if you're wondering about a career in data science, apply to one of these programs now. Just the application process taught me a ton of things, ranging from "what does a data scientist do?" to "how do I scrape a webpage?" to "what specific tools do I need to become proficient in?" It will answer a lot of the questions that any potential future data scientist would have.

Third, I applied to jobs. Usual job seeker advice here - be tenacious, send to multiple employers, continue to practice interviewing, etc. If you've learned your craft well, you're tremendously valuable to potential employers - so believe in yourself and keep at it.

6) If you could give career advice to yourself from a couple of years ago, or to a fellow scientist or mathematician now, what would it be?

I'm happy with how things turned out - and I see now that I was already on this trajectory from 2 years ago - so I'd tell myself to keep going. As for anyone with a science or mathematics background contemplating a career change, I'd say go for it. At least, try applying to one of those boot camps.

I'd also say this, on a more "big picture" level: I think that computing, and data science in particular, is the pre-eminent field of our age, in our current moment in human history. I first got into physics because of people like Einstein or things like space travel or nuclear power - but these are very much 20th century endeavors. I still love physics, and there are certainly still some interesting activities there - but computing and data is where all the really exciting things are now happening.

7) Anything else I'm missing?

I can't think of anything really "missing" from the questions above - so here's a summary instead. I love my new job. I love my new career, considered as its own academic field. It has a number of varied characteristics that all fit my traits and needs exceedingly well: the required talents and skills, the reward schedule, the mix of group and solitary activities, the working hours, the connection with all other fields of study, etc. Of course, every job is bound to have some downsides, but I'm very content overall - I couldn't expect much more from a new career. If anyone is thinking about a career in data science, I'd encourage them to at least give it a serious pursuit.

You may next want to read:
How to make a fractal
Basic Bayesian reasoning: a better way to think (Part 1)

### Questions and answers: my career change to data science (Part 2)

(continued from the previous post)

4) What are the best and worst parts of the job?

My favorite part of the job is probably something that I've mentioned above - the idea that I'm sharpening my data interpretation skills, that my job is basically to think soundly about data.

It's also great to see my ideas make a difference in the real world very quickly. People are having a different experience right now because of what I've done. My co-workers come to me for help with the data, and my recommendations determine how they do their job and how our customers interact with our product.

I also like the regular, but flexible, hours for my job. I don't have to worry about being "on-call" like some software engineers do, where they have to get up and do work at 3am on a weekend when the website totally breaks or something. I work during normal working hours, but I can easily switch around things like coming in late one morning for staying late another day.

There really aren't that many bad parts. I guess I sometimes end up waiting for the computer to process things, which you sometimes have to do when you're looking at billions of rows of data. Sometimes the work can get a little boring, when I have to repeat a similar, routine analysis. Some of my work ends up not having an impact, either because other people ignore it or because the conclusions end up not being actionable - and that's annoying for the same reason that making a difference is satisfying. But overall, the good definitely outweighs the bad.

5) What are the top 3 actions you took to successfully land a job in data science?

First, I learned a great deal of programming on my own. I picked up Python and JavaScript, because the school I was working at asked me to teach these, and because I wanted to do more with my blog. Some of these pre-career change projects can be found on the other posts on this blog. The programming skills, combined with the quantitative background I already had from my science background, positioned me well for a data science job.

Second, I applied to a data science boot camp. Insight is a well-known program, as is The Data Incubator. There are many others. Seriously, if you're wondering about a career in data science, apply to one of these programs now. Just the application process taught me a ton of things, ranging from "what does a data scientist do?" to "how do I scrape a webpage?" to "what specific tools do I need to become proficient in?" It will answer a lot of the questions that any potential future data scientist would have.

Third, I applied to jobs. Usual job seeker advice here - be tenacious, send to multiple employers, keep practicing interviewing, etc. If you've learned your craft well, you're tremendously valuable to potential employers - so believe in yourself and keep at it.

6) If you could give career advice to yourself from a couple of years ago, or to a fellow scientist or mathematician now, what would it be?

I'm happy with how things turned out - and I see now that I was already on this trajectory from 2 years ago - so I'd tell myself to keep going. As for anyone with a science or mathematics background contemplating a career change, I'd say go for it. At least, try applying to one of those boot camps.

I'd also say this, on a more "big picture" level: I think that computing, and data science in particular, is the pre-eminent field of our age, in our current moment in human history. I first got into physics because of people like Einstein or things like space travel or nuclear power - but these are very much 20th century endeavors. I still love physics, and there are certainly still some interesting activities there - but computing and data is where all the really exciting things are now happening.

7) Anything else I'm missing?

I can't think of anything really "missing" - so here's a summary instead. I love my new job, and my new career, considered as its own academic field. It has a number of varied characteristics that all fit my traits and needs exceedingly well: the required talents and skills, the reward schedule, the mix of group and solitary activities, the working hours, the connection with all other fields of study, etc. Of course, every job is bound to have some downsides, but I'm very content overall - I couldn't expect much more from a new career. If anyone is thinking about a career in data science, I'd encourage them to at least give it a serious pursuit.

(consolidated in the next post)

You may next want to read:
Sherlock Bayes, logical detective: a murder mystery game
The want of a mate

### Questions and answers: my career change to data science (Part 1)

I've been working as a data scientist for some time now. My background is in physics, and my previous work was mostly in education, teaching in the STEM fields. This is an exciting period in my life. Career changes often are, and data science in particular has been called "the sexiest job of the 21st century" - a description that I'll not complain about.

All this excitement has lead to people asking a number of questions about my job - in fact, I recently ran into an old friend who was contemplating a similar kind of career change, and I told her that I loved my new job. She then asked me the following questions:

1) Why do you love being a data scientist? Why does the job fit you like a glove?
2) What do you do day-to-day? What's a typical week like?
3) What were the challenges starting out? What are they now, some time later?
4) What are the best and worst parts of the job?
5) What are the top 3 actions you took to successfully land a job in data science?
6) If you could give career advice to yourself from a couple of years ago, or to a fellow scientist or mathematician now, what would it be?
7) Anything else I'm missing?

1) Why do you love being a data scientist? Why does the job fit you like a glove?

Doing data science is like all the fun parts of doing physics, without any of the drudgery. Iteration times are much faster. If I'm curious about something, I can just look in the data, instead of having to go get the liquid nitrogen, and wait for the vacuum system to pump down, and for the coarse approach to finish, etc. I can typically produce something meaningful in hours or days, instead of weeks or months.

I never was very good in the lab, with all the physical equipment. I was better in a more theoretical, abstracted setting. I felt more at home when sitting in front of a computer, or thinking about a math problem on a whiteboard, rather than working with physical lab equipment. Collecting physical data is hard - it's what ended up taking all the time and effort in physics. But in data science, there's already so much data out there that much more of my time could be spent on exploring, processing, analyzing, and acting on that data. I can do things like that for hours or days on end. So a data science job fits well with my talents and temperament.

The lengthy and time-consuming nature of physical data collection was another part of science that I didn't particularly like. It was hard to stay motivated - I can work hard on a problem for an extended period of time, but if there was no real sign of progress after a few days of focused effort, that would be very discouraging to me. This happened to me frequently in physics, and it probably ultimately prevented me from doing well in grad school.

In contrast, data science projects generally give me some real return on my investment within hours of putting in the effort. In addition, working in industry as opposed to academia means that the results of my work impact the real world much more quickly. This higher rate of reward for my efforts helps me stay motivated and is much more suited for my personality.

On a more big-picture level, I like the idea that my primary job is to think soundly about data. That, in the abstract, is a profound activity, closely connected to the depth of the nature of the universe. In that sense I still feel that I am trying to "know the thoughts of God", in much the same way that I felt when I was exploring the "big questions" of physics.

2) What do you do day-to-day? What's a typical week like?

I usually do some programming, some SQL data pulls, visualize data by making graphs, attend meetings, etc. These pieces get put together to answer a data question that's relevant to my employer, like "How effective is this feature in our business in serving our customers?" or "Which of our potential customers would benefit the most from which of our offerings?" I usually work on answering several of these questions per week, with some of the larger questions taking several weeks to answer completely.

There is also an element of teaching and education, as part of the role of a data scientist is helping others understand what the data is saying and what actions we can take in response to it. This is great for me, as it scratches my itch for teaching and personal interactions - more things that I couldn't often get while sitting in the basement in a physics lab.

3) What were the challenges starting out? What are they now, some time later?

Coming from a background in physics, there were some new things I had to pick up. Obviously, I had to ramp up on the programming end. I had picked up a smattering of obscure, little-used programming languages throughout my physics career, but I finally learned more useful languages like Python or JavaScript some time before switching to data science. Apart from general programming, there's also some specific sets of modules used by data scientists, and I had to become proficient in those as well. I also picked up some statistics.

In all these thing my physics background served me well. A great deal of things was just learning to implement the things that I could already think of to do, in a specific development environment. But the foundation of the necessary abstract concepts and the mathematical tools were already in place.

Now that I've been in the field for a while, I'm still challenged to continually improve and add to my skill set. I see more of what's out there now - there's so much to branch out to that it's hard to say what I should learn next. There's also the other, more nebulous, non-technical questions that I have to answer, like "how can I maximize the impact that my work will have in my company?" or "how should I split my time between exploring new ideas and doing what others have asked of me?" I didn't have much time to contemplate these questions when I first started, but I find that they're becoming more important now.

(to be continued in the next post)

You may next want to read:
Basic Bayesian reasoning: a better way to think (Part 1)

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 28)

(Continued from the previous post)

#### Miscellaneous thoughts

Here's a few more assorted thoughts:

I still think that you're too afraid of large odds. For example, my gut feeling is that 99.99% is far too small a limit on how certain we can be in history. I mean, we can make meaningful, almost empirical statements about everyone who existed. That should tell you that odds of something like 1e11 is not extraordinary. In fact, if we are quite certain in our statement about everyone who lived, we can add a few more orders of magnitude. I still mostly stand by my statement that a probability is not "too large" unless its log is "too large", although I have adjusted this "too large" value downwards somewhat. Or maybe a probability is not "too large" unless you can no longer meaningfully get at it empirically.

On a related note, you see that using the skeptical prior value of 1e-22 for the resurrection is not all just showmanship. Clearly, a prior value of 1e-11 is justified, in that Christianity is essentially claiming that Jesus was unique on all of world history. You'll probably want another 2 orders of magnitude on that to future-proof the claim for the possible increases in the human population. You'll then want another 4 orders to at least reach the 99.99% for "certain within limits of history" mark. That already brings us to 1e17 as the necessary Bayes' factor - which, incidentally, seems to be about what you come up with in your own summary. I'm just requiring another 5 orders of magnitude on top of that to really put the nail in the coffin, and to cover any unseen contingencies. You can call that "just showmanship" if you'd like, but it serves the honest purpose for which I have constructed it in my series - a prior smaller than any that a skeptic can justifiably ask for, which can still be overcome by the evidence.

There's also a few things to say about conspiracies - for example, I think that at sufficiently large N values they scale worse than independence. Everyone telling the truth scales exponentially, as you said, but a conspiracy has an additional factor where everyone's story has to match with everyone else's, and everyone has to get along with everyone else - meaning, there's an N! factor against conspiracies. This is why we rightly consider any hypothesis involving "a vast conspiracy" to be a crackpot theory. Fortunately, we don't have to worry about this kind of calculation, given that we have the historical data.

Lastly, I'm in total agreement with you that we should follow Truth wherever it leads. So I'm very glad for this discussion.

#### Summary of the issues

Here's a summary of the issues you brought up, and where I stand on them:

I agree that numbers like 1e300, 1e100, or even 1e54 are too large as the final, overall probability values, as they're beyond the limit of how certain we can be. I'm also backing up from my final value of 1e32 as the probability for the resurrection, which I thankfully did not state too forcefully too often.

I still think 1e54 or 1e44 can be used as Bayes' factors, given that they are the factors between two specific hypothesis and not the factors for the whole "true" or "false" hypothesis in aggregate. They should not be thrown out just for being too large.

I still think that 1e8 is a good estimate for the Bayes' factor for a human testimony, and in fact our discussion here has only strengthened my belief. I can perhaps be talked down to 1e7, or 1e6 in set circumstances, but as a rough, order-of-magnitude estimate, 1e8 is a perfectly serviceable value. My confidence in this has actually been strengthened considerably as a result of our exchange.

I agree that the "license plate effect" is real, and it has a number of fascinating and important implications. But it does not really affect my calculated Bayes' factor of 1e54. Instead, it's main function is to increases the Bayes' factor for a human testimony when it applies. This is what allows us to do things like believing that the Gospels, as a whole, are reliable.

Many of the testimonies for the resurrection can be considered independent, until they successfully knock out all "reasonable" priors, leaving only things like conspiracy theories as the leftovers. I agree that at this point, the possibilities for strongly correlated testimonies must be considered.

I agree that the odds for the resurrection is still very high even after considering things like conspiracy theories.

#### Farewell

Thank you again for taking the time to read my series and replying to me, Aron. Your reply was intellectually stimulating, and very useful!

(The series on the resurrection will continue on with this post.)

You may next want to read:
The principle of least awesomeness
The dialogue between two aliens who found a book on Earth

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 27)

(Continued from the previous post)

#### Planning out the stages of the whole argument

So, all that gives us that Bayes' factor of 1e54. Now, as I've said I'm okay with its large magnitude, under the specification that this is for a certain model evaluated under some well-justified degree of independence. Given its large value, Jesus would have clearly risen from the dead under these conditions. I also think it's clear that nearly no amount of partial-dependence hypotheses, like ordinary social pressures, can nullify this value. So I do think that this large Bayes' factor is useful in these broad cases.

But as you've pointed out, we now need to worry about low-probability hypotheses that would wipe out that Bayes' factor by asserting a near-total dependence of the evidence. That is to say, we need to consider hypotheses which are specifically constructed to allow for ignoring the evidence, like conspiracy theories. I had initially planned to simply dismiss things like conspiracy theories at the end, but this conversation with you has convinced me that I need to address them.

So here's my plan. I'll group the spectrum of prior possibilities for the "no resurrection" hypothesis into the following regimes, in the order of decreasing probabilities involved:

1. Largely independent testimonies
2. Partially dependent testimonies
3. Near-total dependency in testimonies (conspiracy theories, other theories specifically designed to allow for ignoring the evidence)
4. Epistemological obliteration (brain in a vat, multiverse, all just a dream, you can never know anything, possibilities you can't even think of, etc.)

I do plan on just dismissing that last one at the end. Possibilities 1 and 2 are taken care of with the enormous Bayes' factor of 1e54. So now, we must start considering possibility 3.

#### Future of the series

I think I can argue convincingly against hypotheses like conspiracy theories in the upcoming posts of my series. In particular, the part of the series on resurrection stories from non-Christian sources are not just a "Christianity is better than these others" compilation. It's a way to double check the Bayes' factor, and to provide protection against hypotheses like a conspiracy theory. If, in fact, a conspiracy (or alien interference, or malicious spirits, or whatever) can produce results like Christianity, then over the course of world history you can expect for it to have done so once before, or at least come somewhat close. But empirically, no such results exist. None even come remotely close. Christianity is a distinct outlier.

You said, in attempting to estimate a Bayes' factor from historical data, that "For the kinds of skeptical reasons I stated above, it would be hard to get this much above 10^11 by itself since then we run out of the ability to check how many potential parallels there are." But you can check not only how many parallels there are, but how close they come. Assuming independence, you can even put precise numbers on the Bayes' factor involved by measuring the degree to which Christianity is an outlier. Even without independence you can definitively say that that the Bayes' factor was at a minimum around 10^11, and likely a good deal larger.

That's the great thing about arguing from empirical, historical records. You can bypass all the calculations about the probability of conspiracies or exactly what kind of dependence the testimonies might have had, or whether it was some other kind of hypothesis that generated this dependence. All of that automatically gets incorporated into the historical data at their actually correct historical values, and all we have to do is to read off the final result.

Anyway, that's my plan for the future of the series. I'll look at the historical comparisons, then use it to argue that the Bayes' factor of 1e54 is about correct assuming mostly independence. Furthermore, the historical comparisons allow us to say that Jesus is very likely to have risen even after all the other hypothesis, such as a conspiracy, are taken into account. That would probably be a good time to re-iterate that the 1e54 was for a specific model, that the final probability value would be not so extreme in reality, but still plenty high.

If you looked at how I write for my blog, I generally make a final compilation post at the end of a long, multi-part series, where I clean things up and maintain it as the final link. There, I'll probably rearrange the material so that the issues you brought up are resolved from the beginning. I do want that final post to be good, and clear of sloppiness and error.

So if you would share your thoughts on the future of the series, I would be grateful.

(To be continued in the next post)

You may next want to read:
Interpreting Genesis 1 by looking through John 1
For Christmas: the Incarnation

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 26)

(Continued from the previous post)

#### The "license plate effect", and its applicability to my calculations

Now, I acknowledge that your "license plate effect" is in fact real - that 1e8 can be split between the "license plate effect" and the remaining "human honesty factor". But for the examples that I provided, I disagree that the split between these factors is as extreme as 1e6 to 1e2. I mean, if you've won the lottery, there generally aren't 1e6 other things that you could have chosen to lie about that would be equally interesting.

But more importantly, the exact split doesn't matter. The examples from which I calculated that 1e8 value were all specifically chosen to be similar to the disciple's testimony for Christ's resurrection. They are all special, interesting, positive claims - indeed in each case they're probably one of the most interesting things that the person could talk about. So, as long as this similarity holds, my full value - both the "license plate effect" and the "human honesty factor" - is applicable to the disciple's testimony. As Bayes' rule says, posterior odds is prior odds times Bayes's factor. And that Bayes' factor doesn't care whether certain parts of it were from certain effects. The entire Bayes' factor applies, as long as it was calculated for a similar situation. This is why I'm not too concerned about, say, your counterexample of "[naively slapping] 8 orders of magnitude on a 1:1 odds proposition": because that does not correspond to the scenario that the disciples faced.

Now, this "license plate effect" and the "human honesty factor" does interact in interesting ways in cases of multiple testimonies. I'm not sure if this is what you were getting at when you said that "if I'm right about the 8 = 6 + 2 split, you can only discount that 6 once". But I believe something like that can happen, depending on the degree of dependence between the multiple testimonies.

This is how it would work out (using your numbers): if Peter testifies that Jesus rose from the dead, then you should give his testimony the full 1e8 = 1e6 * 1e2 Bayes factor - exactly as it worked out in my numerous examples. But if you then turn to John and ask "hey, is Peter telling the truth?" and John answers "yes", John should only get the 1e2, because he has now been put in a different kind of situation than Peter - more akin to answering an 1:1 odds proposition.

But if you then have someone completely random burst into the room afterwards - let's say Paul - who says "guys! Jesus rose from the dead!", then that testimony should again get the full 1e8 Bayes' factor, because it was made independently from the other two.

So this again turns into a question of independence. Fortunately, I do think there is a strong case to be made for independence among the three named witnesses I used - Peter, James, and Paul. I mean, you yourself gave a 1e6 factor to Paul's conversion because it was so unexpected that an enemy of Christianity would have such a drastic turnaround by the encounter with the risen Christ. That's a factor you gave out on top of the fact that people claimed to have seen the risen Christ, as an expression of the strong anti-correlation (beyond mere independence) you'd expect between Paul's testimony and the other's. That was an additional factor on top of what's in my calculations, and it amply covers any possible dependence between Peter and James. As for the remaining testimonies in 1 Corinthians 15, I did severely discount them to account for dependence. So I'm quite comfortable with my 1e54 value, with the aforementioned caveat that we're not yet considering things like conspiracy theories.

#### The main effect of the "license plate effect"

Now, as I said, I disagree with your 1e6:1e2 split for the "license plate effect", and the implication that this effect mainly serves to weaken the witness testimonies. I think that its more important function, by far, is to immensely strengthen the testimonies to which it applies. It works to "cancel out any amount of low prior probability", in your words. Or to empower the testimony with a Bayes' factor of something like 1e120, in my example of recording a chess game.

So I'm very glad you mentioned the effect, because it was somewhat foggy in my own mind, and it allows us to do things like justify the remainder of the stories in the Gospel accounts. These other stories just get filed under "more details" once the resurrection is accepted, whereas there's no way to cover the prior on all those stories on just a Bayes' factor of 1e8.

#### The Bayes' factor for a human testimony

But at the end of this discussion on the strength of a single testimony, I still pretty much stand by my 1e8 number. I think 1e7 is also quite reasonable, and it can drop to something like 1e6 in circumstances where the possibility of lying is distinctly real (e.g. claiming to an aid agency that a loved one died on 9/11, or when calculating how many times you've been lied to). Maybe I could be convinced to use 1e6 as a lower bound, instead of 1e8 as a likely value. In any case it would not really affect my series, as I've said that a human testimony is "within a couple of orders of magnitude of my answer", and even the lower value is large enough to overwhelm any possible prior against the resurrection.

As one more confirmation of that 1e8 number, take a look at this video - it show's a woman's reaction to an acquaintance claiming to have won the lottery. Now, did that woman seem like a gullible idiot to you? I didn't feel that way. She starts off quite skeptical, but not dismissively skeptical. You can then see the man's sincerity working on her. Her degree of belief is clearly somewhat close to even odds right before the numbers are confirmed. I think her overall reaction is pretty rational. Now, there are some small differences between the video and my examples. For instance, she knows that there's a winner out there, and the man making the claim is already an acquaintance - but on the other hand, this result is achieved with little effort on the man's part, taking only minutes of insistence. The man being an acquaintance also reduces the "licenses plate effect". On the whole, you can see her mind being pulled through a Bayes' factor of something like 1e6 within mere minutes, in good accord with rationality, in a situation pretty similar to what I described in my examples. So 1e8 for something like the disciple's testimony about the resurrection is quite reasonable, and remains the best value to use.

(To be continued in the next post)

You may next want to read:
The intellect trap

### Bayesian evaluation for the likelihood of Christ's resurrection (Part 25)

And now, time for a short interlude in this series.

I've been in communication with Aron Wall of Undivided Looking. It's a great blog that people should check out, which covers much of the same subject matters as my blog. I've asked him for feedback on my series, and he graciously replied back with a very lengthy post. There are some thing in my series that he disagreed with, and there are some things in his reply that I disagreed with, and there are things that came up that merited further discussion - so I thought it best that I reply back in a few of my own posts.

#### Greetings

Aron,

Thanks for taking the time to read over my series and post your reply. It's been very helpful. Your reply was in fact a great deal more than what I had anticipated, so I thought it appropriate to respond in a few of my own posts.

#### Things I learned

Your explanation for exceedingly small probabilities being unreasonable, due to the need to take even crackpot theories into account at such extremes, makes sense. I gladly abandon my suggestion, in the comments of your blog, that scientific laws or the existence of the Roman empire can be asserted with 1e100+ odds. I was actually pretty uncomfortable with saying those things, but I couldn't put my finger on why I felt that way at the time - so this is in fact a relief.

In my own series, I have not very much strongly asserted that the probability I obtained (1e32) is definite. So I will take your suggestion that I will not take this value seriously, instead using it to demonstrate that Jesus plainly rose from the dead if we make a set of likely of assumptions. I will handle the other, very unlikely cases separately.

Fortunately, I'm still in the middle of my series, so I can add in things like this without too much trouble. For that reason, I'm glad we're having this discussion now, rather than after the end of the series.

#### Probabilities and Bayes' factors

As for a great deal of your other comments - I think there's some confusion between a probability and a Bayes' factor. Both of us have used words like these somewhat imprecisely, and clarifying our usage will resolve a number of things. So:

That 1e54 factor I cited is a Bayes' factor, not a probability. It is a ratio of in-model probabilities - and each model may just be a simple, computable hypothesis (e.g. "a random shuffle") or a complex aggregate thereof ("all the different possible ways that a deck of cards may be shuffled"). As such, I more or less stand by that number and am not bothered at all by its large magnitude - with the caveat that the value is for a specific model, and not for the resurrection hypothesis in the aggregate. Essentially, I'm saying that I'm ignoring crackpot theories for the time being (conspiracy theories, malicious spirits playing jokes on us, etc.). As a Bayes' factor, that 1e54 should not be thrown out simply because it is too large, although it is unreasonable to use it to calculate a final, aggregated probability. If I understood your point correctly, this should not be a problem.

The five sigma probability of 1e-6, although I called it a probability, should also really be interpreted as a Bayes' factor. The words get in the way here, because in null hypothesis significance testing, that five sigma value is a probability. But in the Bayesian framework it should really be a Bayes' factor between the null hypothesis and the "whatever result you got" hypothesis.

#### 1e8 as the Bayes' factor for a human testimony

Furthermore, the 1e8 factor for the strength of a human testimony is also a Bayes' factor, not a probability. It is NOT saying that people lie only 1 out of 1e8 times, which would be clearly absurd as you pointed out. Rather, it's saying that your prior odds should be adjusted by that much based on a human testimony.

This difference resolves your counterexample of "how many times in my life have I been lied to?" Yes, there has been 1e5 situations where someone was tempted to lie, and you've maybe been lied to around 1e3 times. That works out to lying rate of 1e-2, or a truth to lie posterior odds of 1e2.

But each of these lies probably had a typical prior odds of 1e-3 ("I got into a car accident", "I got locked out of my house", "I went to Harvard", etc). You yourself have said that one in a million events happen all the time, so a prior odds of 1e-3 is not at all out of the ordinary. This is especially the case when someone is making a positive assertion that something happened ("I once bowled a perfect game"), rather than trying to deny something or get out of something ("I have to wash my hair").

So, that's a prior of 1e-3, and a posterior odds of 1e2, which gives us a Bayes' factor of 1e5. The other 3 orders of magnitude can be made up for by adding the "earnest, sincere" condition, and the fact that we're specifically taking about scenarios where people are tempted to lie to you. So, the value of 1e8 as the Bayes' factor for a human testimony is in good accord with your lying example.

That 1e8 number also still bears out through your license plate example. Would you doubt someone who's earnestly, sincerely claiming to have a license plate like 6DVL666? If the Bayes' factor for a human testimony is really about 1e8, you should have some doubt, but not a very strong doubt, that this person was telling the truth - this seems to be about the right level of skepticism required here. I don't think, for example, you'd find 10,000 or even 1000 liars to a single truth-teller in this scenario.

I'm quite sure of this because all the examples I gave in my series - winning the lottery, being struck by lightning, having a PhD in physics from Harvard, having a loved one killed in 9/11, etc. - all involve cases like the 6DVL666 license plate, and not like the 4ZIW623 license plate. They're all special, interesting, positive claims that someone might want to choose to lie about, where all the improbability is in the main claim and not in the details - and I still get values around 1e8. So in every example that I have examined or you have brought up, this value seems to hold up.

(To be continued in the next post)

You may next want to read:
Orthodoxy vs. living out the Gospel: which is more important?
Adam and Eve were historical persons. Who were they? (Part 1)