## Blog pages

### The two envelopes problem and its solution

A job I was looking at had a requirement that read: "Inability to stop thinking about the two envelopes problem unless you’ve truly come to peace with an explanation you can communicate to us". So I thought I'd post my explanation for the problem.

The setup to the problem goes like this:
You have two indistinguishable envelopes in front of you. They both contain money, but one envelope has twice as much money as the other.
You get to choose one of the envelopes to keep. Since the envelopes are indistinguishable, you have 1/2 chance of having chosen the one with more money.
But now, after you've picked an envelope but before your choice becomes finalized, you are given the opportunity to switch to the other envelope. Should you make the switch?
Now, one sensible and easy reply is to say that you shouldn't bother. The envelopes are indistinguishable and you have no idea which one contains more money. Your chances of getting the bigger payout remains 50-50 regardless of your choice.

But now, a wild statistician appears, and makes the following argument:
"Let's say, for the sake of argument, that the envelope you have now contains \$20. Then the other envelope might contain \$40, or \$10. Since these two possibilities are equally likely, your expectation value after switching would be half of their sum (0.5*\$40 + 0.5*\$10), or \$25. That's 25% more than the \$20 you have now.
But if we think about this more, the initial choice of \$20 actually doesn't matter. You can make the same argument for any possible value of money in your envelope. You'll always gain 25% more on average by switching. So, even without knowing the amount of money in your envelope now, you should switch."
Impressed by the wild statistician's use of numbers and such, and figuring that even if he's wrong you would at worst break even, you decide to make the switch. But then, as you're about to finalize your decision and take the new envelope home, the statistician repeats exactly the same argument, word for word. "Let's say, for the sake of argument..." He's now urging you to switch BACK to your original envelope. After all, the two envelopes are indistinguishable. If there is a rational reason to switch the first time, the same reason must equally apply for switching the second time. But at this point, it becomes obvious that if you continued to listened to the wild statistician, you would do nothing but switch the two envelopes for all eternity.

That can't possibly be the right choice. Now, here is the real two envelopes problem: something must be wrong with the wild statistician's argument - but what exactly is the nature of his error?

The solution to the problem goes as follows:

If we start by assuming there's \$20 in your envelope, it is NOT equally likely that the other envelope contains \$40 or \$10. This is where the wild statistician goes wrong. In general, given a value x in your current envelope, it is NOT equally likely for the other envelope to contain 2x or x/2.

Before we get more mathematical, let's examine the problem intuitively, by grounding it in a solid example. Say that you're on a television game show, and you're playing this two envelopes game. You know that American TV game shows typically give prizes from hundreds to tens of thousands of dollars. Now, if the host of the show lets you know that your envelope contains \$50, should you switch? I certainly would. I know that, given the typical payout of TV shows, the two envelopes were more likely set up to contain \$100 and \$50 rather than \$50 and \$25. The two probabilities are NOT EQUAL.

Oh the other hand, imagine that you're a high school statistics student, and your teacher is playing this two envelope game with you for a class lesson. Your envelope contains the same \$50 as in the previous example. Should you make the switch? No way. You seriously think your teacher put \$100 in the other envelope to give to a high school student, for a single lesson? If your teacher has 5 statistics classes, he stands to lose up to \$500 on that one lesson - likely far exceeding his pay for the day. It is much more likely that your teacher chose \$50 and \$25 for the values rather than \$100 and \$50. Again, the two probabilities are NOT EQUAL.

Now, if the two probabilities were equal, then the wild statistician would be right, and you should switch. And you should continue to do so as long as the probabilities remained equal. But the problem described by that situation is not the two envelope problem. It's actually a 50-50 bet where if you win, you double your money, but if you lose, you only lose half your money (compared that to most casino games, where you lose your entire bet). If you find a game like that, you should continue playing it for a very long time.

But for the two envelope problem, the chances of either doubling or halving your money are generally not equal. This will be true for ANY reasonable probability distribution of possible values of money in the envelops. "Reasonable" here means that the probability distribution must sum to one, and that it must have a finite expectation value. Consider any of the following probability distributions (or any other reasonable distribution you wish to think up) for the money in the envelopes:
The orange line the probability distribution for the smaller amount money in one of the envelopes. The green line is the probability distribution for double that value, in the other envelope - it's been stretched horizontally by 2 to represent the doubling, and compressed vertically by 0.5 to keep the probability normalized. You see that the two probabilities are equal (where the lines cross) only for very rare, special amounts of money. In general, if you see a small amount of money in your envelope, you're more likely to have the "smaller" of the two envelopes, and if you see lots of money, you're more likely to have the "greater" of the two. You should be able to understand this intuitively, in conjunction with the game show / statistics teacher examples given above.

Whether you should switch or not depends on the expectation value of the money in the envelopes. If the amount in the "smaller" envelope is A, then the amount in the "greater" envelope would be 2A, and the expectation value for choosing them with 50-50 chance would simply be 3A/2. Since the envelopes are indistinguishable, this is in fact the expectation value of choosing either one, so it doesn't matter which one you choose. This is nothing more than the original, simple argument presented at the very beginning.

However, what if the wild statistician insists on putting the problem in terms of expected gain conditioned on the different possible values of the money in your current envelope? This is how his original flawed argument was framed. It's an overly complicated way of thinking about the problem, but shouldn't we also be able to come to the correct solution this way?

We can. (Beware, calculus ahead) Let:

x = amount of money in your current envelope,
f(x) = probability distribution of the money in the "lesser" envelope, and
g(x) = probability distribution of the money in the "greater" envelope.

Then f(x) can be completely general, but g(x) = 0.5 f(0.5x) due to the stretch/compression transformations. Also, the overall distribution for the amount in your current envelope, given that you chose one of the two envelopes with equal chance, is:

p(x) = 0.5( f(x) + g(x) ).

Then, the expectation value for switching is given by the following integral:

Expectation value for switching = ∫ E(x) p(x) dx

Where E(x) is the expectation value of switching when the money in your current envelope is x. This is given by:

E(x) = x * p("smaller" envelope|x) - 0.5x * p("greater" envelope|x)

That is to say, upon switching, you'll gain x if you currently have the "smaller" envelope, but lose 0.5x if you currently have the "greater" envelope. Furthermore, the p("smaller" envelope|x) and p("greater" envelope|x) values can easily be calculated by the definition of conditional probability as follows,

p("smaller" envelope|x) = 0.5 f(x) / p(x),
p("greater" envelope|x) = 0.5 g(x) / p(x)

noting that the numerator corresponds to getting a specific envelope AND a specific x value.

putting this all together, we get:

Expectation value for switching = ∫ E(x) p(x) dx =

∫ (x * 0.5 f(x)/p(x) - 0.5x * 0.5 g(x)/p(x)) p(x) dx = 0.5 ∫ x * f(x) - 0.5x * g(x) dx =
0.5 ( ∫ x f(x) dx - ∫ 0.5x g(x) dx )

However,

∫ 0.5x g(x) dx = ∫ 0.5x 0.5 f(0.5x) dx = ∫ 0.5x  f(0.5x) 0.5dx = ∫ u f(u) du = ∫ x f(x) dx

Where we used a u-substitution and took advantage of the fact that the integral goes from 0 to infinity in the last two steps. Therefore:

Expectation value for switching = ∫ E(x) p(x) dx = 0.5 ( ∫ x f(x) dx - ∫ x f(x) dx ) = 0.5 * 0 = 0

So there is no expected gain or loss from switching, which is the same conclusion we reached at the very beginning.

You may next want to read:
The intellect trap
Basic Bayesian reasoning: a better way to think (Part 1)
A common mistake in Bayesian reasoning

1. Consider two possible solutions:

1) Assume that your envelope contains \$A, and that the other is equally likely to contain either \$2A, or \$A/2. So the expected value of the other is (\$2A)/2+(\$A/2)/2 = \$5A/4.

2) Assume that the two envelopes combined contain \$T, and it is equally likely that yours contains \$T/3 and the other contains \$2T/3, or yours contains \$2T/3 and the other contains \$T/3. The expected value of yours is (\$T/3)/2+(\$2T/3)/2 = \$T/2, and expected value of the other is (\$2T/3)/2+(\$T/3)/2 = \$T/2.

The math in these two solutions is identical, yet they get different answers. So at least one (to be completely unbiased) must be wrong.

Since the only differences are in the assumptions made to set up the math, one must make a bad assumption. The assumptions about the values (\$A and either \$2A or \$A/2, and split between \$T/3 and \$2T/3) are trivially correct. The only other assumption is "equally likely." It must be wrong in at least one of the solutions.

Your reasons for claiming it is wrong in the first solution sound reasonable, but because they are based on subjectivity, they are not provable. There is an argument that is provable. Money comes in integer multiples of some base unit; call it \$1 here. If A=1, it must be the smaller value; that is, the other envelope contains \$2 with 100% probability. Even if you could arrange distributions where \$2A and \$A/2 are equally likely at other values, this counterexample proves the assumption to be incorrect in general. So solution #1 is wrong.

But the assumption of equal likelihood is trivially correct in solution #2.

1. I'm afraid you're trivializing the problem.

Essentially, your reasoning boils down to "solution 1 could be right, or solution 2 could be right. Since they disagree, and since solution 2 is right, solution 1 must be wrong." But this is not actually addressing the two envelopes problem, it's just restating it. From the beginning, we already know that solution 1 is wrong. We already know that solution 2 is right. The problem starts by saying so. The challenge is to explain why.

I've seen some other people try similar things - for instance, by stating that since the wild statistician's argument could be easily reversed to argue that you shouldn't switch, it leads to a contradiction, where you both should and should not switch - and therefore he must be wrong. Again, this is just restating the problem; we already know that his reasoning leads to a contradiction. The problem is to explain what exactly causes it to be wrong.

The same goes for your argument that you can't have half of one dollar. It attempts to trivialize the problem based on a technicality. It's easy to think of a workaround, where the amount of money is infinitely subdivisible (or close enough to the point where it doesn't matter). Other such trivialization attempts include saying "well, there's no way that two envelopes can be actually completely indistinguishable, especially if they actually contain different things", or "uniform probability over the integers are non-normalizable and therefore impossible". Again, the issue at the heart of the problem is to explain the exact nature of the wild statistician's error, not to catch him on a technicality.

I furthermore don't understand what you mean when you say that my solution is based on subjectivity. My solution is completely general - it works for ABSOLUTELY ANY possible probability distribution of money in the envelope, and I have demonstrated that this is in fact the case. After this identification of the exact nature of the wild statistician's error, and showing why he cannot be right, I furthermore go on to explain how he should handle the problem, if he were to stick to his approach but fix his mistake. That is how you know I've solved the problem - not only did I identify his mistake, but I've understood it so thoroughly that I've also explained how to correct it.

In short, you've done well to understand the problem. You've even gone a little bit further and said that the problem is with the "equally likely" statement. But now, it's time to actually understand the solution.

2. I thought I was pretty clear in saying that because different answers were produced, at least one, and maybe both, solutions are wrong. The point was to stop focusing on what one believes to be right based on superficial thought, and focus instead on what could be wrong based on better scrutiny. Also note that this addresses your thesis, which is not the solution, per se, but the comparison of competing solutions. (Also, i apologize for any bad typing - I had elbow surgery last week and am typing with only my left hand.)

People develop "blind spots" in simple (meaning few possibilities, not trivial) probability problems, that make them overlook the most basic assumptions they make. For example, in the famous Monty Hall Problem, they overlook that the host had to choose a door to open. It is the different set of choices he had, depending on where the prize is, that drive the solution. Initially picking the right door leaves him two choices, while picking a wrong one leaves only one. Once he reveals his choice, the odds are thus (1/2):1 that you are in the first situation.

In this problem, the math in both solutions is inargueably correct, given the cases each reduces the problem to, and the probabilities assigned to those cases.

In both solutions, the cases, while different from each other, are also inargueably correct. Probability is an unusual field in Math, because the problem setup is just as dependent on the rules of the field, as the problem itself. All you need do is partition the problem into a set of non-overlapping, but complete, events. Both solutions clearly do that.

The last step is to assign probabilities to each case. This is where the blind spot comes in. You can't just assert "each is equally likely," you have to prove it. The blind spot is that "I chose the lesser envelope" is not the same thing as "the specific value A is the lesser value." The partition in the latter case does not have to be equally likely.

My point is that my solution 2 is the best way to compare the solution the leads to paradox, to one that doesn't. Not that it is the most rigorously correct. I identified (as you agreed to in your last paaragraph) where the error must be, showed that solution 1 is wrong in the discrete case, and why solution 2 does not make that error. This is enough to discredit solution 1, and to satisfy your original thesis.

You are right, that it fails in the continuous case. (BTW, your approach to this was subjective because it was based on the subjective assessmernt of {\$25,\$50} vs. {\$50,\$100} on a game show, the subjective assessment of what a High School teacher could afford, and what a "reasonable probability distribution" woukld be in the continuous case). Still, it points the way.

Let A and B be random variables representing the values in the two envelopes, before you pick, Each needs a probability density function, f(a) and g(b). But we must assume f(x)=g(x), since the envelopes are indistinguishable.

They aren't independent. We must have f(x)=f(x/2)/2+f(2x)/2. It isn't hard to show to a mathematican that f(x)=constant over [0,inf) is the only solution, and that is not a valid probability function. Hence, solution 1 can never be right.

3. Did you not see that my game show / high school student examples were only a motivating scenario to guide your thinking? Or that I've explicitly generalized that example to cover ANY possible probability distribution? Or that I've directly performed the expectation value integration over the a COMPLETELY GENERAL probability distribution function to fix the mistake and arrive at the correct answer?

It's like you've read a third of the way into the post, stopped, then made your comments. No wonder you prefer your restatement of the problem to my solution.

2. You had simply asserted that "If we start by assuming there's \$20 in your envelope, it is NOT equally likely that the other envelope contains \$40 or \$10." And you never proved it (keep reading). You supported this assertion with subjective claims which, because of the subjectivity, a casual reader might doubt.

My "motivating scenario" was only to prove that it is true for that casual reader. That was all I was trying to do; avoid the mere assertion of this truth, and demonstrate it instead. Not to address what I suspected to be, and turns out to be, an incorrect (again, keep reading) analysis that I admit I had only skimmed, but was willing to let pass.

Yes, incorrect. It may surprise you to know that there are distributions where, if you assume a value x in your envelope, the expected value in the other envelope is greater. That's why I suspected an error. Look it up on Wikipedia. The reason it is not a paradox is because the expected value of x, before you assume its value, is infinite. So the question "Should I switch?" asked before you know x, is comparing two infinite values and cannot be answered. Asked after, it theoretically allows the envelope to contain more money than what exists.

The incorrect part is the "distribution" functions g(x) = 0.5 f(0.5x). Not true with discrete distributions, which really are what you should use. Since g describes the "greater" value, and f the "lesser," g(x)=f(0.5x). With continuous, it is true that the probability DENSITY functions satisfy your equations. But then your analysis is just the continuous version of the discrete analysis in my solution #2.

1. You said:
"It may surprise you to know that there are distributions where, if you assume a value x in your envelope, the expected value in the other envelope is greater."

This does not in fact surprise me, because I've already covered it in the original post, when I said:
"Now, if the host of the show lets you know that your envelope contains \$50, should you switch? I certainly would. I know that, given the typical payout of TV shows, the two envelopes were more likely set up to contain \$100 and \$50 rather than \$50 and \$25."

Furthermore, when you say:
"The reason it is not a paradox is because the expected value of x, before you assume its value, is infinite."

This is also already accounted for in the original post, when I said:
""Reasonable" here means that the probability distribution must sum to one, and that it must have a finite expectation value."

We can address the other flaws in your solution later. But first, you need to actually read and understand what I write. You've clearly not read or not understood the parts of the original post that I cited above. Without this first basic skill, you'll not be able to understand the rest of the arguments necessary for our discussion.

2. Well, it’s obvious you aren’t going to actually read what I say for what is said. Maybe I just didn’t say it completely enough – but that makes things topo long. For example, that distribution I mentioned applies to *ANY* x in the distribution, not a specific x based on the subjective assessment of typical values you describe.

You have clearly not read or understood the main point I’ve been making: that regardless of whether your later analysis is correct, it requires more analysis than can be reasonably provided in the context you yourself provided: a job interview.

Mine can.

3. I've now cited multiple instances of where you completely misunderstood what I've written, whereas everything you've said thus far is merely a poor parroting of my thoughts.

I don't think you're really trying anymore - but if you are, let's try again to focus on the parts of the original post that you still seem to be unable to read.

Do you see where I said, "Before we get more mathematical, let's examine the problem intuitively, by grounding it in a solid example"? My examples were merely the preliminaries to the actual math. The specific values of X provided there do not matter for the actual argument.

Do you see where I said, "This will be true for ANY reasonable probability distribution of possible values of money in the envelops. "Reasonable" here means that the probability distribution must sum to one, and that it must have a finite expectation value", then provided multiple examples of this working out with actual functions? My solution not only works with any value of X, but also with any reasonable possible distribution over X. It is a compete superset of your objection about "a specific x based on the subjective assessment".

Do you see where I said, "We can. (Beware, calculus ahead) Let:
x = amount of money in your current envelope,
f(x) = probability distribution of the money in the "lesser" envelope, and
g(x) = probability distribution of the money in the "greater" envelope", then went on to show how to do the problem correctly starting from the completely general functions f(x) and g(x)? There is no subjectivity here; my solution works for any conceivable distribution of money in the envelopes.

You, on the other hand, have only responded with continued, repetitive misunderstanding and an aping of the charges I've made against you. But let's keep this simple. Have you, or have you not, read and understand the very specific parts of the original post that I've cited above? This is a simple yes/no question.

3. ‘Do you see where I said, "Before we get more mathematical, let's examine the problem intuitively, by grounding it in a solid example"?’

Yes. Did you see where I pointed out that this “solid example” was subjective, meaning that it is not as “solid” as you seem to think? If I’m guilty of anything, it is only that I didn’t make it clear in that reply that I wasn’t talking about your more explicit solution that used calculus. I did try to clarify that in my second reply – did you notice that I said “You are right, that it fails in the continuous case” and identified what I was talking about as “the subjective assessment of {\$25,\$50} vs. {\$50,\$100} on a game show” ?

Did you notice that THE MAIN POINT of my original comment was to provide a solid example in place of the one you gave? And how I have repeatedly pointed out that it was enough to satisfy the original thesis, of communicating to a job interviewer why there is no paradox?

Have you noticed that, since the problem uses currency as a metric, this problem can only use a discrete probability problem? And so that “solid example” I provided is actually mathematical proof that the two probability distributions cannot be equal? Have you noticed that extending the problem to a continuous distribution IS JUST AS “UNREASONABLE” as a discrete one where the expectation is infinite, since the gain can be infinitesimal?

Have you noticed that I pointed out that you can’t treat “the lesser amount” and “the greater amount” as independent random variables? Whether or not the result is the same, AND I FREELY ADMIT that I haven’t examined your analysis in depth (because there is no need to), you do not make this distinction in your math.

Finally, have you noticed that in my original reply I provided an explanation where it does not matter if the distribution is discrete, or continuous; finite, or infinite; reasonable, or unreasonable (which is why I have no desire to study your analysis)? How, if the total amount has the distribution t(x) (discrete or continuous), then the second random variable you need is just the choice? And that it is discrete, it is independent of anything else, and it has the equiprobable values {Low,High}? And so the expected gain by switching is (2x/3-x/3)*P(Low)+(x/3-2x/3)*P(High)=0?

Which result – yours with f(x) and g(x) and a long explanation, or mine with t(x) and a short one, do you think a job interviewer would be more impressed by?

1. (Part 1/2)
You said:
"Did you see where I pointed out that this “solid example” was subjective, meaning that it is not as “solid” as you seem to think?"

That is the point of an example: to be specific, concrete, using particulars. This example then motivates an argument that's completely general, which holds for any reasonable distribution. An example is suppose to be specific, or in your words, "subjective". That's the whole point. It appears that you've confused an example with an argument. You do at least understand that my ARGUMENT is completely general?

You said:
"Did you notice that THE MAIN POINT of my original comment was to provide a solid example in place of the one you gave? And how I have repeatedly pointed out that it was enough to satisfy the original thesis"

Yes, and I've already responded to it, in my first reply to you, when I said:
"I'm afraid you're trivializing the problem.
Essentially, your reasoning boils down to "solution 1 could be right, or solution 2 could be right. Since they disagree, and since solution 2 is right, solution 1 must be wrong." But this is not actually addressing the two envelopes problem, it's just restating it. From the beginning, we already know that solution 1 is wrong. We already know that solution 2 is right. The problem starts by saying so. The challenge is to explain why."

You said:
"Have you noticed that, since the problem uses currency as a metric, this problem can only use a discrete probability problem? And so that “solid example” I provided is actually mathematical proof that the two probability distributions cannot be equal? Have you noticed that extending the problem to a continuous distribution IS JUST AS “UNREASONABLE” as a discrete one where the expectation is infinite, since the gain can be infinitesimal?"

I have not yet touched on this much because there was so much other misunderstandings, but since you think that this is so important, I'll go ahead and address it. There are multiple ways to see that my solution is strictly superior to your discrete distribution solution.

1. As I touched on before, this is hardly important. You can always subdivide money to the point where it doesn't really matter. Financial transactions, for example, are often tracked to the fourth decimal place. If your argument relies on the difference of \$0.00005, it's not a good argument.

2. Since real numbers are a superset of the integers, it is trivial to model a discrete solution using a continuous solution (just use dirac delta functions), whereas there is no unique method for going the other way. The continuous solution includes all possible discrete solutions, whereas your discrete solution falls flat on its face if we just allow for trivial variations on the problem, like "half of the other envelope, rounding down".

3. You have not thought through the implications of your solution at all. Yes, the probability distribution cannot be equal in the discrete case, because, in your words, "If A=1, it must be the smaller value; that is, the other envelope contains \$2 with 100% probability". But actually think about what this means: if you know that the other envelope contains \$2, then what ought you to do? Far from demonstrating the wrongless of the wild statistician, you have actually strengthened his case. Your argument in fact works exactly in the opposite direction of the way you think it does.

You said:
"Have you noticed that I pointed out that you can’t treat “the lesser amount” and “the greater amount” as independent random variables?"

And when did I do this? I in fact explicitly made one depend on the other. Did you read and understand the part where i wrote "Then g(x) = 0.5 f(0.5x) due to the stretch/compression transformations" in the original article?

2. (Part 2/2)
You said:
"Finally, have you noticed that in my original reply I provided an explanation where it does not matter if the distribution is discrete, or continuous; finite, or infinite; reasonable, or unreasonable (which is why I have no desire to study your analysis)? How, if the total amount has the distribution t(x) (discrete or continuous), then the second random variable you need is just the choice?"

This has already been addressed: the reason that your explanation doesn't consider these things is because it trivialized the problem. You're just falling back on the on the fact that "the second random variable you need is just the choice" between two indistinguishable envelopes - something we've already known from the beginning. You're essentially saying, "well, since the envelopes are indistinguishable, it shouldn't matter". You're not providing a solution; you're just restating the problem.

Imagine a software engineering interview, where your interviewer asks, "okay, what actually happens when you search for something on Google?" and you answer, "you get the search results, obviously. What, isn't that a completely correct answer?". Your answer is like that.

You said:
"Which result – yours with f(x) and g(x) and a long explanation, or mine with t(x) and a short one, do you think a job interviewer would be more impressed by?"

The one that actually answers the question.

Lastly, you said:
"I wasn’t talking about your more explicit solution that used calculus", and
"I FREELY ADMIT that I haven’t examined your analysis in depth" because you felt that "there is no need to", and
"I have no desire to study your analysis"

So, you've repeatedly admitted that you haven't actually read and understood my analysis - the very subject of the post that you're replying to. I strongly urge you to do go back and actually read and understand it. It's rude to run your mouth off in reply to a post that you did not read or understand. Once you do understand it, you'll immediately see that it completely subsumes all your lines of thought.

To help you understand this, I have two exercises for you. I think they'll do you good.

1. I mentioned that a continuous distribution can always be used to model a discrete distribution using dirac delta functions. Explicitly outline the procedure. That is, starting with a discrete distribution d(x) where x can only take on discrete values, explain how to obtain a c(x) where x can take continuous values, in such a way that d(x) and c(x) behaves identically in all statistical calculations.

2. I mentioned the case of restricting yourself to only discrete values, and what happens when you find the lowest amount in your envelope (say, \$1). Now, put yourself in the shoes of the wild statistician, and construct his argument for him. What would he say? How could this discretization only strengthen his incorrect argument? And what is the correct reply to him that would actually fix that mistake?

Do these exercises, and you'll be closer to understanding my solution. I may just write the solution to these problems myself in a follow-up post, because I think a lot of people are having the same difficulties you're having. So try to do the exercises before then.

4. In your original reply, you said: “Your explanation … trivialized the problem.” You are right about one thing only here: my approach is more trivial that your. But that could mean that you have over-complicated it, while mine is sufficient. If this is the case – as I keep asserting it is – then there is no need to go through your exercises. But I will go through a simplified version of them. On my terms.

You then added: “Essentially, your reasoning boils down to ‘solution 1 could be right, or solution 2 could be right. Since they disagree, and since solution 2 is right, solution 1 must be wrong.’” I also submit to you that this is all you have done, just with more complexity. AND IT REDUCES TO MINE.

I mentioned several times that you used treat f(x) and g(x) as independent distributions, when they are not. You still haven’t addressed that (you did point out how to transform one to the other, but not why you can integrate over both with the same variable of integration). What you keep skipping over is that this is just a complication of my second approach, which you have not addressed. And it is the exercise I will present, although I have already said as much: The entire second part of your analysis can be replaced with:

Let T be a random variable representing the total amount in the two envelopes. Let f(t) be its distribution function – its range, reasonableness, and whether it continuous or discrete-via-dirac-delta, are completely irrelevant. Let C be a random variable representing your choice – it is discrete, and has the range c={Low,High}. The only assumptions that are needed, is that C is independent of T and Pr(c=Low)=Pr(c=High}=1/2. I hope you will agree with that. Then, if you switch, you gain t/3 if c=Low, and gain –t/3 if C=High. The overly-complex form of the integral you use for the expected gain, but in my system, is:

esp(gain) = integral(t,[gain(c=Low)*Pr(c=Low)+gain(c=High)*Pr(c=High)]*f(t) ).

= integral( t, [(t/3)/2+(-t/3)/2]*f(t) )
= 0.

There is no need to actually integrate this. There is no need to make any assumptions about what the distribution of t is. There is no need to make any assumptions about its range. Or the expected value of t. Or whether it is reasonable, as opposed to a “wild statistician’s” imagination. There is no need to address how to handle possible dependence when you separate the one random variable into two. In fact, there is no need to go through any of the points you keep bringing up.

The expectation is zero. Trivially. Why are you arguing against this?

1. (Part 1 of 2)
You said:
"The expectation is zero. Trivially. Why are you arguing against this?"

I'm not arguing against that. Have you still not understood that the point isn't to know that the wild statistician is wrong, but to figure out exactly what's wrong with his argument? As I said in the original post:
"Now, here is the real two envelopes problem: something must be wrong with the wild statistician's argument - but what exactly is the nature of his error?"

You said:
"But that could mean that you have over-complicated it, while mine is sufficient."

No, it is not. We already know that the wild statistician is wrong. The whole point of the problem is to figure out exactly what's wrong with his argument. Remember, the wild statistician starts by complicating a simple problem. My approach is also complicated, because it actually follows his approach in order to identify his error. Your approach completely ignores the wild statistician's argument, and is therefore overly simplistic - it forgets that the whole point of the problem is to figure out exactly what's wrong with his argument.

You said:
"But I will go through a simplified version of them. On my terms."

This is exactly like how you ignore the wild statistician's argument and give an answer "on [your] terms", and thereby miss the point of the problem, which is to figure out exactly what's wrong with his argument. Do your assignments; they will help you understand where you've made your mistakes, rather than enforcing them through more errors.

You said:
"I also submit to you that this is all you have done, just with more complexity. AND IT REDUCES TO MINE."

No, it does not. My argument actually follows the wild statistician, plumbs the depth of his thoughts, and in there finds the kernel of his error and corrects it, thereby solving the problem. That is the point of the problem: to figure out exactly what's wrong with his argument. Your shallow approach simply ignores this. This shallowness has already lead you to make several mistakes, which you would understand if you actually did the exercises I assigned to you.

You did say that it's the "equally likely" assumption that's incorrect. But you argue about it in a completely harebrained way that actually goes against the point you're trying to make, then rely on a technicality of a discrete distribution for your case. Again, you would understand all this if you had only just done the exercises I assigned you. Given the multiple mistakes you make here, it seems likely that you only identified the "equally likely" assumption as the wrong one due to random chance, or simply by copying what I've already said in the original post. In addition, your integral approach then completely ignores what little bit of correct approach you somehow managed to find here. It goes off on a completely separate calculation, forgetting that the whole point of the problem is to figure out exactly what's wrong with the wild statistician's argument.

My approach, on the other hand, starts with an example that gives an intuitive understanding of the wild statistician's error ("Say that you're on a television game show"). It then gives a qualitative, intuitive answer to correct his mistake. ("In general, if you see a small amount of money in your envelope, you're more likely to have the "smaller" of the two envelopes, and if you see lots of money, you're more likely to have the "greater" of the two"). This then forms the basis of the calculus-based part of my answer.

2. (Part 2 of 2)
The wild statistician bases his argument on what you should do upon seeing \$20 in the envelope. That is, upon a conditional probability. You completely ignore this, forgetting that the whole point of the problem is to figure out exactly what's wrong with his argument. I address it directly, by saying:

"Also, the overall distribution for the amount in your current envelope, given that you chose one of the two envelopes with equal chance, is:

p(x) = 0.5( f(x) + g(x) )."

and

"Furthermore, the p("smaller" envelope|x) and p("greater" envelope|x) values can easily be calculated by the definition of conditional probability as follows:

p("smaller" envelope|x) = f(x)/p(x),
p("greater" envelope|x) = g(x)/p(x)"

So, I actually trace through the wild statistician's argument, explicitly identify the wild statistician's error, and by giving the correct form of this conditional probability, grounded in its very definition, I show that it cannot in general be equal to 0.5. I then go on to show that with this correction, doing the integral yields the answer we knew to be correct all along. You do none of this.

So, to summarize:
Your approach ignores the whole point of the problem, which is to figure out exactly what's wrong with the wild statistician's argument.
My approach directly identifies the wild statistician's mistake, identifies why it's a mistake, corrects it, and shows that it lead to the answer we know to be correct.
Your approach doesn't follow the wild statistician's argument, forgetting that the whole point of the problem is to figure out exactly what's wrong with that argument.
My approach actually follows along with the wild statistician's argument for the sake of identifying his error. That's why it becomes complicated; because the wild statistician starts by complicating a simple problem.
Your integration approach never addresses how the wild statistician made his argument based on a conditional probability of your current envelope containing a certain amount of money, despite the fact that this is where he makes his mistake, and that pointing this out is the whole point of the problem.
My approach identifies this step and provides the correct form of this probability, thus actually answering the question.
Your approach has only lead to wrong conclusions and confused, disjointed thinking: you would know this if you did your homework.
My approach is unified in its approach and has corrected the wild statistician's error, getting us back to the answer we knew was right all along.

In short, I actually solve the problem. You merely restate it and pretend that it's enough. All this is why your approach is utterly inadequate, and incomparable to mine.

Do you understand now?

Imagine a software engineering interview, where your interviewer asks, "okay, what actually happens when you search for something on Google?" and you answer, "you get the search results, obviously. What, isn't that a completely correct answer?". Your answer is like that.

Like that shallow "you get the search results" answer, your approach doesn't actually solve the problem. Remember, the whole point of the problem is to ... well, you get the point.

I hope.

5. "Have you still not understood that the point isn't to know that the wild statistician is wrong, but to figure out exactly what's wrong with his argument?" And have you not figured out that I did just that in my first post? That it isn't necessary to beat that horse until there is nothing left that is recognizable as equine, if all you want to know is that it is dead?

There are two approaches you mention: (A) Knowing THAT the "wild statistician" is wrong, and (B) knowing EXACTLY the error was that made him wrong. But what you try to address is different, (C) how to make his error "exactly correct" and maybe provide a corrected version of his solution. That is not what you claim is the point.

There is a fourth approach, (D) Providing an alternate correct solution; but this does not address this issue you claim is the point. Without (B), all it does is suggest that there are two solutions that seem to be right. This is the paradox we are trying to dispel. With (B), it shows that the error is significant. (B) alone doesn't prove the answer is wrong. Accomplishing (D) with it does.

The "wild statistician" said "Since these two possibilities are equally likely..." This obviously is a statement that he justified to himself only superficially. Showing that it must be wrong, AS I DID even if it was only a corner case of any distribution, is sufficient for approach (B). Yes, I said (B), not (A) as you claim. It is the only possible error, and it is wrong, so (B) is satisfied. Every other part of his solution is 100%, undeniably correct. As is my solution #2 (aimed at approach (D)). So I addressed the point you claim needs to be addressed.

There is no need to address (C). First, we don't have enough information to do it, which is why you limited yourself to a few "reasonable" distributions. But that does not prove that an "unreasonable" one (btw, summing to one is not part of part of being "reasonable," it is the definition of "distribution") can't have the property "these two possibilities are equally likely."

What you call a "qualitative, intuitive answer" is still just a subjective opinion. So even though your answer is right, you failed to find what is "exactly correct" about it. And didn't even prove what was "exactly wrong" with the wild statistician's approach. You pointed out where you thought the error was, and found a " qualitative, intuitive" example of how it could be wrong, but you didn't prove it. I did, very simply. Any way you could put money into these envelopes has a minimum value, and so there is at least one point where the wild statistician must be wrong.

Do I also need to keep pointing out that the rest of your post is aimed at approach (D), and does not (again, IMHO) do it as well as my much simpler approach? And that it makes a claim that is just as superficial as the wild statistician's, when it uses the same variable (x) in two inter-dependent probability distribution functions?

This is why I never studied your approach (D) in depth, and have no intention to. Not because I believe the result is wrong - I don't - but because (1) there is an error in it, even if it is just in the labels you apply, and (2) there is a much simpler way to accomplish the exact same thing.

If you use just one random variable to represent value, there are no issues like the ones I have raised, whether or not they affect the result. And the function you integrate over IS IDENTICALLY ZERO everywhere you integrate it. Which is what I did. I "solved the problem" to use your words. Formally, completely, correctly, and much more simply than your approach. So again I ask, what part of the conclusions in this paragraph do you dispute? How is it inferior, in any way, to yours?

1. I think you're done here. You're at the stage where you ignore my replies and pretend that your claims still have validity. I could go through each of your paragraphs again, citing specific lines from our previous posts again, and point out your errors in them again, but most of what I'd say about them again has already been said in one of my previous replies.

I would actually be willing to go through with all that with you again, but first, you have to demonstrate that this will actually be worth it. So far, you've only demonstrated an obstinate refusal to address any of my points, whereas I've answered your objections by pointing out - quoting, in fact - specific places where you were wrong and I was right.

To demonstrate that you're capable of rational discourse, simply do the exercies I assigned to you before:

1. I mentioned that a continuous distribution can always be used to model a discrete distribution using dirac delta functions. Explicitly outline the procedure. That is, starting with a discrete distribution d(x) where x can only take on discrete values, explain how to obtain a c(x) where x can take continuous values, in such a way that d(x) and c(x) behaves identically in all statistical calculations.

2. I mentioned the case of restricting yourself to only discrete values, and what happens when you find the lowest amount in your envelope (say, \$1). Now, put yourself in the shoes of the wild statistician, and construct his argument for him. What would he say? How could this discretization only strengthen his incorrect argument? And what is the correct reply to him that would actually fix that mistake?

Doing these will demonstrate that you are mentally capable of following my arguments and willing to acknowledge your mistakes.

I'm going add one more problem to your homework, as it's an issue that I see you repeatedly having issues with:

3. Imagine a software engineering interview, where your interviewer asks, "okay, what actually happens when you search for something on Google?" and you answer, "you get the search results, obviously. What, isn't that a completely correct answer?". What is wrong with this answer? What would a completely correct answer look like? (you don't actually have to provide the full answer, which would obviously be too difficult. Just describe what it would cover) And what would a mediocre answer look like?

Do your homework, and I'll gladly go over your previous post again, paragraph by paragraph, and point out where I've answered you before.

6. Part 1

"You're at the stage where you ignore my replies and pretend that your claims still have validity." And from where I sit, that is where you have been in this entire non-discussion.

"I could … point out your errors … again." You don’t seem to understand that you have not pointed out a single error, only where you think my approach is inferior to yours. All you have said is things like "you restated the problem," or "you trivialize the problem based on a technicality," and then deferred any actual commentary to “later.”

I have a master’s degree in applied mathematics, so I do not need you to assign me "homework" to prove that I know as much as you think you do. I have pointed out actual errors in your work – and I admit they are mostly superficial – that shows I know more. Things like "’Reasonable’ here means that the probability distribution must sum to one" when that is the definition of a distribution; and calling f(x) and g(x) distributions for what you imply are two different random variables ("lesser" and "greater") when they are values derived from only random variable that you identify, x=the current value. Then you ignore the fact that what you actually did was the same thing I did – one random variable that defines both "lesser" and "greater" but not "current." You just did it in a far too complicated way.

Maybe, as your homework, you should try to explain why a continuous model of a discrete probability distribution is just that – a model. A way of looking at one thing in the paradigm of another. Then explain what makes them different – start with the probability of any specific value being zero in a true continuous distribution. Or explain how a job interviewer in field of Mathematics might react differently to a response, than one in Computer Science. And then argue for which field this problem is better suited.

Through all of this, it is you who has “demonstrated an obstinate refusal” to address any of my points. You have not discussed any of them, you have summarily dismissed them as unworthy of your consideration. So again, I repeat the salient points:

7. Part 2:

1) All of the math presented here – by me when I presented the two solutions, by the wild statistician, and by you, is essentially correct. Yours makes superficial errors in how it uses terms, but I never said it ended up being wrong.
2) The only significant error is the assumption that, once you call value in your envelope "x," that it is equally likely that the other envelope will have x/2 or 2*x. (See further explanation below.)
3) This is can be trivially demonstrated to be an error, since a discrete distribution must have a minimum value. Realistically, it should have a maximum value also, but that is not provable.
4) Supposing a “wild statistician” who makes the subjective assessment that you are in the middle of this range is irrelevant, because the assumption may be true there. Similarly, finding distributions that you call "reasonable" is irrelevant. For example, if the envelopes could be {\$10,\$20}, {\$20,\$40}, … {\$320,\$640} with equal probability, then when you see anything between \$20 and \$320 it is indeed equally likely that the other envelope contains half, or twice, as much. So if you allow the distribution to be an open question, it is only at the endpoints that the chances must be unequal.
5) A continuous distribution that is not a model of a discrete one (thru dirac-delta functions) is irrelevant to the problem as stated. Using such a model for a discrete case does not change the fact that it is discrete.
6) The correct way to answer the question “should you switch” is to use one random variable to describe what the values are, and another to describe whether you have the high or the low one. You do this, even though you phrase it another way and refuse to admit it.
7) But you severely over-complicate the issue. The distribution of the "value" random variable is completely irrelevant, as is calculus. Even if you allow the distribution to be continuous. What you did was find that the value of the expectation is zero *after* the integration. What I did was find that the value you sum, or integrate, over is zero everywhere. What you did requires deriving some properties of the distribution. Mine does not.

In item #2, the error is the misapplication of the Principle of Indifference. It’s what says that, if I say I roll an N-sided die, you should assume that each side has a 1/N probability. Even if I don’t say so. More formally, if N possibilities exist that are functionally equivalent based on the information you have, you should assign each the same probability. The trick is defining what “functionally equivalent” means. It isn’t true for the “other” value if you assume a value for x. I would have loved to discussed this with you, but you won’t discuss anything with me.

1. I realize now that the assignments I gave you were too much for you; they are beyond your skills and maturity level. For that, I apologize - I should have realized this earlier, but I wanted to give you the benefit of the doubt.

So instead, I will give you another exercise. It'll take several steps to complete, but I promise you that:

1) It'll take only the simplest algebra and the most basic of probability theory to do these problems. I believe that you are in fact mentally capable of solving them. Please don't disappoint me.

2) The enormity of your error will be fully revealed by the end. Your thoughts will be clearly shown to be wrong, just as surely as "0 = 1" is wrong.

3) Completing this series of exercises will answer, or at least lead in the direction of answering, all the other questions we've brought up so far.

So, are you ready? You have nothing to fear as long as you're willing to face your mistakes and grow up. Here's the first part of the exercises:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

8. . I really didn't expect you to address any point I raised, but I had to try. Since you like exercises, I'll leave you with some simple ones:

1) Let X and Y be two continuous, independent random variables on 0<=x<=1, 0<=y<=1. with probability density functions f(x) and g(y). Let h(x,y) be a continuous function that is finite when 0<=x<=1, 0<=y<=1. Write an expression for the expected value of h.

2) How many integrations does it use?

3) How many does your "expectation for switching" use?

4) What does that say about the number of random variables you are using?

9. As a Christmas present, and trying desperately to demonstrate good will, let me show you the correct way to do what you tried to do (and really what you accomplished, but with sloppy technique). But I’m going to leave one part more generic than you did, to isolate the differences between our approaches.

Let X be a random variable that determines the amounts of money in the game. In probability, we usually use upper case, like X, to represent a random variable. It does not have a specific value, it is an the abstract concept representing a quantity that varies randomly. We use the equivalent lower case letter to represent a value of that variable in an instance of the experiment; in this case, x. Whether X is continuous or discrete[see note 1] isn’t particularly relevant to this analysis, even though the properties of the two have significant differences that can be relevant to other analyses.

Then, let the value of the “lesser” envelope be defined to be l(x), and of the “greater” to be g(x).

Similarly, let C be the (discrete) random variable representing your initial choice. I’ll use c=-1 to mean you chose the lesser envelope, and c=1 for the greater.

C’s distribution function is Pr(c=-1) = Pr(c=1) = 1/2. Assume the probability density function for X is f(x).

Using these two random variables, an expression for the change in value when you switch is (c*l(x) – c*g(x)). The expected value needs to “accumulate” all possibilities over these two random variables. The discrete one is a summation and the continuous one is an integration:

E(switch) = ∫ sum[(c*l(x) – c*g(x))*Pr(c)]*f(x)*dx

Probably because you thought l(x) and g(x) represented two different random variables, and not two values derived from single random variable, you reversed the order of the integration and the summation. That let you isolate the "different" random variables in the integration. While not incorrect, you did far more work than was needed – and required an assumption I don’t see the need for (“we used a u-substitution and took advantage of the fact that the integral goes from 0 to infinity”).

What you got, correcting a typo, was ∫ x f(x) dx - ∫ 0.5x g(x) dx. This essentially means the contribution of the “lesser” envelope to the expectation, minus the contribution of the “greater” envelope.

All that is unnecessary if you do the summation first. It evaluates to:

[(-1)*l(x) – (-1)*g(x)]/2 + [(1)*l(x) – (1)*g(x)]/2
= [-l(x)+g(x) + l(x)-g(x)]/2
= 0.

Making the integration, the variable change, and the entire distribution question moot.

BTW, your “x” was the lesser amount, l(x)=x, and g(x)=2x. Again, these are not two different random variables, they are values derived separately from one. My “x” was the total amount, l(x)=x/3, and g(x)=2x/3. You need to re-derive your answer if these functions are not linear; mine works for any set of functions defining what is in the envelopes, and for any distributions.

+++++

 And modeled as continuous using delta functions.
 One of the differences between continuous and discrete random variables, is that the distribution function for a discrete one is the actual probability of each possible value. Conventionally, we use the notation Pr(x=value).
 But with a true continuous random variable, no value has a non-zero probability. Instead, we use what is called a probability density function, typically called f(x). The probability distribution is defined over an interval, not at a value, as Pr(x0<=x<=x1) = ∫ f(x)*dx, for x=x0 to x1.

1. Although your exercise are besides the point, I will do them to demonstrate goodwill, and in the hopes that you will get back on topic.

1. ∫∫ f(x) g(y) h(x,y) dx dy
2. Two
3. One. Obviously, I simplified the double integral down to a single one, because that part is so simple. This is like someone writing that 1 + 1 + 1 = 3, instead of writing 1 + 1 + 1 = (1 + 1) + 1 = 2 + 1 = 3.
4. That I started with two random variables, then simplified the resulting expression down to one.

I would also like to thank you for pointing out the typos. They've been corrected now (I had dropped some dx's in integrals). I do correct my posts when I find mistakes in them, and acknowledge these corrections. If you find any more such mistakes, please bring them to my attention.

Now, back on topic: as to the rest of your post, you are totally wrong whenever you talk about my approach. You've misunderstood it completely. Your entire second post is repetitious and irrelevant. In short, your thoughts are currently a subset of my thoughts.

Again, the most direct way to see this is to do the exercise I assigned you. As I said before, the enormity of your error will be made completely clear, and this result will answer all the other frivolous things you keep bringing up. Again, the first part of the exercise is:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

10. "Obviously, I simplified the double integral down to a single one, because that part is so simple." If it is so simple, then show me the structure you gave for #1 is reduced to what you use, “∫ (x * f(x)/p(x) - 0.5x * g(x)/p(x)) p(x) dx”

The reason you can't, is because the second random variable you use is accumulated by a summation over the initial choice of lesser/greater.

And I have understood every part of your posts - you are just unwilling to address mine, so you ignore my points. As an example, I point to how you ignored the fact that your f(x) and g(x) are probability density functions, not probability distribution functions like you call them. But they are density functions of a transformation of the same random variable, not different ones. This is a basic point, it is a harder one than the trivial one you keep asking.

It seems obvious now that you have no interest in other people’s thoughts or ideas, even if they conflict only mildly with your own.

1. You are, as ever, wrong about most things you say.

We've both been accusing each other of ignoring the other's points. Who that accusation is true for is simply a matter of reading what's been written before. For instance, the very post you're replying to has four of my answers to the problems you issued to me, whereas you did not even address the one very elementary problem I assigned to you. Again, that problem was:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

I still need an answer from you on this problem, which I've been asking for for a long time now.

If you look further back, the record only becomes worse. For instance, you still haven't answered how to turn a discrete model into a continuous model, which is in fact the answer to your new question this time about how to derive the expression I used from a double integral. Now, I understand that it was a mistake to ask that "discrete to continuous" question to you, as it was beyond your ability. That's why you're currently confused about how I derived the expression I used.

But you know what? I'm not concerned about things that are beyond you right now. I just want you to answer a simple question, one that I think you're capable of. Again, that problem is:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

Again, I promise you that answering the sequence of questions that start with that will make it absolutely clear how wrong you are. You will be shown to be surely wrong, just as certain as 0 = 1 is wrong. You will not have to understand anything about all this calculus or dirac functions that seems to perplex you so much. It will be as clear as 0 = 1.

Why are you not answering the question? What are you so afraid of? Do you feel that my question is beneath you, with your oh-so-precious math degree? I assure you that it's not. When you make a mistake, you must go and fix that mistake, regardless how elementary that mistake is. This is as true for a preschooler as it is for Albert Einstein. If your mistake consists in thinking that 0 = 1, then you must go back and re-learn the difference between those two numbers.

Are you afraid that you'll turn out to have been wrong, or that you'll have made a complete fool of yourself? Unfortunately, nothing that either of us can do now can change that. The only thing that you can do now to move ahead is to recognize your mistake and learn from it. That is why answering the question is so important.

Are you afraid that I'll be mean to you, or that I'll belittle you or humiliate you if you acknowledge your error? I promise you that I won't. You will have my genuine praise and admiration when you do. I know that that's not a easy thing to do for someone like yourself. All the reprobation you're experiencing from me is due to your insistence in continuing in your error, which I feel I must rightly condemn. It will vanish as soon as you turn back.

Again, the problem is:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

11. “You are, as ever, wrong about most things you say.” I have not been wrong about anything – you just define your own opinion as right, and either ignore the facts I present, or treat them as opinions which must be wrong. From the top:

“I'm afraid you're trivializing the problem.” It is a trivial problem, but people develop “blind spots” about simple (few cases, not necessarily easy) probability problems that conflict with their intuition.

“This is not actually addressing the two envelopes problem, it's just restating it.” (1) I didn’t “state” anything, I summarized two conflicting solutions. Even if I did, it wasn’t a “restatement” because you didn’t state it, you linked to an article that includes many different problems:

1) Before I open an envelope, should I switch?
a. Why do people get different answers?
b. What is correct?
c. Why is the other one incorrect?
2) After I open an envelope, so I know one value, should I switch?
a. Why do people get different answers?
b. What is correct?
c. Why is the other one incorrect?
3) What is the impact of making it a (true) continuous distribution?

Your only statement was “come to peace with an explanation you can communicate to us.” The implication of your analysis was that you wanted to “come to peace” with 1c, arriving at 1b along the way. I did exactly that – it is you who were not “at peace” with how I did so. There was nothing wrong with how I did it, you just didn’t like it. Any interviewer worth hiring me would have recognized it.

Yes, it was a fairly trivial error – but it was an error, and a provable one. A fact you have not acknowledged. You tried to accomplish the same thing, but did so subjectively only. You merely asserted “it is NOT equally likely that the other envelope contains \$40 or \$10.” At first, you addressed the question for specific values of X (addressing problem 2, not 1). Then with specific distributions. You did say “the chances of either doubling or halving your money are generally not equal,” BUT THIS AN ASSERTION, NOT A PROOF THAT THEY ARE NOT EQUAL. I proved that there must be example where they are not equal, so the assertion must be true. AND I DID IT MUCH MORE SIMPLY THAN YOU DID.

You did point out that if “the amount of money is infinitely subdivisible” my proof doesn’t work. I realize that, and said as much. But in an interview, if this point gets raised, I can show the equivalent argument for a continuous distribution. You just dismissed my correct argument, and implied there could not an equivalent argument for a continuous distribution. This is what I mean by ignoring my arguments.

And I frankly find your “homework” insulting, since I have shown you over and over that I know quite a lot about probability. More than you do, it seems, so you have no right to test me. In my opinion, you are just using it as an excuse to ignore any point I make.

I skipped your correct answers to my questions, because I wanted to address only why one was wrong. “The answer to your new question this time about how to derive the expression I used from a double integral.” You don’t seem to understand what a double integral is, or what a problem in two random variables is. Even after I explained it once. So here is a lesson. A random variable is a measure of a quantity that can vary unpredictably. If you have two, you need to use a joint density function f(x,y). If the RV’s are independent, you can separate them into f(x,y)=g(x)*h(y). Either way, to get the distribution function you must integrate over BOTH variables.

This isn’t what you did. You did have two random variables, but they are not the ones you claimed. You used one variable for both values, and one to indicate which value you have. So in “p(x) = 0.5( f(x) + g(x) )”, the “0.5” is the probability term for the second variable, and the two terms represent the summation over that variable. Your f(x) and g(x) are just transformations of the only distribution for the first. What you called “separating the double integral” was merely applying the property (A+B)=(A)+(B).

1. Note 1 from previous post:

Let F(x) be is the distribution function (integral from 0 to x of the density function f(x) dx) of the value in your chosen envelope.

Define A(i) = F((2^(i+1))*x0)-F((2^i)*x0), for any x0 and i>=0. That is, the probability of being in a range that is as "wide" as its lower bound.

Choose any x0 where A(0) is non-zero.

The equal probability assumption means that A(i) = A(i-1)/2 + A(i+1)/2 for i>0.

If that is true, then A(i) = A(0)/(i+1) + i*A(i+1)/(i+1). (Proof by induction not included).

This makes F(inf)-F(x0) = A(0)+A(1)+A(3)+... > A(0)*(1+1/2+1/3+…).

Since the summation diverges, F can’t be a proper distribution function. Proof by contradiction.

+++++

I use these explanations in lieu of your insulting questions, to show that I do indeed understand quite a lot about probability.

2. (Part 1/2)

So, let's get the peripheral things out of the way first.

1. Your summaries are boring, useless and irrelevant. It's all been said before. You think you understand my solution when you don't, and you say that you do. This is all very repetitive at this point; the only thing to do now is to demonstrate, with a clear, simple, incontrovertible example, that you in fact did not understand my solution. That is what I've been trying to walk you though with my exercise, and what you've been desperately running away from this whole time.

2. Your new solution in your second post is really not bad at all. I'm actually quite proud of you. If you had presented this solution in your first post, I would say that it's a different approach to the problem that at least gives some good insights. If I were an interviewer, I would certainly give you a passing grade for that answer. I mean, I can nitpick it quite a bit, but the fundamental idea is good. (Don't you hate those people that nitpick the minor details, then misunderstand and dismiss the whole thing?) It does many, many things right over your original answer:

A. Like my solution, it recognizes that using a continuous distribution doesn't automatically make an answer "bad" or "unreasonable". It may even demonstrate an appreciation for the fact that an argument over a continuous distribution is strictly superior to an argument over a discrete one.

B. Like my solution, it actually follows the wild statistician and addresses his arguments (at least for a little bit) by framing the argument in terms of the amount of money in the envelope, instead of bringing in an out-of-nowhere variable t for the total amount of money in both envelopes. So, it gets closer to actually addressing what was wrong with the wild statistician's argument.

C. Like my solution, it's quite general, and it actually works. Unlike your original answer, it doesn't make the horrendous mistake of trying to hinge the main argument on a technicality of a discrete distribution, that ended up actually strengthening the wild statistician's argument. This solution actually proves the wild statistician wrong.

So, after going back and forth in over a dozen posts with me, you've finally produced something worthy, a solution that finally has some of the desirable qualities of my solution. I do wish you could have done it sooner, but still, seriously, good job.

3. I would say that your aforementioned solution takes care of at least some of the concepts in the set of 3 exercises earlier (discrete to continuous, why the "minimum value" argument was wrong, and what it means to give a nontrivial answer). Again, good job. However, your repeated confusion about how I got my expressions from a double integral tells me that you really don't know anything about dirac delta functions. This is, again, not the most important point right now, so I'd like to shelve it for the moment, but if this is really the one thing that really bothers you and keeps you from answering my questions, I can explain it to you. I'll have to explain it for the readers anyway. You just have to promise that you'll actually answer my questions afterwards.

3. (Part 2/2)

Okay, now that all that is out of the way, on to the continuing main point:

I've set a problem before you for many posts now. It is this:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

This is an important problem, because it will lead to demonstrating that you have fundamentally, irrevocably misunderstood my solution. You say that I've made some mistakes? Well, even if you're right, by your own admission they're minor, and doesn't really matter much. But my claim is that you have utterly, monstrously misunderstood my solution, and I'm trying to show that to you. This is why this is more important than pretty much anything you have to say.

You say that you find it insulting that I ask you these questions, when you have your precious degree and know oh so much about probability? Well, you shouldn't. Like I said before, you must go back and fix your mistake, at the level of that mistake, regardless of who you are and what you know. If your mistake is thinking that 0 = 1, then you must relearn what 0 and 1 are. And if your degree prevents you from doing that, your degree is not worth the paper it's printed on. If your knowledge prevents you from doing that, your knowledge is worse than worthless. You're better off as a ignorant high-school dropout.

That is why you must answer these questions. But, I understand that you feel insulted. I know that people like you can feel that way when you feel so sure that you're right and someone else says that you've made a mistake. I sympathize. So, let's try something a little bit different: if you feel that the questions are insulting, how about if I answer them instead? If I stoop down to your level and answer the questions that you find insulting instead of you, then you should be okay with that, right? After all, I would be the one answering the "insulting" questions. And then, because I still need participation from you, YOU get to tell me if I'm right or not!

So, shall we give this a try?

The problem is:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

f(x) = 1

Tell me, am I right?

12. (1 of 2)
1. “Your summaries are boring, useless and irrelevant.” My summaries carefully outline what is right about the two conflicting solutions, and prove the one and only part that is wrong about the incorrect one, thereby satisfying the requirement. Ad hominem attacks aside, they are simpler, and at the same time more rigorous, than yours. And yes, I do understand your argument – I just think mine supersedes it based on its simplicity. If you think there is a point I don’t understand, point it out and I will explain what is right, or wrong, about it.

2. “I'm actually quite proud of you. If you had presented this solution in your first post, I would say that it's a different approach to the problem that at least gives some good insights.” Condescension aside, it is a verbose treatment that shouldn’t be necessary. I included it just as a demonstration for you, since you think continuous distributions require a different approach. It really is necessary only if there is a chance of \$0 (discrete case) or an arbitrarily small amount (continuous case) in both envelopes. If there is a minimum value Xmin>0, then my original argument extends to F(2*Xmin)-F(Xmin), since it is greater than zero but F(Xmin)-F(Xmin/2)=0. I just didn’t want you to dismiss this trivial, and equivalent, statement like you did my original one.

Had you responded with this level of acceptance to my original problem – that is, not calling my solution a technicality when (A) it was a correct proof, (B) your objection was a technicality, that (C) is easily and trivially addressed, we would not be in this position now.

“Don't you hate those people that nitpick the minor details, then misunderstand and dismiss the whole thing?” Yep. That’s what riled me.

1. 3. “However, your repeated confusion about how I got my expressions from a double integral tells me that you really don't know anything about dirac delta functions.” Dirac delta functions are irrelevant since a discrete problem can be handled with discrete arguments. Specifically, the fact that there must be a minimum, as in my original post. This is not a mere technicality; it is proof positive that the assumption of equal probability cannot be true. Something you continue to ignore, implying denial of the fact.

“However, your repeated confusion about how I got my expressions from a double integral tells me that you really don't know anything about dirac delta functions.” And when you repeatedly fail to recognize that a double integral is one where you integrate over two variables, that is ∫∫ f(x,y) dx dy? And continue to resort to completely irrelevant arguments about direc delta functions (that you do not use), it tells me that either you don’t understand the difference, or are still ignoring my points. You feel that you must be right. Since I disagree, I must be wrong.

AGAIN: what you did was use two random variables, but they what you imply. One was discrete, and one was continuous. (Which I have acknowledged can model a discrete one through dirac delta functions. This admission is the only thing I need to say about dirac delta functions. So, that’s all I said about them.) The discrete random variable is C={lesser, greater} with probability distribution {0.5,0.5}. You accumulate its probability by a summation, not an integration. It is the expectation OVER C of the gain: gain(C=lesser)/2+gain(C=greater)/2. You then perform a SINGLE INTEGRATION over the continuous random variable where I don’t think you even understand what it is that you did.

Here is what you did: your single continuous random variable is X=the value in your envelope. By calling it different RVs, you did it incorrectly, but with the right result in the end. (Note that RVs are conventionally denoted with upper case; then, a simple variable, often an integration variable, represents the possible values and is denoted with lower case. The fact that you do not recognize this is about as damaging as not recognizing that the uniform continuous density function over [x0,x1) is f(x)=1/(x1-x0).) You then derived a value for the functions gain(X=x|C=lesser) and gain(X=x|C=greater). But you keep calling the separation of two terms in a single integration a “double integration,” when it is not. Since your two integrations turn out to be equal, the expectation is zero.

The reason this is a bad analysis, is because the integration step is superfluous, as is any need to identify whether the probability distribution is discrete, continuous, or discrete-modeled-as-continuous. Because the summation I listed two paragraphs ago is identically zero, regardless of x.

13. A re-presentation of my answer to the interviewer:

Hi, I’m Monty Hall, and I’m holding two heavy boxes. A random (but non-zero – that wouldn’t be any fun!) amount of molten gold (\$1,000 per ounce) was poured into one box, and exactly twice as much was poured into the second. Then, the two were topped up to exactly one pound with worthless sand. You get to pick one box; but after you finalize your pick, I have three people here with some advice for you.

Contestant: I’ll pick box A.

Wild statistician: Let’s say, for the sake of argument, that Box A has 1 ounce of gold. Then the other box might contain 2 ounces, or half an ounce. Since these two possibilities are equally likely, your expectation value after switching would be half of their sum (0.5*\$2,000 + 0.5*\$500), or \$1,250. That's 25% more than the \$1,00 we assumed you have now.

You: But if I switch to Box B, whatever amount we assume is in there, the same argument says Box A has 25% more. And then again, and again. Pretty soon that is more than one pound of gold.

But I see your error – you assumed, without proof, that Box B was equally likely to contain 2 ounces, or half an ounce. That can be true for some amounts of gold in the boxes, but not all. For example, if Box A contains more than 8 ounces, then Box B can’t contain twice as much. Similarly, since “but non-zero” means there is a minimum amount Gmin, then if Box A has between Gmin and 2*Gmin, Box B can’t have half as much. These arguments wouldn’t apply if it was possible there was no gold in either box (that wouldn’t be any fun!) and there was no upper bound (get serious), but I’m sure the argument could be extended to those cases anyway.

JeffJo: Instead, assume both boxes together contain T ounces of gold. And while it may seem reasonable to do so, I don’t need to place any restrictions on T. Box A is either the lesser box with T/3 ounces, or the greater one with 2T/3. Since these two possibilities are equally likely, your current expectation value is half of their sum (0.5*T/3+0.5*2*T/3) = T/6+T/3 = T/2. If you switch, you technically should switch the order of this summation, but addition is commutative so the result is the same.

Contestant: But didn’t you make the same assumption?

JeffJo: No, WS’s assumption was tied to specific values – mine was only about whether you had the lesser, or greater, value. WS's specific error was a mis-application of the Principle of Indifference. It says that options which are "indifferent" are equally likely. "Lesser" and "greater" are indifferent, but "1/2 ounce" and "2 ounces" may not be, as you pointed out.

NaClhv: But you need to consider the greater and lesser values to be different random variables, symbolically derive separate distributions for them, and integrate over their entire ranges to find the expected value of switching!

JeffJo: No, you really do not. The expectation value when you switch is 0.5*(2T/3-T/3) + 0.5*(T/3-2T/3). This is true no matter what T can be, how it is distributed, if it is continuous or discrete, or whether you know the value or are just treating it as a simple variable. And it is zero, no matter what T is.

1. (Part 1/2)

Again, getting the peripheral issues out of the way first:

"it is a verbose treatment that shouldn’t be necessary. I included it just as a demonstration for you, since you think continuous distributions require a different approach."

No, your second solution is really fundamentally, deeply better than your first, deeply flawed approach, for the reasons I enumerated earlier. I agree that it's needlessly verbose, but like I said, I don't want to nitpick. The fundamental idea behind it is a vast improvement. Don't you see that you've achieved a deeper level of insight into the structure of the problem? That you've actually (at least in part) addressed the wild statistician's flaw instead of actually arguing for him by fueling the reason to switch with your initial "if you have \$1, the other one obviously must contain \$2" argument?

I really hope you can see this - I'd hate to think that the only worthy thing you've done this whole time turned out to be just a broken clock being right twice a day.

As for "since you think continuous distributions require a different approach". Your new approach really is different. It provides an argument that works based on the structure of the entire distribution function instead of giving a single (misguided) counterexample. And on the issue of discrete vs. continuous as a whole, you got it exactly backwards. You see, you were the one that said that "extending the problem to a continuous distribution IS [...] “UNREASONABLE”", and I was the one arguing that "[t]he continuous solution includes all possible discrete solutions". And now you'll say that you understood me and that I'm wrong and I'll try to point out to you where we said each of these things before and you'll ignore them and ... You see? This is why this whole debate is basically going nowhere.

On all that stuff about dirac delta functions: if you promise that you'll actually address my main point immediately after I demonstrate how to get my answer from a double integral, I'll go ahead and do it. But I think your confusion here is intricately tied to you misunderstanding my solution, so it's better to address that first. Otherwise, you're likely to just get things twisted around again, and you already owe me several answers.

2. (Part 2/2)

Now, on to the main point:

You've disastrously misunderstood my solution, from the very beginning. You're utterly wrong whenever you write about your understanding of it. You only dig yourself deeper into the hole every time you try to talk about my solution - as in the latest example of your "re-presentation of [your] answer to the interviewer"

And I can demonstrate this, using only simple, basic algebra and fundamental probability theory. I've tried to lead you step by step through this demonstration. This was the one single thing I've been asking of you for many, many posts, always emphasizing how important it was. But you've always ignored it. I've now addressed numerous peripheral issues you've brought up, and yet you've only ever run away from the one thing that I've asked you to do.

You said that "If you think there is a point I don’t understand, point it out and I will explain what is right, or wrong, about it". I'm sorry, but this will not work. Throughout this whole discussion you've shown an extraordinary predilection for evading, twisting, ignoring, and misunderstanding my points. Your repeated past offenses has put it beyond your right to ask this of me. That is why we're going to do this step by step, using only the most basic algebra and probability theory, with confirmation from both of us at each step, with no room for misunderstanding, confusion, or escape. You have nothing to fear from this if you feel that you haven't made a mistake, or if you're willing to face your mistakes honestly.

Your answer to my question has long been overdue. But given that it's already overdue, now is the perfect time to start on it. You've finally, finally given a solution that, while not attaining to full marks, is at least passing. I'm willing to overlook for now that you seem not to understand why this solution is good while your first one was not - that's no small matter, but frankly, the main point of you misunderstanding my solution is more important. Given that I'm willing to overlook this matter, there are now no other outstanding issues you've brought up.

So, here is the state of this main point right now:

The problem is:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

f(x) = 1

Tell me, am I right?

You already know that "the uniform continuous density function over [x0,x1) is f(x)=1/(x1-x0)", so all you have to do is just apply that formula, then acknowledge your agreement with me.

14. “You said, about your second, new solution: "it is a verbose treatment that shouldn’t be necessary. I included it just as a demonstration for you, since you think continuous distributions require a different approach." No, your second solution is really fundamentally, deeply better than your first, deeply flawed approach, for the reasons I enumerated earlier.”

It is a verbose treatment that I included because I thought it would appeal to you, based on your proclivity to continuous distributions. A much better one, based on my original (which is rigorous and sufficient to the task” is (and this can be extended to continuous based on the hint I’ll drop; discrete is easier):

1) In order to have x dollars (continuous case: X s.t. x0<=x<2x0) the pair of values must be (low,high) = (x/2,x) or (x,2x). (Continuous case: F(x0)-F(x0/2) = F(2x0)-F(x0).)
2) The assumption of equal probability, based on x dollars in chosen envelope, is true if and only if Pr(x/2,x) = Pr(x,2x) for every x in the distributions range.
3) This is a contradiction for several reasons. First, it implies that if x is in the range, x/2 and 2x must also be, which is COMPLETELY unreasonable because it implies the values can be arbitrarily close to zero and arbitrarily large. If you think you can allow that, you can’t create a non-zero distribution for either the discrete, or continuous, cases. And please note that I said distribution, not density.

“Don't you see that you've achieved a deeper level of insight into the structure of the problem?” No. The only insight necessary is that your Wild Statistician applied the Principle of Indifference without checking to see if its prerequisites applied. Since my very simple demonstration shows that they cannot, his solution is trivially invalid.

“That you've actually (at least in part) addressed the wild statistician's flaw instead of actually arguing for him by fueling the reason to switch with your initial "if you have \$1, the other one obviously must contain \$2" argument?” ????? And you completely ignore the fact that the point was to find a flaw in his argument, and show that there was not a flaw in the argument that says switching can’t matter.

Yes, in that case you should switch, but that was not full argument. There is a corresponding problem at the maximum end – it’s just harder to convince nitpickers that there must be a maximum

“… an argument that works based on the structure of the entire distribution function instead of giving a single (misguided) counterexample.” Again, the examples was there to find an error in one argument. You just won’t admit that proving the W.S. is wrong is all that is necessary. So yupou focus on this technicality, which is easily remedied.

“On all that stuff about dirac delta functions: if you promise that you'll actually address my main point immediately after I demonstrate how to get my answer from a double integral, I'll go ahead and do it.” Why don’t you try addressing my description of what you did first? Then you’ll see that you have no double integral over the two random variables you claimed. And that if you actually show a valid modeling of a discrete distribution, its random variable will be the one that I said you treated as discrete.

“But I think your confusion here is intricately tied to you misunderstanding my solution,…”

I completely understand what you wrote. If there is any misunderstanding, you don’t understand what you wrote. But you are still ignoring the fact that it is way too complicated. Using a better choice of Random Variables, the function you integrate is identically zero. This renders any understanding of what you did that is different than what you wrote irrelevant.

15. The following, except as noted, applies to any variation of the Two Envelope Problem before you look an envelope. This includes the discrete and continuous cases (and discrete-modeled-as-continuous), cases with minimum and maximum values, and cases where the values can be arbitrarily large or small.

1) Let [x0,x1) be a range of values with a non-zero probability Q, and x00.
4) The Wild Statistician’s assumption is proven impossible, by contradiction.
5) In the classic Two Envelope problem, currency is placed in the envelopes so the distribution is discrete, and has a minimum. Being discrete isn’t significant, but the minimum is: With it, (3a) is sufficient to demonstrate the contradiction that leads to the conclusion in (4).
6) My argument about my original “Solution 1” is sufficient to prove that WS’s solution isn’t correct.
a. It didn’t address whether his conclusion was right or wrong, just the argument.
b. If you also accept a maximum, the trivial parallel proves his conclusion is wrong.
c. If you don’t accept either the minimum or maximum, the simple extension of the argument in (3) is required to prove him wrong.
d. Being discrete, continuous, or discrete-modeled-as-continuous is irrelevant.
7) Nachlv said “The chances of either doubling or halving your money are generally not equal. This will be true for ANY reasonable probability distribution of possible values of money in the envelops.” Nachlv did not prove this must be true, Nachlv merely showed examples based on assuming various distributions (or values) first, and then showing it was not true for them. So as far as Nachlv showed, WS could be correct.
a. I proved it, well enough for the first response to an interviewer.
b. If the interviewer raises technicalities like Nachlv did, there are easy replies.
8) “However, what if the wild statistician insists on putting the problem in terms of expected gain conditioned on the different possible values of the money in your current envelope?” I addressed this completely.
a. Let T be the total amount.
b. The lesser amount is T/3, and the greater amount is 2T/3.
c. The expected gain when you switch is (T/3)*Pr(c=low)+(-T/3)*Pr(c=High) = 0.
d. This is true for any distribution of T, making any argument about distributions completely irrelevant. Again.
e. This is true for any value of T, before you integrate over it and its distribution. So even if you look in an envelope, and deduce something about T, the expectation is still zero.
9) Nachlv presented a 300+ word argument that showed that the expectation over all values of X, a random variable related to T, was zero.
a. It used incorrect terminology and sloppy math, but is correct in the end.
b. It did NOT show that the expectation for any value of X was zero.
c. Whether or not you accept it as “correct,” it is inferior to the argument in (8) which is shorter, simpler and more general.

Which points do you disagree with, and why?

1. Well, that was a disappointing reply. You still ignored the one single question that I've been asking you to look at for many, many posts, and put it beyond any reasonable doubt that the new solution you brought up - the one worthy thing that you did this whole time - was just a broken clock being right twice a day.

In fact, your reply exemplifies exactly what I talked about in my last post: your singular bend, from the very beginning of this discussion, towards evading, twisting, ignoring, and misunderstanding my points. You've ignored the one single question that I've asked you to look at - the one that I said was important above all your drivel - and gone on to only multiply your errors.

That question is:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

I even made it as easy for you as possible, by providing the answer and only asking you to confirm it. I said:

f(x) = 1

Tell me, am I right?

This question is so important because it begins a series of simple, straightforward, and uncontroversial questions that clearly demonstrate that you are wrong. You are wrong as surely and as clearly as the statement 0 = 1 is wrong. And all your bluster will come to nothing, being shown for the worthless drivel that it is. That is why this question is so important - the only thing that matters right now.

Look, I've engaged with you in many ways throughout this discussion. I've pointed out the many mistakes you've made, specifically quoting my correct approaches in my original article. I've done the problems you've asked me to do. I even corrected some typos in my original article based on your feedback, and when you changed your argument to something better, I appropriately complemented you on those signs of progress - although it's clear now this progress was only accidental. And yet, you will not engage me on this one, single question.

Remember when we were accusing each other of ignoring one another's points? I guess that's cleared up now.

There is nothing to be afraid of. Just answer the question. I know you can do it. Neither of us can change the fact that you're wrong, or that you've made a fool of yourself, or that you're unworthy your supposed degree or knowledge, or that you've only dug the hole deeper with the recent posts. But the one thing you can do - to move forward by recognizing and admitting your mistake - lies through answering that question. Will you continue on refusing to answer it? Will you engage with me at all?

16. “You still ignored the one single question that I've been asking you to look at for many, many posts.” I have answered your trivial and insulting question, in a more general form than you asked it. You just missed it- probably because you did not read it. I have also demonstrated far better knowledge than it requires.

“I've pointed out the many mistakes you've made.” You have not pointed out a single one. Saying things like “this trivializes the problem” is an unsupported statement of opinion. If you think otherwise, please quote what I said that is in error, and explicitly point out the error. I even tried to make it easy for you by itemizing the points, but you use your insulting test as a way to avoid identification of the errors you claim exist.

+++++

When you said "f(x) = probability distribution of the money in the "lesser" envelope," you use f(x) as a density function. Not a distribution.

When you said "g(x) = probability distribution of the money in the "greater" envelope," and how you use it later, you are using one random variable but giving it two density functions. What you are trying to do, as I have said repeatedly, is represent two functions of a single random variable. That requires just the one random variable at this point.

The second random variable you use is the choice. It is discrete, with two possible values {low,high} and distribution (0.5, 0.5). You do not represent it as a random variable, tho. What you call p(x) is really the joint distribution-and-density function of your two random variables.

As I keep saying, the result is ends up correct, but is an overly-complex way of accomplishing what you want. There is a much simpler way, but you use excuses to avoid addressing it.

Now, what do you disagree with in my clearly stated, and clearly supported list?

17. Egad, maybe it wasn't as clear as I thought - because two steps got deleted, and one corrupted somehow. They were there, but I didn't keep a copy.

1) Let x0<x1<2x0, and [x0,x1) be a range of values with a non-zero probability Q= Pr(x0<=x<x1).
2) WS's assumption is that Pr(x0/2<=x<x0) = Q = Pr(2x0<=x<2x1). This can be extended endlessly to Pr(x0*2^n<=x<=x1*2^n) for any n, positive or negative.
a. If the range of possible x's has a minimum, this requires non-zero probabilities below that minimum.
b. If the range of possible x's has a maximum, this requires non-zero probabilities above that minimum.
c. If the range goes from 0 to infinity, the Q must be zero.
3) All three cases violate the assumptions.

1. I've found a minor error in the original post that's unrelated to our ongoing discussion. I've corrected it.

Originally, my calculations for the conditional probabilities read:

"""
p("smaller" envelope|x) = f(x)/p(x),
p("greater" envelope|x) = g(x)/p(x)
"""

Here, I accidentally dropped the prior probabilities for each envelope being chosen before taking x in to account. So each of those should actually have a factor of 0.5 attached to it. Those lines have now been corrected to:

"""
p("smaller" envelope|x) = 0.5 f(x) / p(x),
p("greater" envelope|x) = 0.5 g(x) / p(x)

noting that the numerator corresponds to getting a specific envelope AND a specific x value.
"""

I then carry those factors of 0.5 through the rest of the calculations, but they don't do anything - they just get factored to the outside of everything and get multiplied by zero at the very last step.

It's important to note that this has no bearing on our ongoing discussion. The mistake amounts to something hardly more than a typo. I still stand by everything I've said in the comments. I don't think it should change anything you've said, either.

But, since this does involve changing the text that we've been talking about (albeit in an insignificant way), I felt that I needed to point out that I've changed it, apologize for any inconvenience this may cause (which I believe to be "no inconvenience" - but let me know), give you a chance to re-evaluate the text, and revise anything you've said in the comments thus far with no negative consequences.

You may also be interested to know that I found this mistake while doing the problem you assigned me, of deriving the expression I use starting from a double integral, using dirac delta functions. So, a genuine "thank you" for that - You helped me catch this unrelated error. And, as I said before, I'll be glad to share that completed derivation once the question that I've been asking you gets answered.

I strongly urge you to use this opportunity to take a new look at my argument and really understand it. I hope that the fact that I found this mistake, while doing the exercise you asked of me, then pointed out the mistake to you myself, without changing my main argument in the slightest, or taking back anything I've said in the comments, gives you some insight into my commitment to mathematical integrity and the strength of my solution. Everything I said in the comments is for real. I am not trying to evade your points or ask merely insulting questions. I really do want you to answer the one question I keep bringing up.

18. "I strongly urge you to use this opportunity to take a new look at my argument and really understand it." And I urge you to do the same.

Specifically, if one random variable represents the value in both envelopes - as your "x" does, in an overly complicated way that you don't describe well - and the other represents the choice of low vs. high - as yours does, but you don't describe as a random variable at all - then integration is irrelevant. This is true whether you use a discrete distribution, as the original problem suggests; or a truly continuous one, without infinite values for the density; or a dirac-delta based continuous model of a distribution that includes discrete elements.

WHY? BECAUSE THE EXPRESSION YOU WOULD INTEGRATE IS IDENTICALLY ZERO FOR EVERY VALUE OF X.

This isn't hard to understand. But you do have to try.

19. 1) Suppose X is a uniformly-distributed continuous random variable defined on the range x0<=x<x1. Let f(x) be its probability density function. What is the significance of the expression ∫ A(x)*f(x)*dx, integrated from x=x0 to x=x1?

2) Even if can't figure out what f(x) is in question 1, do you even care that f(x)=1/(x1-x0) when you evaluate ∫ [A(x)-B(x)]*f(x)*dx if A(x)=B(x)? Do you need to know anything at all about f(x)?

From here on, say T is a random variable representing the total amount in the two envelopes. Then, S(t)=t/3 is a function representing the amount in the smaller envelope, and G(t)=2t/3 is a function representing the amount in the greater envelope. Further, A(t)=G(t)-S(t)=+t/3 is function representing the gain when you switch from the smaller envelope, and B(t)=S(t)-G(t)=-t/3 is function representing the gain when you switch from the greater envelope. Finally, let C be a discrete random variable representing which envelope you initially chose, so c={smaller, greater} with distribution {1/2,1/2}.

3A) Why do I use both upper case and lower case letters?

3B) What is different about how you handle C and T, since C must be discrete, and T could, in theory, be continuous or discrete?

3C) What is the significance of questions #1 and #2 to the expected value of a switch?

1. I take it from your posts that you are, in fact, okay with my changes? That like me, you stand by everything you've said? That you won't come back later and say "Oh, of course that changes everything! Of course I didn't mean for my comments to apply now, after you've made those changes!"

Again, although these changes actually doesn't affect our discussion at all, I'm extending an offer for you to revise anything you've said with no negative consequences, as I feel that it's only fair. This will expire at the beginning of my next post. I strongly urge you to take advantage of it. Really, the best way for this discussion to end is for you to say "Well, now that your mistake has made me take a second look at your argument, I now realize that it is in fact a completely general argument which subsumes my entire first argument. I was misinterpreting it completely." And then, for me to reply "No problem. I'm still glad we had this discussion, because thanks to you I caught that minor mistake and fixed it."

And, while that offer is operational, I'm going to refrain from asking you my one question for the time being. It will be back again in my next post, unless you've actually understood my solution by then. But, in hopes that you will actually try to understand my solution, I will instead spend this post in trying to address your points.

I clearly define all the terms before I begin using them, by saying:

"""
x = amount of money in your current envelope,
f(x) = probability distribution of the money in the "lesser" envelope, and
g(x) = probability distribution of the money in the "greater" envelope.
"""

They do not use your notation, but all that should be perfectly clear. In your notation,
f(x) would be the probability density function over l, and g(x) would be the probably density function over s. I've also explained earlier that I've collapsed the full integral down to a single one, which can easily be done using dirac delta functions. Doing it this way makes the argument simpler, more intuitive, and focuses it on the actually interesting part - the wild statistician's actual error.

1) That is the expression for the expected value of A.
2) No, you don't care about the specific form of f(x). The integrand is zero everywhere, so the entire expression is trivially zero.

Note here, between 2 and 3, that all this business with t is YOUR argument, not mine. It is the argument that I dismissed as being trivial - the argument that we already knew from the very beginning, because it was essentially already a part of the problem at the start. Also remember that my accusation against you this whole time is that you don't understand MY solution, not that you don't understand your own argument.

3A) Because upper case letters represent random variables in the abstract, without specifying a specific value to them, or a function that generates the same. Whereas lower case letters represent a specific instance of such a random variable.
3B) I essentially start my problem as already having integrated over c using dirac delta functions.
3C) If A(x) - B(x) represents some quantity you're interested in, the expectation value of that quantity would be zero because the integrand is zero everywhere.

I hope those answers were to your satisfaction. Now, my claim is still that you have completely, utterly, disastrously misunderstood my solution. Again, the best possible end to this discussion for you is to take my offer of changing your mind with no repercussions, acknowledge and fix your misunderstanding, and learn something. Your other options are... worse.

20. “I take it from your posts that you are, in fact, okay with my changes?” Haven’t looked them because, for the most part, they are irrelevant (see my last post) and probably still have careless errors, which you can’t seem to understand and don’t seem to care that you don’t. Like:

“x = amount of money in your current envelope,
f(x) = probability distribution of the money in the "lesser" envelope, and
g(x) = probability distribution of the money in the "greater" envelope.”

It is not just “my notation” that you get wrong, it is the definition of the terms. A probability distribution function is an actual probability. The probability for a single value x of a true continuous random variable (as opposed to a discrete one you model with dirac delta functions) is zero. The probability distribution is F(x)= ∫ f(t) dt from t=-inf to x.

“They do not use your notation, but all that should be perfectly clear.” And I have said they end up being correct, despite these careless errors.

“I've also explained earlier that I've collapsed the full integral down to a single one, …” but not how. Like almost all of what you say, you present a de facto conclusion without any support. This one would be pretty difficult to support, since that isn’t what you did. The second integration is really a summation, since your second random variable is the discrete one representing the choice. The two you call different ones are FUNCTIONS of a single random variable, and you integrate over it exactly ONCE. There are two terms in it that you add, but you have a single integral.

“Note here, between 2 and 3, that all this business with t is YOUR argument, not mine.” Right. And it is precisely why we don’t need yours, since the integration becomes irrelevant.

“It is the argument that I dismissed as being trivial” but never addressed that it is trivial because the problem, when viewed correctly, is that trivial.

“… the argument that we already knew from the very beginning, because it was essentially already a part of the problem at the start.” Where?
I’ll look at your “changes” tomorrow. Maybe. But it’s hard to see why I should, when you won’t look at mine except to call it “trivial,” and use lame excuses like "you haven't provided an answer that is found on the first page of any probability textbook" when you continue to get things wrong from the second, and all subsequent, pages.

1. As for my "mistakes", even if they are really "mistakes" you yourself said that they're minor and don't really matter. I mean - "probability distribution function" vs. "probability density function"? Really? I think you and I both know that this is not about a preference for one notation vs. another. My claim has always been that you've completely misunderstood my solution, making any minor mistakes I might have made completely irrelevant in comparison.

And, since you keep bring it up -
https://en.wikipedia.org/wiki/Probability_distribution_function
https://en.wikipedia.org/wiki/Probability_distribution
You see that there is flexibility in the terminology, because math is not about silly things like this.

Please do look at the change, and look at my solution again. Seriously, that "I don't see why I should" attitude is terrible. It's rude. Remember, my claim is that you have utterly and disastrously misunderstood my solution. Then, to come in here with that misunderstanding, and to say "I don't see why [you] should" look at my work, while pushing your shallow argument which is entirely subsumed in mine, is rude, arrogant, immature, and ignorant.

Now, I'm going to again ask the question that I wanted answered from you, as I said I would. It is this:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

By now, you should understand that this question is for real. I'm not trying to insult you by asking something so simple. I'm not trolling you or just delaying things or trying to distract the discussion from other things. I've demonstrated my earnestness by my actions throughout this long discussion - answering your questions, acknowledging my own mistakes, etc. This question really does begin a sequence that will clearly demonstrate that you've misunderstood my solution, and points the way to how to understand it better. I realize that you find it insulting because it's so easy. But that is necessary: as I said previously, if you make a mistake in thinking that 0 = 1, you must go back and re-learn what 0 and 1 are. It is also intentional: because you continually evade, ignore, and misunderstand my solution and my points, I intentionally designed the simplest way to show that misunderstanding. We will walk through each step together, affirm that it is indeed correct, so that when we come to the conclusion, there will be no chance of misunderstanding, ignoring, or escape.

21. "I mean - "probability distribution function" vs. "probability density function"? Really?" Knowing these terms is more significant than knowing a uniform density function is 1/(x1-x0). I keep trying to get you to ignore such questions by admitting to you that the names aren't important, just like that insulting question isn't important. But you keep using that insulting question, and others, as an excuse to ignore what I say.

"And, since you keep bring it up..." Did you see where they said "However, this use is not standard among probabilists and statisticians" ? Or notice that my point wasn't that your usage was not understood, but that it could be interpreted as not understanding the difference? The kind of point YOU keep raising with your insulting question?

Now, you should look at https://en.wikipedia.org/wiki/Conditional_probability_distribution#Continuous_distributions . The conditional density you keep writing as a function of a single random variable should be written as a function of two. This is why "p("smaller" envelope|x) = 0.5 f(x) / p(x)" is still an incorrect expression, even if the coefficient is now correct (I haven't bothered to check).

Let the smaller amount be the random variable S (values called s), the greater be G (and g), and the value in your be X (and x). I don't know how to do a subscript here, so I'll call the densities fS(s), fG(g), and fX(x). And there is a another we need - the joint density fSX(s,x).

These are highly related random variables. But the correct way to do what you are trying is to discover that relationship, which you haven't. The correct way to write the expression for what you called p is fS(s|X=x)=fSX(s,x)/fX(x). IT NEEDS THE JOINT DISTRIBUTION OF TWO DIFFERENT RANDOM VARIABLES. To average it you need a double integral, over ds and dx SEPARATELY. YOU DON'T DO THIS.

The (single) integration that you end up with is 0.5 ( ∫ x f(x) dx - ∫ x f(x) dx ). What you continue to ignore is that there can be many ways to get there. One is the complicated, technically-ioncorrect-but-may-end-up-right way you used. Another is the integral I presented in question #2 a few posts ago, ∫ [A(t)-B(t)] f(t) dt where A(t)=B(t). Did you not notice a similarity between yours and mine? THEY ARE ESSENTIALLY THE SAME THING, with the exception of the coefficient you changed which turns out to be irrelevant because it gets multiplied by zero.

You keep dismissing mine as trivialization. It isn't - the result is the exact same result try to get, without the complex machinations you use to transform the easy random variable T=total into X=your envelope, and then misrepresent the other random variables and claim you are using two when you aren't.

You just can't admit that a trivial-seeming analysis is sufficient for a trivial problem. And that the only thing you need to communicate to your interviewer is why the problem is trivial: (1) The answer that says you should switch contains an error. (2) The trivial one you dismiss does not, AND IT EASILY SHOWN TO BE CORRECT IF YOU DON'T DISMISS IT BASED ON ITS SIMPLICITY. (3) Its answer is that the expected gain, integrated over only the "choice" random variable, is zero. So actual integration over the "value" random variable is irrelevant, since it must be zero also.

22. BTW, I have answered your insulting question three times now. Just in a more general form than you requested. You even acknowledged it. I have resisted the temptation to ask you if you realize that 1/(1-0)=1. Should I do so before letting any kind of discussion continue, since without such a statement it appears you don't understand simple arithmetic? This is the exact kind of thing you keep doing.

1. You said:
"""
BTW, I have answered your insulting question three times now. Just in a more general form than you requested. You even acknowledged it. I have resisted the temptation to ask you if you realize that 1/(1-0)=1. Should I do so before letting any kind of discussion continue, since without such a statement it appears you don't understand simple arithmetic? This is the exact kind of thing you keep doing.
"""

What I need from you is a confirmation and your willingness to follow me, to actually engage with me. You see the kind of answer you just gave above? This kind of non-answer that does everything possible to not simply say "yes, the answer is 1"? Your tendency to do this is exactly the reason that you need to walk through this step-by-step process. Because, at the end of the process when you'll be shown to be utterly wrong, you need to be able to say, "yes, I was wrong", instead of "Well, you're just ignoring my points that show that you're just confused".

Again, this is not a test of your math ability. I chose these questions specifically because I already thought you could answer them. If it is a test at all, it is a test of your ability and willingness to follow another person's argument.

If you feel insulted by this, go ahead and feel free to insult me back, but be aware that you'd only be digging yourself in deeper. But if it would actually get you to answer the question, feel free to say "Of course I agree that the answer is 1, you idiot who can't even do basic arithmetic! How stupid are you?". Such an answer would actually be less rude than your perpetual evasion and bluster. Remember, I'm not asking these questions to you to see if you know them. Rather, I need your participation, your willingness to follow someone else's argument.

Seriously, if you aren't willing to engage with me on this one single thing, despite the fact that I have already engaged with you numerous times, you aren't worthy to call yourself any kind of mathematician. I mean, imagine if you did this in a in-person discussion. Imagine that we're sitting in a math classroom, discussing a problem, and I said "Okay, you're wrong, and I can show you that you're wrong if you just follow my argument. You see this equation here?" and you replied by saying "I'm insulted by your suggestion that I'm blind. Is it not manifestly clear that my eyes work just fine? Are you the kind of idiot who has to insult their opponent's vision because they're losing the argument?". Would that not be rude? Is that how someone who claims to have a degree in math comports himself? Your diploma is ashamed of you.

The rest of your replies are filled with error, as usual. But that is irrelevant. The question I ask will settle that. So, let's try this again:

The question is:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

f(x) = 1

Tell me, am I right?

All I need from you is "Yes, you are right that f(x) = 1". You can even choose to answer "Of course I agree that the answer is 1, you idiot who can't even do basic arithmetic! How stupid are you?", although I would advise against it.

23. (1 of 2 or 3 - let's see what I can fit)

“What I need from you is a confirmation and your willingness to follow me, to actually engage with me.” You describe two different things here. I am not going to “follow you,” but I have been trying from the beginning to “engage with you.”

The difference is that I feel I know just as much, or more, about probability in general, and this problem is specific, than you do. I feel I have demonstrated this knowledge amply. I do not need to, and am not willing to, be led by the nose in this discussion, as you keep trying to do. I want it to be a two-way endeavor. Do you? I ask, because you have not shown any willingness to engage with me. Only to lead me, and I do not need to be led.

Toward those stated goals, I have absolutely no intentions of providing a direct answer to your insulting question which you know I can answer. The one whose only purpose at this point is to make me subservient to you in a one-sided discussion. I thought this would have been extremely obvious by now.

I also have no intention of insulting you, as you have insulted me. But I would have left this discussion long ago if I didn’t want a proper, two-way discussion.

24. (2 of 3)

Starting points I want to emphasize:

1) You did not make a problem statement. You linked to Wikipedia. Its setup begins by describing what you call the Wild Statistician’s solution, and then simply asserting “However, this violates common sense.” The implication is that it is common sense that the two envelopes are equivalent to you, so which you pick shouldn’t matter.
a. The WS solution is demonstrably correct, except for one unsupported assumption (see (3d), (6), and (7) below).
2) Later – not as part of the “problem setup” as you implied in your very first reply to me – it provides a different, and very long-winded solution that, like yours, ends up being correct. It is almost impossible to follow.
3) I provided simple explanations for both solutions. I did not “just restate” the problem as you claimed; I did restate the solution described in (1) above, but I solved the part described by (2) in a different way.
a. The purpose of restating (1) was to identify the error.
b. Yes, this solution is trivial. That’s part of my original point. It is also demonstrably correct. There is one is correctly applied assumption: that when you know nothing about either envelope, you have equal chances of having either.
c. The difference between (1a) and (3b) is that (1a) assumes you know something about one envelope. The value.
4) There is nothing wrong with a solution to (2) being trivial, if the solution is correct in all ways. I’ll point out that all you did was provide a different solution for the same thing.
a. Yours contained some marginally questionable steps, but ended up being right in the end.
b. But it ended up being right because, based on what you described as a trivialization, your integration was essentially ∫ k*Z(x)*f(x) dx, where Z(x) is identically zero for all x in the range of integration. So if you made a mistake in, for example, the constant k, you’d still get the right answer.
c. You have admitted to such a mistake.
5) One big question here, is which solution that parallels (2) was a better explanation: Mine, which is trivial and explained in two undeniably correct sentences; or yours, which contained many errors in nomenclature, one of which resulted in an error that did not produce the wrong result only because of a reason explained by mine.
a. In other words, is the “trivialization” you use as an excuse to ignore any attempt to discuss my solution, actually a better reason to do the same to yours?

1. (3 of 3)

6) Proving that the WS’s assumption was wrong should be one half of the explanation communicated to the interviewer (a correct, however trivial, answer to (2) is the other half).
a. I proved it, but only for a special case.
b. You objected, calling it a technicality.
c. I have asserted that the special case is the best representation of the original problem.
d. You did not prove the assumption was wrong; you merely listed examples of distributions where it was. You did not show that the actual distribution had to be one of them.
7) Your objections in (6b) are also technicalities; technicalities I was prepared to discuss. The WS’s assumption is equivalent to saying that, regardless of the form the distribution takes, the probability of the pair of values {S,G} being in the ranges 2^(n-1)*x0<=s<2^(n)*x0 and 2^(n)*x0<=g<2^(n+1)*x0 (do you know why I don’t use <= in both places?) have to be the same regardless of -inf<n<inf. So one of three contradictions follow from the WS’s assumption:
a. If there is a claimed lower limit to your x, it means there is non-zero probability for x’s below it. This is the “technicality” I used, but only because it is the easiest to demonstrate.
b. If there is a claimed upper limit to your x, it means there is non-zero probability for x’s above it.
c. If there is no lower or upper limit, the range of x is infinite and (please look at the general form I provided for your insulting question, since this is the only place it is relevant to this discussion) the probability is 0 for any range of x’s.
8) The explanation in (7) applies to discrete, continuous, and discrete-modeled-as-continuous distributions.
a. It is unnecessary to model discrete distributions as continuous. Sure, you can, but it is not necessary. So describing how is irrelevant.

Now, I am not interested in hearing a confirmation that a statement is right, since that is a leader-follower kind of non-discussion. I want to know what you disagree with, and why.

2. Enough. Your worthless posts have benighted my blog for far too long. I'm taking administrative action.

You are a liar. When I linked you to the two Wikipedia articles showing that there is flexibility among the terms "probability density function", "probability distribution function", and "probability distribution", you replied and said:

"""
Did you see where they said "However, this use is not standard among probabilists and statisticians" ?
"""

when in fact nothing like those words appear anywhere in the two articles I linked. For proof, here are the links to the two articles as they appeared when you made your lying statement:

https://en.wikipedia.org/w/index.php?title=Probability_distribution&oldid=696690763
https://en.wikipedia.org/w/index.php?title=Probability_distribution_function&oldid=667630013

Read those pages. Search them (usually control-f or command-f in most systems) for "However, this use is not standard among probabilists and statisticians", or any meaningful fragment thereof. "this use" returns zero results. "not standard" returns zero results. "among" returns zero results. "probabilists" and "statisticians" both return zero results. Manually reading the articles returns zero results. In fact, one of the articles actually expresses the opposite sentiment, stressing the ambiguity of these terms and the need to take author preference into account.

Meaning, you did not read or quote these articles. You literally just made something up, for the explicit purpose of deception, because you knew that you were wrong and didn't want to admit it. You are a liar.

You are a hypocrite. Since the time I asked you to answer my one question, I have:

Corrected typos in my original post based on your feedback;
Acknowledged your accomplishment when you changed your argument to be based on a continuous distribution, and discussed it at length with you, even though that just turned out to be a broken clock being wrong twice a day;
Voluntarily pointed out an error that I found in my original work, although you did not know of it;

In comparison, in all that time you have not even answered the one single simple question that I asked you to answer, one which I emphasized was of supreme importance in this discussion. Then, in face of this overwhelming difference in our willingness to engage with each other, after you flat out refused to ever answer my question, you then turn around and accuse me of not wanting a two-way endeavor, and not showing any willingness to engage with you? You are a hypocrite.

3. You are a spammer, and you declared it yourself, in saying:

"""
I have absolutely no intentions of providing a direct answer to your insulting question
"""

Again, that one simple question was the only thing I asked you to do to engage with me, while I was doing everything I documented above to engage with you. Do you know what makes a spammer a spammer? Say that you came in here pushing penis enlargement pills. If I then asked you something about human sexuality in general, and you actually replied appropriately, then you would have been off-topic at the beginning, but not a spammer. No, what makes spam that ugly blight in online discussions is that the spammer is only interested in spamming their own agenda, while ignoring everything else. That's why it is pointless to engage with a spammer, and you have yourself declared that it is pointless to engage with you. You are a spammer.

You are a troll. I have, throughout this discussion, been completely earnest with you. I corrected you as long as I thought that was still effective. I complemented you when you did something right. I thanked you and apologized to you as the occasions warranted. And I reproved you when your actions were contemptible. On the other hand, you have willfully practiced deception. You started the discussion without having read the original post completely, by your own admission. You never intended to engage with me, while claiming the opposite. You've practiced well-known dishonorable debate techniques (parroting my arguments back to me, shotgunning the discussion with errors.). You were never going to have any kind of reasonable discussion with me. You are a troll.

4. I have standards for the comments people post on my blog. For the sake of my readers, and because on the internet there are bound to be some number of fools with mouths too large for their brains, I have to occasionally delete comments or block users. For example, spam posts are deleted summarily. Given that you are a proven liar, a demonstrated hypocrite, a declared spammer, and a confirmed troll, I should do the same to your posts.

However, because I would like for you to learn something from all this, I'm going to give you one last chance to engage with me. I am going to ask you my one question again, and I want you to answer it. But, to force you to stay on topic, I am going to tell you the mathematically correct answer. You are to reply to me with that answer EXACTLY, with no changes or additions. Any other replies will be summarily deleted.

This will continue for several questions. Each time, there will be a simple math question using only the simplest algebra and the most basic probability theory. I will tell you the mathematically correct answer, and you are to reply with that correct answer exactly. This will go on until the final question (probably around 5-10 questions in), whose mathematically correct answer will be "Yes, I was wrong."

So, again: reply exactly with the given correct answer. Anything else will be summarily deleted. If the correct answer is not given after some extended period of time, I will consider you to have abandoned the exercise.

You may be wondering "why would I ever follow along with this plan?" Firstly and most importantly, because it'll do you good. You cannot change the fact that you were wrong, or that you have made a complete fool of yourself. But you can still acknowledge those mistakes, which will be helpful in turning yourself around.

Secondly, I will take pity on you in my future posts if you complete this exercise. Because of the mess you made in this comment section, I will have to write a lot of things to clean it all up. Regardless of whether you choose to complete the exercise, I will eventually have to write out the sequence of the questions in this exercise and their answers, showing exactly why you are utterly wrong. I will also have to clarify my comment policy, answer some of the other mathematical questions that has come up (double to single integral using dirac delta functions, the three exercises I assigned you before, etc), and write a follow-up post on how some people can go so wrong when they try to answer the two envelopes problem.

If you complete the exercise, I will take special care to avoid directly referring to you in all these future writings, and allow you to delete all of your previous posts. I will not replace them directly with the backup copies I have of our discussion - instead I will only use these backups for the texts of your post, with your handle redacted. So your errors will be disassociated from your handle.

If you have any attachment to your handle, and value whatever online reputation it may have, I hope that will serve as a motivation for you to complete the exercise.

So, shall we have one last go at this?

The question is:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

My answer is that f(x) = 1. Tell me, am I right?

Yes, you are right that f(x) = 1

25. It has been well over a month since I asked JeffJo for a reply. It saddens me to say that I must consider him to have abandoned this thread.

For the sake of my reader's understanding, and to clean up after JeffJo, I will now post the correct answers to the questions that came up during this discussion.

I had been asking one question to JeffJo, saying that it is the first of a short series of questions that will lead to a clear demonstration of his error, using only the simplest of algebra and probability theory. The following is that series of questions, in its entirety.

Question 1:

Let f(x) be a uniform probability density function over the real numbers in the interval [0,1]. What is the value of this function at x = 0.5?

My answer is that f(x) = 1. Tell me, am I right?

Yes, you are right that f(x) = 1.

Question 2:

Now, let g(x) = 0.5 f(0.5x), as I stated in the original article. Given the same f(x) as in the previous question, what is the value of g(x) at x = 0.5?

g(x) = 0.5 f(0.5 * 0.5)
= 0.5 * 1 = 0.5

Question 3:

Now using the same g(x) and f(x) as above, what is the value of x * f(x) - 0.5x * g(x) at x = 0.5?

x * f(x) - 0.5x * g(x)
= 0.5 * 1 - 0.5 * 0.5 * 0.5
= 0.5 - 0.125 = 0.375

Question 4:

Now, is that expression in the previous question (that is, x * f(x) - 0.5x * g(x)) the expression that I'm integrating in my original article?

Answer: Yes, that is the expression that is being integrated in the original article.

Question 5:

And that expression - the integrand - is 0.375 at x = 0.5?

Answer: Yes, the integrand is 0.375 at x = 0.5

Question 6:

Is 0.375 = 0?

Answer: No, 0.375 is not equal to zero.

Question 7:

So, were you wrong when you repeatedly insisted that my integral reduces to yours because its integrand was zero for all x values?

Now, did I actually expect JeffJo to go through with this exercise? Of course not - you can hardly expect someone who couldn't even say "Yes, f(x) = 1" to ever say "Yes, I was wrong". Even if he somehow managed to answer the first question, he would have had some excuse, distraction, evasion, or lie to avoid saying "yes, I was wrong" - even though he IS wrong, just as surely as 0.375 = 0 is wrong.

1. What exactly he would have said for himself about this error is unknown, but what he has actually said about this integrand is all over the above thread. He really and truly believed that the integrand in my initial article was zero for all values of x. The following are just a sample of the things he said - I swear I couldn't have set him up better even if I were sockpuppeting him.

JeffJo:
"""And the function you integrate over IS IDENTICALLY ZERO everywhere you integrate it. Which is what I did. I "solved the problem" to use your words."""

This seems clear enough, doesn't it? He thinks that the integrand in my function is always zero. He does the wrong problem - a much simpler problem - and thinks that he solved my problem. That is what it means to trivialize a problem. But maybe I misunderstood him somehow?

JeffJo:
"""What you did was find that the value of the expectation is zero *after* the integration. What I did was find that the value you sum, or integrate, over is zero everywhere."""

Hm, it certainly looks like I understood him correctly. He really does think that my integral is an integral over the zero function.

The wild statistician's mistake, in fact, is actually somewhat related to JeffJo's. JeffJo thinks that the integral is simple because it's zero everywhere. The wild statistician thinks that it must be positive because the expectation value is 5/4 x everywhere. Both are wrong. Both are trivializing a more complicated integral, which only becomes zero after the integration.

JeffJo:
"""What you got, correcting a typo, was ∫ x f(x) dx - ∫ 0.5x g(x) dx. This essentially means the contribution of the “lesser” envelope to the expectation, minus the contribution of the “greater” envelope.

All that is unnecessary if you do the summation first. It evaluates to:

[(-1)*l(x) – (-1)*g(x)]/2 + [(1)*l(x) – (1)*g(x)]/2
= [-l(x)+g(x) + l(x)-g(x)]/2
= 0.

Making the integration, the variable change, and the entire distribution question moot."""

Yeah - I really don't think I'm misunderstanding him. He really thinks that my integrand is zero.

JeffJo:
"""9) Nachlv presented a 300+ word argument that showed that the expectation over all values of X, a random variable related to T, was zero.
(...)
b. It did NOT show that the expectation for any value of X was zero."""

That would be because the expectation is in fact NOT zero for any value of x.

The really notable thing about this error here is that at the beginning, he actually understood the issue correctly. He understood that knowing the value of x changed the expectation value of switching envelopes. He in fact accused ME of not understanding it. But when confronted with a clear quote from my original article about how I had taken exactly this into account, he retreated to his trivialized approach to the problem, and then insisted that my approach must reduce down to his trivialized approach. That is the origin of this error. So he actually forgot something he genuinely knew, and made a new, monstrous error on the same issue, all to avoid saying that he was mistaken.

JeffJo:
"""WHY? BECAUSE THE EXPRESSION YOU WOULD INTEGRATE IS IDENTICALLY ZERO FOR EVERY VALUE OF X."""

Yup, definitely understood him there. No room for misunderstanding.

JeffJo:
"""your integration was essentially ∫ k*Z(x)*f(x) dx, where Z(x) is identically zero for all x in the range of integration."""

And one more for good measure, in his very last series of posts.

So, my series of questions does exactly what I said it would do - using only the most elementary math, it clearly demonstrates the enormity of JeffJo's error, which is clearly wrong as surely as 0.375 = 0 is wrong.

But, I did also say that this would lead to answering all of his other objections. So, let's move on to those points.

2. Note that my integral was an integration over the expectation value at x over the probability distribution over x. JeffJo repeatedly insisted that I had not done this properly, that I needed to do a double integral. I answered that I had already integrated over the "choice" variable to reduce that double integral to a single integral. But JeffJo continued to insist that I had not done this.

"""1. I mentioned that a continuous distribution can always be used to model a discrete distribution using dirac delta functions. Explicitly outline the procedure. That is, starting with a discrete distribution d(x) where x can only take on discrete values, explain how to obtain a c(x) where x can take continuous values, in such a way that d(x) and c(x) behaves identically in all statistical calculations."""

Let d(x) = p1 for x = x1, p2 for x = x2, p3 for x = x3, etc. for as many values of x you'd care to specify, up to a countably infinite number. Then, using dirac delta functions, the probability density function c(x) over a continuous x is given by c(x) = p1 δ(x - x1) + p2 δ(x - x2) + p3 δ(x - x3) + ..., where δ(x) is the dirac delta function.

Now, in order to tackle the double integral problem, we have to extend this to two dimensions. If I have p(x,y) which is a two-dimensional probability density function over the x and y variables, but y is actually a discrete variable that I want to model using continuous values, how should I do so using dirac delta functions? Like so:

p(x,y) = p1 δ(y-y1) py1(x) + p2 δ(y-y2) py2(x) + p3 δ(y-y3) py3(x) ...

Where, as before, p1, p2, p3, etc. are the probabilities for y taking on the values of y1, y2, y3, etc. respectively. py1, py2, and py3 are then probability density functions for x, given that y has taken on y1, y2, or y3, respectively.

For us in the two envelope problem, if we let y = -1 for the "we've chosen the 'smaller' envelope" choice and y = +1 for the "we've chosen the 'greater' envelope", then p1 and p2 are both 0.5, and py1(x) = f(x) and py2(x) = g(x). x, as in the main article, is the value of money in the current envelope. Putting this all together, we get:

p(x,y) = 0.5 δ(y+1) f(x) + 0.5 δ(y-1) g(x)

Now, we want to integrate the expectation value over this probability density function. The expectation value is +x if we have the smaller envelope, but -x/2 if we have the larger envelope. Or,

e(x,y) = (x for y = -1, -x/2 for y = 1)

Now, let's integrate over the choice variable in the double integral, to reduce it down to a single variable:

E = ∫∫ e(x,y) p(x,y) dx dy
= ∫∫ (x for y = -1, -x/2 for y = 1) (0.5 δ(y+1) f(x) + 0.5 δ(y-1) g(x)) dx dy
Now, remembering that the dirac delta function is zero if the argument is not zero:
= ∫∫ 0.5 δ(y+1) f(x) * x + 0.5 δ(y-1) g(x) * (-x/2) dx dy
Then using the property of dirac delta functions that it integrates to 1,
= ∫ 0.5 f(x) * x + 0.5 g(x) * (-x/2) dx dy
= 0.5 ∫ x * f(x) - 0.5x * g(x) dx
Which is, of course, the exact expression that I evaluated in my original article. So, I did in fact do that very thing which JeffJo said I couldn't do: starting with a double integral and ending with my expression. I trust that the reader will be able to find ample places in the above discussion where JeffJo did, in fact, make that false claim.

3. Now, let's move on to the second question I posed to JeffJo in my original set of 3:

"""2. I mentioned the case of restricting yourself to only discrete values, and what happens when you find the lowest amount in your envelope (say, \$1). Now, put yourself in the shoes of the wild statistician, and construct his argument for him. What would he say? How could this discretization only strengthen his incorrect argument? And what is the correct reply to him that would actually fix that mistake?"""

JeffJo actually got this one eventually. After I brought this up repeatedly with him, near the end of our discussion, he finally understood that if you have \$1, and you knew that the other envelope couldn't have \$0.50 because \$1 was the minimum value, then it must therefore have \$2, and you must therefore switch. So, the wild statistician would say, "So, you've come up with ONE counterexample to my argument, where your envelope contains 1\$. But in that lone case, you must absolutely switch. So if the envelope contains 1\$, you must switch, and if it contains more, then you must still switch according to my original argument. You've only strengthened my argument." The following is where JeffJo finally realizes this:

JeffJo:
"""????? And you completely ignore the fact that the point was to find a flaw in his argument, and show that there was not a flaw in the argument that says switching can’t matter.

Yes, in that case you should switch, but that was not full argument. There is a corresponding problem at the maximum end – it’s just harder to convince nitpickers that there must be a maximum"""

And here, we have a good example of how he would have reacted if I had gone through the series of questions on 'is the integrand zero' with him. Note the combination of diversion, evasion, and lies that he employs. In particular, when he says "Yes, in that case you should switch, but that was not full argument", that is a complete lie. When I first brought up that second question, we were still discussing his solution in his very first post. The question of what to do at the maximum end did not get addressed until near the end of the thread, when he did actually say some worthwhile things, if only accidentally. The fact that he did NOT address the maximum end on his original post can easily be seen by scrolling up to it.

Also related to this problem is my claim that my solution completely subsumes his initial attempt, that his thoughts were a subset of my thoughts. So, whereas JeffJo completely misunderstood my solution and couldn't even correctly presented his initial "minimum value" solution, my solution included his solution, from the very beginning in the original post. All I have to do is set f(x) = δ(x-1) in my integral to completely encapsulate it.

4. And that, finally, leads us to the last of my three questions.

"""3. Imagine a software engineering interview, where your interviewer asks, "okay, what actually happens when you search for something on Google?" and you answer, "you get the search results, obviously. What, isn't that a completely correct answer?". What is wrong with this answer? What would a completely correct answer look like? (you don't actually have to provide the full answer, which would obviously be too difficult. Just describe what it would cover) And what would a mediocre answer look like?"""

That "you get the search results, obviously" answer is right in all the wrong ways. It doesn't actually answer the question, and it shuts down further discussion since the answer is 'obvious' and the potential hire is arrogant about it. It displays no insight or understanding of the actual process we're interested in. It tries to only get the 'right answer' in a shallow, technical sense. It then pretends that that's enough.

The full answer would address things like the directing of web traffic when you make a search request from Google, and the details of Google's search algorithm.

5. With that, I believe that we've now cleaned up well enough after JeffJo's mess. All the major questions that have been raised have been answered, and any other lingering questions could be tackled by the reader with the information provided. Still, if anything is not yet clear, please feel free to comment and ask.

I also mentioned that I would have to clarify my comment policy - I don't think I'll be writing down explicit rules yet, but this thread should serve as a good example for things that I do or do not find acceptable in my comments section.

I also said that I'd write a follow-up post on some of the pitfalls surrounding the two envelopes problem. I'll put that on my "ideas for future posts" list in my head, which I hope to get to some day, which I may not actually manage to do. But, I will at least change something in my original article to prevent some of the pitfalls that JeffJo fell into. I've added "Then f(x) can be completely general, but g(x) = 0.5 f(0.5x)... "

At the end of all this, I would still like to thank JeffJo, because this discussion did still lead to catching some typo-like mistakes in the original article. I sincerely hope that he comes to a better understanding of this problem and a better approach to problem-solving in general. And I hope that you, my readers, have come to a clear understanding of all the questions that were brought up in the thread above. Please, feel free to add any comments and questions you may still have.

26. It can be taken as an axiom that both envelopes are equally preferable. The probability of there being a given pair of amounts of money is then defined that way. If one envelope has \$20, then the other envelope has \$10 with a probability of 2/3, or \$40 with a probability of 1/3.

So the solution cannot be described in terms of actual amounts of money, because this probability cannot be summed into a finite distribution.