Two Envelopes,
Part 2
The Concept of “Conditional Expectation"
We return to the Two-Envelope Problem, this time using the concept of random variables. Suppose $X$ and $Y$ are two discrete random variables with joint density $f_{XY}.$ Then $$P[Y=y\mid X=x]=\frac{P[Y=y\ \text{and}\ X=x]}{P[X=x]}=\frac{f_{XY}(x,y)}{f_X(x)}$$where $f_X$ is the marginal density for $X$, assumed to be non-zero. For any $x$ such that $f_X(x)\neq 0$ we may define the conditional random variable $Y|x$ with the density $$f_{Y\mid x}(y):=\frac{f_{XY}(x,y)}{f_X(x)}.$$The expected value of $Y\mid x$ is then given, as usual, by $$\text{E}[Y\mid x]=\sum_y y\cdot f_{Y\mid x}(y)$$if the sum on the right is over finitely many terms or, in the case where it is an infinite series, if the series converges absolutely. The term $\text{E}[Y\mid x]$ is also known as the conditional expectation of $Y$.
For any $x$, $\text{E}[Y\mid x]$ has a fixed value. But each value of $x$ actually occurs with probability $f_X(x)$. Thus we may view $\text{E}[Y\mid x]$ itself as a random variable and calculate its expectation, i.e., $$\text{E}[\text{E}[Y\mid x]]=\sum_x \text{E}[Y\mid x]\cdot f_X(x).$$
Some Exercises
Try to answer the following questions:
- Verify that $f_{Y\mid x}$ defined above is actually a density of a random variable.
- Use the formula for the conditional expectation to solve the following problem: "You throw a dice until you get 6. What is the expected number of throws (including the throw giving 6) conditioned on the event that all throws gave even numbers.” (This problem was published on Gil Kalai’s blog. Feel free to read through the discussions and find the answer in the blog comments, but also do the formal calculation using $\text{E}[Y\mid x]$ for suitably defined random variables $Y$ and $X$.)
- Prove the Law of Total Expectation: Suppose that $\text{E}[Y]$ exists. Then $$\text{E}[Y]=\text{E}[\text{E}[Y\mid x]].$$You will need to interchange the summations with respect to $x$ and $y$. For finite sums this is no problem, but for infinite sums, apply the Discrete Dominated Convergence Theorem (see below). Make sure you describe exactly how the theorem is applied to allow you to interchange summations.
With this background on conditional expectation, watch the full video to see how the “rigorous” two-envelope problem is resolved. The first 10 minutes were already featured in out initial discussion of the problem, feel free to skip through them.
This video was published by User “Formant" on YouTube.
Discrete Dominated Convergence Theorem
Let $(\varphi_n)$ be a sequence of functions $\varphi_n\colon\mathbb{N}\to\mathbb{R}$ such that $\lvert\varphi_n\rvert\le g$ for some function $g\colon\mathbb{N}\to\mathbb{R}$. Suppose that the pointwise limit$$\lim_{n\to\infty}\varphi_n(k)=:\varphi(k)$$exists for any $k\in\mathbb{N}$ and that $\sum\limits_{k=0}^\infty g(k)<\infty.$Then
$$\lim_{n\to\infty}\sum_{k=0}^\infty \varphi_n(k)=\sum_{k=0}^\infty \varphi(k).$$
Proof.
Let $(\varphi_n)$, $g$ and $\varphi$ be given as in the lemma. We want to prove that for every $\varepsilon>0$ there exists an $N\in\mathbb{N}$ so that
$$\sum_{k=0}^\infty \varphi_n(k)>\sum_{k=0}^\infty \varphi(k)-\varepsilon\tag{1}$$for all $n>N$.
If we can show this inequality for our $(\varphi_n)$ with $\varphi_n(k)\to\varphi(k)$ for each $k$, then by applying the result to $-\varphi_n$, we obtain
$$\sum_{k=0}^\infty \varphi_n(k)<\sum_{k=0}^\infty \varphi(k)+\varepsilon\tag{2}$$for all $n>N$.
Putting (1) and (2) together, we will have shown that for every $\varepsilon>0$ there exists an $N\in\mathbb{N}$ such that for all $n>N$
$$\biggl| \sum_{k=0}^\infty \varphi_n(k)-\sum_{k=0}^\infty \varphi(k)\biggr|<\varepsilon.$$
This is just the statement of the theorem and we are done.
To prove (1), we want to be able to assume that $\varphi_n(k)\ge 0$. This is where the dominating function $g$ comes in: $\varphi_n(k)\to\varphi(k)$ if and only if $\varphi_n(k)+g(k)\to\varphi(k)+g(k)$. Since $\lvert\varphi_n(k)\rvert\le g(k)$, $\varphi_n(k)+g(k)\ge 0$ and, adding $\sum_{k=0}^\infty g(k)$ to both sides of (1), we have
$$\sum_{k=0}^\infty\mathop{\underset{\mmlToken{mo}{⎵}}{(\varphi_n(k)+g(k))}}\limits_{\ge 0}>\sum_{k=0}^\infty\mathop{\underset{\mmlToken{mo}{⎵}}{(\varphi(k)+g(k))}}\limits_{\ge 0}-\varepsilon$$for all $n>N$.
So, due to the existence of the dominating function, we can henceforth assume that $\varphi_n(k)\ge 0$ and $\varphi(k)\ge 0$ and return to proving (1).
Given $\varepsilon>0$, we can choose $N_1\in\mathbb{N}$ such that
$$\sum_{k=0}^{N_1} \varphi(k)>\sum_{k=0}^\infty \varphi(k)-\frac{\varepsilon}{2}$$
(this uses that $\varphi_n(k)\ge 0$). Next, choose $N_2\in\mathbb{N}$ such that for all $n>N_2$ $$\varphi_n(k)>\varphi(k)-\varepsilon/[2(N_1+1)]$$for all $k=0,\ldots,N_1$.
(It's not possible to do this for an infinite number of $k$s, but we can find such an $N_2$ for each individual $k$ and then simply take the largest one, which will work for all $k=0,\ldots,N$.) Then we put everything together:
$$\sum_{k=0}^\infty \varphi_n(k)\ge \sum_{k=0}^{N_1} \varphi_n(k) > \sum_{k=0}^{N_1} \varphi(k) -\frac{\varepsilon}{2} >\sum_{k=0}^\infty \varphi(k)-\varepsilon$$
This completes the proof.