Convergence in Probability
Even in computer science, we often accept the convergence of probabilities and distributions based on experimental results.
However, to gain a deeper understanding of statistics, we introduce formal definitions to clarify what convergence means in
statistical terms.
Let \(\{X_n\}\) be a sequence of random variables and \(X\) be a random variable defined on a sample space.
\(X_n\) converges in probability to \(X\) denoted by
\[
X_n \xrightarrow{P} X,
\]
if \(\quad \forall \epsilon > 0\),
\[
\lim_{n \to \infty} P [| X_n - X | \geq \epsilon ] = 0
\]
or equivalently,
\[
\lim_{n \to \infty} P [| X_n - X | < \epsilon ] = 1.
\]
Theorem 1:
Suppose \(X_n \xrightarrow{P} a\) where \(a\) is a constant, and the real function \(f\) is continuous at \(a\). Then
\[
f(X_n) \xrightarrow{P} f(a).
\]
Proof:
Let \(\epsilon > 0\). Since \(f\) is continuous at \(a\), \(\, \exists \delta > 0 \) such that if
\[
|x - a | < \delta \Longrightarrow |f(x) - f(a)| < \epsilon.
\]
Thus,
\[
|f(x) - f(a)| \geq \epsilon \Longrightarrow |x - a | \geq \delta
\]
Substituting \(X_n\) for \(x\), we obtain
\[
P[|f(X_n) - f(a)| \geq \epsilon] \leq P [| X_n - a| \geq \delta ].
\]
As \(n \to \infty\), we have \(f(X_n) \xrightarrow{P} f(a)\).
In general, if \(X_n \xrightarrow{P} X,\) and \(f\) is a continuous function, then
\[
f(X_n) \xrightarrow{P} f(X).
\]
Convergence in distribution
Let \(\{X_n\}\) be a sequence of random variables and let \(X\) be a random variable. Let \(F_{X_n}\) and
\(F_X\) be the cdfs of \(X_n\) and \(X\) respectively.
Let \(C(F_X)\) denote the set of all points where \(F_X\) is continuous.
\(X_n\) converges in distribution to \(X\) denoted by
\[
X_n \xrightarrow{D} X,
\]
if \(\quad \forall x \in C(F_{X})\),
\[
\lim_{n \to \infty} F_{X_n} (x) = F_X (x).
\]
Often the distribution of \(X\) is called the asymptotic(limiting) distribution of the sequence of random
variables of \(\{X_n\}\).
In this case, \(X_n\) does NOT always get close to \(X\) in probability. However, follwing theorem gives us a connection
between the two concepts.
Theorem 2:
If \(X_n\) converges to \(X\) in probability, then \(X_n\) converges to \(X\) in distribution.
Proof:
Suppose \(X_n \xrightarrow{P} X\), and let \(x\) be a point of continuity of \(F_{x_n}\).
\(\forall \, \epsilon > 0\),
\[
\begin{align*}
F_{X_n}(x) &= P[X_n \leq x] \\\\
&= P[\{X_n \leq x\} \cap \{|X_n - X| < \epsilon\}] + P [\{X_n \leq x\} \cap \{|X_n - X| \geq \epsilon\}] \\\\
&\leq P[X \leq x + \epsilon] + P[|X_n - X| \geq \epsilon]
\end{align*}
\]
Since \(X_n \xrightarrow{P} X\), we know \(P[|X_n - X| \geq \epsilon] \to 0\). Then we get a upper bound
\[
\lim_{n \to \infty} \sup F_{X_n}(x) \leq F_{X}(x + \epsilon) \tag{1}.
\]
Similarly, we can get a lower bound:
\[
P[X_n > x ] \leq P[X \geq x - \epsilon] + P[|X_n - X| \geq \epsilon]
\]
\[
\Longrightarrow \lim_{n \to \infty} \inf F_{X_n}(x) \geq F_X (x - \epsilon) \tag{2}.
\]
Combining (1) and (2), we obtain
\[
F_X(x - \epsilon) \leq \lim_{n \to \infty} \inf F_{X_n}(x) \leq \lim_{n \to \infty} \sup F_{X_n}(x) \leq F_X (x + \epsilon).
\]
Here, as \(\epsilon \to 0\), we have
\[
\lim_{n \to \infty} F_{X_n}(x) = F_X (x)
\]
because \(x\) is a point of continuity of \(F_{x_n}\).
Therefore
\[
X_n \xrightarrow{D} X.
\]
Now, we can see that the central limit theorem(CLT) is a statement about convergence in distribution.
(See Normal Distribution)
Moment Generating Function(mgf)
Let \(X\) be a random variable such that for some \(h > 0\), the expectation of \(e^{tX}\) exists for \(t \in (-h, h)\).
The moment generating function(mgf) of \(X\) is given by
\[
M(t) = \mathbb{E } (e^{tX}), \qquad t \in (-h, h).
\]
Note: we must need an open interval about 0.
We revisit the central limit theorem and now we can prove it under some assumption.
Theorem 3: Central Limit Theorem(CLT)
Let \(X_1, X_2, \cdots, X_n\) be a random sample from a distribution that has mean \(\mu\) and variance \(\sigma^2 >0\).
Then the random variable
\[
Y_n = \frac{\sum X_i - n\mu}{\sigma \sqrt{n}} = \frac{\sqrt{n}(\bar{X_n}-\mu)}{\sigma}
\]
converges in distribution to a random variable that has a standard normal distribution:
\[
Y_n \xrightarrow{D} N(0, 1).
\]
Proof:
Assume that the mgf \(M(t) = \mathbb{E } (e^{tX})\) exists for \(t \in (-h, h)\).
The mgf for \(X-\mu\) is:
\[
m(t) = \mathbb{E }[e^{t(X-\mu)}] = e^{-\mu t} M(t), \qquad t \in (-h, h)
\]
and satisfies that
\(m(0) = 1\), \(\quad m'(0) = \mathbb{E }(X -\mu)\), and \(m''(0) = \mathbb{E }[(X-\mu)^2] = \sigma^2\).
By Taylor's theorem, \(\exists \, \xi \in (0, t)\) such that
\[
\begin{align*}
m(t) &= m(0) + m'(0)t + \frac{m''(\xi)t^2}{2} \\\\
&= m(0) + m'(0)t + \frac{\sigma^2 t^2}{2} - \frac{\sigma^2 t^2}{2} + \frac{m''(\xi)t^2}{2} \\\\
&= 1 + \frac{\sigma^2 t^2}{2} + \frac{[m''(\xi) - \sigma^2]t^2}{2} \tag{3}
\end{align*}
\]
Consider the mgf for the normalized sum:
\[
\begin{align*}
M(t ; n) &= \mathbb{E }\Big[\exp \Big(t \frac{\sum X_i - n \mu}{\sigma \sqrt{n}}\Big)\Big] \\\\
&= \Big\{\mathbb{E } \Big[\exp \Big(t \frac{X - \mu}{\sigma \sqrt{n}}\Big)\Big]\Big\}^n \\\\
&= \Big[m \Big(\frac{t}{\sigma \sqrt{n}}\Big)\Big]^n
\end{align*}
\]
where \(\frac{t}{\sigma \sqrt{n}} \in (-h, h) \).
Replacing \(t\) by \(\frac{t}{\sigma \sqrt{n}}\) in equation (3):
\[
m \Big(\frac{t}{\sigma \sqrt{n}}\Big) = 1 + \frac{t^2}{2n} + \frac{[m''(\xi) - \sigma^2]t^2}{2n\sigma^2}
\]
where \(\xi \in (0, \frac{t}{\sigma \sqrt{n}})\) and \(t \in (-h\sigma \sqrt{n}, h \sigma \sqrt{n})\).
Thus,
\[
M(t ; n) = \Big\{1 + \frac{t^2}{2n} + \frac{[m''(\xi) - \sigma^2]t^2}{2n\sigma^2} \Big\}^n .
\]
Since \(m''(t)\) is continuous at \(t = 0\) and \(\xi \to 0\) as \(n \to \infty\),
\[
\lim_{n \to \infty}[m''(\xi) - \sigma^2] = 0.
\]
Note that we use the fact:
\[
\lim_{n \to \infty} \Big(1 + \frac{x}{n} \Big)^n = e^x
\]
with \(x = \frac{t^2}{2}\). Thus, for all \(t \in \mathbb{R}\),
\[
\lim_{n \to \infty} M(t ; n) = e^{\frac{t^2}{2}}.
\]
This is the mgf of the standard normal distribution \(N(0, 1)\).
Therefore, the random variable \(Y_n = \frac{\sqrt{n}(\bar{X_n}-\mu)}{\sigma}\) has an asymptotic
standard normal distribution:
\[
Y_n = \frac{\sqrt{n}(\bar{X_n}-\mu)}{\sigma} \xrightarrow{D} N(0, 1).
\]
Note: Without standardization, we can say that
\[
\sqrt{n}(\bar{X_n}-\mu) \xrightarrow{D} N(0, \sigma^2).
\]
Note: We can get the mgf of \(N(0, 1)\) as follows:
\[
\begin{align*}
M(t) &= \int_{-\infty}^{\infty} e^{tx} \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} \, dx \\\\
&= \int_{-\infty}^{\infty} e^{-\frac{1}{2}(x^2 -2tx)} \frac{1}{\sqrt{2\pi}} \, dx \\\\
&= \int_{-\infty}^{\infty} e^{-\frac{1}{2}\{(x-t)^2 -t^2\}} \frac{1}{\sqrt{2\pi}} \, dx \\\\
&= e^{\frac{t^2}{2}}\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{(x-t)^2}{2}} \, dx \\\\
&= e^{\frac{t^2}{2}} \cdot 1
\end{align*}
\]