Convergence

Convergence in Probability

Even in computer science, we often accept the convergence of probabilities and distributions based on experimental results. However, to gain a deeper understanding of statistics, we introduce formal definitions to clarify what convergence means in statistical contexts.

Let \(\{X_n\}\) be a sequence of random variables and \(X\) be a random variable defined on a sample space. \(X_n\) converges in probability to \(X\) denoted by \[ X_n \xrightarrow{P} X, \] if \(\quad \forall \epsilon > 0\), \[ \lim_{n \to \infty} P [| X_n - X | \geq \epsilon ] = 0 \] or equivalently, \[ \lim_{n \to \infty} P [| X_n - X | < \epsilon ] = 1. \]

Theorem 1: Suppose \(X_n \xrightarrow{P} a\) where \(a\) is a constant, and the real function \(f\) is continuous at \(a\). Then \[ f(X_n) \xrightarrow{P} f(a). \]

Proof: Let \(\epsilon > 0\). Since \(f\) is continuous at \(a\), \(\, \exists \delta > 0 \) such that if \[ |x - a | < \delta \Longrightarrow |f(x) - f(a)| < \epsilon. \] Thus, \[ |f(x) - f(a)| \geq \epsilon \Longrightarrow |x - a | \geq \delta \] Substituting \(X_n\) for \(x\), we obtain \[ P[|f(X_n) - f(a)| \geq \epsilon] \leq P [| X_n - a| \geq \delta ]. \] As \(n \to \infty\), we have \(f(X_n) \xrightarrow{P} f(a)\).

In general, if \(X_n \xrightarrow{P} X,\) and \(f\) is a continuous function, then \[ f(X_n) \xrightarrow{P} f(X). \]

Convergence in distribution

Let \(\{X_n\}\) be a sequence of random variables and let \(X\) be a random variable. Let \(F_{X_n}\) and \(F_X\) be the cdfs of \(X_n\) and \(X\) respectively.
Let \(C(F_X)\) denote the set of all points where \(F_X\) is continuous.
\(X_n\) converges in distribution to \(X\) denoted by \[ X_n \xrightarrow{D} X, \] if \(\quad \forall x \in C(F_{X})\), \[ \lim_{n \to \infty} F_{X_n} (x) = F_X (x). \] Often the distribution of \(X\) is called the asymptotic(limiting) distribution of the sequence of random variables of \(\{X_n\}\).
In this case, \(X_n\) does NOT always get close to \(X\) in probability. However, following the theorem gives us a connection between the two concepts.

Theorem 2: If \(X_n\) converges to \(X\) in probability, then \(X_n\) converges to \(X\) in distribution.

Proof: Suppose \(X_n \xrightarrow{P} X\), and let \(x\) be a point of continuity of \(F_{x_n}\).
\(\forall \, \epsilon > 0\), \[ \begin{align*} F_{X_n}(x) &= P[X_n \leq x] \\\\ &= P[\{X_n \leq x\} \cap \{|X_n - X| < \epsilon\}] + P [\{X_n \leq x\} \cap \{|X_n - X| \geq \epsilon\}] \\\\ &\leq P[X \leq x + \epsilon] + P[|X_n - X| \geq \epsilon] \end{align*} \] Since \(X_n \xrightarrow{P} X\), we know \(P[|X_n - X| \geq \epsilon] \to 0\). Then we get a upper bound \[ \lim_{n \to \infty} \sup F_{X_n}(x) \leq F_{X}(x + \epsilon) \tag{1}. \] Similarly, we can get a lower bound: \[ P[X_n > x ] \leq P[X \geq x - \epsilon] + P[|X_n - X| \geq \epsilon] \] \[ \Longrightarrow \lim_{n \to \infty} \inf F_{X_n}(x) \geq F_X (x - \epsilon) \tag{2}. \] Combining (1) and (2), we obtain \[ F_X(x - \epsilon) \leq \lim_{n \to \infty} \inf F_{X_n}(x) \leq \lim_{n \to \infty} \sup F_{X_n}(x) \leq F_X (x + \epsilon). \] Here, as \(\epsilon \to 0\), we have \[ \lim_{n \to \infty} F_{X_n}(x) = F_X (x) \] because \(x\) is a point of continuity of \(F_{x_n}\).
Therefore \[ X_n \xrightarrow{D} X. \]

Now, we can see that the central limit theorem(CLT) is a statement about convergence in distribution. (See Normal Distribution)

Moment Generating Function(mgf)

Let \(X\) be a random variable such that for some \(h > 0\), the expectation of \(e^{tX}\) exists for \(t \in (-h, h)\). The moment generating function(mgf) of \(X\) is given by \[ M(t) = \mathbb{E } (e^{tX}), \qquad t \in (-h, h). \] Note: we must need an open interval about 0.

We revisit the central limit theorem and now we can prove it under some assumption.

Theorem 3: Central Limit Theorem(CLT) Let \(X_1, X_2, \cdots, X_n\) be a random sample from a distribution that has mean \(\mu\) and variance \(\sigma^2 >0\). Then the random variable \[ Y_n = \frac{\sum X_i - n\mu}{\sigma \sqrt{n}} = \frac{\sqrt{n}(\bar{X_n}-\mu)}{\sigma} \] converges in distribution to a random variable that has a standard normal distribution: \[ Y_n \xrightarrow{D} N(0, 1). \]

Proof: Assume that the mgf \(M(t) = \mathbb{E } (e^{tX})\) exists for \(t \in (-h, h)\).
The mgf for \(X-\mu\) is: \[ m(t) = \mathbb{E }[e^{t(X-\mu)}] = e^{-\mu t} M(t), \qquad t \in (-h, h) \] and satisfies that \(m(0) = 1\), \(\quad m'(0) = \mathbb{E }(X -\mu)\), and \(m''(0) = \mathbb{E }[(X-\mu)^2] = \sigma^2\).
By Taylor's theorem with remainder, \(\exists \, \xi \in (0, t)\) such that \[ \begin{align*} m(t) &= m(0) + m'(0)t + \frac{m''(\xi)t^2}{2} \\\\ &= m(0) + m'(0)t + \frac{\sigma^2 t^2}{2} - \frac{\sigma^2 t^2}{2} + \frac{m''(\xi)t^2}{2} \\\\ &= 1 + \frac{\sigma^2 t^2}{2} + \frac{[m''(\xi) - \sigma^2]t^2}{2} \tag{3} \end{align*} \] Consider the mgf for the normalized sum: \[ \begin{align*} M(t ; n) &= \mathbb{E }\Big[\exp \Big(t \frac{\sum X_i - n \mu}{\sigma \sqrt{n}}\Big)\Big] \\\\ &= \Big\{\mathbb{E } \Big[\exp \Big(t \frac{X - \mu}{\sigma \sqrt{n}}\Big)\Big]\Big\}^n \\\\ &= \Big[m \Big(\frac{t}{\sigma \sqrt{n}}\Big)\Big]^n \end{align*} \] where \(\frac{t}{\sigma \sqrt{n}} \in (-h, h) \).

Replacing \(t\) by \(\frac{t}{\sigma \sqrt{n}}\) in equation (3): \[ m \Big(\frac{t}{\sigma \sqrt{n}}\Big) = 1 + \frac{t^2}{2n} + \frac{[m''(\xi) - \sigma^2]t^2}{2n\sigma^2} \] where \(\xi \in (0, \frac{t}{\sigma \sqrt{n}})\) and \(t \in (-h\sigma \sqrt{n}, h \sigma \sqrt{n})\).

Thus, \[ M(t ; n) = \Big\{1 + \frac{t^2}{2n} + \frac{[m''(\xi) - \sigma^2]t^2}{2n\sigma^2} \Big\}^n . \] Since \(m''(t)\) is continuous at \(t = 0\) and \(\xi \to 0\) as \(n \to \infty\), \[ \lim_{n \to \infty}[m''(\xi) - \sigma^2] = 0. \] Note that we use the fact: \[ \lim_{n \to \infty} \Big(1 + \frac{x}{n} \Big)^n = e^x \] with \(x = \frac{t^2}{2}\). Thus, for all \(t \in \mathbb{R}\), \[ \lim_{n \to \infty} M(t ; n) = e^{\frac{t^2}{2}}. \] This is the mgf of the standard normal distribution \(N(0, 1)\). Therefore, the random variable \(Y_n = \frac{\sqrt{n}(\bar{X_n}-\mu)}{\sigma}\) has an asymptotic standard normal distribution: \[ Y_n = \frac{\sqrt{n}(\bar{X_n}-\mu)}{\sigma} \xrightarrow{D} N(0, 1). \]

Note: Without standardization, we can say that \[ \sqrt{n}(\bar{X_n}-\mu) \xrightarrow{D} N(0, \sigma^2). \]
Note: We can get the mgf of \(N(0, 1)\) as follows: \[ \begin{align*} M(t) &= \int_{-\infty}^{\infty} e^{tx} \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} \, dx \\\\ &= \int_{-\infty}^{\infty} e^{-\frac{1}{2}(x^2 -2tx)} \frac{1}{\sqrt{2\pi}} \, dx \\\\ &= \int_{-\infty}^{\infty} e^{-\frac{1}{2}\{(x-t)^2 -t^2\}} \frac{1}{\sqrt{2\pi}} \, dx \\\\ &= e^{\frac{t^2}{2}}\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{(x-t)^2}{2}} \, dx \\\\ &= e^{\frac{t^2}{2}} \cdot 1 \end{align*} \]

Probability & Statistics

Convergence in Probability

Convergence in distribution

Moment Generating Function(mgf)