Convergence

Convergence in Probability Convergence in distribution Moment Generating Function(mgf)

Convergence in Probability

Even in computer science, we often accept the convergence of probabilities and distributions based on experimental results. However, to gain a deeper understanding of statistics, we introduce formal definitions to clarify what convergence means in statistical terms.

Let \(\{X_n\}\) be a sequence of random variables and \(X\) be a random variable defined on a sample space. \(X_n\) converges in probability to \(X\) denoted by \[ X_n \xrightarrow{P} X, \] if \(\quad \forall \epsilon > 0\), \[ \lim_{n \to \infty} P [| X_n - X | \geq \epsilon ] = 0 \] or equivalently, \[ \lim_{n \to \infty} P [| X_n - X | < \epsilon ] = 1. \]

Theorem 1: Suppose \(X_n \xrightarrow{P} a\) where \(a\) is a constant, and the real function \(f\) is continuous at \(a\). Then \[ f(X_n) \xrightarrow{P} f(a). \]
Proof: Let \(\epsilon > 0\). Since \(f\) is continuous at \(a\), \(\, \exists \delta > 0 \) such that if \[ |x - a | < \delta \Longrightarrow |f(x) - f(a)| < \epsilon. \] Thus, \[ |f(x) - f(a)| \geq \epsilon \Longrightarrow |x - a | \geq \delta \] Substituting \(X_n\) for \(x\), we obtain \[ P[|f(X_n) - f(a)| \geq \epsilon] \leq P [| X_n - a| \geq \delta ]. \] As \(n \to \infty\), we have \(f(X_n) \xrightarrow{P} f(a)\).

In general, if \(X_n \xrightarrow{P} X,\) and \(f\) is a continuous function, then \[ f(X_n) \xrightarrow{P} f(X). \]

Convergence in distribution

Let \(\{X_n\}\) be a sequence of random variables and let \(X\) be a random variable. Let \(F_{X_n}\) and \(F_X\) be the cdfs of \(X_n\) and \(X\) respectively.
Let \(C(F_X)\) denote the set of all points where \(F_X\) is continuous.
\(X_n\) converges in distribution to \(X\) denoted by \[ X_n \xrightarrow{D} X, \] if \(\quad \forall x \in C(F_{X})\), \[ \lim_{n \to \infty} F_{X_n} (x) = F_X (x). \] Often the distribution of \(X\) is called the asymptotic(limiting) distribution of the sequence of random variables of \(\{X_n\}\).
In this case, \(X_n\) does NOT always get close to \(X\) in probability. However, follwing theorem gives us a connection between the two concepts.

Theorem 2: If \(X_n\) converges to \(X\) in probability, then \(X_n\) converges to \(X\) in distribution.
Proof: Suppose \(X_n \xrightarrow{P} X\), and let \(x\) be a point of continuity of \(F_{x_n}\).
\(\forall \, \epsilon > 0\), \[ \begin{align*} F_{X_n}(x) &= P[X_n \leq x] \\\\ &= P[\{X_n \leq x\} \cap \{|X_n - X| < \epsilon\}] + P [\{X_n \leq x\} \cap \{|X_n - X| \geq \epsilon\}] \\\\ &\leq P[X \leq x + \epsilon] + P[|X_n - X| \geq \epsilon] \end{align*} \] Since \(X_n \xrightarrow{P} X\), we know \(P[|X_n - X| \geq \epsilon] \to 0\). Then we get a upper bound \[ \lim_{n \to \infty} \sup F_{X_n}(x) \leq F_{X}(x + \epsilon) \tag{1}. \] Similarly, we can get a lower bound: \[ P[X_n > x ] \leq P[X \geq x - \epsilon] + P[|X_n - X| \geq \epsilon] \] \[ \Longrightarrow \lim_{n \to \infty} \inf F_{X_n}(x) \geq F_X (x - \epsilon) \tag{2}. \] Combining (1) and (2), we obtain \[ F_X(x - \epsilon) \leq \lim_{n \to \infty} \inf F_{X_n}(x) \leq \lim_{n \to \infty} \sup F_{X_n}(x) \leq F_X (x + \epsilon). \] Here, as \(\epsilon \to 0\), we have \[ \lim_{n \to \infty} F_{X_n}(x) = F_X (x) \] because \(x\) is a point of continuity of \(F_{x_n}\).
Therefore \[ X_n \xrightarrow{D} X. \]
Now, we can see that the central limit theorem(CLT) is a statement about convergence in distribution. (See Normal Distribution)

Moment Generating Function(mgf)

Let \(X\) be a random variable such that for some \(h > 0\), the expectation of \(e^{tX}\) exists for \(t \in (-h, h)\). The moment generating function(mgf) of \(X\) is given by \[ M(t) = \mathbb{E } (e^{tX}), \qquad t \in (-h, h). \] Note: we must need an open interval about 0.

We revisit the central limit theorem and now we can prove it under some assumption.

Theorem 3: Central Limit Theorem(CLT) Let \(X_1, X_2, \cdots, X_n\) be a random sample from a distribution that has mean \(\mu\) and variance \(\sigma^2 >0\). Then the random variable \[ Y_n = \frac{\sum X_i - n\mu}{\sigma \sqrt{n}} = \frac{\sqrt{n}(\bar{X_n}-\mu)}{\sigma} \] converges in distribution to a random variable that has a standard normal distribution: \[ Y_n \xrightarrow{D} N(0, 1). \]
Proof: Assume that the mgf \(M(t) = \mathbb{E } (e^{tX})\) exists for \(t \in (-h, h)\).
The mgf for \(X-\mu\) is: \[ m(t) = \mathbb{E }[e^{t(X-\mu)}] = e^{-\mu t} M(t), \qquad t \in (-h, h) \] and satisfies that \(m(0) = 1\), \(\quad m'(0) = \mathbb{E }(X -\mu)\), and \(m''(0) = \mathbb{E }[(X-\mu)^2] = \sigma^2\).
By Taylor's theorem, \(\exists \, \xi \in (0, t)\) such that \[ \begin{align*} m(t) &= m(0) + m'(0)t + \frac{m''(\xi)t^2}{2} \\\\ &= m(0) + m'(0)t + \frac{\sigma^2 t^2}{2} - \frac{\sigma^2 t^2}{2} + \frac{m''(\xi)t^2}{2} \\\\ &= 1 + \frac{\sigma^2 t^2}{2} + \frac{[m''(\xi) - \sigma^2]t^2}{2} \tag{3} \end{align*} \] Consider the mgf for the normalized sum: \[ \begin{align*} M(t ; n) &= \mathbb{E }\Big[\exp \Big(t \frac{\sum X_i - n \mu}{\sigma \sqrt{n}}\Big)\Big] \\\\ &= \Big\{\mathbb{E } \Big[\exp \Big(t \frac{X - \mu}{\sigma \sqrt{n}}\Big)\Big]\Big\}^n \\\\ &= \Big[m \Big(\frac{t}{\sigma \sqrt{n}}\Big)\Big]^n \end{align*} \] where \(\frac{t}{\sigma \sqrt{n}} \in (-h, h) \).

Replacing \(t\) by \(\frac{t}{\sigma \sqrt{n}}\) in equation (3): \[ m \Big(\frac{t}{\sigma \sqrt{n}}\Big) = 1 + \frac{t^2}{2n} + \frac{[m''(\xi) - \sigma^2]t^2}{2n\sigma^2} \] where \(\xi \in (0, \frac{t}{\sigma \sqrt{n}})\) and \(t \in (-h\sigma \sqrt{n}, h \sigma \sqrt{n})\).

Thus, \[ M(t ; n) = \Big\{1 + \frac{t^2}{2n} + \frac{[m''(\xi) - \sigma^2]t^2}{2n\sigma^2} \Big\}^n . \] Since \(m''(t)\) is continuous at \(t = 0\) and \(\xi \to 0\) as \(n \to \infty\), \[ \lim_{n \to \infty}[m''(\xi) - \sigma^2] = 0. \] Note that we use the fact: \[ \lim_{n \to \infty} \Big(1 + \frac{x}{n} \Big)^n = e^x \] with \(x = \frac{t^2}{2}\). Thus, for all \(t \in \mathbb{R}\), \[ \lim_{n \to \infty} M(t ; n) = e^{\frac{t^2}{2}}. \] This is the mgf of the standard normal distribution \(N(0, 1)\). Therefore, the random variable \(Y_n = \frac{\sqrt{n}(\bar{X_n}-\mu)}{\sigma}\) has an asymptotic standard normal distribution: \[ Y_n = \frac{\sqrt{n}(\bar{X_n}-\mu)}{\sigma} \xrightarrow{D} N(0, 1). \]
Note: Without standardization, we can say that \[ \sqrt{n}(\bar{X_n}-\mu) \xrightarrow{D} N(0, \sigma^2). \]
Note: We can get the mgf of \(N(0, 1)\) as follows: \[ \begin{align*} M(t) &= \int_{-\infty}^{\infty} e^{tx} \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} \, dx \\\\ &= \int_{-\infty}^{\infty} e^{-\frac{1}{2}(x^2 -2tx)} \frac{1}{\sqrt{2\pi}} \, dx \\\\ &= \int_{-\infty}^{\infty} e^{-\frac{1}{2}\{(x-t)^2 -t^2\}} \frac{1}{\sqrt{2\pi}} \, dx \\\\ &= e^{\frac{t^2}{2}}\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{(x-t)^2}{2}} \, dx \\\\ &= e^{\frac{t^2}{2}} \cdot 1 \end{align*} \]