Measure Theory with Probability

Probability Space

In the previous section, we found out that some functions cannot be integrable using either the Riemann or improper Riemann integration. To handle such cases, we need to introduce measure theory and the Lebesgue integralation. Since our focus is on applied mathematics, particularly in the context of statistics and machine learning, we will concentrate on probability-related measure theory. While we'll avoid covering every foundational topic and formal mathematical detail, our goal is to build a solid understanding that leads to the introduction of the Lebesgue integration.

First, we need a formal definition of probalistic model:

A probability space is a triple \(\Omega, \mathcal{F}, \mathbb{P}\) where

\(\Omega\) is the sample space:
\(\mathcal{F}\) is a \(\sigma\)-algebra:
\(\mathbb{P}\) is a probability measure:

Sample Space

The sample space \(\Omega\) can be finite, countable, or uncountable. An element of \(\Omega\) is denoted by \(\omega\), and is called an elementary outcomes.

For example, if our experiment consists of an infinite number of consecutive rolls of a die, the sample space is the set: \[ \Omega = \{1, 2, 3, 4, 5, 6\}^{\infty} \] and an elementary outcome is an infinite sequence such as: \[ \omega = \{1, 1, 4, 3, 1, 5, \cdots\}. \]
A simpler case of the probability space can be a discrete probability space. In this case, the sample space is finite, or countable: \[ \Omega = \{\omega_1, \omega_2, \cdots \}. \] and \(\sigma\)-algebra is the set of all subsets of \(\Omega\). Then the probability measure assigns a number in the set \([0, 1]\) to every subset of \(\Omega\). It is defined in terms of the probabilities \(\mathbb{P}(\{\omega\})\) of the elementary outcomes and satisfies \[ \forall A \subset \Omega, \quad P(A) = \sum_{\omega \in A}\mathbb{P}(\{\omega\}) \] and \[ \sum_{\omega \in \Omega}\mathbb{P}(\{\omega\}) = 1. \] Note: We will use \(\mathbb{P}(\omega)\) instead of \(\mathbb{P}(\{\omega\})\) and \(\mathbb{P}(\omega_i)\) will be denoted by \(p_i\).

\(\sigma\)-algebra (\(\sigma\)-field)

Ideally, we wish to specify the probability \(\mathbb{P} (A)\) of "every" subset of \(\Omega\). However, it is too complicated, especially, in the case where \(\Omega\) is uncountable. So, we assign probabilities to only a partial collection of subsets of \(\Omega\). The sets in this collection are to be thought of as the “nice” and "interesting" subsets of \(\Omega\). Formally, we define the collection as follows:

\(\sigma\)-algebra, \(\mathcal{F}\) is a collection of subsets of \(\Omega\) with the following properties:

\(\mathcal{\emptyset} \in \mathcal{F}\).
If \(A \in \mathcal{F}\), then \(A^c \in \mathcal{F}\).
If \(\forall i \in \mathbb{N}, A_i \in \mathcal{F}\), then \(\bigcup_{i =1}^{\infty} A_i \in \mathcal{F}\).

An event \(A\) is called an \(\mathcal{F}\)-measurable set, or simply a measurable set. The pair \((\Omega, \mathcal{F})\) is called a measurable space.

Given collection \(\mathcal{C}\) of subsets of \(\Omega\), we want to define \(\sigma\)-algebra \(\mathcal{F}\) as the intersection of "all" \(\sigma\)-algebras that contains \(\mathcal{C}\) becuase we only need the smallest \(\sigma\)-algebra containing \(\mathcal{C}\). In this manner, \(\mathcal{F}\) is said to be the \(\sigma\)-algebra generated by \(\mathcal{C}\), and is denoted by \(\sigma(\mathcal{C})\).

Probability Measure

A collection of sets \(A_{\alpha} \subset \Omega\) where \(\alpha\) ranges over some index set is mutually exclusive or that the sets are disjoint if \(A_{\alpha} \cap A_{\alpha'} = \emptyset \) whenever \(\alpha \neq \alpha'\). Also, the sets \(A_{\alpha} \subset \Omega\) are called collectively exhaustive if \(\, \cup_{\alpha} A_{\alpha} = \Omega\).

A measure is a function \[ \mu: \mathcal{F} \to [0, \infty] \] which assigns a nonnegative extended real number \(\mu(A)\) to every set \(A \in \mathcal{F}\), and which satisfies the following two conditions:

\(\mu(\emptyset) = 0\)
Countable additivity (\(\sigma\)-additivity):

A probability measure is a measure \(\mathbb{P}\) with the additional property \(\mathbb{P}(\Omega) = 1\). In this case, the triple \((\Omega, \mathcal{F}, \mathbb{P})\) is called a probability space.

The countable additivity implies that probabilities (and more generally, measures) behave like the notions of volume: the volume of a countable union of disjoint sets is the sum of their individual volumes. Indeed, a measure is to be understood as some generalized notion of a volume.

Finite Additivity

Now, replacing countable unions and sums by finite ones, we get the following definitions:

An algebra (or, a field) is a collection \(\mathcal{F}_0\) of subsets of \(\Omega\) with the following properties:

\(\emptyset \in \mathcal{F}\).
If \(A \in \mathcal{F}\), then \(A^c \in \mathcal{F}\).
If \(A, B \in \mathcal{F}\), then \(A \cup B \in \mathcal{F}\).

A function \(\mathbb{P}: \mathcal{F}_0 \to [0, 1]\) is said to be finitely additive if \[ \begin{align*} &A, B \in \mathcal{F}_0 , \quad A \cap B = \emptyset \\\\ &\Longrightarrow \mathbb{P}(A \cap B) = \mathbb{P}(A) + \mathbb{P}(B). \end{align*} \] Note: This is strictly weaker than the countable additivity property of probability measures.

Caratheodory's Extension Theorem

Theorem 1: Caratheodory's Extension Theorem Let \(\mathcal{F}_0\) be an algebra of subsets of a smaple space \(\Omega\), and let \(\mathcal{F} = \sigma(\mathcal{F}_0)\) be the \(\sigma\)-algebra that it generates.
Suppose that \(\mathbb{P}_0 : \mathcal{F}_0 \to [0, 1]\) that satisfies \(\mathbb{P}_0(\Omega) = 1\) as well as countable additivity on \(\mathcal{F}_0\).
Then, \(\mathbb{P}_0\) can be extended uniquely to a probability measure on \((\Omega, \mathcal{F})\). That is, there exists a unique probability measure \(\mathbb{P}\) on \((\Omega, \mathcal{F})\) such that \[ \forall A \in \mathcal{F}_0, \quad \mathbb{P}(A) = \mathbb{P}_0(A). \]

Lebesgue measure

The uniform distribution on the interval \([0, 1]\) assigns probability \(b - a\) to every interval \([a, b] \subset [0, 1]\). We want to define the appropriate \(\sigma\)-algebra and the probability measure on the sample space \(\Omega = [0, 1]\), but first, we consider the smaple space: \[ \Omega' = (0, 1]. \] Consider the collection \(\mathcal{C}\) of all intervals \([a, b]\) contained in \((0, 1]\) and let \(\mathcal{F}\) be the \(\sigma\)-algebra generated by \(\mathcal{C}\). This is called the Borel \(\sigma\)-algebra and is denoted by \(\mathcal{B}\). Every set that belongs to this \(\sigma\)-algebra is called the Borel (measurable) set. Any set that can be formed by starting with intervals \([a, b]\) is a Borel set. For example, the set of rational numbers in \((0, 1]\), and its complement, the set of irrational numbers in \((0, 1]\) are Borel sets.

Since defining a probability measure for all Borel sets is too complicated, we start with a "smaller" collection, \(\mathcal{F}_0 \subset (0, 1]\). We let \(\mathcal{F}_0\) consist of the empty set and all sets that are finite unions of intervals of the form \((a, b]\). For example, \[ A \in \mathcal{F}_0 \Longrightarrow A = (a_1, b_1] \cup (a_2, b_2] \cdots \cup (a_n, b_n], \] where \(0 \leq a_1 < b_1 \leq a_2 < b_2 \leq \cdots \leq a_n < b_n \leq 1, \quad n \in \mathbb{N}\). Also, we define: \[ \mathbb{P}_0(A) = (b_1 - a_1) + (b_2 - a_2) + \cdots + (b_n - a_n) \] which is \(\sigma\)-additive on \(\mathcal{F}_0\).

We can now apply the Caratheodory's extension theorem, and conclude that there exists a probability measure \(\mathbb{P}\) defined on the entire Borel \(\sigma\)-algebra \(\mathcal{B}\), that agrees with \(\mathbb{P}_0\) on \(\mathcal{F}_0\). We call this measure the Lebesgue or uniform measure. In particular, \[ \forall \, (a, b] \subset (0, 1], \quad \mathbb{P}((a, b]) = b - a. \]
Finally, by adding \(\{0\}\) to the sample space \(\Omega'\), and assigning zero probability to it, we obtain the uniform distribution model with sample space \(\Omega = [0, 1]\). (We only need to check the measurability of \(\{0\}\), and it is also a Borel set.)

Revisit our Problem: Remember, in the previous section, we encountered the following problem:
Consider the Dirichlet function: \[ f(x)= \begin{cases} 1 &\text{if \(x \in \mathbb{Q}\)} \\ 0 &\text{if \(x \in \mathbb{R} \setminus \mathbb{Q}\)} \end{cases} \] compute: \[ \int_0^1 f(x)dx \]
Now, we can see the interval \([0, 1]\) as the sample space \(\Omega\). The smallest \(\sigma\)-algebra that contains every interval \([a, b] \subset [0, 1]\) is the Borel \(\sigma\)-algebra. Moreover, the Dirichlet function is measurable with respect to the Borel \(\sigma\)-algebra because both the set of rationals: \(\mathbb{Q} \cap [0, 1]\) and irrationals \([0, 1] \setminus \mathbb{Q}\) are Borel sets. Therefore, in short, the Dirichlet function is Lebesgue measurable.

We are getting closer to the Lebesgue integration, but we would like to learn more about Lebesgue measure.

Consider the sample space \(\Omega = \mathbb{R}\). As usual, we define a \(\sigma\)-algebra of subsets of \(\mathbb{R}\). Let \(\mathcal{C}\) be the collection of all intervals of the form \([a, b]\) and let \(\mathcal{B} = \sigma(\mathcal{C})\) be the \(\sigma\)-algebra that it generates.
Let \(\mathbb{P}_n\) be the uniform measure on \((n, n+1]\). Given a set \(A \subset \mathbb{R}\), we decompose it into countably many pieces, each piece contained in some interval \((n, n+1]\), and define its “length” \(\mu(A)\) using countable additivity as follows: \[ \mu(A) = \sum_{n = -\infty}^{\infty} \mathbb{P}_n \left(A \cap (n, n+1] \right). \] Since \(A \cap (n, n+1]\) is a measurable subset of \((n, n+1]\), \(\mathbb{P}_n (A \cap (n, n+1]) \geq 0\). Thus, nonnegativity is hold: \[ \mu(A) \in [0, \infty]. \] Also, \(\emptyset \cap (n, n+1] = \emptyset\) and \(\mathbb{P}_n ( \emptyset) = 0\), thus \[\mu(\emptyset) = 0. \] Now, we need to check countable additivity. Let \(\{A_i\}_{i=1}^{\infty}\) be a countable collection of pairwise disjoint sets in \(\mathcal{B}\). For each fixed \(n\), the set \(\left\{A_i \cap (n, n+1] \right\}_{i=1}^{\infty}\) are also pairwise disjoint. Since \(\mathbb{P}_n\) is the uniform measure on \((n, n+1]\), it satisfies countable additivity. That is, \[ \begin{align*} &\mathbb{P}_n \left(\bigcup_{i=1}^{\infty} (A_i \cap (n, n+1] )\right) = \sum_{i=1}^{\infty} \mathbb{P}_n \left(A_i \cap (n, n+1] \right) \\\\ &\Longrightarrow \mathbb{P}_n \left( \left( \bigcup_{i=1}^{\infty} A_i \right) \cap (n, n+1] \right) = \sum_{i=1}^{\infty} \mathbb{P}_n \left(A_i \cap (n, n+1]\right) \\\\ \end{align*} \] Then we have: \[ \begin{align*} \mu\left( \bigcup_{i=1}^{\infty} A_i \right) &= \sum_{n=-\infty}^{\infty} \mathbb{P}_n \left(\left( \bigcup_{i=1}^{\infty} A_i \right) \cap (n, n+1] \right) \\\\ &= \sum_{n= -\infty}^{\infty} \sum_{i = 1}^{\infty} \mathbb{P}_n (A_i \cap (n, n+1] ) \\\\ &= \sum_{i = 1}^{\infty} \sum_{n= -\infty}^{\infty} \mathbb{P}_n (A_i \cap (n, n+1] ) \\\\ &= \sum_{i = 1}^{\infty} \mu(A_i). \end{align*} \] (We can exchange the order of summation because the terms in the sums are nonnegative.)
Therefore, \(\mu\) is a measure on \((\mathbb{R}, \mathcal{B})\), which is the Lebesgue measure.

Note: Since \(\mu(\mathbb{R}) = \infty\), this is NOT a probability measure.