Probability Space
In the previous section, we found out that some functions cannot be integrable using either the Riemann or improper
Riemann integration. To handle such cases, we need to introduce measure theory and the Lebesgue integralation.
Since our focus is on applied mathematics, particularly in the context of statistics and machine learning, we will concentrate
on probability-related measure theory. While we'll avoid covering every foundational topic and formal mathematical
detail, our goal is to build a solid understanding that leads to the introduction of the Lebesgue integration.
First, we need a formal definition of probalistic model:
A probability space is a triple \(\Omega, \mathcal{F}, \mathbb{P}\) where
- \(\Omega\) is the sample space:
The set of possible outcomes of an experiment.
- \(\mathcal{F}\) is a \(\sigma\)-algebra:
A collection of of subsets of the sample space \(\Omega\).
- \(\mathbb{P}\) is a probability measure:
A function that assigns a nonnegative probability to every set in the \(\sigma\)-algebra \(\mathcal{F}\).
Sample Space
The sample space \(\Omega\) can be finite, countable, or uncountable. An element of \(\Omega\) is denoted by
\(\omega\), and is called an elementary outcomes.
For example, if our experiment consists of an infinite number of consecutive rolls of a die, the sample space is the set:
\[
\Omega = \{1, 2, 3, 4, 5, 6\}^{\infty}
\]
and an elementary outcome is an infinite sequence such as:
\[
\omega = \{1, 1, 4, 3, 1, 5, \cdots\}.
\]
A simpler case of the probability space can be a discrete probability space. In this case, the sample space
is finite, or countable:
\[
\Omega = \{\omega_1, \omega_2, \cdots \}.
\]
and \(\sigma\)-algebra is the set of all subsets of \(\Omega\). Then the probability measure assigns a number in the set \([0, 1]\) to
every subset of \(\Omega\). It is defined in terms of the probabilities \(\mathbb{P}(\{\omega\})\) of the elementary outcomes and satisfies
\[
\forall A \subset \Omega, \quad P(A) = \sum_{\omega \in A}\mathbb{P}(\{\omega\})
\]
and
\[
\sum_{\omega \in \Omega}\mathbb{P}(\{\omega\}) = 1.
\]
Note: We will use \(\mathbb{P}(\omega)\) instead of \(\mathbb{P}(\{\omega\})\) and \(\mathbb{P}(\omega_i)\) will be denoted by \(p_i\).
\(\sigma\)-algebra (\(\sigma\)-field)
Ideally, we wish to specify the probability \(\mathbb{P} (A)\) of "every" subset of \(\Omega\). However, it
is too complicated, especially, in the case where \(\Omega\) is uncountable. So, we assign probabilities to only
a partial collection of subsets of \(\Omega\). The sets in this collection are to be thought of as the “nice”
and "interesting" subsets of \(\Omega\). Formally, we define the collection as follows:
\(\sigma\)-algebra, \(\mathcal{F}\) is a collection of subsets of \(\Omega\) with the following properties:
- \(\mathcal{\emptyset} \in \mathcal{F}\).
- If \(A \in \mathcal{F}\), then \(A^c \in \mathcal{F}\).
- If \(\forall i \in \mathbb{N}, A_i \in \mathcal{F}\), then \(\bigcup_{i =1}^{\infty} A_i \in \mathcal{F}\).
An event \(A\) is called an \(\mathcal{F}\)-measurable set, or simply a measurable set. The pair
\((\Omega, \mathcal{F})\) is called a measurable space.
Given collection \(\mathcal{C}\) of subsets of \(\Omega\), we want to define \(\sigma\)-algebra \(\mathcal{F}\) as the intersection of
"all" \(\sigma\)-algebras that contains \(\mathcal{C}\) becuase we only need the smallest \(\sigma\)-algebra containing
\(\mathcal{C}\). In this manner, \(\mathcal{F}\) is said to be the \(\sigma\)-algebra generated by \(\mathcal{C}\), and is denoted
by \(\sigma(\mathcal{C})\).
Probability Measure
A collection of sets \(A_{\alpha} \subset \Omega\) where \(\alpha\) ranges over some index set is
mutually exclusive or that the sets are disjoint if \(A_{\alpha} \cap A_{\alpha'} = \emptyset \)
whenever \(\alpha \neq \alpha'\). Also, the sets \(A_{\alpha} \subset \Omega\) are called collectively exhaustive
if \(\, \cup_{\alpha} A_{\alpha} = \Omega\).
A measure is a function
\[
\mu: \mathcal{F} \to [0, \infty]
\]
which assigns a nonnegative extended real number \(\mu(A)\) to every set \(A \in \mathcal{F}\), and which satisfies the
following two conditions:
- \(\mu(\emptyset) = 0\)
- Countable additivity (\(\sigma\)-additivity):
If \(\{A_i\}\) is a sequence of disjoint sets that belong to \(\mathcal{F}\), then
\[
\mu(\cup_i A_i) = \sum_{i =1}^{\infty} \mu(A_i).
\]
A probability measure is a measure \(\mathbb{P}\) with the additional property \(\mathbb{P}(\Omega) = 1\).
In this case, the triple \((\Omega, \mathcal{F}, \mathbb{P})\) is called a probability space.
The countable additivity implies that probabilities (and more generally, measures) behave like the notions of volume:
the volume of a countable union of disjoint sets is the sum of their individual volumes. Indeed, a measure is to be
understood as some generalized notion of a volume.
Finite Additivity
Now, replacing countable unions and sums by finite ones, we get the following definitions:
An algebra (or, a field) is a collection \(\mathcal{F}_0\) of subsets of \(\Omega\) with the following properties:
- \(\emptyset \in \mathcal{F}\).
- If \(A \in \mathcal{F}\), then \(A^c \in \mathcal{F}\).
- If \(A, B \in \mathcal{F}\), then \(A \cup B \in \mathcal{F}\).
A function \(\mathbb{P}: \mathcal{F}_0 \to [0, 1]\) is said to be finitely additive if
\[
\begin{align*}
&A, B \in \mathcal{F}_0 , \quad A \cap B = \emptyset \\\\
&\Longrightarrow \mathbb{P}(A \cap B) = \mathbb{P}(A) + \mathbb{P}(B).
\end{align*}
\]
Note: This is strictly weaker than the countable additivity property of probability measures.
Caratheodory's Extension Theorem
Theorem 1: Caratheodory's Extension Theorem
Let \(\mathcal{F}_0\) be an algebra of subsets of a smaple space \(\Omega\), and
let \(\mathcal{F} = \sigma(\mathcal{F}_0)\) be the \(\sigma\)-algebra that it generates.
Suppose that \(\mathbb{P}_0 : \mathcal{F}_0 \to [0, 1]\) that satisfies \(\mathbb{P}_0(\Omega) = 1\)
as well as countable additivity on \(\mathcal{F}_0\).
Then, \(\mathbb{P}_0\) can be extended uniquely to a probability measure on \((\Omega, \mathcal{F})\).
That is, there exists a unique probability measure \(\mathbb{P}\) on \((\Omega, \mathcal{F})\) such that
\[
\forall A \in \mathcal{F}_0, \quad \mathbb{P}(A) = \mathbb{P}_0(A).
\]
Lebesgue measure
The uniform distribution on the interval \([0, 1]\) assigns probability \(b - a\) to every
interval \([a, b] \subset [0, 1]\). We want to define the appropriate \(\sigma\)-algebra and the probability
measure on the sample space \(\Omega = [0, 1]\), but first, we consider the smaple space:
\[
\Omega' = (0, 1].
\]
Consider the collection \(\mathcal{C}\) of all intervals \([a, b]\) contained in \((0, 1]\) and let \(\mathcal{F}\) be
the \(\sigma\)-algebra generated by \(\mathcal{C}\). This is called the Borel \(\sigma\)-algebra and is
denoted by \(\mathcal{B}\). Every set that belongs to this \(\sigma\)-algebra is called the Borel (measurable) set.
Any set that can be formed by starting with intervals \([a, b]\) is a Borel set. For example,
the set of rational numbers in \((0, 1]\), and its complement, the set of irrational numbers in \((0, 1]\) are Borel sets.
Since defining a probability measure for all Borel sets is too complicated, we start with a "smaller" collection,
\(\mathcal{F}_0 \subset (0, 1]\). We let \(\mathcal{F}_0\) consist of the empty set and all sets that are finite unions of intervals
of the form \((a, b]\). For example,
\[
A \in \mathcal{F}_0 \Longrightarrow A = (a_1, b_1] \cup (a_2, b_2] \cdots \cup (a_n, b_n],
\]
where \(0 \leq a_1 < b_1 \leq a_2 < b_2 \leq \cdots \leq a_n < b_n \leq 1, \quad n \in \mathbb{N}\).
Also, we define:
\[
\mathbb{P}_0(A) = (b_1 - a_1) + (b_2 - a_2) + \cdots + (b_n - a_n)
\]
which is \(\sigma\)-additive on \(\mathcal{F}_0\).
We can now apply the Caratheodory's extension theorem, and conclude that there exists a probability measure \(\mathbb{P}\) defined on the
entire Borel \(\sigma\)-algebra \(\mathcal{B}\), that agrees with \(\mathbb{P}_0\) on \(\mathcal{F}_0\). We call this measure
the Lebesgue or uniform measure. In particular,
\[
\forall \, (a, b] \subset (0, 1], \quad \mathbb{P}((a, b]) = b - a.
\]
Finally, by adding \(\{0\}\) to the sample space \(\Omega'\), and assigning zero probability to it, we obtain the uniform distribution
model with sample space \(\Omega = [0, 1]\). (We only need to check the measurability of \(\{0\}\), and it is also a Borel set.)
Revisit our Problem:
Remember, in the previous section, we encountered the following problem:
Consider the Dirichlet function:
\[
f(x)=
\begin{cases}
1 &\text{if \(x \in \mathbb{Q}\)} \\
0 &\text{if \(x \in \mathbb{R} \setminus \mathbb{Q}\)}
\end{cases}
\]
compute:
\[
\int_0^1 f(x)dx
\]
Now, we can see the interval \([0, 1]\) as the sample space \(\Omega\). The smallest \(\sigma\)-algebra that contains
every interval \([a, b] \subset [0, 1]\) is the Borel \(\sigma\)-algebra. Moreover,
the Dirichlet function is measurable with respect to the Borel \(\sigma\)-algebra because both the set of
rationals: \(\mathbb{Q} \cap [0, 1]\) and irrationals \([0, 1] \setminus \mathbb{Q}\) are Borel sets. Therefore, in short,
the Dirichlet function is Lebesgue measurable.
We are getting closer to the Lebesgue integration, but we would like to learn more about Lebesgue measure.
Consider the sample space \(\Omega = \mathbb{R}\). As usual, we define a \(\sigma\)-algebra of subsets of \(\mathbb{R}\). Let
\(\mathcal{C}\) be the collection of all intervals of the form \([a, b]\) and let \(\mathcal{B} = \sigma(\mathcal{C})\) be the
\(\sigma\)-algebra that it generates.
Let \(\mathbb{P}_n\) be the uniform measure on \((n, n+1]\). Given a set \(A \subset \mathbb{R}\), we decompose it into countably
many pieces, each piece contained in some interval \((n, n+1]\), and define its “length” \(\mu(A)\) using countable additivity as follows:
\[
\mu(A) = \sum_{n = -\infty}^{\infty} \mathbb{P}_n \left(A \cap (n, n+1] \right).
\]
Since \(A \cap (n, n+1]\) is a measurable subset of \((n, n+1]\), \(\mathbb{P}_n (A \cap (n, n+1]) \geq 0\).
Thus, nonnegativity is hold:
\[
\mu(A) \in [0, \infty].
\]
Also, \(\emptyset \cap (n, n+1] = \emptyset\) and \(\mathbb{P}_n ( \emptyset) = 0\), thus
\[\mu(\emptyset) = 0.
\]
Now, we need to check countable additivity. Let \(\{A_i\}_{i=1}^{\infty}\) be a countable collection of
pairwise disjoint sets in \(\mathcal{B}\). For each fixed \(n\), the set \(\left\{A_i \cap (n, n+1] \right\}_{i=1}^{\infty}\) are
also pairwise disjoint. Since \(\mathbb{P}_n\) is the uniform measure on \((n, n+1]\), it satisfies countable additivity. That is,
\[
\begin{align*}
&\mathbb{P}_n \left(\bigcup_{i=1}^{\infty} (A_i \cap (n, n+1] )\right) = \sum_{i=1}^{\infty} \mathbb{P}_n \left(A_i \cap (n, n+1] \right) \\\\
&\Longrightarrow \mathbb{P}_n \left( \left( \bigcup_{i=1}^{\infty} A_i \right) \cap (n, n+1] \right) = \sum_{i=1}^{\infty} \mathbb{P}_n \left(A_i \cap (n, n+1]\right) \\\\
\end{align*}
\]
Then we have:
\[
\begin{align*}
\mu\left( \bigcup_{i=1}^{\infty} A_i \right) &= \sum_{n=-\infty}^{\infty} \mathbb{P}_n \left(\left( \bigcup_{i=1}^{\infty} A_i \right) \cap (n, n+1] \right) \\\\
&= \sum_{n= -\infty}^{\infty} \sum_{i = 1}^{\infty} \mathbb{P}_n (A_i \cap (n, n+1] ) \\\\
&= \sum_{i = 1}^{\infty} \sum_{n= -\infty}^{\infty} \mathbb{P}_n (A_i \cap (n, n+1] ) \\\\
&= \sum_{i = 1}^{\infty} \mu(A_i).
\end{align*}
\]
(We can exchange the order of summation because the terms in the sums are nonnegative.)
Therefore, \(\mu\) is a measure on \((\mathbb{R}, \mathcal{B})\), which is the Lebesgue measure.
Note: Since \(\mu(\mathbb{R}) = \infty\), this is NOT a probability measure.