Basic Probability Ideas - MATH-CS COMPASS

Probability

The collection of every possible outcome of an experiment is called a sample space denoted as \(S\). It can be discrete or continuous. An event \(A\) is a set of outcomes of an experiment, or a subset of the sample space \(A \subseteq S\).

The probability of \(A\) denoted as \(P(A)\) satisfies the following axioms:

\( 0 \leq P(A) \leq 1\)
\(P(S) = 1\) and \(P(\emptyset) = 0\)
If the events \(A\) and \(B\) are mutually exclusive, \(P(A \cup B) = P(A) + P(B)\)

and can be calculated as: \[ P(A) = \frac{\text{number of outcomes in } A}{\text{number of outocomes in } S}. \]
Using event algebra, the follwing basic facts can be derived:

Complement: \(P(\bar{A}) = 1 - P(A)\)

Addition:\(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)

Inclusion: If \(B \subset A\), then \(A \cap B = B\), and so \(P(A) - P(B) = P(A \cap \bar{B})\)

de Morgan's laws:
\(P(\overline{A \cup B}) = P( \bar{A} \cap \bar{B})\)
\(P(\overline{A \cap B}) = P( \bar{A} \cup \bar{B})\)

There are some counting rules to find all possible outcomes in \(S\) quickly.

Multiplication principle:

Permutation:

Combinations:

Binomial coefficient

multinomial coefficient

Conditional Probability

The conditional probability of an event \(A\) given an event \(B\) is defined as \[ P(A \mid B) = \frac{P(A \cap B)}{P(B)}. \tag{1} \] which means \[ P(A \cap B) = P(A \mid B) P(B). \tag{2} \] Also, from Equation (1), \[ P(B \mid A) = \frac{P(B \cap A)}{P(A)} = \frac{P(A \cap B)}{P(A)}. \] Using Equation (2), \[ P(B \mid A) = \frac{P(A \mid B) P(B)}{P(A)}. \tag{3} \] This is known as Bayes' theorem or inverse probability law. We will discuss it in a more useful form later.

If \(A\) and \(B\) are mutually independent, \[ P(A \mid B) = P(A), \text{ and } P(B \mid A) = P(B). \] Thus \[ P(A \cap B) = P(A) P(B). \] In addition, in this case, \[ \begin{align*} P(A)P(\bar{B}) &= P(A)[1 - P(B)] \\\\ &= P(A) - P(A)P(B) \\\\ &= P(A) - P(A \cap B) \\\\ &= P(A \cap \bar{B}) \end{align*} \] Similarly, \( P(\bar{A})P(B) = P(\bar{A} \cap B)\) and \( P(\bar{A})P(\bar{B}) = P(\bar{A} \cap \bar{B})\).

Note: Two events are mutually independent if the occurrence of one event does not affect the probability of the occurrence of the other event. Be careful not to confuse it with mutually exclusive.

Law of Total Probability

Theorem 1: Law of Total Probability Let the sample space \(S\) be decomposed into \(k\) mutually exclusive events \(B_1, B_2, \cdots B_k\). Then for any event \(A\), \[ \begin{align*} P(A) &= P(A \mid B_1)P(B_1) + P(A \mid B_2)P(B_2) + \cdots + P(A \mid B_k)P(B_k) \\\\ &= \sum_{i =1} ^k P(A \mid B_i)P(B_i) \end{align*} \]

Proof: \begin{align*} P(A) &= P[(A \cap B_1) \cup (A \cap B_2) \cup \cdots \cup (A \cap B_k)] \\\\ &= P(A \cap B_1) + P(A \cap B_2) + \cdots + P(A \cap B_k) \\\\ &= P(A \mid B_1)P(B_1) + P(A \mid B_2)P(B_2) + \cdots + P(A \mid B_k)P(B_k) \\\\ &= \sum_{i =1} ^k P(A \mid B_i)P(B_i) \end{align*}

Bayes' Theorem

We revisit Equation (3): \[ P(B \mid A) = \frac{P(A \mid B) P(B)}{P(A)}. \] \(P(B)\) is called the prior probability of \(B\) and \(P(B \mid A)\) is called the posterior probability of \(B\). This is the foundation of Bayesian statistics.

Here, using the Law of Total Probability, we can get a more general form of Bayes' Theorem.

Theorem 2: Bayes' Theorem Let mutually exclusive events \(B_1, B_2, \cdots B_k\) be partitions of the sample space \(S\) with the condition that \(P(B_i) > 0\) for \(i = 1, 2, \cdots, k\). Then for \(j = 1, 2, \cdots, k\), \[ \begin{align*} P(B_j \mid A) &= \frac{P(A \mid B_j)P(B_j)}{\sum_{i=1}^k P(A \mid B_i)P(B_i)} \\\\ &= \frac{P(B_j \cap A)}{P(A)} \end{align*} \]

This is a powerful tool in machine learning such as classification. For example, in medical diagnosis, \(B_1, B_2, \cdots B_k\) can be possible diseases and let \(A\) be observed symptoms, then \(P(B_j \cap A)\) represents the probability of having diseas \(B_j\) given the symptoms \(A\).