Random Variables

In machine learning, we can consider any unknown quantities as random variables. A random variable associates a distinct numerical value with each possible outcome in the sample space. (Formally, it is a single valued fuction.) Usually, we denote a random variable by a capital letter (\(X\)) and a specific value taken by a random variable by the corresponding lower case letter (\(x\)).

A random variable is discrete if the number of possible values it takes is finite or countably infinite. We can describe the collection of probabilities as a function of \(x\): \[ f(x) = P(X = x) \] we call \(f(x)\) a probability mass function (p.m.f.).
Then

\(f(x) \geq 0\) for all \(x\).
\(\sum_{x} f(x) = 1\).

Cumulative distribution function(c.d.f.) of p.m.f. is \[ F(x) = P(X \leq x) = \sum_{k \leq x} f(k). \] Note: \(P(a \leq X \leq b) = F(b) - F(a - 1)\).

A random variable is continuous if it can be any value from one of more intervales of real numbers. Since the possible values are uncountably infinte, instead of p.m.f., we use the probability density function (p.d.f.).

\(0 \leq f(x) \leq 1\) for .
\(\int_{- \infty}^\infty f(x)\,dx = 1\)
\(P(a \leq X \leq b) = \int_{a}^b f(x)\,dx\) for any \(a \leq b\)

Cumulative distribution function(c.d.f.) of p.d.f. is \[ F(x) = P(X \leq x) = \int_{- \infty}^x f(u)\,du. \] So, by the Fundamental Theorem of Calculus, \[ f(x) = \frac{dF(x)}{dx}. \] Note: \(P(a \leq X \leq b) = \int_{a}^b f(x)\,dx = F(b) - F(a)\).

Expected Values

The expected value (mean) of a discrete random variable \(X\) is defined as \[ \mathbb{E}[X] = \mu = \sum_{x}x f(x) \] which is weighted average of all possible values taken by \(X\).

For a continuous random variable, \[ \mathbb{E}[X] = \mu = \int x f(x) \,dx \] We could consider the expected value as the center of gravity of the distribution of \(X\). So, if the mean equals to the median, the distribution of \(X\) must be symmetric.
Note: \[ \begin{align*} \mathbb{E}[aX+b] &= \int (ax+b) f(x) \,dx \\\\ &= a \int x f(x) dx + b \int f(x)dx \\\\ &= a\mathbb{E}[X]+ b. \end{align*} \]

Variance

The variance of a random variable \(X\) is defined as \[\sigma^2 = \text{ Var }(X) = \mathbb{E}[(X - \mu)^2] \geq 0 \] Or, an alternative expression is given by \[ \begin{align*} \mathbb{E}[(x - \mu)^2] &= \mathbb{E}[X^2 -2\mu X + \mu^2] \\\\ &= \mathbb{E}[X^2] -2\mu \mathbb{E}[X] + \mathbb{E}[\mu^2] \\\\ &= \mathbb{E}[X^2] -2\mu^2 + \mu^2 \\\\ &= \mathbb{E}[X^2] -\mu^2 \end{align*} \] Note: \(\text{Var }[constant] = 0\)
Also, we define the square root of the variance as the Standard deviation \[ \text{SD }[X] = \sigma = \sqrt{\text{Var }(X)} \]

Theorem 1: \[ \text{Var }[aX + b] = a^2 \text{Var }[X] \quad a,\, b \in \mathbb{R} \] Note: the additive constant \(b\) does not affect the distribution of \(X\).

Proof: \[ \begin{align*} \text{Var }[aX + b] &= \mathbb{E}[(aX+b)^2] - (\mathbb{E}[aX+b])^2 \\\\ &= a^2\mathbb{E}[X^2] -2ab\mathbb{E}[X] + b^2 - (a^2(\mathbb{E}[X])^2 +2ab\mathbb{E}[X] +b^2)\\\\ &= a^2(\mathbb{E}[X^2] - (\mathbb{E}[X])^2 )\\\\ &= a^2 \text{Var }[X] \end{align*} \]

Random Variables

Probability & Statistics