Student's t-Distribution - MATH-CS COMPASS

Student's \(t\)-Distribution

In general, the normal distribution is sensitive to outliers. A robust alternative to the normal distribution is the student's \(t\)-distribution. Its probability density function is given by: \[ f(y | \mu, \sigma^2, \nu) \propto \left[ 1 + \frac{1}{\nu}\left(\frac{y-\mu}{\sigma}\right)^2 \right]^{-\frac{\nu+1}{2}} \] where \(\mu\) is the mean, \(\sigma > 0\) is the scale parameter(NOT standard deviation), and \(\nu > 0\) is the degrees of freedom.

The Student \(t\)-distribution has heavy tails(There is more probability mass in the tail than with a normal distribution), which makes it robust to outliers.

As the degrees of freedom \(\nu\) gets larger, the function acts like the normal distribution. For \(\nu \gg 5\), the pdf rapidly approaches a normal distribution and loses its robustness properties.

Note: \[ \text{mean } = \text{mode } = \mu, \quad \text{variance } = \frac{\nu \sigma^2}{\nu - 2} \] and the mean only exists if \(\nu > 1\) and the variance only exists if \(\nu > 2\).

Cauchy Distribution

When \(\nu = 1\), the Student's \(t\)-distribution becomes the Cauchy distribution. Its probability density function is given by: \[ f(x | \mu, \gamma) = \frac{1}{\gamma \pi}\left[ 1 + \left(\frac{x - \mu}{\gamma}\right)^2\right]^{-1}. \] This pdf has heavy tails compared to of the normal distribution. Consider the standard Cauchy distribution (\(\mu = 0, \quad \gamma = 1\)). Then \[ \begin{align*} \mathbb{E }[X] &= \int_{-\infty}^{\infty} x f(x)dx \\\\ &= \frac{1}{\pi} \int_{-\infty}^{\infty} \frac{x}{1 + x^2} dx. \end{align*} \] In short, since the tails are so heavy, the mean does not converge. (See Improper Riemann integrals for the details.)

In Bayesian modeling, we want to use a distribution over \(\mathbb{R}^+\) with heavy tails, but finite density at the origin. In such a case, the half Cauchy distribution is useful. Its probability density function is given by: \[ f(x | \gamma) = \frac{2}{\pi \gamma} \left[ 1 + \left(\frac{x}{\gamma}\right)^2\right]^{-1}. \] (This is the Cauchy distribution with \(\mu = 0\).)

Laplace Distribution

Like the half Cauchy distribution, because of its heavy tails, the Laplace distribution (Double sided exponential distribution) is also popular in some machine learning models such as robust linear regression.

Its pdf is given by: \[ \text{Laplace } ( y | \mu, b) = \frac{1}{2b}\exp \left( - \frac{|y - \mu|}{b}\right) \] where \(\mu\) is a location parameter and \(b > 0\) is a scale parameter. Note: \[ \text{mean } = \text{mode } = \mu, \quad \text{variance } = 2b^2. \]

Student's \(t\)-Distribution

Probability & Statistics