Student's \(t\)-Distribution
In general, the normal distribution is sensitive to outliers. A robust alternative to the normal distribution is
the student's \(t\)-distribution. Its probability density function is given by:
\[
f(y | \mu, \sigma^2, \nu) \propto \left[ 1 + \frac{1}{\nu}\left(\frac{y-\mu}{\sigma}\right)^2 \right]^{-\frac{\nu+1}{2}}
\]
where \(\mu\) is the mean, \(\sigma > 0\) is the scale parameter(NOT SD), and \(\nu > 0\) is the degrees of freedom.
The Student \(t\)-distribution has heavy tails(There is more probability mass in the tail than
with a normal distribution), which makes it robust to outliers.
As the degrees of freedom \(\nu\) gets larger, the function acts like the normal distribution. For \(\nu >> 5\), the pdf rapidly approaches
a normal distribution and loses its robustness properties.
Note:
\[
\text{mean } = \text{mode } = \mu, \quad \text{variance } = \frac{\nu \sigma^2}{\nu - 2}
\]
and the mean only exists if \(\nu > 1\) and the variance only exists if \(\nu > 2\).
Cauchy Distribution
When \(\nu = 1\), the Student's \(t\)-distribution is known as the Cauchy distribution. Its probability density function is given by:
\[
f(x | \mu, \gamma) = \frac{1}{\gamma \pi}\left[ 1 + \left(\frac{x - \mu}{\gamma}\right)^2\right]^{-1}.
\]
This pdf has heavy tails compared to of the normal distribution. Consider the standard Cauchy distribution (\(\mu = 0, \quad \gamma = 1\)).
Then
\[
\begin{align*}
\mathbb{E }[X] &= \int_{-\infty}^{\infty} x f(x)dx \\\\
&= \frac{1}{\pi} \int_{-\infty}^{\infty} \frac{x}{1 + x^2} dx.
\end{align*}
\]
In short, since the tails are so heavy, the mean does not converge.
(See Improper Riemann integrals for the details.)
In Bayesian modeling, we want to use a distribution over \(\mathbb{R}^+\) with heavy tails, but finite density
at the origin. In such a case, the half Cauchy distribution is useful. Its probability density
function is given by:
\[
f(x | \gamma) = \frac{2}{\pi \gamma} \left[ 1 + \left(\frac{x}{\gamma}\right)^2\right]^{-1}.
\]
(The Cauchy distribution with \(\mu = 0\).)
Laplace Distribution
Like the half Cauchy distribution, because of its heavy tails, the
Laplace distribution (Double sided exponential distribution) is also popular
in some machine learning models such as robust linear regression.
Its pdf is given by:
\[
\text{Laplace } ( y | \mu, b) = \frac{1}{2b}\exp \left( - \frac{|y - \mu|}{b}\right)
\]
where \(\mu\) is a location parameter and \(b > 0\) is a scale parameter.
Note:
\[
\text{mean } = \text{mode } = \mu, \quad \text{variance } = 2b^2.
\]