Continuous Fourier Transform
The Fourier transform extends the ideas of Fourier series
from periodic functions to non-periodic functions defined on \(\mathbb{R}\). While Fourier series decompose
periodic signals into discrete frequency components, the Fourier transform decomposes general signals into
a continuous spectrum of frequencies.
For a function \(f: \mathbb{R} \to \mathbb{C}\) (or \(\mathbb{R} \to \mathbb{R}\)), the Fourier transform
\(\hat{f}\) is defined by:
\[
\hat{f}(\omega) = \mathcal{F}\{f\}(\omega) = \int_{-\infty}^{\infty} f(x)e^{-i\omega x} \, dx
\]
where \(\omega \in \mathbb{R}\) is the frequency variable (or angular frequency).
The inverse Fourier transform recovers \(f\) from \(\hat{f}\):
\[
f(x) = \mathcal{F}^{-1}\{\hat{f}\}(x) = \frac{1}{2\pi}\int_{-\infty}^{\infty} \hat{f}(\omega)e^{i\omega x} \, d\omega
\]
Note: Different conventions exist for the placement of the factor \(\frac{1}{2\pi}\). Some authors place
\(\frac{1}{\sqrt{2\pi}}\) in both the forward and inverse transforms to make them symmetric. The convention
above is common in mathematics and engineering.
The Fourier transform is well-defined for functions in \(L^1(\mathbb{R})\) (absolutely integrable functions):
\[
\int_{-\infty}^{\infty} |f(x)| \, dx < \infty
\]
and can be extended to \(L^2(\mathbb{R})\) (square-integrable functions) using limiting procedures.
Example: Gaussian Function
Consider the Gaussian function:
\[
f(x) = e^{-\frac{x^2}{2}}
\]
Its Fourier transform is:
\[
\begin{align*}
\hat{f}(\omega) &= \int_{-\infty}^{\infty} e^{-\frac{x^2}{2}}e^{-i\omega x} \, dx \\\\
&= \int_{-\infty}^{\infty} e^{-\frac{x^2}{2} - i\omega x} \, dx
\end{align*}
\]
Complete the square in the exponent:
\[
-\frac{x^2}{2} - i\omega x = -\frac{1}{2}(x^2 + 2i\omega x) = -\frac{1}{2}\left((x + i\omega)^2 - (i\omega)^2\right) = -\frac{(x + i\omega)^2}{2} - \frac{\omega^2}{2}
\]
Therefore:
\[
\hat{f}(\omega) = e^{-\frac{\omega^2}{2}} \int_{-\infty}^{\infty} e^{-\frac{(x+i\omega)^2}{2}} \, dx
\]
By Cauchy's integral theorem (contour integration in the complex plane), shifting the contour of
integration does not change the value of the integral, so:
\[
\int_{-\infty}^{\infty} e^{-\frac{(x+i\omega)^2}{2}} \, dx = \int_{-\infty}^{\infty} e^{-\frac{x^2}{2}} \, dx = \sqrt{2\pi}
\]
Thus:
\[
\hat{f}(\omega) = \sqrt{2\pi} \cdot e^{-\frac{\omega^2}{2}}
\]
This remarkable result shows that the Gaussian function is an eigenfunction of the Fourier
transform: its transform is also a Gaussian (up to a constant). This property makes Gaussian functions
fundamental in signal processing, probability theory, and quantum mechanics.
Properties of Fourier Transform
The Fourier transform has several important properties that make it a powerful tool for analysis:
1. Linearity:
\[
\mathcal{F}\{\alpha f + \beta g\} = \alpha \mathcal{F}\{f\} + \beta \mathcal{F}\{g\}
\]
for constants \(\alpha, \beta \in \mathbb{C}\).
2. Translation (Shift) Property:
\[
\mathcal{F}\{f(x - a)\}(\omega) = e^{-i\omega a}\hat{f}(\omega)
\]
A shift in time corresponds to a phase shift in frequency.
3. Modulation Property:
\[
\mathcal{F}\{e^{i\omega_0 x}f(x)\}(\omega) = \hat{f}(\omega - \omega_0)
\]
Multiplication by a complex exponential shifts the frequency spectrum.
4. Scaling Property:
\[
\mathcal{F}\{f(ax)\}(\omega) = \frac{1}{|a|}\hat{f}\left(\frac{\omega}{a}\right), \quad a \neq 0
\]
Compressing a signal in time stretches its frequency spectrum, and vice versa. This is known as the
uncertainty principle in signal processing.
5. Differentiation Property:
\[
\mathcal{F}\{f'(x)\}(\omega) = i\omega \hat{f}(\omega)
\]
More generally, for the \(n\)-th derivative:
\[
\mathcal{F}\{f^{(n)}(x)\}(\omega) = (i\omega)^n \hat{f}(\omega)
\]
This transforms differentiation into multiplication, which is why Fourier transforms are useful for solving
differential equations.
6. Plancherel's Theorem (Generalized Parseval):
\[
\int_{-\infty}^{\infty} |f(x)|^2 \, dx = \frac{1}{2\pi}\int_{-\infty}^{\infty} |\hat{f}(\omega)|^2 \, d\omega
\]
This generalizes Parseval's identity to the continuous case,
showing energy conservation between time and frequency domains. The Fourier transform is a
unitary operator on \(L^2(\mathbb{R})\) (up to normalization).
7. Duality (Fourier Inversion):
\[
\mathcal{F}\{\mathcal{F}\{f\}\}(x) = 2\pi f(-x)
\]
Applying the Fourier transform twice (up to normalization and sign change) returns the original function,
demonstrating a deep symmetry between time and frequency domains.
Convolution Theorem
The convolution of two functions \(f\) and \(g\) is defined as:
\[
(f * g)(x) = \int_{-\infty}^{\infty} f(t)g(x-t) \, dt
\]
The convolution theorem states that the Fourier transform converts convolution into pointwise
multiplication:
\[
\mathcal{F}\{f * g\}(\omega) = \hat{f}(\omega) \cdot \hat{g}(\omega)
\]
and conversely:
\[
\mathcal{F}\{f \cdot g\}(\omega) = \frac{1}{2\pi}(\hat{f} * \hat{g})(\omega)
\]
This is one of the most important properties of the Fourier transform. In many applications, computing
a convolution directly is computationally expensive (\(O(n^2)\) operations for discrete signals), but
using the FFT to compute the transform, multiplying in frequency domain, and transforming back can be
much faster (\(O(n \log n)\) operations).
The convolution theorem is fundamental in:
- Signal processing: Filtering operations are implemented as convolutions
- Image processing: Blurring, sharpening, and edge detection use convolution kernels
- Deep learning: Convolutional neural networks (CNNs) use convolution operations
- Probability theory: The PDF of a sum of independent random variables is the convolution
of their individual PDFs
Proof of Convolution Theorem:
Starting with the definition of the Fourier transform:
\[
\begin{align*}
\mathcal{F}\{f * g\}(\omega) &= \int_{-\infty}^{\infty} (f * g)(x)e^{-i\omega x} \, dx \\\\
&= \int_{-\infty}^{\infty} \left(\int_{-\infty}^{\infty} f(t)g(x-t) \, dt\right) e^{-i\omega x} \, dx \\\\
&= \int_{-\infty}^{\infty} f(t) \left(\int_{-\infty}^{\infty} g(x-t)e^{-i\omega x} \, dx\right) dt
\end{align*}
\]
Let \(u = x - t\), then \(dx = du\) and:
\[
\begin{align*}
&= \int_{-\infty}^{\infty} f(t) \left(\int_{-\infty}^{\infty} g(u)e^{-i\omega (u+t)} \, du\right) dt \\\\
&= \int_{-\infty}^{\infty} f(t)e^{-i\omega t} \left(\int_{-\infty}^{\infty} g(u)e^{-i\omega u} \, du\right) dt \\\\
&= \left(\int_{-\infty}^{\infty} f(t)e^{-i\omega t} \, dt\right) \cdot \left(\int_{-\infty}^{\infty} g(u)e^{-i\omega u} \, du\right) \\\\
&= \hat{f}(\omega) \cdot \hat{g}(\omega)
\end{align*}
\]
Discrete Fourier Transform (DFT)
In practice, we often work with discrete, finite sequences of data rather than continuous functions.
The Discrete Fourier Transform (DFT) is the discrete analog of the continuous Fourier transform.
For a sequence of \(N\) complex numbers \(\{x_0, x_1, \ldots, x_{N-1}\}\), the DFT is defined as:
\[
X_k = \sum_{n=0}^{N-1} x_n e^{-\frac{2\pi i}{N}kn}, \quad k = 0, 1, \ldots, N-1
\]
The inverse DFT (IDFT) recovers the original sequence:
\[
x_n = \frac{1}{N}\sum_{k=0}^{N-1} X_k e^{\frac{2\pi i}{N}kn}, \quad n = 0, 1, \ldots, N-1
\]
The DFT can be expressed as a matrix multiplication. Define the \(N \times N\) DFT matrix:
\[
W = \begin{bmatrix}
1 & 1 & 1 & \cdots & 1 \\
1 & \omega & \omega^2 & \cdots & \omega^{N-1} \\
1 & \omega^2 & \omega^4 & \cdots & \omega^{2(N-1)} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
1 & \omega^{N-1} & \omega^{2(N-1)} & \cdots & \omega^{(N-1)^2}
\end{bmatrix}
\]
where \(\omega = e^{-\frac{2\pi i}{N}}\) is a primitive \(N\)-th root of unity.
Then the DFT can be written as:
\[
\mathbf{X} = W\mathbf{x}
\]
where \(\mathbf{x} = [x_0, x_1, \ldots, x_{N-1}]^T\) and \(\mathbf{X} = [X_0, X_1, \ldots, X_{N-1}]^T\).
The DFT matrix has remarkable properties:
- The columns of \(W\) are orthogonal (but not orthonormal)
- \(W^*W = NI\), where \(W^*\) is the conjugate transpose
- The inverse is \(W^{-1} = \frac{1}{N}W^*\)
Direct computation of the DFT using this matrix multiplication requires \(O(N^2)\) operations. However,
the Fast Fourier Transform (FFT) algorithm reduces this to \(O(N \log N)\), making
Fourier analysis practical for large datasets.
Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT) is not a different transform, but rather an efficient
algorithm for computing the DFT. The most common FFT algorithm is the Cooley-Tukey algorithm,
which uses a divide-and-conquer approach.
The key insight is that the DFT of size \(N\) can be expressed in terms of two DFTs of size \(\frac{N}{2}\)
(assuming \(N\) is a power of 2). We split the sequence into even and odd indexed elements:
\[
\begin{align*}
X_k &= \sum_{n=0}^{N-1} x_n e^{-\frac{2\pi i}{N}kn} \\\\
&= \sum_{m=0}^{N/2-1} x_{2m} e^{-\frac{2\pi i}{N}k(2m)} + \sum_{m=0}^{N/2-1} x_{2m+1} e^{-\frac{2\pi i}{N}k(2m+1)} \\\\
&= \sum_{m=0}^{N/2-1} x_{2m} e^{-\frac{2\pi i}{N/2}km} + e^{-\frac{2\pi i}{N}k}\sum_{m=0}^{N/2-1} x_{2m+1} e^{-\frac{2\pi i}{N/2}km}
\end{align*}
\]
Define \(E_k\) as the \(k\)-th element of the DFT of the even-indexed subsequence and \(O_k\) as the \(k\)-th
element of the DFT of the odd-indexed subsequence. Then for \(k = 0, 1, \ldots, N/2-1\):
\[
X_k = E_k + e^{-\frac{2\pi i}{N}k}O_k
\]
and using the periodicity of DFT, for \(k = N/2, \ldots, N-1\):
\[
X_k = E_{k-N/2} - e^{-\frac{2\pi i}{N}(k-N/2)}O_{k-N/2}
\]
This decomposition can be applied recursively when \(N\) is a power of 2. The recurrence relation is:
\[
T(N) = 2T(N/2) + O(N)
\]
which gives:
\[
T(N) = O(N \log N)
\]
The FFT algorithm revolutionized signal processing when it was popularized by Cooley and Tukey in 1965.
It made real-time digital signal processing feasible and is now fundamental to:
- Audio and video compression (MP3, JPEG, MPEG)
- Wireless communications (OFDM in WiFi, 4G/5G)
- Medical imaging (MRI reconstruction)
- Numerical solution of PDEs
Modern implementations like FFTW (Fastest Fourier Transform in the West) use advanced techniques
to achieve near-optimal performance on various hardware architectures.
Applications in Machine Learning
Fourier methods have numerous applications in modern machine learning and data science:
1. Random Fourier Features:
As mentioned in the classification section,
random Fourier features provide an explicit approximation to shift-invariant kernel functions.
For a shift-invariant kernel \(k(x, y) = k(x - y)\), Bochner's theorem states that \(k\)
can be represented as the Fourier transform of a non-negative measure. Specifically, if \(k\) is a continuous,
positive definite, shift-invariant kernel on \(\mathbb{R}^d\), then there exists a non-negative finite measure
\(\mu\) such that:
\[
k(\delta) = \int_{\mathbb{R}^d} e^{i\omega^T\delta} d\mu(\omega)
\]
For the RBF (Gaussian) kernel \(k(\delta) = e^{-\frac{\|\delta\|^2}{2\sigma^2}}\), the measure corresponds to
\(\mu \sim \mathcal{N}(0, \sigma^{-2}I)\).
We can approximate the kernel using Monte Carlo sampling. For real-valued features:
\[
z_\omega(x) = \sqrt{2}\cos(\omega^Tx + b)
\]
where \(\omega\) is sampled from \(\mu\) and \(b \sim \text{Uniform}[0, 2\pi]\). The feature map is:
\[
\phi(x) = \sqrt{\frac{2}{D}}\begin{bmatrix} \cos(\omega_1^Tx + b_1) \\ \vdots \\ \cos(\omega_D^Tx + b_D) \end{bmatrix}
\]
which satisfies \(\mathbb{E}_{\omega, b}[z_\omega(x)z_\omega(y)] = k(x-y)\).
This technique, introduced by Rahimi and Recht (2007), allows kernel methods to scale to large datasets by
converting them into linear methods with complexity \(O(nD)\) instead of \(O(n^2)\).
2. Time Series Analysis:
Fourier transforms are essential for analyzing temporal patterns:
- Spectral analysis: Identifying dominant frequencies and periodicities in data
(used in finance, climate science, neuroscience)
- Seasonal decomposition: Separating trend, seasonal, and residual components
using Fourier basis functions
- Filtering: Removing noise or extracting specific frequency bands
- Forecasting: Using Fourier features as inputs to ML models for capturing periodic patterns
3. Signal and Image Processing:
FFT-based methods are ubiquitous in modern data processing:
- Audio processing: Speech recognition systems convert audio to spectrograms
(frequency representations over time) using Short-Time Fourier Transform (STFT)
- Image compression: JPEG uses the Discrete Cosine Transform (DCT), closely related
to the DFT, to compress images
- Data augmentation: Manipulating frequency components for augmenting training data
(e.g., SpecAugment for speech)
4. Efficient Computation in Neural Networks:
While direct application of FFT in neural networks is uncommon in practice, understanding the Fourier
perspective provides valuable intuition:
- Convolution operations in CNNs can theoretically be computed via FFT for very large kernels
- Spectral analysis helps understand which frequencies neural networks learn to represent
- Frequency-domain data augmentation (phase/amplitude manipulation) improves model robustness
Understanding Fourier methods provides essential intuition for signal processing, time series analysis,
and frequency-domain thinking—skills that are invaluable across machine learning, data science, and
computational applications.