Fourier Transform

Calculus to Optimization & Analysis

Continuous Fourier Transform Properties of Fourier Transform Convolution Theorem Discrete Fourier Transform Fast Fourier Transform Applications in Machine Learning

Continuous Fourier Transform

The Fourier transform extends the ideas of Fourier series from periodic functions to non-periodic functions defined on \(\mathbb{R}\). While Fourier series decompose periodic signals into discrete frequency components, the Fourier transform decomposes general signals into a continuous spectrum of frequencies.

For a function \(f: \mathbb{R} \to \mathbb{C}\) (or \(\mathbb{R} \to \mathbb{R}\)), the Fourier transform \(\hat{f}\) is defined by: \[ \hat{f}(\omega) = \mathcal{F}\{f\}(\omega) = \int_{-\infty}^{\infty} f(x)e^{-i\omega x} \, dx \] where \(\omega \in \mathbb{R}\) is the frequency variable (or angular frequency).

The inverse Fourier transform recovers \(f\) from \(\hat{f}\): \[ f(x) = \mathcal{F}^{-1}\{\hat{f}\}(x) = \frac{1}{2\pi}\int_{-\infty}^{\infty} \hat{f}(\omega)e^{i\omega x} \, d\omega \]
Note: Different conventions exist for the placement of the factor \(\frac{1}{2\pi}\). Some authors place \(\frac{1}{\sqrt{2\pi}}\) in both the forward and inverse transforms to make them symmetric. The convention above is common in mathematics and engineering.

The Fourier transform is well-defined for functions in \(L^1(\mathbb{R})\) (absolutely integrable functions): \[ \int_{-\infty}^{\infty} |f(x)| \, dx < \infty \] and can be extended to \(L^2(\mathbb{R})\) (square-integrable functions) using limiting procedures.

Example: Gaussian Function Consider the Gaussian function: \[ f(x) = e^{-\frac{x^2}{2}} \] Its Fourier transform is: \[ \begin{align*} \hat{f}(\omega) &= \int_{-\infty}^{\infty} e^{-\frac{x^2}{2}}e^{-i\omega x} \, dx \\\\ &= \int_{-\infty}^{\infty} e^{-\frac{x^2}{2} - i\omega x} \, dx \end{align*} \] Complete the square in the exponent: \[ -\frac{x^2}{2} - i\omega x = -\frac{1}{2}(x^2 + 2i\omega x) = -\frac{1}{2}\left((x + i\omega)^2 - (i\omega)^2\right) = -\frac{(x + i\omega)^2}{2} - \frac{\omega^2}{2} \] Therefore: \[ \hat{f}(\omega) = e^{-\frac{\omega^2}{2}} \int_{-\infty}^{\infty} e^{-\frac{(x+i\omega)^2}{2}} \, dx \] By Cauchy's integral theorem (contour integration in the complex plane), shifting the contour of integration does not change the value of the integral, so: \[ \int_{-\infty}^{\infty} e^{-\frac{(x+i\omega)^2}{2}} \, dx = \int_{-\infty}^{\infty} e^{-\frac{x^2}{2}} \, dx = \sqrt{2\pi} \] Thus: \[ \hat{f}(\omega) = \sqrt{2\pi} \cdot e^{-\frac{\omega^2}{2}} \] This remarkable result shows that the Gaussian function is an eigenfunction of the Fourier transform: its transform is also a Gaussian (up to a constant). This property makes Gaussian functions fundamental in signal processing, probability theory, and quantum mechanics.

Properties of Fourier Transform

The Fourier transform has several important properties that make it a powerful tool for analysis:

1. Linearity: \[ \mathcal{F}\{\alpha f + \beta g\} = \alpha \mathcal{F}\{f\} + \beta \mathcal{F}\{g\} \] for constants \(\alpha, \beta \in \mathbb{C}\).

2. Translation (Shift) Property: \[ \mathcal{F}\{f(x - a)\}(\omega) = e^{-i\omega a}\hat{f}(\omega) \] A shift in time corresponds to a phase shift in frequency.

3. Modulation Property: \[ \mathcal{F}\{e^{i\omega_0 x}f(x)\}(\omega) = \hat{f}(\omega - \omega_0) \] Multiplication by a complex exponential shifts the frequency spectrum.

4. Scaling Property: \[ \mathcal{F}\{f(ax)\}(\omega) = \frac{1}{|a|}\hat{f}\left(\frac{\omega}{a}\right), \quad a \neq 0 \] Compressing a signal in time stretches its frequency spectrum, and vice versa. This is known as the uncertainty principle in signal processing.

5. Differentiation Property: \[ \mathcal{F}\{f'(x)\}(\omega) = i\omega \hat{f}(\omega) \] More generally, for the \(n\)-th derivative: \[ \mathcal{F}\{f^{(n)}(x)\}(\omega) = (i\omega)^n \hat{f}(\omega) \] This transforms differentiation into multiplication, which is why Fourier transforms are useful for solving differential equations.

6. Plancherel's Theorem (Generalized Parseval): \[ \int_{-\infty}^{\infty} |f(x)|^2 \, dx = \frac{1}{2\pi}\int_{-\infty}^{\infty} |\hat{f}(\omega)|^2 \, d\omega \] This generalizes Parseval's identity to the continuous case, showing energy conservation between time and frequency domains. The Fourier transform is a unitary operator on \(L^2(\mathbb{R})\) (up to normalization).

7. Duality (Fourier Inversion): \[ \mathcal{F}\{\mathcal{F}\{f\}\}(x) = 2\pi f(-x) \] Applying the Fourier transform twice (up to normalization and sign change) returns the original function, demonstrating a deep symmetry between time and frequency domains.

Convolution Theorem

The convolution of two functions \(f\) and \(g\) is defined as: \[ (f * g)(x) = \int_{-\infty}^{\infty} f(t)g(x-t) \, dt \]
The convolution theorem states that the Fourier transform converts convolution into pointwise multiplication: \[ \mathcal{F}\{f * g\}(\omega) = \hat{f}(\omega) \cdot \hat{g}(\omega) \] and conversely: \[ \mathcal{F}\{f \cdot g\}(\omega) = \frac{1}{2\pi}(\hat{f} * \hat{g})(\omega) \]
This is one of the most important properties of the Fourier transform. In many applications, computing a convolution directly is computationally expensive (\(O(n^2)\) operations for discrete signals), but using the FFT to compute the transform, multiplying in frequency domain, and transforming back can be much faster (\(O(n \log n)\) operations).

The convolution theorem is fundamental in:

Proof of Convolution Theorem: Starting with the definition of the Fourier transform: \[ \begin{align*} \mathcal{F}\{f * g\}(\omega) &= \int_{-\infty}^{\infty} (f * g)(x)e^{-i\omega x} \, dx \\\\ &= \int_{-\infty}^{\infty} \left(\int_{-\infty}^{\infty} f(t)g(x-t) \, dt\right) e^{-i\omega x} \, dx \\\\ &= \int_{-\infty}^{\infty} f(t) \left(\int_{-\infty}^{\infty} g(x-t)e^{-i\omega x} \, dx\right) dt \end{align*} \] Let \(u = x - t\), then \(dx = du\) and: \[ \begin{align*} &= \int_{-\infty}^{\infty} f(t) \left(\int_{-\infty}^{\infty} g(u)e^{-i\omega (u+t)} \, du\right) dt \\\\ &= \int_{-\infty}^{\infty} f(t)e^{-i\omega t} \left(\int_{-\infty}^{\infty} g(u)e^{-i\omega u} \, du\right) dt \\\\ &= \left(\int_{-\infty}^{\infty} f(t)e^{-i\omega t} \, dt\right) \cdot \left(\int_{-\infty}^{\infty} g(u)e^{-i\omega u} \, du\right) \\\\ &= \hat{f}(\omega) \cdot \hat{g}(\omega) \end{align*} \]

Discrete Fourier Transform (DFT)

In practice, we often work with discrete, finite sequences of data rather than continuous functions. The Discrete Fourier Transform (DFT) is the discrete analog of the continuous Fourier transform.

For a sequence of \(N\) complex numbers \(\{x_0, x_1, \ldots, x_{N-1}\}\), the DFT is defined as: \[ X_k = \sum_{n=0}^{N-1} x_n e^{-\frac{2\pi i}{N}kn}, \quad k = 0, 1, \ldots, N-1 \]
The inverse DFT (IDFT) recovers the original sequence: \[ x_n = \frac{1}{N}\sum_{k=0}^{N-1} X_k e^{\frac{2\pi i}{N}kn}, \quad n = 0, 1, \ldots, N-1 \]
The DFT can be expressed as a matrix multiplication. Define the \(N \times N\) DFT matrix: \[ W = \begin{bmatrix} 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega & \omega^2 & \cdots & \omega^{N-1} \\ 1 & \omega^2 & \omega^4 & \cdots & \omega^{2(N-1)} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{N-1} & \omega^{2(N-1)} & \cdots & \omega^{(N-1)^2} \end{bmatrix} \] where \(\omega = e^{-\frac{2\pi i}{N}}\) is a primitive \(N\)-th root of unity.

Then the DFT can be written as: \[ \mathbf{X} = W\mathbf{x} \] where \(\mathbf{x} = [x_0, x_1, \ldots, x_{N-1}]^T\) and \(\mathbf{X} = [X_0, X_1, \ldots, X_{N-1}]^T\).

The DFT matrix has remarkable properties:


Direct computation of the DFT using this matrix multiplication requires \(O(N^2)\) operations. However, the Fast Fourier Transform (FFT) algorithm reduces this to \(O(N \log N)\), making Fourier analysis practical for large datasets.

Fast Fourier Transform (FFT)

The Fast Fourier Transform (FFT) is not a different transform, but rather an efficient algorithm for computing the DFT. The most common FFT algorithm is the Cooley-Tukey algorithm, which uses a divide-and-conquer approach.

The key insight is that the DFT of size \(N\) can be expressed in terms of two DFTs of size \(\frac{N}{2}\) (assuming \(N\) is a power of 2). We split the sequence into even and odd indexed elements: \[ \begin{align*} X_k &= \sum_{n=0}^{N-1} x_n e^{-\frac{2\pi i}{N}kn} \\\\ &= \sum_{m=0}^{N/2-1} x_{2m} e^{-\frac{2\pi i}{N}k(2m)} + \sum_{m=0}^{N/2-1} x_{2m+1} e^{-\frac{2\pi i}{N}k(2m+1)} \\\\ &= \sum_{m=0}^{N/2-1} x_{2m} e^{-\frac{2\pi i}{N/2}km} + e^{-\frac{2\pi i}{N}k}\sum_{m=0}^{N/2-1} x_{2m+1} e^{-\frac{2\pi i}{N/2}km} \end{align*} \] Define \(E_k\) as the \(k\)-th element of the DFT of the even-indexed subsequence and \(O_k\) as the \(k\)-th element of the DFT of the odd-indexed subsequence. Then for \(k = 0, 1, \ldots, N/2-1\): \[ X_k = E_k + e^{-\frac{2\pi i}{N}k}O_k \] and using the periodicity of DFT, for \(k = N/2, \ldots, N-1\): \[ X_k = E_{k-N/2} - e^{-\frac{2\pi i}{N}(k-N/2)}O_{k-N/2} \]
This decomposition can be applied recursively when \(N\) is a power of 2. The recurrence relation is: \[ T(N) = 2T(N/2) + O(N) \] which gives: \[ T(N) = O(N \log N) \]
The FFT algorithm revolutionized signal processing when it was popularized by Cooley and Tukey in 1965. It made real-time digital signal processing feasible and is now fundamental to:


Modern implementations like FFTW (Fastest Fourier Transform in the West) use advanced techniques to achieve near-optimal performance on various hardware architectures.

Applications in Machine Learning

Fourier methods have numerous applications in modern machine learning and data science:

1. Random Fourier Features:
As mentioned in the classification section, random Fourier features provide an explicit approximation to shift-invariant kernel functions. For a shift-invariant kernel \(k(x, y) = k(x - y)\), Bochner's theorem states that \(k\) can be represented as the Fourier transform of a non-negative measure. Specifically, if \(k\) is a continuous, positive definite, shift-invariant kernel on \(\mathbb{R}^d\), then there exists a non-negative finite measure \(\mu\) such that: \[ k(\delta) = \int_{\mathbb{R}^d} e^{i\omega^T\delta} d\mu(\omega) \]
For the RBF (Gaussian) kernel \(k(\delta) = e^{-\frac{\|\delta\|^2}{2\sigma^2}}\), the measure corresponds to \(\mu \sim \mathcal{N}(0, \sigma^{-2}I)\).

We can approximate the kernel using Monte Carlo sampling. For real-valued features: \[ z_\omega(x) = \sqrt{2}\cos(\omega^Tx + b) \] where \(\omega\) is sampled from \(\mu\) and \(b \sim \text{Uniform}[0, 2\pi]\). The feature map is: \[ \phi(x) = \sqrt{\frac{2}{D}}\begin{bmatrix} \cos(\omega_1^Tx + b_1) \\ \vdots \\ \cos(\omega_D^Tx + b_D) \end{bmatrix} \] which satisfies \(\mathbb{E}_{\omega, b}[z_\omega(x)z_\omega(y)] = k(x-y)\).

This technique, introduced by Rahimi and Recht (2007), allows kernel methods to scale to large datasets by converting them into linear methods with complexity \(O(nD)\) instead of \(O(n^2)\).

2. Time Series Analysis:
Fourier transforms are essential for analyzing temporal patterns:


3. Signal and Image Processing:
FFT-based methods are ubiquitous in modern data processing:
4. Efficient Computation in Neural Networks:
While direct application of FFT in neural networks is uncommon in practice, understanding the Fourier perspective provides valuable intuition:

Understanding Fourier methods provides essential intuition for signal processing, time series analysis, and frequency-domain thinking—skills that are invaluable across machine learning, data science, and computational applications.