Fourier Transform

Continuous Fourier Transform

The Fourier transform extends the ideas of Fourier series from periodic functions to non-periodic functions defined on \(\mathbb{R}\). While Fourier series decompose periodic signals into discrete frequency components, the Fourier transform decomposes general signals into a continuous spectrum of frequencies.

For a function \(f: \mathbb{R} \to \mathbb{C}\) (or \(\mathbb{R} \to \mathbb{R}\)), the Fourier transform \(\hat{f}\) is defined by: \[ \hat{f}(\omega) = \mathcal{F}\{f\}(\omega) = \int_{-\infty}^{\infty} f(x)e^{-i\omega x} \, dx \] where \(\omega \in \mathbb{R}\) is the frequency variable (or angular frequency).

The inverse Fourier transform recovers \(f\) from \(\hat{f}\): \[ f(x) = \mathcal{F}^{-1}\{\hat{f}\}(x) = \frac{1}{2\pi}\int_{-\infty}^{\infty} \hat{f}(\omega)e^{i\omega x} \, d\omega \]
Note: Different conventions exist for the placement of the factor \(\frac{1}{2\pi}\). Some authors place \(\frac{1}{\sqrt{2\pi}}\) in both the forward and inverse transforms to make them symmetric. The convention above is common in mathematics and engineering.

The Fourier transform is well-defined for functions in \(L^1(\mathbb{R})\) (absolutely integrable functions): \[ \int_{-\infty}^{\infty} |f(x)| \, dx < \infty \] and can be extended to \(L^2(\mathbb{R})\) (square-integrable functions) using limiting procedures.

Example: Gaussian Function Consider the Gaussian function: \[ f(x) = e^{-\frac{x^2}{2}} \] Its Fourier transform is: \[ \begin{align*} \hat{f}(\omega) &= \int_{-\infty}^{\infty} e^{-\frac{x^2}{2}}e^{-i\omega x} \, dx \\\\ &= \int_{-\infty}^{\infty} e^{-\frac{x^2}{2} - i\omega x} \, dx \end{align*} \] Complete the square in the exponent: \[ -\frac{x^2}{2} - i\omega x = -\frac{1}{2}(x^2 + 2i\omega x) = -\frac{1}{2}\left((x + i\omega)^2 - (i\omega)^2\right) = -\frac{(x + i\omega)^2}{2} - \frac{\omega^2}{2} \] Therefore: \[ \hat{f}(\omega) = e^{-\frac{\omega^2}{2}} \int_{-\infty}^{\infty} e^{-\frac{(x+i\omega)^2}{2}} \, dx \] By Cauchy's integral theorem (contour integration in the complex plane), shifting the contour of integration does not change the value of the integral, so: \[ \int_{-\infty}^{\infty} e^{-\frac{(x+i\omega)^2}{2}} \, dx = \int_{-\infty}^{\infty} e^{-\frac{x^2}{2}} \, dx = \sqrt{2\pi} \] Thus: \[ \hat{f}(\omega) = \sqrt{2\pi} \cdot e^{-\frac{\omega^2}{2}} \] This remarkable result shows that the Gaussian function is an eigenfunction of the Fourier transform: its transform is also a Gaussian (up to a constant). This property makes Gaussian functions fundamental in signal processing, probability theory, and quantum mechanics.

Properties of Fourier Transform

The Fourier transform has several important properties that make it a powerful tool for analysis:

1. Linearity: \[ \mathcal{F}\{\alpha f + \beta g\} = \alpha \mathcal{F}\{f\} + \beta \mathcal{F}\{g\} \] for constants \(\alpha, \beta \in \mathbb{C}\).

2. Translation (Shift) Property: \[ \mathcal{F}\{f(x - a)\}(\omega) = e^{-i\omega a}\hat{f}(\omega) \] A shift in time corresponds to a phase shift in frequency.

3. Modulation Property: \[ \mathcal{F}\{e^{i\omega_0 x}f(x)\}(\omega) = \hat{f}(\omega - \omega_0) \] Multiplication by a complex exponential shifts the frequency spectrum.

4. Scaling Property: \[ \mathcal{F}\{f(ax)\}(\omega) = \frac{1}{|a|}\hat{f}\left(\frac{\omega}{a}\right), \quad a \neq 0 \] Compressing a signal in time stretches its frequency spectrum, and vice versa. This is known as the uncertainty principle in signal processing.

5. Differentiation Property: \[ \mathcal{F}\{f'(x)\}(\omega) = i\omega \hat{f}(\omega) \] More generally, for the \(n\)-th derivative: \[ \mathcal{F}\{f^{(n)}(x)\}(\omega) = (i\omega)^n \hat{f}(\omega) \] This transforms differentiation into multiplication, which is why Fourier transforms are useful for solving differential equations.

6. Plancherel's Theorem (Generalized Parseval): \[ \int_{-\infty}^{\infty} |f(x)|^2 \, dx = \frac{1}{2\pi}\int_{-\infty}^{\infty} |\hat{f}(\omega)|^2 \, d\omega \] This generalizes Parseval's identity to the continuous case, showing energy conservation between time and frequency domains. The Fourier transform is a unitary operator on \(L^2(\mathbb{R})\) (up to normalization).

7. Duality (Fourier Inversion): \[ \mathcal{F}\{\mathcal{F}\{f\}\}(x) = 2\pi f(-x) \] Applying the Fourier transform twice (up to normalization and sign change) returns the original function, demonstrating a deep symmetry between time and frequency domains.

Convolution Theorem

The convolution of two functions \(f\) and \(g\) is defined as: \[ (f * g)(x) = \int_{-\infty}^{\infty} f(t)g(x-t) \, dt \]
The convolution theorem states that the Fourier transform converts convolution into pointwise multiplication: \[ \mathcal{F}\{f * g\}(\omega) = \hat{f}(\omega) \cdot \hat{g}(\omega) \] and conversely: \[ \mathcal{F}\{f \cdot g\}(\omega) = \frac{1}{2\pi}(\hat{f} * \hat{g})(\omega) \]
This is one of the most important properties of the Fourier transform. In many applications, computing a convolution directly is computationally expensive (\(O(n^2)\) operations for discrete signals), but using the FFT to compute the transform, multiplying in frequency domain, and transforming back can be much faster (\(O(n \log n)\) operations).

The convolution theorem is fundamental in:

Signal processing: Filtering operations are implemented as convolutions
Image processing: Blurring, sharpening, and edge detection use convolution kernels
Deep learning: Convolutional neural networks (CNNs) use convolution operations
Probability theory: The PDF of a sum of independent random variables is the convolution of their individual PDFs

Proof of Convolution Theorem: Starting with the definition of the Fourier transform: \[ \begin{align*} \mathcal{F}\{f * g\}(\omega) &= \int_{-\infty}^{\infty} (f * g)(x)e^{-i\omega x} \, dx \\\\ &= \int_{-\infty}^{\infty} \left(\int_{-\infty}^{\infty} f(t)g(x-t) \, dt\right) e^{-i\omega x} \, dx \\\\ &= \int_{-\infty}^{\infty} f(t) \left(\int_{-\infty}^{\infty} g(x-t)e^{-i\omega x} \, dx\right) dt \end{align*} \] Let \(u = x - t\), then \(dx = du\) and: \[ \begin{align*} &= \int_{-\infty}^{\infty} f(t) \left(\int_{-\infty}^{\infty} g(u)e^{-i\omega (u+t)} \, du\right) dt \\\\ &= \int_{-\infty}^{\infty} f(t)e^{-i\omega t} \left(\int_{-\infty}^{\infty} g(u)e^{-i\omega u} \, du\right) dt \\\\ &= \left(\int_{-\infty}^{\infty} f(t)e^{-i\omega t} \, dt\right) \cdot \left(\int_{-\infty}^{\infty} g(u)e^{-i\omega u} \, du\right) \\\\ &= \hat{f}(\omega) \cdot \hat{g}(\omega) \end{align*} \]

Discrete Fourier Transform (DFT)

In practice, we often work with discrete, finite sequences of data rather than continuous functions. The Discrete Fourier Transform (DFT) is the discrete analog of the continuous Fourier transform.

For a sequence of \(N\) complex numbers \(\{x_0, x_1, \ldots, x_{N-1}\}\), the DFT is defined as: \[ X_k = \sum_{n=0}^{N-1} x_n e^{-\frac{2\pi i}{N}kn}, \quad k = 0, 1, \ldots, N-1 \]
The inverse DFT (IDFT) recovers the original sequence: \[ x_n = \frac{1}{N}\sum_{k=0}^{N-1} X_k e^{\frac{2\pi i}{N}kn}, \quad n = 0, 1, \ldots, N-1 \]
The DFT can be expressed as a matrix multiplication. Define the \(N \times N\) DFT matrix: \[ W = \begin{bmatrix} 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega & \omega^2 & \cdots & \omega^{N-1} \\ 1 & \omega^2 & \omega^4 & \cdots & \omega^{2(N-1)} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{N-1} & \omega^{2(N-1)} & \cdots & \omega^{(N-1)^2} \end{bmatrix} \] where \(\omega = e^{-\frac{2\pi i}{N}}\) is a primitive \(N\)-th root of unity.

Then the DFT can be written as: \[ \mathbf{X} = W\mathbf{x} \] where \(\mathbf{x} = [x_0, x_1, \ldots, x_{N-1}]^T\) and \(\mathbf{X} = [X_0, X_1, \ldots, X_{N-1}]^T\).

The DFT matrix has remarkable properties:

The columns of \(W\) are orthogonal (but not orthonormal)
\(W^*W = NI\), where \(W^*\) is the conjugate transpose
The inverse is \(W^{-1} = \frac{1}{N}W^*\)

Direct computation of the DFT using this matrix multiplication requires \(O(N^2)\) operations. However, the Fast Fourier Transform (FFT) algorithm reduces this to \(O(N \log N)\), making Fourier analysis practical for large datasets.

Fast Fourier Transform (FFT)

The Fast Fourier Transform (FFT) is not a different transform, but rather an efficient algorithm for computing the DFT. The most common FFT algorithm is the Cooley-Tukey algorithm, which uses a divide-and-conquer approach.

The key insight is that the DFT of size \(N\) can be expressed in terms of two DFTs of size \(\frac{N}{2}\) (assuming \(N\) is a power of 2). We split the sequence into even and odd indexed elements: \[ \begin{align*} X_k &= \sum_{n=0}^{N-1} x_n e^{-\frac{2\pi i}{N}kn} \\\\ &= \sum_{m=0}^{N/2-1} x_{2m} e^{-\frac{2\pi i}{N}k(2m)} + \sum_{m=0}^{N/2-1} x_{2m+1} e^{-\frac{2\pi i}{N}k(2m+1)} \\\\ &= \sum_{m=0}^{N/2-1} x_{2m} e^{-\frac{2\pi i}{N/2}km} + e^{-\frac{2\pi i}{N}k}\sum_{m=0}^{N/2-1} x_{2m+1} e^{-\frac{2\pi i}{N/2}km} \end{align*} \] Define \(E_k\) as the \(k\)-th element of the DFT of the even-indexed subsequence and \(O_k\) as the \(k\)-th element of the DFT of the odd-indexed subsequence. Then for \(k = 0, 1, \ldots, N/2-1\): \[ X_k = E_k + e^{-\frac{2\pi i}{N}k}O_k \] and using the periodicity of DFT, for \(k = N/2, \ldots, N-1\): \[ X_k = E_{k-N/2} - e^{-\frac{2\pi i}{N}(k-N/2)}O_{k-N/2} \]
This decomposition can be applied recursively when \(N\) is a power of 2. The recurrence relation is: \[ T(N) = 2T(N/2) + O(N) \] which gives: \[ T(N) = O(N \log N) \]
The FFT algorithm revolutionized signal processing when it was popularized by Cooley and Tukey in 1965. It made real-time digital signal processing feasible and is now fundamental to:

Audio and video compression (MP3, JPEG, MPEG)
Wireless communications (OFDM in WiFi, 4G/5G)
Medical imaging (MRI reconstruction)
Numerical solution of PDEs

Modern implementations like FFTW (Fastest Fourier Transform in the West) use advanced techniques to achieve near-optimal performance on various hardware architectures.

Applications in Machine Learning

Fourier methods have numerous applications in modern machine learning and data science:

1. Random Fourier Features:
As mentioned in the classification section, random Fourier features provide an explicit approximation to shift-invariant kernel functions. For a shift-invariant kernel \(k(x, y) = k(x - y)\), Bochner's theorem states that \(k\) can be represented as the Fourier transform of a non-negative measure. Specifically, if \(k\) is a continuous, positive definite, shift-invariant kernel on \(\mathbb{R}^d\), then there exists a non-negative finite measure \(\mu\) such that: \[ k(\delta) = \int_{\mathbb{R}^d} e^{i\omega^T\delta} d\mu(\omega) \]
For the RBF (Gaussian) kernel \(k(\delta) = e^{-\frac{\|\delta\|^2}{2\sigma^2}}\), the measure corresponds to \(\mu \sim \mathcal{N}(0, \sigma^{-2}I)\).

We can approximate the kernel using Monte Carlo sampling. For real-valued features: \[ z_\omega(x) = \sqrt{2}\cos(\omega^Tx + b) \] where \(\omega\) is sampled from \(\mu\) and \(b \sim \text{Uniform}[0, 2\pi]\). The feature map is: \[ \phi(x) = \sqrt{\frac{2}{D}}\begin{bmatrix} \cos(\omega_1^Tx + b_1) \\ \vdots \\ \cos(\omega_D^Tx + b_D) \end{bmatrix} \] which satisfies \(\mathbb{E}_{\omega, b}[z_\omega(x)z_\omega(y)] = k(x-y)\).

This technique, introduced by Rahimi and Recht (2007), allows kernel methods to scale to large datasets by converting them into linear methods with complexity \(O(nD)\) instead of \(O(n^2)\).

2. Time Series Analysis:
Fourier transforms are essential for analyzing temporal patterns:

Spectral analysis: Identifying dominant frequencies and periodicities in data (used in finance, climate science, neuroscience)
Seasonal decomposition: Separating trend, seasonal, and residual components using Fourier basis functions
Filtering: Removing noise or extracting specific frequency bands
Forecasting: Using Fourier features as inputs to ML models for capturing periodic patterns

3. Signal and Image Processing:
FFT-based methods are ubiquitous in modern data processing:

Audio processing: Speech recognition systems convert audio to spectrograms (frequency representations over time) using Short-Time Fourier Transform (STFT)
Image compression: JPEG uses the Discrete Cosine Transform (DCT), closely related to the DFT, to compress images
Data augmentation: Manipulating frequency components for augmenting training data (e.g., SpecAugment for speech)

4. Efficient Computation in Neural Networks:
While direct application of FFT in neural networks is uncommon in practice, understanding the Fourier perspective provides valuable intuition:

Convolution operations in CNNs can theoretically be computed via FFT for very large kernels
Spectral analysis helps understand which frequencies neural networks learn to represent
Frequency-domain data augmentation (phase/amplitude manipulation) improves model robustness

Understanding Fourier methods provides essential intuition for signal processing, time series analysis, and frequency-domain thinking—skills that are invaluable across machine learning, data science, and computational applications.