Least-Squares Problems
If \(A \in \mathbb{R}^{m \times n}\) and \(b \in \mathbb{R}^m\), a least-squares solution of
\(Ax = b\) is an \(\hat{x} \in \mathbb{R}^n\) s.t.
\[\forall x \in \mathbb{R}^n, \quad \| b - A\hat{x} \| \leq \| b - Ax \|. \]
We call the norm \(\| b - A\hat{x} \| \) the least-squares error of the approximation.
In practice, it is possible that a solution of some problem \(Ax = b\) is required but it does not exist
(= the system is inconsistent). In that case, we try to find an \(x\) that makes \(Ax\) close to \(b\)
as much as possible. Let
\[\hat{b} = \text{proj}_{\text{Col }A} \, b \]
Since \(\hat{b} \in \text{Col }A \,\), \(Ax = \hat{b}\) is consistent.
This \(\hat{b}\) is the closest point to \(b\) in \(\text{Col }A\).
Then, there exists a least-squares solution \(\hat{x} \in \mathbb{R}^n\) s.t. \(A\hat{x} = \hat{b}\).
Since \(\hat{b}\) is orthogonal to \(b\), we can say that \(b - \hat{b} = b - A\hat{x}\) is orthognal
to \(\text{Col }A\). (Note:\(b = A\hat{x} + (b - A\hat{x})\))
Let \(a_j\) be a column of \(A\), then
\[
\begin{align*}
a_j \cdot (b-A\hat{x}) = 0
&\Longrightarrow a_j^T(b-A\hat{x}) = 0 \\\\
&\Longrightarrow A^T (b-A\hat{x}) = 0
\end{align*}
\]
and we obtain
\[A^TA\hat{x} = A^Tb\]
So, the set of least-squares solutions of \(Ax =b\) must satisfy the normal equations
\[A^TAx = A^Tb\]
If \(A^TA\) is invertible(= The columns of \(A\) are linearly independent.),
\[\hat{x} = (A^TA)^{-1}A^Tb\]
Finally, we also get the orthogonal projection of \(b\) onto the \(\text{Col } A\)
\[
\begin{align*}
\hat{b} &= \text{proj}_{\text{Col }A} \, b \\\\
&= A\hat{x} = A(A^TA)^{-1}A^Tb
\end{align*}
\]
In practice, computing \(A^TA\) makes relatively large errors in \(\hat{x}\). Instead, the QR factorization
gives us more reliable result. Let \(A = QR\). Then
\[
\begin{align*}
\hat{x} &= (R^TQ^TQR)^{-1}(R^TQ^T)b \\\\
&= (R^TR)^{-1}(R^TQ^T)b. \tag{1}
\end{align*}
\]
On the other hand,
\[
\begin{align*}
A\hat{x} = b &\Longrightarrow Q^TQR\hat{x} = Q^Tb \\\\
&\Longrightarrow R\hat{x} = Q^Tb \\\\
&\Longrightarrow \hat{x} = R^{-1}Q^Tb \\\\
\end{align*}
\]
Comparing with (1), \((R^TR)^{-1}R^T = R^{-1}\).
Thus, the unique least-squares solution can be generated by \(\hat{x} = R^{-1}Q^Tb\) but in practice, solving
\[R\hat{x} = Q^Tb\]
by back-substitution is much faster than computing \(R^{-1}\).