Projection matrices and least squares

The lesson by Gilbert Strang “Projection matrices and least squares” is very nice and useful (you can find it here), but as often happens with him you have to demonstrate some passages alone.

Now the problem. 

Given a matrix A of real numbers with m rows and n columns, its columns span a vector subspace of  R^{m}, which corresponds to it in case m <= n and at least m columns are linearly independent. Given a vector b in R^{m} not necessarily belonging to the column space of A (C(A)), which is the nearest vector of C(A) to b?

And now we start to investigate…

First consideration.
We can restrict the columns of A just to those that are independents, because they are a basis for C(A) and so they span it all.

Second consideration.
Suppose that a vector p \epsilon C(A) exists such that e=b-p is orthogonal to C(A). In such a case would it be the solution we are looking for? Yes of course. Why?
The reason is very simple: consider any other vector of C(A), that we call p_1, then b=p_1+e_1. Is e_1 longer or shorter than e? It’s longer. Indeed b=p1+(p-p1)+e, but then e1=(p-p1)+e.
Now |e_1|^2=e_1 \cdot e_1=|p-p1|^2+|e|^2+2(p-p1) \cdot e
(where the \cdot stands for the inner product between vectors).
But e is orthogonal to C(A), so (p-p1) \cdot e=0. Finally |e_1|^2 is greater than |e|^2!

But now we have another question: does surely such a vector p exist?
From previous lessons we know that R^m is the union of 2 specific subspaces: the column space of A and the null space of A^T, which is orthogonal to C(A). So any vector  b belonging to R^m can be expressed in a unique way as a linear combination of the union of 2 basis: one from C(A) and one from N(A^T). But the combination from the first base is p and the other is e! So such a projection exists and is unique.

Now we want to find the projection. Is there a way to express it as a function of A and b?
Yes, there is. Consider the vector x of R^n such that Ax=p. We know that b=p+e and that e is orthogonal to C(A).
So e=Ax-b is orthogonal to C(A).This can be expressed using the inner product as follows: (Ax-b)Az=0 for any z belonging to R^n. But then it means that (Ax-b)^TAz = 0 for any z.
As a consequence it means that the transposed vector (Ax-b)^TA must be 0!

So x^TA^TA =b^TA or equivalently A^TAx=A^Tb. But we know that surely such an x exists and that it is unique too: in fact Ax = p, and we chose to limit the columns of A to the only independent ones.

But then it means that A^TA is invertible. So x=(AA^T)^{-1}A^Tb and p=A (AA^T)^{-1}A^Tb, where the matrix P=A (AA^T)^{-1}A^T is called the projection matrix: It allows to get the projection of any b vector in C(A).

An indirect but interesting result is that if the columns of A are independent then A^TA is invertible!


Leave a Reply

Your email address will not be published.