The lesson by Gilbert Strang “Projection matrices and least squares” is very nice and useful (you can find it here), but as often happens with him you have to demonstrate some passages alone.
Now the problem.
Given a matrix A of real numbers with rows and columns, its columns span a vector subspace of , which corresponds to it in case and at least columns are linearly independent. Given a vector b in not necessarily belonging to the column space of (), which is the nearest vector of to ?
And now we start to investigate…
We can restrict the columns of just to those that are independents, because they are a basis for and so they span it all.
Suppose that a vector exists such that is orthogonal to . In such a case would it be the solution we are looking for? Yes of course. Why?
The reason is very simple: consider any other vector of , that we call , then . Is longer or shorter than ? It’s longer. Indeed , but then .
(where the stands for the inner product between vectors).
But is orthogonal to C(A), so . Finally is greater than !
But now we have another question: does surely such a vector exist?
From previous lessons we know that is the union of 2 specific subspaces: the column space of A and the null space of , which is orthogonal to . So any vector belonging to can be expressed in a unique way as a linear combination of the union of 2 basis: one from and one from . But the combination from the first base is and the other is ! So such a projection exists and is unique.
Now we want to find the projection. Is there a way to express it as a function of and ?
Yes, there is. Consider the vector of such that . We know that and that is orthogonal to .
So is orthogonal to .This can be expressed using the inner product as follows: for any belonging to . But then it means that for any .
As a consequence it means that the transposed vector must be 0!
So or equivalently . But we know that surely such an exists and that it is unique too: in fact , and we chose to limit the columns of A to the only independent ones.
But then it means that is invertible. So and , where the matrix is called the projection matrix: It allows to get the projection of any vector in .
An indirect but interesting result is that if the columns of A are independent then is invertible!
Suppose A is an m*n matrix with real values. It has a Null Space N(A) and a rank r. Can we infer N(ATA) and its rank?
We know that N(A) is contained in N(ATA), because if Ax = 0 then ATAx = 0. But how can we be sure that no x exists such that Ax != 0 but ATAx = 0?
Ax is a combinations of the columns of A, so it belongs to the columns space of A ( C(A) ) or equivalently to the row space of AT. At the same time, if AT(Ax) = 0, then it means that Ax belongs to the null space of AT. But we know that these 2 vector subspaces are orthogonal and share only the 0 vector; otherwise it would be that (Ax)T(Ax) = 0 while Ax != 0, but the inner product of a real vector is the square of its length, so it cannot be 0 for a non zero vector!
And this demonstrate that N(ATA) = N(A).
Because the rank of a matrix m*n is equal to n – dimension of N(matrix), we can also say that the rank(ATA) = rank(A).