Why the columns of the Fast Fourier Transform are orthogonal

The Fast Fourier Transform is an algorithm based on a complex matrix, which we’ll call F. It’s defined as follows:

F=\begin{bmatrix}1 & 1 & 1 &\cdots & 1 \\1 & w & w^2 & \cdots & w^{n-1}\\1 & w^2 & w^4 & \cdots & w^{2(n-1)} \\\cdots & \cdots & \cdots & \cdots & \cdots\\1 & w^{n-1} & w^{2(n-1)} & \cdots & w^{(n-1)^2}\end{bmatrix}

More briefly we have that element f_{ij} of F is equal to w^{ij}, with 0<=i,j<=n-1,
where w is the complex number: w=e^{i\frac{2\pi}{n}}.

We want to show why any column of F is orthogonal to any other.

Consider 2 generic columns: C_j and C_k of F, with j<k. The inner product between them is: <C_j, C_k>=\overline{C_j}^TC_k, where with \overline{C_j}^T we mean the transposed conjugate of C_j.

It can be developed as follows:

<C_j, C_k>=\sum_{h=0}^{n-1} \overline{w^{hj}}w^{hk}=\sum_{h=0}^{n-1} (\overline{w^j})^h(w^jw^{k-j})^h}=\sum_{h=0}^{n-1} (\overline{w^j})^h(w^j)^h(w^{k-j})^h}=\sum_{h=0}^{n-1} (w^{k-j})^h}

where (\overline{w^j})^h(w^j)^h = 1 because (w^j)^h has amplitude 1 and if multiplied for its conjugate the result is its amplitude squared, so 1.

We can rewrite the last sum as follows:

<C_j, C_k>=\sum_{h=0}^{n-1} (e^{i\frac{2\pi(k-j)}{n}})^h}

Consider the set of the elements of that sum. What happens if we rotate each of them of an angle \frac{2\pi(k-j)}{n}? Does the set change? It’s simple to see that it doesn’t! The first becomes the the second, the second becomes the third and so on until the last that becomes the first. Hence also the sum result does not change. But the only complex number that does not change after a rotation is 0.

On the Determinant and the Trace of a matrix

Suppose you have a squared matrix A with n rows and columns. Is there a relationship between its eigenvalues and its determinant? And what about its trace?

Determinant of A.

Consider the determinant of \mid A- \lambda I \mid, which is a polynomial of degree n. The values of \lambda that solve the equation \mid A- \lambda I \mid = 0, are the eigenvalues of A, and are n as its degree.
We can write such a polynomial using its roots (its eigenvalues) as follows:
\mid A- \lambda I \mid = (\lambda - \lambda_1)(\lambda - \lambda_2)\cdots(\lambda - \lambda_n)

Its constant term is: (-1)^n \lambda_1\cdots\lambda_n,
but it’s also the value of \mid A- \lambda I \mid with \lambda = 0, which obviously is \mid A \mid.

So \mid A \mid = (-1)^n \lambda_1\cdots\lambda_n.

Trace of A.

If we develop the term of degree n-1 we obtain:
-(\lambda_1+\lambda_2+\cdots\+lambda_n).

For simplicity we consider the case of an 3x3 matrix:

\mid A- \lambda I \mid = \begin{bmatrix}a_{11}-\lambda & a_{12} & a_{13} \\a_{21} & a_{22}-\lambda &  a_{23} \\a_{31} & a_{32} & a_{33}-\lambda \end{bmatrix}

It can be decomposed as follows:

\mid A- \lambda I \mid = \begin{bmatrix}a_{11} & a_{12} & a_{13} \\a_{21} & a_{22}-\lambda & a_{23} \\a_{31} & a_{32} & a_{33}-\lambda \end{bmatrix}+\begin{bmatrix}-\lambda & 0 & 0 \\a_{21} & a_{22}-\lambda & a_{23} \\a_{31} & a_{32} & a_{33}-\lambda \end{bmatrix}

Now we can apply the decomposition on the second row of the second matrix:

\begin{bmatrix}-\lambda & 0 & 0 \\a_{21} & a_{22}-\lambda & a_{23} \\a_{31} & a_{32} & a_{33}-\lambda \end{bmatrix}=\begin{bmatrix}-\lambda & 0 & 0 \\a_{21} & a_{22} & a_{23} \\a_{31} & a_{32} & a_{33}-\lambda \end{bmatrix}+\begin{bmatrix}-\lambda & 0 & 0 \\0 & -\lambda & 0 \\a_{31} & a_{32} & a_{33}-\lambda \end{bmatrix}

And finally we can apply the decomposition on the third row of the last matrix:


\begin{bmatrix}-\lambda & 0 & 0 \\0 & -\lambda & 0 \\a_{31} & a_{32} & a_{33}-\lambda \end{bmatrix}=\begin{bmatrix}-\lambda & 0 & 0 \\0 & -\lambda & 0 \\a_{31} & a_{32} & a_{33} \end{bmatrix}+\begin{bmatrix}-\lambda & 0 & 0 \\0 & -\lambda & 0 \\0 & 0 & -\lambda \end{bmatrix}

The matrices that give a contribute to the n-1 degree of \mid A- \lambda I \mid are:

D1=\begin{bmatrix}a_{11} & a_{12} & a_{13} \\a_{21} & a_{22}-\lambda & a_{23} \\a_{31} & a_{32} & a_{33}-\lambda \end{bmatrix}

D2=\begin{bmatrix}-\lambda & 0 & 0 \\a_{21} & a_{22} & a_{23} \\a_{31} & a_{32} & a_{33}-\lambda \end{bmatrix}

D3=\begin{bmatrix}-\lambda & 0 & 0 \\0 & -\lambda & 0 \\a_{31} & a_{32} & a_{33} \end{bmatrix}

So their n-1 degree contribute is: (a_{11}+a_{22}+a_{33}).

Generalizing the contribution is: (-1)^{n-1}(a_{11}+\cdots+a_{nn}).

Finally Trace(A)=\sum_1^n a_{ii}=(-1)^n\sum_1^n \lambda_{j}

Projection matrices and least squares

The lesson by Gilbert Strang “Projection matrices and least squares” is very nice and useful (you can find it here), but as often happens with him you have to demonstrate some passages alone.

Now the problem. 

Given a matrix A of real numbers with m rows and n columns, its columns span a vector subspace of  R^{m}, which corresponds to it in case m <= n and at least m columns are linearly independent. Given a vector b in R^{m} not necessarily belonging to the column space of A (C(A)), which is the nearest vector of C(A) to b?

And now we start to investigate…

First consideration.
We can restrict the columns of A just to those that are independents, because they are a basis for C(A) and so they span it all.

Second consideration.
Suppose that a vector p \epsilon C(A) exists such that e=b-p is orthogonal to C(A). In such a case would it be the solution we are looking for? Yes of course. Why?
The reason is very simple: consider any other vector of C(A), that we call p_1, then b=p_1+e_1. Is e_1 longer or shorter than e? It’s longer. Indeed b=p1+(p-p1)+e, but then e1=(p-p1)+e.
Now |e_1|^2=e_1 \cdot e_1=|p-p1|^2+|e|^2+2(p-p1) \cdot e
(where the \cdot stands for the inner product between vectors).
But e is orthogonal to C(A), so (p-p1) \cdot e=0. Finally |e_1|^2 is greater than |e|^2!

But now we have another question: does surely such a vector p exist?
From previous lessons we know that R^m is the union of 2 specific subspaces: the column space of A and the null space of A^T, which is orthogonal to C(A). So any vector  b belonging to R^m can be expressed in a unique way as a linear combination of the union of 2 basis: one from C(A) and one from N(A^T). But the combination from the first base is p and the other is e! So such a projection exists and is unique.

Now we want to find the projection. Is there a way to express it as a function of A and b?
Yes, there is. Consider the vector x of R^n such that Ax=p. We know that b=p+e and that e is orthogonal to C(A).
So e=Ax-b is orthogonal to C(A).This can be expressed using the inner product as follows: (Ax-b)Az=0 for any z belonging to R^n. But then it means that (Ax-b)^TAz = 0 for any z.
As a consequence it means that the transposed vector (Ax-b)^TA must be 0!

So x^TA^TA =b^TA or equivalently A^TAx=A^Tb. But we know that surely such an x exists and that it is unique too: in fact Ax = p, and we chose to limit the columns of A to the only independent ones.

But then it means that A^TA is invertible. So x=(AA^T)^{-1}A^Tb and p=A (AA^T)^{-1}A^Tb, where the matrix P=A (AA^T)^{-1}A^T is called the projection matrix: It allows to get the projection of any b vector in C(A).

An indirect but interesting result is that if the columns of A are independent then A^TA is invertible!

 

The transpose of a matrix multiplied by itself

Suppose A is an m*n matrix with real values. It has a Null Space N(A) and a rank r. Can we infer N(ATA) and its rank?

We know that N(A) is contained in N(ATA), because if Ax = 0 then ATAx = 0. But how can we be sure that no x exists such that Ax != 0 but ATAx = 0?

Ax is a combinations of the columns of A, so it belongs to the columns space of A ( C(A) ) or equivalently to the row space of AT. At the same time, if AT(Ax) = 0, then it means that Ax belongs to the null space of AT. But we know that these 2 vector subspaces are orthogonal and share only the 0 vector; otherwise it would be that (Ax)T(Ax) = 0 while Ax != 0, but the inner product of a real vector is the square of its length, so it cannot be 0 for a non zero vector!
And this demonstrate that N(ATA) = N(A).

Because the rank of a matrix m*n is equal to n – dimension of N(matrix), we can also say that the rank(ATA) = rank(A).