General Theory of Relativity

This page contains my notes and my considerations regarding the course on General Relativity: “The We-Heraeus International Winter School on Gravity and Light (2015)” available on Youtube, where Professor Frederic P. Schuller explains the matter very clearly.
I have also used the very good book “Geometrical methods of mathematical physics” by Professor Bernard Schutz to help me better understand some arguments.

I have decided to publish it as I write my notes without necessarily respecting the logic order in which topics are presented in the course. So this page is in a working progress state.

In the spirit of this site, which is also a sort of account of my journey into modern physics, sometimes I’ll write not only about the topic itself but also the difficulties I have encountered and the way I have tried to overcome them: I think this can be useful to those like me are approaching modern physics.

You can find my notes about some differential geometry arguments which are preparatory to the subsequent arguments, at this page:

1. Affine Connection

Suppose we have a function f  \in C^{\infty}(M) (where M is a d-dimensional smooth manifold) and a vector field X also in M. Let U be an open set of M where X never is null. Then there exist a congruence derived from X, that is a family of curves such that each point in U belongs to only one of them and, in any point, X can be obtained by that curve (in the usual way).
Suppose that \gamma is a curve of the congruence from X (you can find its definition and explanation in the book of prof. Schutz) and that \gamma(\lambda_0)=P. It makes sense to ask which is the value of the derivative of f in P with respect to X: it is the derivative of f along the curve \gamma in P.

Such derivative is defined as followes: \frac{df}{d\lambda}=(f\circ\gamma)^{'}(\lambda_0)

It does not depend on the coordinate system choosen in U, or in other words, it is chart independent. So it is well defined and makes sense!

What happens if we try to compute the derivative along X in P of a vector field Y or more generally of a tensor T?
We could be tempted to consider its components in a specific chart and apply to them what we have seen for a generic smooth function f. But we have 2 big problems (presented on a vector field, but easily extendable to any tensor field):

  1. the result of the operation must be a vector field, so independent of the chart used, but it’s simple to see that this is not the case;
  2. how can we compute the difference between 2 vectors defined in 2 different tangent spaces on M? They belong to 2 totally disjoint vector spaces!

Professor Schuller used a totally abstract approach, starting from the properties we expect the affine connection to have and then deriving the general formulation. The problem I encountered during the dedicated lesson was the presentation of Leibniz’s rule in 2 different ways, saying that they are equivalent without proving it (he left this task to his students). I haven’t been able to prove it myself, so that was why I started studying Professor Schutz’s book, where I found an approach clearer for myself: and now I’m back to the topic.

To solve the second of these problems we must find a way to connect the tangent spaces on M, but this means defining a way to transport a vector from a tangent space T_p_1(M) to another T_p_2(M) so that they can be “compared”. The first problem can be solved asking that the transport is chart independent.

With these premises we can define the “Affine Connection” or equivalently the “Covariant Derivative” of a tensor T in P with respect to a vector field X as follows:

\nabla_{X}T=\lim_{\epsilon \rightarrow 0}\frac{T_{\lambda_0+\epsilon}(\lambda_0)-T(\lambda_0)}{\epsilon}

where T_{\lambda_0+\epsilon}(\lambda_0) is the tensor field evaluated in \gamma(\lambda_0+\epsilon) and transported through the connection into \gamma(\lambda_0), being \gamma the curve derived from X passing through point P.

With this general definition we can now list the properties that the \nabla operator has got:

  • if T is a tensor field of type (r, s) then \nabla_{X}(T) is another tensor field of the same type.
  • if f is a smooth function on M, then \nabla_Xf = X(f)
  • \nabla_{X}(T \bigotimes S)=\nabla_{X}(T)\bigotimes\nabla_{X}(S)

this is the Leibniz’s rule in its classic form, where the symbol \bigotimes represents the tensor product, which is defined as followes:
let T be a tensor of type (r, s) and S a tensor of type (r', s'), then
(T \bigotimes S)(\tilde{\omega}^1,...,\tilde{\omega}^r,\tilde{\underline{\omega}}^{1},...,\tilde{\underline{\omega}}^{r'},\bar{v}_1,...,\bar{v}_s,\underline{\bar{v}}_1},...,\underline{\bar{v}}_{s^{'}}) =
T(\tilde{\omega}^1,...,\tilde{\omega}^r,\bar{v}_1,...,\bar{v}_s) \bigotimes S(\tilde{\underline{\omega}}^{1},...,\tilde{\underline{\omega}}^{r'},\underline{\bar{v}}_1},...,\underline{\bar{v}}_{s^{'}})


\nabla_{X}(T \bigotimes S)=\lim_{\epsilon \rightarrow 0}\frac{T_{\lambda_0+\epsilon}(\lambda_0) S_{\lambda_0+\epsilon}(\lambda_0)-T(\lambda_0) S(\lambda_0)}{\epsilon} =
=\lim_{\epsilon \rightarrow 0}\frac{T_{\lambda_0+\epsilon}(\lambda_0) S_{\lambda_0+\epsilon}(\lambda_0)-T(\lambda_0)S_{\lambda_0+\epsilon}(\lambda_0)+T(\lambda_0)S_{\lambda_0+\epsilon}(\lambda_0)-T(\lambda_0) S(\lambda_0)}{\epsilon}=
=\lim_{\epsilon \rightarrow 0}[\frac{T_{\lambda_0+\epsilon}(\lambda_0)-T(\lambda_0)}{\epsilon}S_{\lambda_0+\epsilon}(\lambda_0)]+
+\lim_{\epsilon \rightarrow 0}[T(\lambda_0) \frac{S_{\lambda_0+\epsilon}(\lambda_0)-S(\lambda_0)}{\epsilon}]

but \lim_{\epsilon \rightarrow 0}S_{\lambda_0+\epsilon}(\lambda_0)=S(\lambda_0)
because we assume that the affine connection is a continuous operator and S_{\lambda_0+\epsilon}(\lambda_0) for \epsilon=0 is equal to S(\lambda_0).
So it is proved.

  • \nabla_{X}(T(\tilde{\omega}, \bar{Y}))=\nabla_{X}T(\tilde{\omega}, \bar{Y})+T(\nabla_{X}\tilde{\omega}, \bar{Y})+T(\omega, \nabla_{X}\bar{Y})

where T is a tensor field, \tilde{\omega} is a covector field and \bar{Y} is a vector field. This is another version of Leibniz’s rule. Here prof. Schuller said it is equivalent to the classical version, but I couldn’t prove it myself, so I tried to prove it starting from the general definition as before.


For simplicity I’ll write T_{\lambda_0+\epsilon}(\lambda_0) as T_{\lambda_0+\epsilon}, just to reduce the extension of the next expressions

\nabla_{X}(T(\tilde{\omega}, \bar{Y}))=\lim_{\epsilon \rightarrow 0}\frac{T_{\lambda_0+\epsilon}(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega}, \bar{Y})}{\epsilon}
but T_{\lambda_0+\epsilon}(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega}, \bar{Y}) can be rearranged as:
T_{\lambda_0+\epsilon}(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})+T(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega},\bar{Y}_{\lambda_0+\epsilon})+T(\tilde{\omega},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega}, \bar{Y})}{\epsilon}

So \nabla_{X}(T(\tilde{\omega}, \bar{Y}))=\lim_{\epsilon \rightarrow 0}\frac{T_{\lambda_0+\epsilon}(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})}{\epsilon}+
+\lim_{\epsilon \rightarrow 0}\frac{T(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega},\bar{Y}_{\lambda_0+\epsilon})}{\epsilon}+
+\lim_{\epsilon \rightarrow 0}\frac{T(\tilde{\omega},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega}, \bar{Y})}{\epsilon}.

Now we analyze these 3 terms separately:

  1. \lim_{\epsilon \rightarrow 0}\frac{T_{\lambda_0+\epsilon}(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})}{\epsilon}=\nabla_{X}T(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})
  2. \lim_{\epsilon \rightarrow 0}\frac{T(\tilde{\omega}_{\lambda_0+\epsilon},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega},\bar{Y}_{\lambda_0+\epsilon})}{\epsilon}=\lim_{\epsilon \rightarrow 0}\frac{T(\tilde{\omega}_{\lambda_0+\epsilon}-\tilde{\omega},\bar{Y}_{\lambda_0+\epsilon})}{\epsilon}=
    =T(\lim_{\epsilon \rightarrow 0}\frac{\tilde{\omega}_{\lambda_0+\epsilon}-\tilde{\omega}}{\epsilon},\bar{Y}_{\lambda_0+\epsilon})=T(\nabla_X\tilde{\omega}, \bar{Y}_{\lambda_0+\epsilon})
    where I have exploited the multilinearity of tensors.
  3. +\lim_{\epsilon \rightarrow 0}\frac{T(\tilde{\omega},\bar{Y}_{\lambda_0+\epsilon})-T(\tilde{\omega}, \bar{Y})}{\epsilon}=T(\tilde{\omega},\lim_{\epsilon \rightarrow 0}\frac{\bar{Y}_{\lambda_0+\epsilon}-\bar{Y}}{\epsilon})
    As before I have used multilinearity of tensors.

Because of the continuity of \nabla operator we know that:

\lim_{\epsilon \rightarrow 0}\tilde{\omega}_{\lambda_0+\epsilon}=\tilde{\omega}(\lambda_0)
\lim_{\epsilon \rightarrow 0}\bar{Y}_{\lambda_0+\epsilon}=\bar{Y}(\lambda_0)

So we have demonstrated that:
\nabla_{X}(T(\tilde{\omega}, \bar{Y}))=\nabla_{X}T(\tilde{\omega}, \bar{Y})+T(\nabla_{X}\tilde{\omega}, \bar{Y})+T(\omega, \nabla_{X}\bar{Y})

  • \nabla_{fX}T=f \nabla_X T
    where f is a smooth function on M.


Consider a curve \gamma of the congruence of X passing through the point P on M and ask: does this curve belong to the congruence of the vector field fX? The answer is yes, but a reparameterization of \gamma in necessary (unless f is constant on all points of the curve).
Consider \gamma on a chart (U, x), with P\in U: if M is an d-dimensional manifold, then \gamma has d components in R^d: \gamma_i with i going from 1 to d.
We know that \frac{d\gamma_i}{d\lambda}=\frac{d(x\circ\gamma)^i}{d\lambda}=X^i(P), being \gamma(\lambda)=P. We can thus interpret the components of X in (U, x) as the velocity evaluated in such a chart of a point running along the curve according to the law: \gamma(\lambda), with \lambda considered as a time variable. Multiplyng X for f in each point of the curve, requires that point running on the curve adapts its velocity to the new values in each point: it’s intuitive it can always be done. But we can evaluate more precisely the question, as followes.
If we call \tilde\gamma the projection of \gamma on the chart (U, x), does a reparameterization of \gamma (that I call \mu(\alpha)) exist such that: (\tilde\gamma\circ\mu)^{i}^{'}(\alpha)=((fX)\circ x^{-1})^{i}(\tilde\gamma(\mu(\alpha))) being that \gamma(\mu(\alpha)) spans all the curve?
Because (\tilde\gamma\circ\mu)^{i}^{'}=\frac{d\tilde\gamma^{i}}{d\lambda}(\mu(\alpha)) \frac{d\mu}{d\alpha}(\alpha), it is equivalent to asking that \frac{d\mu}{d\alpha}(\alpha)=f(\gamma(\mu(\alpha))) for each \alpha. Given that \mu(\alpha_0)=\lambda_0 and \gamma(\lambda_0)=P, where P\in U, we can easily construct \mu so that it respect the above equation; or equivalently the above equation admits a solution.

So \nabla_{fX}T(P) = \lim_{\epsilon \rightarrow 0}\frac{T_{\alpha_0+\epsilon}(\alpha_0)-T(\alpha_0)}{\epsilon}=\lim_{\epsilon \rightarrow 0}\frac{T_{\lambda_0+(\mu(\alpha_0+\epsilon)-\mu(\alpha_0))}(\lambda_0)-T(\lambda_0)}{\mu(\alpha_0+\epsilon)-\mu(\alpha_0)}\frac{\mu(\alpha_0+\epsilon)-\mu(\alpha_0)}{\epsilon}=
=\nabla_{X}T(P) \mu^{'}(\alpha_0)=\nabla_{X}T(P) f(P).

The properties listed so far derive directly from the definition of the affine connection given before, based on the concept of transport of a tensor along a curve (element of the congruence of vector field). But now we ask the connection to respect 2 additional properties, which are:

  • \nabla_{X}(T+S)=\nabla_{X}(T)+\nabla_{X}(S)

which means that summing T and S in \lambda_0 + \epsilon and then transporting the result to P (where P is the point hit by the integral curve of X in \lambda_0) is equivalent to transport separately T(\lambda_0 + \epsilon) and S(\lambda_0 + \epsilon) in P and then summing them.

  • \nabla_{X+Y}T=\nabla_{X}T+\nabla_{Y}T

This property together with the previous one makes the affine connection linear with respect to the vector field argument on which it is based:


being f and g smooth functions on M.

Using all theese properties, it’s possible and very interesting to find how the \nabla operator transforms respectiveley a vector field, a covector field and more complex tensor field respectiveley.

1.2 Affine Connection of e Vector field Y:

\nabla_XY=\nabla_{X^m\frac{\partial }{\partial x^m}}(Y^i\frac{\partial }{\partial x^i})=X^m\nabla_{\frac{\partial }{\partial x^m}}(Y^i\frac{\partial }{\partial x^i})=
=X^m\nabla_{\frac{\partial }{\partial x^m}}(Y^i)\frac{\partial }{\partial x^i}+X^mY^i\nabla_{\frac{\partial }{\partial x^m}}(\frac{\partial }{\partial x^i})=*

Since Y^i is a function on M (or at least on an open set U of it):

X^m\nabla_{\frac{\partial }{\partial x^m}}(Y^i)=X^m\frac{\partial Y^i}{\partial x^m}=X(Y^i)

But what about \nabla_{\frac{\partial }{\partial x^m}}(\frac{\partial }{\partial x^i})? \nabla operator transform a tensor field in another of the same type, so: \nabla_{\frac{\partial }{\partial x^m}}(\frac{\partial }{\partial x^i})=\Gamma^q_{\;im}\; \frac{\partial }{\partial x^q}


\nabla_XY=X(Y^i)\;\frac{\partial }{\partial x^i}}+X^mY^i\;\Gamma^q_{\;im}\; \frac{\partial }{\partial x^q}          (1.0)

Looking at the i th component:
(\nabla_XY)^i=X(Y^i)+X^mY^j\;\Gamma^i_{\;jm} , where j and m run from 1 to d.
The \Gamma^i_{\;jm} are the connection coefficient functions, and they define entirely the affine connection. They are functions, depending on the point of the manifold in which we apply the \nabla operator. They depend on the choosen chart.

How can \Gamma^i_{\;jm} be computed?

\Gamma^i_{\;jm}=dx^i(\nabla_{\frac{\partial}{\partial x^m}}\frac{\partial}{\partial x^j}) , indeed:

choosing X=\frac{\partial}{\partial x^m} and Y=\frac{\partial}{\partial x^j} in eq. 1.0 we have that:
Y^i=\delta^i_{\;j} , so X(Y^i)=\frac{\partial Y^i}{\partial x^m} = 0;
– in X^mY^i\;\Gamma^q_{\;im}\; \frac{\partial }{\partial x^q} , Y^i = \delta^i_\;j and X^m=1 so it reduces to \Gamma^q_{\;jm}\; \frac{\partial }{\partial x^q};

Finally: \nabla_{\frac{\partial}{\partial x^m}}\frac{\partial}{\partial x^j}} = \Gamma^q_{\;jm}\; \frac{\partial }{\partial x^q}

Consequently dx^i(\nabla_{\frac{\partial}{\partial x^m}}\frac{\partial}{\partial x^j}) = dx^i(\Gamma^q_{\;jm}\; \frac{\partial }{\partial x^q}) = \Gamma^q_{\;jm}\;dx^i(\frac{\partial }{\partial x^q}) =
= \Gamma^q_{\;jm}\;\frac{\partial x^i}{\partial x^q} = \Gamma^q_{\;jm}\;\delta^i_{\;q} = \Gamma^i_{\;jm}

So the result is proved.

We will discover in a moment that they fully define not only how \nabla_X acts on a vector field, but on any type of tensor field.

1.3 Affine Connection of a Covector field \omega:


2. Riemann Tensor

It is define as followes:

Riem(\omega, Z, X, Y)=\omega(\nabla_X \nabla_Y Z - \nabla_Y \nabla_X Z - \nabla_{[X, Y]}Z)

where \omega is a covector field, while X, Y and Z are vector fields. [X, Y] is the commutator of X and Y:
[X, Y] = XY - YX .

It’s a (1, 3) tensor and its componenst are Riem^i_{jab}.

But wait, \nabla_{[X, Y]} makes sense only in case [X, Y] is a vector field and it’s not difficult to see it, applyng this operator on a smooth function f on M:
[X, Y](f) = [X^i \frac{\partial}{\partial x_i}, Y^j \frac{\partial}{\partial x_j}](f) = X^i \frac{\partial}{\partial x_i}(Y^j \frac{\partial f}{\partial x_j}) - Y^j \frac{\partial}{\partial x_j}(X^i \frac{\partial f}{\partial x_i}) =
= X^i \frac{\partial Y^j}{\partial x_i}\frac{\partial f}{\partial x_j} + X^i Y^j \frac{\partial^2 f}{\partial x_i \partial x_j} - Y^j \frac{\partial X^i}{\partial x_j}\frac{\partial f}{\partial x_i} - Y^j X^i \frac{\partial^2 f}{\partial x_j \partial x_i} ,
the second and the third element are equals and so (renaming indeces in the first element):
[X, Y](f) = X^j \frac{\partial Y^i}{\partial x_j}\frac{\partial f}{\partial x_i} - Y^j \frac{\partial X^i}{\partial x_j}\frac{\partial f}{\partial x_i} = [(X^j \frac{\partial Y^i}{\partial x_j} - Y^j \frac{\partial X^i}{\partial x_j})\frac{\partial}{\partial x_i}](f)

If we consider X, Y and Z given vector fields and \omega as a variable, we can write:

Riem(\bullet, Z, X, Y)=\bullet(\nabla_X \nabla_Y Z - \nabla_Y \nabla_X Z - \nabla_{[X, Y]}Z)

being the 2 side of equation linear maps on the parameter \omega in any point of the manifold. But a linear map from V^* to R is just a vector.

The last equation can be rearranged as followes:

\nabla_X \nabla_Y Z - \nabla_Y \nabla_X Z = Riem(\bullet, Z, X, Y) + \nabla_{[X, Y]}Z

but Riem(\bullet, Z, X, Y) is a vector field and so it can be written as:
Riem(\bullet, Z, X, Y) = R^m \frac{\partial}{\partial x^m} .

If we choose X and Y as 2 elements from a chart induced basis:
X = \frac{\partial}{\partial x^a} and Y = \frac{\partial}{\partial x^b} , [X, Y] = [\frac{\partial}{\partial x^a}, \frac{\partial}{\partial x^b}] = \frac{\partial^2}{\partial x^a \partial x^b} - \frac{\partial^2}{\partial x^b \partial x^a} = 0.

\nabla_{[X, Y]}Z = \nabla_0 Z = 0 being the integral curves of the congruence associated to the vector field null, just points and so without any possible transport of Z vector field along them.

On the other hand (R^i \frac{\partial}{\partial x^i})(dx^m) = R^i (\frac{\partial x^m}{\partial x^i}) = R^i \delta^m_i = R^m.

But (R^i \frac{\partial}{\partial x^i})(dx^m) = Riem(dx^m, Z, \frac{\partial}{\partial x^a}, \frac{\partial}{\partial x^b}) =
= Riem(dx^m, Z^j \frac{\partial}{\partial x^j}, \frac{\partial}{\partial x^a}, \frac{\partial}{\partial x^b}) = Riem^m_{\;\;\;jab}\;Z^j .

Finally, if we abbreviate \nabla_{\frac{\partial}{\partial x^a}} into \nabla_a, then:

\nabla_a \nabla_b Z - \nabla_b \nabla_a Z = Riem^m_{\;\;\;jab}\;Z^j

And we’ll see that \nabla_a \nabla_b - \nabla_b \nabla_a gives a measure of the curvature of the manifold in any point. This is why Riemann tensor is a curvature tensor.

Suppose X and Y are 2 vector fileds that commute ([X, Y] = 0) on the manifolt or in an open set of it. A seen before Riem(\bullet, Z, X, Y)=\bullet(\nabla_X \nabla_Y Z - \nabla_Y \nabla_X Z) and now we try to see how the term \nabla_X \nabla_Y - \nabla_Y \nabla_X can be interpreted geometrically.
The next figure show the reasoning:

It is based on the fact that the transport along the affine connection of the sum of 2 tensor fields is equal to the sum of the vector fields transported separately, as shown in the next figure:

Auto-parallel transported Curve

Metric Tensor, Speed and Lenght of a curve


Parallel transport and Covariant derivative

Deriving Christoffel Symbols from the Metric Tensor (with g_{ \mu , \nu ; \beta } = 0)

First Step: simmetry of Christoffel Symbols with respect to the low indices

Second Step: deriving Christoffel Symbols in the case g_{ \mu , \nu ; \beta } = 0

Proof of the Local-Flatness Theorem