6. Relativistic quantum mechanics#

So far, our quantum mechanical descriptions have been of massive particles with velocities much lower than the speed of light. This formalism gets us a long way: we can use it to describe all of chemistry, and, by extension, it has many applications in biology. Nonrelativistic quantum mechanics also gives us basic physics insights, like in the emission spectra of atoms, and engineering applications including lasers, NMR, qubits, and quantum computers. However, it intrinsically cannot give us a description of light, while light does have an innate quantum nature, and it was the quantum nature of light that triggered the quantum revolution. Because light, inevitably, travels at the speed of light, we will need to include relativistic effects in our theory if we want it to describe light (and all processes where light interacts with matter), not just as a correction like in our discussion of the fine structure of hydrogen in Section 4.3, but at the basis of our theory. In this theory, we will combine quantum mechanics with the special theory of relativity. Its ultimate form, quantum chromodynamics (QCD), is very powerful and extremely accurate, combining three of the four known fundamental forces. It is however not complete: gravity isn’t part of the theory, and at present nobody knows how to integrate QCD with the general theory of relativity.

6.1. The central equation of relativistic quantum mechanics#

6.1.1. The Klein-Gordon equation#

A first attempt at constructing a base equation for relativistic quantum mechanics could be to ‘quantize’ the special theory of relativity. This attempt is based on the observation that the Schrödinger equation, after a fashion, can be seen as the ‘quantization’ of the classical equation for conservation of energy:

(6.1)#\[ E = K + V = \frac{p^2}{2m} + V. \]

To turn equation (6.1) into a quantum one, we apply the same procedure we used to arrive at quantum-mechanical analogs of the angular momentum (see Section 3.1): we replace the momentum and energy with quantum operators:

(6.2)#\[ \bm{p} \to i \hbar \bm{\nabla} \qquad \text{and} \qquad E \to i \hbar \frac{\partial}{\partial t}. \]

We also replace the potential \(V\) with the potential energy operator \(\hat{V}\). If we make these substitutions in equation (6.1) and then have both sides act on a wave function \(\Psi(\bm{x}, t)\), we indeed arrive at the Schrödinger equation:

(6.3)#\[ i \hbar \frac{\partial \Psi}{\partial t} = - \frac{\hbar^2}{2m} \nabla^2 \Psi + \hat{V} \Psi. \]

Note that this procedure does not give us a true ‘derivation’ of the Schrödinger equation (we still need Axiom 1.2), as the ‘quantization’ recipe in equation (6.2) follows from the Schrödinger equation. However, it does give us an idea about how we could extend quantum mechanics to relativistic systems, as we know that in relativity, the energy equation gets an extra term (see equation (4.28)):

(6.4)#\[ E^2 = m^2 c^4 + p^2 c^2. \]

Giving equation (6.4) the same ‘quantization treatment’ as equation (6.1), we arrive at

(6.5)#\[ -\frac{1}{c^2} \frac{\partial^2 \psi}{\partial t^2} + \nabla^2 \psi = \frac{m^2 c^2}{\hbar^2} \psi. \]

Equation (6.5) is known as the Klein-Gordon equation. Unlike the Schrödinger equation, it is a proper wave equation, and it is the correct relativistic quantum equation for spin-\(0\) particles. Unfortunately however, the particles of interest all have nonzero spin, and we need a more general form.

6.1.2. Four-vectors#

In equation (6.4), the momentum \(p\) is the ‘three-momentum’, the (length of) the classical momentum vector \(\bm{p}\). In relativity theory, we work with four-vectors, which describe relativistic quantities in the four dimensions of spacetime, reflecting the relativistic notion that time is no longer ‘just’ a parameter, but a dimension, and transformations from one (inertial) frame of reference to another affect both the spatial and temporal coordinates. In short[1], four-vectors get an extra (‘zeroth’) component, which for the position vector represents the time, and for the momentum vector the energy (times factors of \(c\), the speed of light, which is a universal constant):

(6.6)#\[\begin{split} \bm{\bar{x}} = \begin{pmatrix} ct \\ x \\ y \\ z \end{pmatrix} = \begin{pmatrix} x^0 \\ x^1 \\ x^2 \\ x^3 \end{pmatrix} \qquad \text{and} \qquad \bm{\bar{p}} = \begin{pmatrix} E/c \\ p_x \\ p_y \\ p_z \end{pmatrix} = \begin{pmatrix} p^0 \\ p^1 \\ p^2 \\ p^3 \end{pmatrix}. \end{split}\]

When making a transformation from one inertial frame to another (e.g. from that of an observer on a platform, to that of an observer on a train moving at constant velocity), the coordinates change according to the Lorentz transformations, \(\bm{\bar{x}}' = \bm{L} \bm{\bar{x}}\), which can be expressed in matrix form:

(6.7)#\[\begin{split} \begin{pmatrix} x'^0 \\ x'^1 \\ x'^2 \\ x'^3 \end{pmatrix} = \begin{pmatrix} \gamma(u) & - \gamma(u) \frac{u}{c} & 0 & 0 \\ - \gamma(u) \frac{u}{c} & \gamma(u) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x^0 \\ x^1 \\ x^2 \\ x^3 \end{pmatrix} \end{split}\]

for a transformation between a stationary frame \(S\) with coordinates \(\bm{x}\) and a frame \(S'\) with coordinates \(\bm{x}'\) moving in the positive \(x\) direction with speed \(u\) with respect to frame \(S\). Here \(\gamma(u)\) is the contraction factor from special relativity,

(6.8)#\[ \gamma(u) = \frac{1}{\sqrt{1-(u/v)^2}}. \]

Relativistic four-vectors form a space which is close but not equal to \(\mathbb{R}^4\), as they have an inner product[2] which is defined[3] differently[4]:

(6.9)#\[ \bm{\bar{x}} \cdot \bm{\bar{y}} = x^0 y^0 - x^1 y^1 - x^2 y^2 - x^3 y^3. \]

An easy calculation shows that the ‘length’ of a four-vector (the quantity \(\bm{\bar{x}} \cdot \bm{\bar{x}}\)), and by extension, the inner product between any two four-vectors, is invariant under Lorentz transformations.

To distinguish between three-vectors and four-vectors, components of three-vectors are indicated with Roman indices like \(p_i\), while those of four-vectors are indicated with Greek ones, like \(p^\mu\). The upper index represents a (standard) column-vector like configuration (known as the contravariant components of the vector). We also have a version with a lower index, corresponding to a row-vector like configuration (known as the covariant components); the inner product can be written as \(\bm{\bar{x}} \cdot \bm{\bar{y}} = x_\mu y^\mu = x^\mu y_\mu\), where the sum over \(\mu\) (ranging from \(0\) to \(3\)) is implicit[5]. Within special relativity, the covariant components of a vector are the same as the contravariant ones, except for a minus sign on the space components: \(x_0 = x^0\), but \(x_i = - x^i\). We can summarize these relations using the metric tensor, which for special relativity is usually written as \(\eta^{\mu \nu}\) (and its inverse, \(\eta_{\mu \nu}\)), to distinguish from the general relativity version \(g^{\mu \nu}\). Using the Einstein summation convention, the metric tensor is defined through

(6.10)#\[ \bm{\bar{x}} \cdot \bm{\bar{y}} = x^\mu y_\mu = x_\mu y^\mu = \eta_{\mu \nu} x^\mu y^\nu = \eta^{\mu \nu} x_\mu y_\nu, \]

from which we can read off that

(6.11)#\[ \eta^{00} = 1, \quad \eta^{11} = \eta^{22} = \eta^{33} = -1, \quad \eta^{\mu \nu} = 0 \;\;\text{if}\;\; \mu \neq \nu. \]

The coefficients of \(\eta_{\mu \nu}\) are the same as those of \(\eta^{\mu \nu}\).

Derivatives can be taken with respect to any of the four components of the position (or time-position) vector; in short-hand notation, we have[6]

(6.12)#\[\begin{align*} \partial_\mu f &= \frac{\partial f}{\partial x^\mu} \end{align*}\]
(6.13)#\[\begin{align*} \square f = \partial^\mu \partial_\mu f &= \frac{1}{c^2} \frac{\partial^2 f}{\partial t^2} - \nabla^2 f. \end{align*}\]

Equation (6.12) thus generalizes the partial derivative, and (6.13) the Laplacian; the operator \(\square\) is known as the d’Alembertian.

In terms of four-vectors, we can re-write equation (6.4) in (even) more concise form:

(6.14)#\[ \bm{\bar{p}} \cdot \bm{\bar{p}} = m^2 c^2. \]

If we now apply the ‘quantization recipe’ of equation (6.2) to our four-vectors, we get

(6.15)#\[\begin{split}\begin{align*} p_\mu &\to i \hbar \partial_\mu = i \hbar \frac{\partial}{\partial x^\mu}\\ p_0 &\to i \hbar \partial_0 = \frac{i\hbar}{c} \frac{\partial}{\partial t} \qquad \text{and} \qquad p_i \to i \hbar \partial_i = i \hbar \frac{\partial}{\partial x^i}. \end{align*}\end{split}\]

Unsurprisingly, just substituting the ‘quantization’ of the four-momentum in equation (6.14) and have it act on a wave function again gives us the Klein-Gordon equation, albeit in more concise form:

(6.16)#\[ -\hbar^2 \square \psi = -\hbar^2 \partial^\mu \partial_\mu \psi = m^2 c^2 \psi. \]

However, you might now guess where things go wrong: rather than ‘applying’ \(\bm{\bar{p}} \cdot \bm{\bar{p}}\), which only gives us the magnitude of the four-momentum, we’ll want each individual component, and will therefore have to ‘factorize’ equation (6.14) to get a more detailed view.

6.1.3. The Dirac equation#

To motivate why we’d want to factorize equation (6.14), let’s consider the case of a stationary particle[7], for which the three-momentum is zero, and we only have one nonzero component of the four-momentum, \(p^0\), directly related to its energy. In that case, equation (6.14) simplifies to

(6.17)#\[ 0 = p^0 p_0 - m^2 c^2 = (p^0)^2 - m^2 c^2 = (p^0 + mc)(p^0 - mc), \]

where we used that \(p_0 = p^0\) (i.e., the zeroth component of the covariant and contravariant representations are identical) in special relativity. We find that we have two solutions: either \(p^0 = mc\) or \(p^0 = -mc\). As \(p^0\) is the energy of our particle, classically we’d dismiss the second solution, but as we’ll see below, we will in fact always get two solutions in relativistic quantum mechanics.

Unfortunately, the factorization for a moving particle is more involved, because with the extra components we get cross terms, and moreover the covariant and contravariant components are no longer identical. Introducing new components \(\beta^\nu\) and \(\gamma^\lambda\) (so four each, for \(\nu, \lambda = 0, 1, 2, 3\)), we can formally proceed with the factorization:

(6.18)#\[\begin{split}\begin{align*} 0 &= \bm{\bar{p}} \cdot \bm{\bar{p}} - m^2 c^2 = p^\mu p_\mu - m^2 c^2 \\ &= \left( \beta^\nu p_\nu + mc \right) \left( \gamma^\lambda p_\lambda - mc \right) \\ &= \beta^\nu \gamma^\lambda p_\nu p_\lambda - mc \left(\beta^\nu - \gamma^\nu \right) p_\nu - m^2 c^2 \end{align*}\end{split}\]

Note that the first term in the last line of (6.18) is a sum over sixteen terms, and the second a sum over four terms. The second term however should vanish, as in the original sum in the first line of (6.18) there are no linear terms in the momentum. Therefore, we have \(\beta^\nu = \gamma^\nu\), and we’re left with four unknowns, the coefficients \(\gamma^\nu\), which satisfy \(p^\mu p_\mu = \gamma^\nu \gamma^\lambda p_\nu p_\lambda\). By writing out the four terms on the left and sixteen terms on the right of this equation, we get

(6.19)#\[\begin{split}\begin{align*} &\left(\gamma^0\right)^2 = 1, \qquad \left(\gamma^1\right)^2 = \left(\gamma^2\right)^2 = \left(\gamma^3\right)^2 = -1, \\ &\gamma^\nu \gamma^\lambda + \gamma^\lambda \gamma^\nu = 0 \qquad \text{if}\; \nu \neq \lambda. \end{align*}\end{split}\]

We can summarize equations (6.19) using the anticommutator, \(\{a, b\} = ab + ba\), and the metric tensor \(\eta^{\mu \nu}\):

(6.20)#\[ \left\{ \gamma^\mu, \gamma^\nu \right\} = 2 \eta^{\mu \nu}. \]

There is no solution of equations (6.20) in terms of numbers. However, there are solutions in which the coefficients \(\gamma^\mu\) are matrices. The smallest solutions are \(4 \times 4\) matrices, which can be expressed in terms of the \(2 \times 2\) identity matrix \(I_2\), the \(2 \times 2\) Pauli spin matrices \(\sigma^i\) (equation (3.50)), and the \(2 \times 2\) zero matrices \(0_2\):

(6.21)#\[\begin{split} \gamma^0 = \begin{pmatrix} I_2 & 0_2 \\ 0_2 & I_2 \end{pmatrix}, \qquad \gamma^i = \begin{pmatrix} 0_2 & \sigma^i \\ -\sigma^i & 0_2 \end{pmatrix}. \end{split}\]

With these matrices as the coefficients, we can finally factorize equation (6.18), ‘quantize’ the momenta \(p_\mu\), and have them act on a quantum function \(\psi\), which gives us the Dirac equation:

(6.22)#\[ i \hbar \gamma^\mu \partial_\mu \psi - m c \psi = 0, \]

where

(6.23)#\[\begin{split} \psi = \begin{pmatrix} \psi_1 \\ \psi_2 \\ \psi_3 \\ \psi_4 \end{pmatrix} \end{split}\]

is known as the bispinor or Dirac spinor. Note that \(\psi\) is not a four-vector; it simply contains four pieces of information that can be cast in (regular) vector form.

6.1.4. Solutions to the Dirac equation#

To understand what the bispinor represents, it is helpful to look at a few simple solutions to the Dirac equation. These solutions are similar to the free particle solutions of the Schrödinger equation: they are solutions to the equation, but do not represent actual physical particles, as the solutions are not normalizable. Nonetheless, they give us some insight into the nature of the solutions, and like for the free particle, we can use them as a basis for solutions that are normalizable.

6.1.4.1. Particle at rest#

If a particle is at rest, its three-momentum \(\bm{p}\) is zero. In this case, the Dirac spinor is independent of position, i.e., all three spatial derivatives vanish:

(6.24)#\[ \frac{\partial \psi}{\partial x} = \frac{\partial \psi}{\partial y} = \frac{\partial \psi}{\partial z} = 0. \]

The Dirac equation now simplifies to a first-order differential equation in time:

(6.25)#\[ \frac{i \hbar}{c} \gamma^0 \frac{\partial \psi}{\partial t} - m c \psi = 0, \]

or

(6.26)#\[\begin{split} \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} \begin{pmatrix} \partial_t \psi_A \\ \partial_t \psi_B \end{pmatrix} = -i \frac{m c^2}{\hbar} \begin{pmatrix} \psi_A \\ \psi_B \end{pmatrix}, \end{split}\]

where \(\psi_A = \begin{pmatrix} \psi_1 \\ \psi_2 \end{pmatrix}\) and \(\psi_B = \begin{pmatrix} \psi_3 \\ \psi_4 \end{pmatrix}\) are two spinors as we’ve encountered them in Section 3.4. Because equation (6.26) separates into two equations for \(\psi_A\) and \(\psi_B\), we can solve them separately, and find

(6.27)#\[ \psi_A(t) = \psi_A(0) \exp\left(-\frac{i mc^2}{\hbar}t \right), \qquad \psi_B(t) = \psi_B(0) \exp\left(\frac{i mc^2}{\hbar}t \right). \]

These solutions are identical to those of the Schrödinger equation, for particles that have (rest) energy \(E = mc^2\) (for \(\psi_A\)) and \(E = - mc^2\) (for \(\psi_B\)), respectively.

The Dirac equation thus predicts that there are two solutions: one with rest energy \(m c^2\), as we would expect from the theory of special relativity, and a second one with rest energy \(- mc^2\). To interpret this result, Dirac pictured the vacuum not as empty, but as filled exactly up to \(E = 0\) with particles, similar to how a metal is filled up to its Fermi energy with electrons. From this ‘sea’ of electrons, we can ‘free’ one (giving it a higher energy, allowing it to move around), but that act also creates a ‘hole’ with an effective positive charge in the remaining material. The hole moreover also moves around. If the electron and hole come together again, they can annihilate each other, going back to the initial state. Likewise, for every particle with mass \(m\) we ‘create’ from the vacuum, we need to open up a ‘hole’ with mass \(-m\). We can observe the hole - not as a particle with negative mass, but as an antiparticle with identical mass (it behaves like a particle with mass \(m\)) but opposite charge to the ‘regular’ particle.

The existence of antiparticles is a direct consequence of the Dirac equation, as published by Dirac in 1928 [Dirac, 1928], and realized in full by Oppenheimer in 1930 [Oppenheimer, 1930]. The first antiparticles, ‘anti-electrons’ (now known as positrons) were discovered already in 1932 by Anderson [Anderson, 1932], earning him (half of) the 1936 Nobel prize in physics; Schrödinger and Dirac shared the 1933 Nobel prize.

6.1.4.2. Plane-wave solutions#

For a free particle, we found in Section 2.2.2 that the solution is a simple plane wave. This solution generalizes to the Dirac equation, with a bispinor that is a function of the position four-vector \(\bm{\bar{x}}\), and a wave described by a wave four-vector[8] \(\bm{\bar{k}}\):

(6.28)#\[ \psi(\bm{\bar{x}}) = \exp\left( - i \bm{\bar{k}} \cdot \bm{\bar{x}} \right) u(\bm{\bar{k}}). \]

To find \(u(\bm{\bar{k}})\), we substitute the solution (6.28) into the Dirac equation (6.22), which gives

(6.29)#\[\begin{split}\begin{align*} \hbar \gamma^\mu k_\mu e^{-i \bm{\bar{k}} \cdot \bm{\bar{x}}} u - m c e^{-i \bm{\bar{k}} \cdot \bm{\bar{x}}} u &= 0 \\ \left( \hbar \gamma^\mu k_\mu - m c \right) u &= 0. \end{align*}\end{split}\]

Note that in equation (6.29), the first term is a matrix, so to add it to the second term, we must multiply the second term with the identity matrix. Writing out the sum \(\gamma^\mu k_\mu\), we get

(6.30)#\[\begin{split} \gamma^\mu k_\mu = \gamma^0 k_0 - \bm{\gamma} \cdot \bm{k} = k_0 \begin{pmatrix} I_2 & 0_2 \\ 0_2 & I_2 \end{pmatrix} - \vec{k} \cdot \begin{pmatrix} 0_2 & \bm{\sigma} \\ -\bm{\sigma} & 0_2 \end{pmatrix} = \begin{pmatrix} k_0 I_2 & - \bm{k} \cdot \bm{\sigma} \\ \bm{k} \cdot \bm{\sigma} & - k_0 I_2 \end{pmatrix}, \end{split}\]

and therefore

(6.31)#\[\begin{split}\begin{align*} 0 = \left( \hbar \gamma^\mu k_\mu - m c \right) u &= \begin{pmatrix} \hbar k_0 - mc & - \hbar \bm{k} \cdot \bm{\sigma} \\ \hbar \bm{k} \cdot \bm{\sigma} & - \hbar k_0 - mc \end{pmatrix} \begin{pmatrix} u_A \\ u_B \end{pmatrix} \\ &= \begin{pmatrix} (\hbar k_0 - m c) u_A - \hbar \bm{k} \cdot \bm{\sigma} u_B \\ -(\hbar k_0 + m c) u_B + \hbar \bm{k} \cdot \bm{\sigma} u_A \end{pmatrix}, \end{align*}\end{split}\]

from which we get (using that \((\bm{k} \cdot \bm{\sigma})^2 = \bm{k}^2 I_2\)):

(6.32)#\[\begin{split}\begin{align*} \left((\hbar k_0)^2 - (mc)^2\right) u_A &= (\hbar \bm{k} \cdot \bm{\sigma})^2 u_A = \hbar^2 \bm{k}^2 u_A, \\ \bm{\bar{k}} \cdot \bm{\bar{k}} &= k_0^2 - \bm{k} \cdot \bm{k} = \left(\frac{mc}{\hbar}\right)^2. \end{align*}\end{split}\]

Unsurprisingly, we retrieve the de Broglie relation for the energy (1.1) and momentum (1.2), now combined into a single four-momentum[9]:

(6.33)#\[ k^\mu = \pm \frac{p^\mu}{\hbar}. \]

Like for the stationary particle, we get two solutions, one with positive and one with negative energy, representing a particle and its antiparticle. Likewise, the bispinor again splits into two pieces, with the first two entries setting the spinor of the particle, and the second two those of the antiparticle, i.e., our solution set is given by

(6.34)#\[\begin{split} u = \begin{pmatrix} u_A^{(1)} \\ u_A^{(2)} \\ u_B^{(1)} \\ u_B^{(2)} \end{pmatrix}. \end{split}\]

As in nonrelativistic quantum mechanics, our solution contains a free scalar factor, which means that we can normalize it. There are multiple possible choices for the normalization. The simplest would be to set \(u^\dagger u = 1\), but you may also encounter \(u^\dagger u = 2E/c\) or \(u^\dagger u = E/mc^2\). Naturally the physics won’t depend on the choice of normalization.

Another notation convention that is often used is to set \(u_A = u\) and \(u_B = v\); in this case, \(u\) and \(v\) are the spinors representing the particle and antiparticle, satisfying the momentum-space Dirac equation, respectively

(6.35)#\[ \left( \gamma^\mu p_\mu - mc \right) u = 0 \]

for the particles and

(6.36)#\[ \left( \gamma^\mu p_\mu + mc \right) v = 0 \]

for the antiparticles.

6.1.5. Electromagnetic field quantization: photons#

Light is an electromagnetic wave, and as such, a specific solution to the Maxwell equations for electrodynamics (cf problem). Just like we obtained the Klein-Gordon and Dirac equations by ‘quantizing’ the energy-momentum equation (6.4) of special relativity, we can get a quantized description of light by looking at the Maxwell equations through a quantum-mechanical lens. First however, we will rewrite the equations in relativistic form; because the equations are already compatible with special relativity, this really is only a different way of writing them down. In classical form, there are four Maxwell equations:

(6.37)#\[\begin{split}\begin{align*} \bm{\nabla} \cdot \bm{E} &= \frac{\rho}{\varepsilon_0}, \\ \bm{\nabla} \times \bm{E} + \frac{\partial \bm{B}}{\partial t} &=0, \\ \bm{\nabla} \cdot \bm{B} &= 0, \\ \bm{\nabla} \times \bm{B} - \mu_0 \varepsilon_0 \frac{\partial \bm{E}}{\partial t} &= \mu_0 \bm{J} , \end{align*}\end{split}\]

where \(\bm{E}\) and \(\bm{B}\) are the electric and magnetic fields, \(\rho\) is the electric charge density, \(\bm{J}\) the current density, and \(\varepsilon_0\) and \(\mu_0\) are the permittivity and permeability of free space, satisfying \(\mu_0 \varepsilon_0 = 1/c^2\).

To cast the Maxwell equations into relativistic form, we need to convert our fields, currents and densities to four-vectors. For the charge and current density that is easy, as together they from a four vector:

(6.38)#\[\begin{split} \bm{\bar{J}} = \begin{pmatrix} c\rho \\ J_x \\ J_y \\ J_z \end{pmatrix}. \end{split}\]

The electric and magnetic fields have less easy counterparts. Rather than extending them with a zeroth component, we need to combine them into an antisymmetric two-tensor, known as the field strength tensor \(\bm{\bar{F}}\), with components:

(6.39)#\[\begin{split} \bm{\bar{F}} = \begin{pmatrix} 0 & -E_x/c & -E_y/c & -E_z/c \\ E_x/c & 0 & -B_z & B_y \\ E_y/c & B_z & 0 & -B_x \\ E_z/c & -B_y & B_x & 0 \end{pmatrix}. \end{split}\]

Next to the field strength tensor, there is another way of combining the fields into an antisymmetric tensor, known as the dual (field strength) tensor \(\bm{\bar{G}}\):

(6.40)#\[\begin{split} \bm{\bar{G}} = \begin{pmatrix} 0 & -B_x & -B_y & -B_z \\ B_x & 0 & E_z/c & -E_y/c \\ B_y & -E_z/c & 0 & E_x/c \\ B_z & E_y/c & -E_x/c & 0 \end{pmatrix}. \end{split}\]

The two homogeneous Maxwell equations (6.37)b and (6.37)c can together be written very succinctly in terms of the tensor \(\bm{\bar{G}}\):

(6.41)#\[ \partial_\mu G^{\mu\nu} = 0. \]

With the current density four vector and the field strength tensor, we can likewise rewrite the two inhomogeneous Maxwell equations (6.37)a and (6.37)d as a single equation:

(6.42)#\[ \partial_\mu F^{\mu\nu} = \mu_0 J^\nu. \]

From the antisymmetry of \(F^{\mu\nu}\), we also get the continuity equation, likewise in succinct form (see Exercise 6.4):

(6.43)#\[ \partial_\nu J^\nu = 0. \]

As you know, in electromagnetism both the electric and the magnetic field can be written in terms of a potential, respectively a scalar potential \(V\) and a vector potential \(\bm{A}\) (see Exercise 6.5). Unsurprisingly, they too combine into a four-vector potential:

(6.44)#\[\begin{split} \bm{\bar{A}} = \begin{pmatrix} V \\ A_x \\ A_y \\ A_z \end{pmatrix}. \end{split}\]

The field strength tensor \(\bm{\bar{F}}\) can be written in terms of the potential as

(6.45)#\[ F^{\mu\nu} = \partial^\mu A^\nu - \partial^\nu A^\mu. \]

The advantage of introducing the potential is that the homogeneous Maxwell equations (6.41) are always satisfied (see Exercise 6.5). The downside is that the potential contains an undetermined part, just like the scalar and vector potentials do in classical electrodynamics. It is easy to see that this the case: simply add the four-divergence of any scalar function (i.e., a term \(\partial_\mu \lambda\), with \(\lambda\) an arbitrary scalar function) to the four-vector \(\bm{\bar{A}}\), and equation (6.45) remains true. A change in the potential which does not affect the fields is known as a gauge transformation. We can exploit the additional degree of freedom to make a convenient choice for the potential; physicists call this step a gauge choice. A commonly used one in relativistic electrodynamics is the Lorentz gauge:

(6.46)#\[ \partial_\mu A^\mu = 0. \]

In the Lorentz gauge, the remaining (inhomogeneous) Maxwell equations (6.42) become

(6.47)#\[ \square A^\mu = \partial^\mu \partial_\mu A^\mu = \mu_0 J^\mu. \]

While the Lorentz gauge eliminated part of the gauge freedom, from equation (6.47) it is clear that we can still change \(\bm{\bar{A}}\) without affecting the physical current density \(\bm{\bar{J}}\): if we have a scalar function \(\phi\) that satisfies the wave equation \(\square \phi = 0\), adding \(\phi\) to any component of the potential \(\bm{\bar{A}}\) changes nothing. We can remove this additional degree of freedom by specifically picking one of the components of \(\bm{\bar{A}}\). The downside of doing so is that we essentially select a fixed coordinate frame to be our inertial frame of reference; we therefore remove some of the inherent underlying symmetry. In practice however, there is always an ‘observer frame’, and we can work from there. The specific choice we will take here is known as the Coulomb gauge, which simply sets

(6.48)#\[ A^0 = 0. \]

The photon is simply the quantization (the particle representation) of the electromagnetic field. In particular, it satisfies the wave equation (6.47). In vacuum, this equation reduces to the Klein-Gordon equation for a massless particle:

(6.49)#\[ \square A^\mu = 0. \]

The solutions to this vacuum wave equation are plane waves again:

(6.50)#\[ A^\mu(\bm{\bar{x}}) = \exp(-i \bm{\bar{k}} \cdot \bm{\bar{x}}) \varepsilon^\mu(\bm{\bar{k}}), \]

where \(\bm{\bar{k}}\) is the wave four-vector, and \(\bm{\bar{\varepsilon}}\) the polarization vector, which tells us the polarization (i.e., the spin state) of the photon. Substituting the plane-wave solution into the wave equation (6.49), we find a constraint on the wave vector:

(6.51)#\[ 0 = k^\mu k_\mu, \]

so the wave vector has length zero, and hence it satisfies the relation \(\omega = ck\), as it should for any electromagnetic wave. Moreover, as \(\bm{\bar{p}} = \hbar \bm{\bar{k}}\), we also have \(p^\mu p_\mu = 0\), so the particle has mass \(0\), again as it should. Finally, the gauge choice puts some constraints on the polarization vector \(\bm{\bar{\varepsilon}}\). The Lorentz gauge gives

(6.52)#\[ k^\mu \varepsilon_\mu = 0, \]

giving us a linear relation between the four components of \(\bm{\bar{\varepsilon}}\). The Coulomb gauge moreover gives

(6.53)#\[ \varepsilon^0 = 0, \]

which removes one component from the polarization vector, and reduces equation (6.52) to a three-vector condition:

(6.54)#\[ \bm{k} \cdot \bm{\varepsilon} = 0. \]

By equation (6.54), the polarization three-vector \(\bm{\varepsilon}\) is always perpendicular to the photon’s direction of propagation (along \(\bm{k}\)); photons are thus transversely polarized. The two constraints combined leave us two degrees of freedom: these of course are the familiar horizontal and vertical polarization, or equivalently, left- and right-handed circular polarization of light. Although photons have spin 1, they thus only have two spin states, due to the constraint that their polarization has to be perpendicular to their direction of propagation; this constraint ultimately comes from the fact that they have no mass. This result holds more generally: massless particles with spin \(s\) have only two magnetic spin states, \(m_s = \pm s\), unlike their massive counterparts, which have \(2s + 1\) possible spin states[10]

6.2. Quantum electrodynamics#

As we discussed in Section 6.1.4 above, we can interpret the negative-energy solutions to the Dirac equation as positive-energy antiparticles. Each spin-½ particle then has a corresponding antiparticle, with identical mass and opposite charge. The antiparticle of the electron is the positron.

Electrodynamics is the study of the motion and interactions of charged particles. While many charged particles exist, the electrons and protons are by far the most prevalent. Protons are about \(2 000\) times heavier than electrons, and in matter are bound in nuclei, which (in a conducting solid) are fixed in a lattice; electrodynamics is therefore essentially the study of the motion and interaction of electrons.

In classical electrodynamics, the interactions between electrons are described by forces and fields. In quantum field theory, of which quantum electrodynamics is the simplest example, we quantize the fields, just like we quantized electromagnetic waves as photons above. Interactions between material particles (which are all fermions) are then mediated by force-carrying gauge bosons. In quantum electrodynamics (or QED), the only material particles are the electron and the positron; the force-carrying gauge boson is the photon.

Quantum field theories are used to calculate what happens when particles interact. There are two important processes that involve such interactions: scattering (when particles collide, or come close enough together that they interact with each other, for example through Coulomb repulsion) and decay (of atomic nuclei, or other composite particles). In QED, we only have scattering, as there are no composite particles that consist of only electrons and positrons.

6.2.1. Symmetry#

Like all physics theories, symmetries play an important role in quantum field theories. We will need two of them to understand the basics of field theory calculations.

6.2.1.1. CPT symmetry#

The charge, parity and time (CPT) symmetry states that if we flip the sign of all charges in a system, mirror it in space, and reverse the direction of time, the behavior of the resulting system is exactly the same as that of the original[11]. The parity transformation can be either the flip of one spatial direction, or (in three dimensions), of all three at the same time; in both cases, a right handed circular motion becomes a left handed one. Combined with time reversal, we could consider a PT transformation also as a flip in all four components of spacetime. Specifically in QED, the uncharged photon should not be affected if we change both parity and time, and indeed it is not: the parity transformation changes a right handed polarization into a left handed one, but traveling in the opposite time direction (i.e., looking ‘down’ instead of ‘up’ the time axis), we are back with a right handed polarization.

The CPT symmetry has an important implication (the Feynman-Stückelberg interpretation): we can interpret antiparticles as regular particles traveling backwards in time. Consequently, in QED we would have only two particles: electrons (traveling either forward or backward in time) and photons. Of course, we do not see electrons moving backwards in time; we see positrons moving forward, but mathematically, their behavior is identical.

6.2.1.2. Crossing symmetry#

The crossing symmetry states that if a reaction of the form \(\text{A} + \text{B} \to \text{C} + \text{D}\) is known to occur (e.g., two electrons colliding with each other, in which case all four letters would correspond to an electron, but with different energies and momenta), then any of these particles can ‘cross over’ to the other side of the reaction, if it is replaced by its antiparticle. Therefore, denoting the antiparticle with an overline, we have the following reactions which all exhibit the same crossing symmetry:

\[\begin{split}\begin{align*} \text{A} + \text{B} &\to \text{C} + \text{D}, \\ \text{A} &\to \overline{\text{B}} + \text{C} + \text{D}, \\ \text{A} + \overline{\text{C}} &\to \overline{\text{B}} + \text{D}, \\ \overline{\text{C}} + \overline{\text{D}} & \to \overline{\text{A}} + \overline{\text{B}}, \end{align*}\end{split}\]

and so on. An explicit example is that if scattering between two electrons is possible, an electron and a positron can also scatter.

6.2.2. Feynman diagrams#

As stated above, the objective of QED (and all quantum field theories) is to calculate what happens if particles interact. Given that we are dealing with quantum-mechanical particles, we can only calculate probabilities, e.g. the probability that two scattering electrons will emerge with certain momenta given their incoming momenta. These probabilities are usually referred to as the amplitudes of their respective processes. The calculation of these amplitudes are in general tremendously complicated. Luckily, they can be conceptually simplified by using a visual series expansion scheme due to Feynman, known as the Feynman diagrams. The idea is that the interaction is built up from elementary processes involving the material (fermionic) and force-carrying (bosonic) particles in the theory. In quantum electrodynamics, exploiting the symmetries indicated above, there is only one type of elementary process, which can be drawn as a simple diagram (Fig. 6.1):

../_images/QEDelementaryprocess.svg

Fig. 6.1 The elementary interaction in quantum electrodynamics: a material particle emitting or absorbing a force-carrying photon. The arrow in the fermionic (matter) particle reflects the traveling direction of the particle; by the CPT symmetry, we can mirror in time, which results in the arrow pointing in the opposite direction, if we also swap the charge; there is therefore an equivalent version where the electron is replaced by a positron.#

By conservation of energy-momentum, the elementary process cannot occur in isolation. It is not hard to see why: by the crossing symmetry, an equivalent version would be the collision of an electron and a positron, resulting in their annihilation and the production of a single photon. We could then always make a transformation to the center-of-momentum frame of the electron-positron system before the collision, meaning that in that system the net momentum is zero. After the collision however, we would be left with a single photon, traveling at the speed of light, which cannot have zero momentum. Therefore, we always need at least two elementary interaction steps in any process.

The simplest example of a process containing two elementary interactions is the lowest-order contribution to the scattering between two electrons, known as Møller scattering (Fig. 6.2):

../_images/QEDMollerscattering.svg

Fig. 6.2 Feynman diagram for the lowest-order contribution to the Møller scattering of two electrons.#

You may have noticed that we’ve drawn the line representing the photon to go straight up in the elementary process of Fig. 6.1 and the diagram for the Møller scattering in Fig. 6.2. That is not by accident: any intermediate particles that participate in a Feynman diagram are known as virtual particles, which are inherently undetectable. At every vertex, energy and momentum must be conserved, but we cannot distinguish between the processes ‘the top electron emits a photon that is absorbed by the bottom one’ and ‘the bottom electron emits a photon that is absorbed by the top one’, which is why we can combine them into a single diagram.

By the crossing symmetry, we can draw a diagram very similar to the one of Fig. 6.2 for the scattering between an electron and a positron (known as Bhabha scattering). However, by the CPT symmetry, there is a second process that also describes Bhabha scattering, and also has two elementary interactions. As we cannot know what is actually happening, we have to account for both possibilities when doing calculations about Bhabha scattering probabilities (Fig. 6.3):

../_images/QEDBhabhascattering.svg

Fig. 6.3 Feynman diagrams for the two lowest-order contribution to the Bhabha scattering of an electron and a positron.#

Further application of the symmetries give us the lowest-order diagrams for three more processes (all described by a single diagram): annihilation of an electron-positron pair, creation of such a pair, and the Compton scattering between an electron and a photon (Fig. 6.4):

../_images/QEDcreationannihilationCompton.svg

Fig. 6.4 Feynman diagrams for the lowest-order contribution to electron-positron annihilation and creation, and the Compton scattering between an electron and a photon. Note that in every vertex, the direction of the arrow of the material (fermionic) particle is continuous.#

Fig. 6.2, Fig. 6.3 and Fig. 6.4 all contain diagrams with two vertices. They represent the lowest-order contributions to the processes they describe. However, there are more contributions: within each diagram, we can add additional vertices, giving additional diagrams that contribute to the process (or, more precisely, contribute to the calculation of the probability that the process occurs). For example, to the diagram describing Møller scattering in Fig. 6.2, we can add a spontaneous generation and annihilation of an electron-positron pair in the trajectory of the photon, an additional interaction between the electron and itself, or a second photon connecting the two electrons, as sketched in Fig. 6.5.

../_images/QEDMollerscatteringsecondorder.svg

Fig. 6.5 The three qualitatively diffetrent Feynman diagrams of the second-order contribution to the Møller scattering of an electron and a positron.#

To calculate the total amplitude of Møller scattering to second order, we need to add all contributions, like we already have to do for the two lowest-order contributions for Bhabha scattering. Moreover, our process does not end with second-order contributions. There are also contributions at third-order (with six vertices), fourth order (eight vertices), fifth order (ten vertices), and so on. Taken together, any process is therefore the sum of infinitely many diagrams, as often indicated in the general diagram shown in Fig. 6.6.

../_images/QEDgeneraldiagram.svg

Fig. 6.6 General Feynman diagram of the interaction between two particles.#

Luckily, although the number of diagrams with more vertices goes up rapidly, their contributions to the amplitude of the process we are calculating decreases even faster. Every vertex add a factor with magnitude \(\alpha = e^2 / \hbar c \approx 1/137\) (the fine-structure constant), and because in every next order we add two vertices, the contributions of the second-order diagrams are already a factor \(10^4\) smaller than those of the first order. Nonetheless, they can be calculated, and in some cases experimentally verified with several orders of precision. Higher-order calculations have also been essential in narrowing down the potential masses of known but previously undetected particles, resulting in their discovery in high energy supercolliders including the Tevatron at Fermilab and the Large Hadron Collider at CERN.

6.3. Problems#