6. Relativistic quantum mechanics#

So far, our quantum mechanical descriptions have been of massive particles with velocities much lower than the speed of light. This formalism gets us a long way: we can use it to describe all of chemistry, and, by extension, it has many applications in biology. Nonrelativistic quantum mechanics also gives us basic physics insights, like in the emission spectra of atoms, and engineering applications including lasers, NMR, qubits, and quantum computers. However, it intrinsically cannot give us a description of light, while light does have an innate quantum nature, and it was the quantum nature of light that triggered the quantum revolution. Because light, inevitably, travels at the speed of light, we will need to include relativistic effects in our theory if we want it to describe light (and all processes where light interacts with matter), not just as a correction like in our discussion of the fine structure of hydrogen in Section 4.3, but at the basis of our theory. In this theory, we will combine quantum mechanics with the special theory of relativity. Its ultimate form, quantum chromodynamics (QCD), is very powerful and extremely accurate, combining three of the four known fundamental forces. It is however not complete: gravity isn’t part of the theory, and at present nobody knows how to integrate QCD with the general theory of relativity.

6.1. The central equation of relativistic quantum mechanics#

6.1.1. The Klein-Gordon equation#

A first attempt at constructing a base equation for relativistic quantum mechanics could be to ‘quantize’ the special theory of relativity. This attempt is based on the observation that the Schrödinger equation, after a fashion, can be seen as the ‘quantization’ of the classical equation for conservation of energy:

(6.1)#\[ E = K + V = \frac{p^2}{2m} + V. \]

To turn equation (6.1) into a quantum one, we apply the same procedure we used to arrive at quantum-mechanical analogs of the angular momentum (see Section 3.1): we replace the momentum and energy with quantum operators:

(6.2)#\[ \bm{p} \to i \hbar \bm{\nabla} \qquad \text{and} \qquad E \to i \hbar \frac{\partial}{\partial t}. \]

We also replace the potential \(V\) with the potential energy operator \(\hat{V}\). If we make these substitutions in equation (6.1) and then have both sides act on a wave function \(\Psi(\bm{x}, t)\), we indeed arrive at the Schrödinger equation:

(6.3)#\[ i \hbar \frac{\partial \Psi}{\partial t} = - \frac{\hbar^2}{2m} \nabla^2 \Psi + \hat{V} \Psi. \]

Note that this procedure does not give us a true ‘derivation’ of the Schrödinger equation (we still need Axiom 1.2), as the ‘quantization’ recipe in equation (6.2) follows from the Schrödinger equation. However, it does give us an idea about how we could extend quantum mechanics to relativistic systems, as we know that in relativity, the energy equation gets an extra term (see equation (4.28)):

(6.4)#\[ E^2 = m^2 c^4 + p^2 c^2. \]

Giving equation (6.4) the same ‘quantization treatment’ as equation (6.1), we arrive at

(6.5)#\[ -\frac{1}{c^2} \frac{\partial^2 \psi}{\partial t^2} + \nabla^2 \psi = \frac{m^2 c^2}{\hbar^2} \psi. \]

Equation (6.5) is known as the Klein-Gordon equation. Unlike the Schrödinger equation, it is a proper wave equation, and it is the correct relativistic quantum equation for spin-\(0\) particles. Unfortunately however, the particles of interest all have nonzero spin, and we need a more general form.

6.1.2. Four-vectors#

In equation (6.4), the momentum \(p\) is the ‘three-momentum’, the (length of) the classical momentum vector \(\bm{p}\). In relativity theory, we work with four-vectors, which describe relativistic quantities in the four dimensions of spacetime, reflecting the relativistic notion that time is no longer ‘just’ a parameter, but a dimension, and transformations from one (inertial) frame of reference to another affect both the spatial and temporal coordinates. In short[1], four-vectors get an extra (‘zeroth’) component, which for the position vector represents the time, and for the momentum vector the energy (times factors of \(c\), the speed of light, which is a universal constant):

(6.6)#\[\begin{split} \bm{\bar{x}} = \begin{pmatrix} ct \\ x \\ y \\ z \end{pmatrix} = \begin{pmatrix} x^0 \\ x^1 \\ x^2 \\ x^3 \end{pmatrix} \qquad \text{and} \qquad \bm{\bar{p}} = \begin{pmatrix} E/c \\ p_x \\ p_y \\ p_z \end{pmatrix} = \begin{pmatrix} p^0 \\ p^1 \\ p^2 \\ p^3 \end{pmatrix}. \end{split}\]

When making a transformation from one inertial frame to another (e.g. from that of an observer on a platform, to that of an observer on a train moving at constant velocity), the coordinates change according to the Lorentz transformations, \(\bm{\bar{x}}' = \bm{L} \bm{\bar{x}}\), which can be expressed in matrix form:

(6.7)#\[\begin{split} \begin{pmatrix} x'^0 \\ x'^1 \\ x'^2 \\ x'^3 \end{pmatrix} = \begin{pmatrix} \gamma(u) & - \gamma(u) \frac{u}{c} & 0 & 0 \\ - \gamma(u) \frac{u}{c} & \gamma(u) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x^0 \\ x^1 \\ x^2 \\ x^3 \end{pmatrix} \end{split}\]

for a transformation between a stationary frame \(S\) with coordinates \(\bm{x}\) and a frame \(S'\) with coordinates \(\bm{x}'\) moving in the positive \(x\) direction with speed \(u\) with respect to frame \(S\). Here \(\gamma(u)\) is the contraction factor from special relativity,

(6.8)#\[ \gamma(u) = \frac{1}{\sqrt{1-(u/v)^2}}. \]

Relativistic four-vectors form a space which is close but not equal to \(\mathbb{R}^4\), as they have an inner product[2] which is defined[3] differently[4]:

(6.9)#\[ \bm{\bar{x}} \cdot \bm{\bar{y}} = x^0 y^0 - x^1 y^1 - x^2 y^2 - x^3 y^3. \]

An easy calculation shows that the ‘length’ of a four-vector (the quantity \(\bm{\bar{x}} \cdot \bm{\bar{x}}\)), and by extension, the inner product between any two four-vectors, is invariant under Lorentz transformations.

To distinguish between three-vectors and four-vectors, components of three-vectors are indicated with Roman indices like \(p_i\), while those of four-vectors are indicated with Greek ones, like \(p^\mu\). The upper index represents a (standard) column-vector like configuration (known as the contravariant components of the vector). We also have a version with a lower index, corresponding to a row-vector like configuration (known as the covariant components); the inner product can be written as \(\bm{\bar{x}} \cdot \bm{\bar{y}} = x_\mu y^\mu = x^\mu y_\mu\), where the sum over \(\mu\) (ranging from \(0\) to \(3\)) is implicit[5]. Within special relativity, the covariant components of a vector are the same as the contravariant ones, except for a minus sign on the space components: \(x_0 = x^0\), but \(x_i = - x^i\). We can summarize these relations using the metric tensor, which for special relativity is usually written as \(\eta^{\mu \nu}\) (and its inverse, \(\eta_{\mu \nu}\)), to distinguish from the general relativity version \(g^{\mu \nu}\). Using the Einstein summation convention, the metric tensor is defined through

(6.10)#\[ \bm{\bar{x}} \cdot \bm{\bar{y}} = x^\mu y_\mu = x_\mu y^\mu = \eta_{\mu \nu} x^\mu y^\nu = \eta^{\mu \nu} x_\mu y_\nu, \]

from which we can read off that

(6.11)#\[ \eta^{00} = 1, \quad \eta^{11} = \eta^{22} = \eta^{33} = -1, \quad \eta^{\mu \nu} = 0 \;\;\text{if}\;\; \mu \neq \nu. \]

The coefficients of \(\eta_{\mu \nu}\) are the same as those of \(\eta^{\mu \nu}\).

Derivatives can be taken with respect to any of the four components of the position (or time-position) vector; in short-hand notation, we have

(6.12)#\[\begin{align*} \partial_\mu f &= \frac{\partial f}{\partial x^\mu} \end{align*}\]
(6.13)#\[\begin{align*} \square f = \partial^\mu \partial_\mu f &= \frac{1}{c^2} \frac{\partial^2 f}{\partial t} - \nabla^2 f. \end{align*}\]

Equation (6.12) thus generalizes the partial derivative, and (6.13) the Laplacian; the operator \(\square\) is known as the d’Alembertian.

In terms of four-vectors, we can re-write equation (6.4) in (even) more concise form:

(6.14)#\[ \bm{\bar{p}} \cdot \bm{\bar{p}} = m^2 c^2. \]

If we now apply the ‘quantization recipe’ of equation (6.2) to our four-vectors, we get

(6.15)#\[\begin{split}\begin{align*} p_\mu &\to i \hbar \partial_\mu = i \hbar \frac{\partial}{\partial x^\mu}\\ p_0 &\to i \hbar \partial_0 = \frac{i\hbar}{c} \frac{\partial}{\partial t} \qquad \text{and} \qquad p_i \to i \hbar \partial_i = i \hbar \frac{\partial}{\partial x^i}. \end{align*}\end{split}\]

Unsurprisingly, just substituting the ‘quantization’ of the four-momentum in equation (6.14) and have it act on a wave function again gives us the Klein-Gordon equation, albeit in more concise form:

(6.16)#\[ -\hbar^2 \square \psi = -\hbar^2 \partial^\mu \partial_\mu \psi = m^2 c^2 \psi. \]

However, you might now guess where things go wrong: rather than ‘applying’ \(\bm{\bar{p}} \cdot \bm{\bar{p}}\), which only gives us the magnitude of the four-momentum, we’ll want each individual component, and will therefore have to ‘factorize’ equation (6.14) to get a more detailed view.

6.1.3. The Dirac equation#

To motivate why we’d want to factorize equation (6.14), let’s consider the case of a stationary particle[6], for which the three-momentum is zero, and we only have one nonzero component of the four-momentum, \(p^0\), directly related to its energy. In that case, equation (6.14) simplifies to

(6.17)#\[ 0 = p^0 p_0 - m^2 c^2 = (p^0)^2 - m^2 c^2 = (p^0 + mc)(p^0 - mc), \]

where we used that \(p_0 = p^0\) (i.e., the zeroth component of the covariant and contravariant representations are identical) in special relativity. We find that we have two solutions: either \(p^0 = mc\) or \(p^0 = -mc\). As \(p^0\) is the energy of our particle, classically we’d dismiss the second solution, but as we’ll see below, we will in fact always get two solutions in relativistic quantum mechanics.

Unfortunately, the factorization for a moving particle is more involved, because with the extra components we get cross terms, and moreover the covariant and contravariant components are no longer identical. Introducing new components \(\beta^\nu\) and \(\gamma^\lambda\) (so four each, for \(\nu, \lambda = 0, 1, 2, 3\)), we can formally proceed with the factorization:

(6.18)#\[\begin{split}\begin{align*} 0 &= \bm{\bar{p}} \cdot \bm{\bar{p}} - m^2 c^2 = p^\mu p_\mu - m^2 c^2 \\ &= \left( \beta^\nu p_\nu + mc \right) \left( \gamma^\lambda p_\lambda - mc \right) \\ &= \beta^\nu \gamma^\lambda p_\nu p_\lambda - mc \left(\beta^\nu - \gamma^\nu \right) p_\nu - m^2 c^2 \end{align*}\end{split}\]

Note that the first term in the last line of (6.18) is a sum over sixteen terms, and the second a sum over four terms. The second term however should vanish, as in the original sum in the first line of (6.18) there are no linear terms in the momentum. Therefore, we have \(\beta^\nu = \gamma^\nu\), and we’re left with four unknowns, the coefficients \(\gamma^\nu\), which satisfy \(p^\mu p_\mu = \gamma^\nu \gamma^\lambda p_\nu p_\lambda\). By writing out the four terms on the left and sixteen terms on the right of this equation, we get

(6.19)#\[\begin{split}\begin{align*} &\left(\gamma^0\right)^2 = 1, \qquad \left(\gamma^1\right)^2 = \left(\gamma^2\right)^2 = \left(\gamma^3\right)^2 = -1, \\ &\gamma^\nu \gamma^\lambda + \gamma^\lambda \gamma^\nu = 0 \qquad \text{if}\; \nu \neq \lambda. \end{align*}\end{split}\]

We can summarize equations (6.19) using the anticommutator, \(\{a, b\} = ab + ba\), and the metric tensor \(\eta^{\mu \nu}\):

(6.20)#\[ \left\{ \gamma^\mu, \gamma^\nu \right\} = 2 \eta^{\mu \nu}. \]

There is no solution of equations (6.20) in terms of numbers. However, there are solutions in which the coefficients \(\gamma^\mu\) are matrices. The smallest solutions are \(4 \times 4\) matrices, which can be expressed in terms of the \(2 \times 2\) identity matrix \(I_2\), the \(2 \times 2\) Pauli spin matrices \(\sigma^i\) (equation (3.50)), and the \(2 \times 2\) zero matrices \(0_2\):

(6.21)#\[\begin{split} \gamma^0 = \begin{pmatrix} I_2 & 0_2 \\ 0_2 & I_2 \end{pmatrix}, \qquad \gamma^i = \begin{pmatrix} 0_2 & \sigma^i \\ -\sigma^i & 0_2 \end{pmatrix}. \end{split}\]

With these matrices as the coefficients, we can finally factorize equation (6.18), ‘quantize’ the momenta \(p_\mu\), and have them act on a quantum function \(\psi\), which gives us the Dirac equation:

(6.22)#\[ i \hbar \gamma^\mu \partial_\mu \psi - m c \psi = 0, \]

where

(6.23)#\[\begin{split} \psi = \begin{pmatrix} \psi_1 \\ \psi_2 \\ \psi_3 \\ \psi_4 \end{pmatrix} \end{split}\]

is known as the bispinor or Dirac spinor. Note that \(\psi\) is not a four-vector; it simply contains four pieces of information that can be cast in (regular) vector form.

6.1.4. Solutions to the Dirac equation#

To understand what the bispinor represents, it is helpful to look at a few simple solutions to the Dirac equation. These solutions are similar to the free particle solutions of the Schrödinger equation: they are solutions to the equation, but do not represent actual physical particles, as the solutions are not normalizable. Nonetheless, they give us some insight into the nature of the solutions, and like for the free particle, we can use them as a basis for solutions that are normalizable.

6.1.4.1. Particle at rest#

If a particle is at rest, its three-momentum \(\bm{p}\) is zero. In this case, the Dirac spinor is independent of position, i.e., all three spatial derivatives vanish:

(6.24)#\[ \frac{\partial \psi}{\partial x} = \frac{\partial \psi}{\partial y} = \frac{\partial \psi}{\partial z} = 0. \]

The Dirac equation now simplifies to a first-order differential equation in time:

(6.25)#\[ \frac{i \hbar}{c} \gamma^0 \frac{\partial \psi}{\partial t} - m c \psi = 0, \]

or

(6.26)#\[\begin{split} \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} \begin{pmatrix} \partial_t \psi_A \\ \partial_t \psi_B \end{pmatrix} = -i \frac{m c^2}{\hbar} \begin{pmatrix} \psi_A \\ \psi_B \end{pmatrix}, \end{split}\]

where \(\psi_A = \begin{pmatrix} \psi_1 \\ \psi_2 \end{pmatrix}\) and \(\psi_B = \begin{pmatrix} \psi_3 \\ \psi_4 \end{pmatrix}\) are two spinors as we’ve encountered them in Section 3.4. Because equation (6.26) separates into two equations for \(\psi_A\) and \(\psi_B\), we can solve them separately, and find

(6.27)#\[ \psi_A(t) = \psi_A(0) \exp\left(-\frac{i mc^2}{\hbar}t \right), \qquad \psi_B(t) = \psi_B(0) \exp\left(\frac{i mc^2}{\hbar}t \right). \]

These solutions are identical to those of the Schrödinger equation, for particles that have (rest) energy \(E = mc^2\) (for \(\psi_A\)) and \(E = - mc^2\) (for \(\psi_B\)), respectively.

The Dirac equation thus predicts that there are two solutions: one with rest energy \(m c^2\), as we would expect from the theory of special relativity, and a second one with rest energy \(- mc^2\). To interpret this result, Dirac pictured the vacuum not as empty, but as filled exactly up to \(E = 0\) with particles, similar to how a metal is filled up to its Fermi energy with electrons. From this ‘sea’ of electrons, we can ‘free’ one (giving it a higher energy, allowing it to move around), but that act also creates a ‘hole’ with an effective positive charge in the remaining material. The hole moreover also moves around. If the electron and hole come together again, they can annihilate each other, going back to the initial state. Likewise, for every particle with mass \(m\) we ‘create’ from the vacuum, we need to open up a ‘hole’ with mass \(-m\). We can observe the hole - not as a particle with negative mass, but as an antiparticle with identical mass (it behaves like a particle with mass \(m\)) but opposite charge to the ‘regular’ particle.

The existence of antiparticles is a direct consequence of the Dirac equation, as published by Dirac in 1928 [Dirac, 1928], and realized in full by Oppenheimer in 1930 [Oppenheimer, 1930]. The first antiparticles, ‘anti-electrons’ (now known as positrons) were discovered already in 1932 by Anderson [Anderson, 1932], earning him (half of) the 1936 Nobel prize in physics; Schrödinger and Dirac shared the 1933 Nobel prize.

6.1.4.2. Plane-wave solutions#

For a free particle, we found in Section 2.2.2 that the solution is a simple plane wave. This solution generalizes to the Dirac equation, with a bispinor that is a function of the position four-vector \(\bm{\bar{x}}\), and a wave described by a wave four-vector[7] \(\bm{\bar{k}}\):

(6.28)#\[ \psi(\bm{\bar{x}}) = \exp\left( - i \bm{\bar{k}} \cdot \bm{\bar{x}} \right) u(\bm{\bar{k}}). \]

To find \(u(\bm{\bar{k}})\), we substitute the solution (6.28) into the Dirac equation (6.22), which gives

(6.29)#\[\begin{split}\begin{align*} \hbar \gamma^\mu k_\mu e^{-i \bm{\bar{k}} \cdot \bm{\bar{x}}} u - m c e^{-i \bm{\bar{k}} \cdot \bm{\bar{x}}} u &= 0 \\ \left( \hbar \gamma^\mu k_\mu - m c \right) u &= 0. \end{align*}\end{split}\]

Note that in equation (6.29), the first term is a matrix, so to add it to the second term, we must multiply the second term with the identity matrix. Writing out the sum \(\gamma^\mu k_\mu\), we get

(6.30)#\[\begin{split} \gamma^\mu k_\mu = \gamma^0 k_0 - \bm{\gamma} \cdot \bm{k} = k_0 \begin{pmatrix} I_2 & 0_2 \\ 0_2 & I_2 \end{pmatrix} - \vec{k} \cdot \begin{pmatrix} 0_2 & \bm{\sigma} \\ -\bm{\sigma} & 0_2 \end{pmatrix} = \begin{pmatrix} k_0 I_2 & - \bm{k} \cdot \bm{\sigma} \\ - \bm{k} \cdot \bm{\sigma} & - k_0 I_2 \end{pmatrix}, \end{split}\]

and therefore

(6.31)#\[\begin{split} 0 = \left( \hbar \gamma^\mu k_\mu - m c \right) u = \begin{pmatrix} \hbar k_0 - mc & - \hbar \bm{k} \cdot \bm{\sigma} \\ \hbar \bm{k} \cdot \bm{\sigma} & - \hbar k_0 + mc \end{pmatrix} \begin{pmatrix} u_A \\ u_B \end{pmatrix} = \begin{pmatrix} (\hbar k_0 - m c) u_A - \hbar \bm{k} \cdot \bm{\sigma} u_B \\ -(\hbar k_0 - m c) u_B + \hbar \bm{k} \cdot \bm{\sigma} u_A \end{pmatrix}, \end{split}\]

from which we get (using that \((\bm{k} \cdot \bm{\sigma})^2 = \bm{k}^2 I_2\)):

(6.32)#\[\begin{split}\begin{align*} \left(\hbar k_0 - mc\right)^2 u_A &= (\hbar \bm{k} \cdot \bm{\sigma})^2 u_A = \hbar^2 \bm{k}^2 u_A, \\ \bm{\bar{k}} \cdot \bm{\bar{k}} &= k_0^2 - \bm{k} \cdot \bm{k} = \left(\frac{mc}{\hbar}\right)^2. \end{align*}\end{split}\]

Unsurprisingly, we retrieve the de Broglie relation for the energy (1.1) and momentum (1.2), now combined into a single four-momentum[8]:

(6.33)#\[ k^\mu = \pm \frac{p^\mu}{\hbar}. \]

Like for the stationary particle, we get two solutions, one with positive and one with negative energy, representing a particle and its antiparticle. Likewise, the bispinor again splits into two pieces, with the first two entries setting the spinor of the particle, and the second two those of the antiparticle, i.e., our solution set is given by

(6.34)#\[ \lbrace u^{(1)}, u^{(2)}, v^{(1)}, v^{(2)} \rbrace. \]

6.1.4.3. Photons#

Unlike the Schrödinger equation, the Dirac equation also has solutions for traveling massless particles. In the simplest case, these are again plane waves, as given by equation (6.28), so we also again obtain equation (6.32) for the wavevector \(\bm{\bar{k}}\), except that the right-hand side of the equation now equals \(0\). The massless particle thus has a four-wavevector with length zero, which means that it satisfies the relation \(\omega = c k\), which holds for electromagnetic waves (i.e., light, and all other parts of the electromagnetic spectrum). The massless particles described by the Dirac equation are therefore photons, the quantized components of light. The two solutions of the Dirac equation of a photon correspond to its two possible polarization states (either left- and right-handed circular polarization or horizontal and vertical linear polarization). The spin of a photon however is not \(\frac12\), but \(1\); the magnetic spin quantum numbers of a photon are \(\pm 1\), corresponding tot the two polarization states. To see why that is the case, we use the correspondence principle, which states that for a large number of particles, results from classical and quantum calculations must be identical. For example, we already identified the energy of a photon as \(E_\mathrm{photon} = \hbar \omega\); the energy of \(N\) photons in a box of volume \(V\) is then \(N \hbar \omega\), giving an energy density of \(N \hbar \omega / V\). From classical electrodynamics, we know that the energy density of an electric field of strength \(\bm{E}\) is given by \(|\bm{E}|^2 / 8 \pi\), which allows us to calculate the number of photons in the box as

(6.35)#\[ N = \frac{|\bm{E}|^2 V}{8 \pi \hbar \omega}. \]

Note that in equation (6.35), all factors on the right hand side are either constants or classically measurable, whereas the number of photons on the left hand side is a purely quantum mechanical concept. We can make similar calculations for the total momentum from the wave number (which gives us the familiar relation \(p = \hbar k\)), and the \(z\)-component of the angular momentum, which is given by

(6.36)#\[ L_z = \frac{|\bm{E}|^2}{8 \pi \omega} \left( |\psi_\mathrm{R}|^2 - |\psi_\mathrm{L}|^2 \right) = \frac{N \hbar}{V} \left( |\psi_\mathrm{R}|^2 - |\psi_\mathrm{L}|^2 \right), \]

where in the classical expression, we interpret \(|\psi_\mathrm{R}|^2\) and \(|\psi_\mathrm{L}|^2\) as the number of photons with right handed and left handed polarization respectively. For a single photon, these numbers become the probabilities that the photon has right handed or left handed polarization. The \(z\)-component of the angular momentum of the photon is then given by

(6.37)#\[ L_z = \hbar \left( |\psi_\mathrm{R}|^2 - |\psi_\mathrm{L}|^2 \right). \]

As always, performing a measurement forces the photon to collapse to a single polarized state; as this can be either left or right handed, the possible outcomes of a measurement of the \(z\)-component of the photon’s angular momentum are thus \(\pm \hbar\), giving us possible magnetic spin quantum numbers of \(\pm 1\). Photons are therefore bosons with spin quantum number \(1\).