6. Relativistic quantum mechanics#
So far, our quantum mechanical descriptions have been of massive particles with velocities much lower than the speed of light. This formalism gets us a long way: we can use it to describe all of chemistry, and, by extension, it has many applications in biology. Nonrelativistic quantum mechanics also gives us basic physics insights, like in the emission spectra of atoms, and engineering applications including lasers, NMR, qubits, and quantum computers. However, it intrinsically cannot give us a description of light, while light does have an innate quantum nature, and it was the quantum nature of light that triggered the quantum revolution. Because light, inevitably, travels at the speed of light, we will need to include relativistic effects in our theory if we want it to describe light (and all processes where light interacts with matter), not just as a correction like in our discussion of the fine structure of hydrogen in Section 4.3, but at the basis of our theory. In this theory, we will combine quantum mechanics with the special theory of relativity. Its ultimate form, quantum chromodynamics (QCD), is very powerful and extremely accurate, combining three of the four known fundamental forces. It is however not complete: gravity isn’t part of the theory, and at present nobody knows how to integrate QCD with the general theory of relativity.
6.1. The central equation of relativistic quantum mechanics#
6.1.1. The Klein-Gordon equation#
A first attempt at constructing a base equation for relativistic quantum mechanics could be to ‘quantize’ the special theory of relativity. This attempt is based on the observation that the Schrödinger equation, after a fashion, can be seen as the ‘quantization’ of the classical equation for conservation of energy:
To turn equation (6.1) into a quantum one, we apply the same procedure we used to arrive at quantum-mechanical analogs of the angular momentum (see Section 3.1): we replace the momentum and energy with quantum operators:
We also replace the potential \(V\) with the potential energy operator \(\hat{V}\). If we make these substitutions in equation (6.1) and then have both sides act on a wave function \(\Psi(\bm{x}, t)\), we indeed arrive at the Schrödinger equation:
Note that this procedure does not give us a true ‘derivation’ of the Schrödinger equation (we still need Axiom 1.2), as the ‘quantization’ recipe in equation (6.2) follows from the Schrödinger equation. However, it does give us an idea about how we could extend quantum mechanics to relativistic systems, as we know that in relativity, the energy equation gets an extra term (see equation (4.20)):
Giving equation (6.3) the same ‘quantization treatment’ as equation (6.1), we arrive at
Equation (6.4) is known as the Klein-Gordon equation. Unlike the Schrödinger equation, it is a proper wave equation, and it is the correct relativistic quantum equation for spin-\(0\) particles. Unfortunately however, the particles of interest all have nonzero spin, and we need a more general form.
6.1.2. Four-vectors#
In equation (6.3), the momentum \(p\) is the ‘three-momentum’, the (length of) the classical momentum vector \(\bm{p}\). In relativity theory, we work with four-vectors, which describe relativistic quantities in the four dimensions of spacetime, reflecting the relativistic notion that time is no longer ‘just’ a parameter, but a dimension, and transformations from one (inertial) frame of reference to another affect both the spatial and temporal coordinates. In short, four-vectors get an extra (‘zeroth’) component, which for the position vector represents the time, and for the momentum vector the energy (times factors of \(c\), the speed of light, which is a universal constant):
When making a transformation from one inertial frame to another (e.g. from that of an observer on a platform, to that of an observer on a train moving at constant velocity), the coordinates change according to the Lorentz transformations, \(\bm{\bar{x}}' = \bm{L} \bm{\bar{x}}\), which can be expressed in matrix form:
for a transformation between a stationary frame \(S\) with coordinates \(\bm{x}\) and a frame \(S'\) with coordinates \(\bm{x}'\) moving in the positive \(x\) direction with speed \(u\) with respect to frame \(S\). Here \(\gamma(u)\) is the contraction factor from special relativity,
Relativistic four-vectors form a space which is close but not equal to \(\mathbb{R}^4\), as they have an inner product[1] which is defined[2] differently[3]:
An easy calculation shows that the ‘length’ of a four-vector (the quantity \(\bm{\bar{x}} \cdot \bm{\bar{x}}\)), and by extension, the inner product between any two four-vectors, is invariant under Lorentz transformations.
To distinguish between three-vectors and four-vectors, components of three-vectors are indicated with Roman indices like \(p_i\), while those of four-vectors are indicated with Greek ones, like \(p^\mu\). The upper index represents a (standard) column-vector like configuration (known as the contravariant components of the vector). We also have a version with a lower index, corresponding to a row-vector like configuration (known as the covariant components); the inner product can be written as \(\bm{\bar{x}} \cdot \bm{\bar{y}} = x_\mu y^\mu = x^\mu y_\mu\), where the sum over \(\mu\) (ranging from \(0\) to \(3\)) is implicit[4]. Within special relativity, the covariant components of a vector are the same as the contravariant ones, except for a minus sign on the space components: \(x_0 = x^0\), but \(x_i = - x^i\). We can summarize these relations using the metric tensor, which for special relativity is usually written as \(\eta^{\mu \nu}\) (and its inverse, \(\eta_{\mu \nu}\)), to distinguish from the general relativity version \(g^{\mu \nu}\). Using the Einstein summation convention, the metric tensor is defined through
from which we can read off that
The coefficients of \(\eta_{\mu \nu}\) are the same as those of \(\eta^{\mu \nu}\).
Derivatives can be taken with respect to any of the four components of the position (or time-position) vector; in short-hand notation, we have
Equation (6.10) thus generalizes the partial derivative, and (6.11) the Laplacian; the operator \(\square\) is known as the d’Alembertian.
In terms of four-vectors, we can re-write equation (6.3) in (even) more concise form:
If we now apply the ‘quantization recipe’ of equation (6.2) to our four-vectors, we get
Unsurprisingly, just substituting the ‘quantization’ of the four-momentum in equation (6.12) and have it act on a wave function again gives us the Klein-Gordon equation, albeit in more concise form:
However, you might now guess where things go wrong: rather than ‘applying’ \(\bm{\bar{p}} \cdot \bm{\bar{p}}\), which only gives us the magnitude of the four-momentum, we’ll want each individual component, and will therefore have to ‘factorize’ equation (6.12) to get a more detailed view.
6.1.3. The Dirac equation#
To motivate why we’d want to factorize equation (6.12), let’s consider the case of a stationary particle[5], for which the three-momentum is zero, and we only have one nonzero component of the four-momentum, \(p^0\), directly related to its energy. In that case, equation (6.12) simplifies to
where we used that \(p_0 = p^0\) (i.e., the zeroth component of the covariant and contravariant representations are identical) in special relativity. We find that we have two solutions: either \(p^0 = mc\) or \(p^0 = -mc\). As \(p^0\) is the energy of our particle, classically we’d dismiss the second solution, but as we’ll see below, we will in fact always get two solutions in relativistic quantum mechanics.
Unfortunately, the factorization for a moving particle is more involved, because with the extra components we get cross terms, and moreover the covariant and contravariant components are no longer identical. Introducing new components \(\beta^\nu\) and \(\gamma^\lambda\) (so four each, for \(\nu, \lambda = 0, 1, 2, 3\)), we can formally proceed with the factorization:
Note that the first term in the last line of (6.15) is a sum over sixteen terms, and the second a sum over four terms. The second term however should vanish, as in the original sum in the first line of (6.15) there are no linear terms in the momentum. Therefore, we have \(\beta^\nu = \gamma^\nu\), and we’re left with four unknowns, the coefficients \(\gamma^\nu\), which satisfy \(p^\mu p_\mu = \gamma^\nu \gamma^\lambda p_\nu p_\lambda\). By writing out the four terms on the left and sixteen terms on the right of this equation, we get
We can summarize equations (6.16) using the anticommutator, \(\{a, b\} = ab + ba\), and the metric tensor \(\eta^{\mu \nu}\):
There is no solution of equations (6.17) in terms of numbers. However, there are solutions in which the coefficients \(\gamma^\mu\) are matrices. The smallest solutions are \(4 \times 4\) matrices, which can be expressed in terms of the \(2 \times 2\) identity matrix \(I_2\), the \(2 \times 2\) Pauli spin matrices \(\sigma^i\), and the \(2 \times 2\) zero matrices \(0_2\):
With these matrices as the coefficients, we can finally factorize equation (6.15), ‘quantize’ the momenta \(p_\mu\), and have them act on a quantum function \(\psi\), which gives us the Dirac equation:
where
is known as the bispinor or Dirac spinor. Note that \(\psi\) is not a four-vector; it simply contains four pieces of information that can be cast in (regular) vector form.