6. Relativistic quantum mechanics#
So far, our quantum mechanical descriptions have been of massive particles with velocities much lower than the speed of light. This formalism gets us a long way: we can use it to describe all of chemistry, and, by extension, it has many applications in biology. Nonrelativistic quantum mechanics also gives us basic physics insights, like in the emission spectra of atoms, and engineering applications including lasers, NMR, qubits, and quantum computers. However, it intrinsically cannot give us a description of light, while light does have an innate quantum nature, and it was the quantum nature of light that triggered the quantum revolution. Because light, inevitably, travels at the speed of light, we will need to include relativistic effects in our theory if we want it to describe light (and all processes where light interacts with matter), not just as a correction like in our discussion of the fine structure of hydrogen in Section 4.3, but at the basis of our theory. In this theory, we will combine quantum mechanics with the special theory of relativity. Its ultimate form, quantum chromodynamics (QCD), is very powerful and extremely accurate, combining three of the four known fundamental forces. It is however not complete: gravity isn’t part of the theory, and at present nobody knows how to integrate QCD with the general theory of relativity.
6.1. The central equation of relativistic quantum mechanics#
6.1.1. The Klein-Gordon equation#
A first attempt at constructing a base equation for relativistic quantum mechanics could be to ‘quantize’ the special theory of relativity. This attempt is based on the observation that the Schrödinger equation, after a fashion, can be seen as the ‘quantization’ of the classical equation for conservation of energy:
To turn equation (6.1) into a quantum one, we apply the same procedure we used to arrive at quantum-mechanical analogs of the angular momentum (see Section 3.1): we replace the momentum and energy with quantum operators:
We also replace the potential \(V\) with the potential energy operator \(\hat{V}\). If we make these substitutions in equation (6.1) and then have both sides act on a wave function \(\Psi(\bm{x}, t)\), we indeed arrive at the Schrödinger equation:
Note that this procedure does not give us a true ‘derivation’ of the Schrödinger equation (we still need Axiom 1.2), as the ‘quantization’ recipe in equation (6.2) follows from the Schrödinger equation. However, it does give us an idea about how we could extend quantum mechanics to relativistic systems, as we know that in relativity, the energy equation gets an extra term (see equation (4.28)):
Giving equation (6.4) the same ‘quantization treatment’ as equation (6.1), we arrive at
Equation (6.5) is known as the Klein-Gordon equation. Unlike the Schrödinger equation, it is a proper wave equation, and it is the correct relativistic quantum equation for spin-0 particles. Unfortunately however, the particles of interest all have nonzero spin, and we need a more general form.
6.1.2. Four-vectors#
In equation (6.4), the momentum \(p\) is the ‘three-momentum’, the (length of) the classical momentum vector \(\bm{p}\). In relativity theory, we work with four-vectors, which describe relativistic quantities in the four dimensions of spacetime, reflecting the relativistic notion that time is no longer ‘just’ a parameter, but a dimension, and transformations from one (inertial) frame of reference to another affect both the spatial and temporal coordinates. In short[1], four-vectors get an extra (‘zeroth’) component, which for the position vector represents the time, and for the momentum vector the energy (times factors of \(c\), the speed of light, which is a universal constant):
When making a transformation from one inertial frame to another (e.g. from that of an observer on a platform, to that of an observer on a train moving at constant velocity), the coordinates change according to the Lorentz transformations, \(\bm{\bar{x}}' = \bm{L} \bm{\bar{x}}\), which can be expressed in matrix form:
for a transformation between a stationary frame \(S\) with coordinates \(\bm{x}\) and a frame \(S'\) with coordinates \(\bm{x}'\) moving in the positive \(x\) direction with speed \(u\) with respect to frame \(S\). Here \(\gamma(u)\) is the contraction factor from special relativity,
Relativistic four-vectors form a space which is close but not equal to \(\mathbb{R}^4\), as they have an inner product[2] which is defined[3] differently[4]:
An easy calculation shows that the ‘length’ of a four-vector (the quantity \(\bm{\bar{x}} \cdot \bm{\bar{x}}\)), and by extension, the inner product between any two four-vectors, is invariant under Lorentz transformations.
To distinguish between three-vectors and four-vectors, components of three-vectors are indicated with Roman indices like \(p_i\), while those of four-vectors are indicated with Greek ones, like \(p^\mu\). The upper index represents a (standard) column-vector like configuration (known as the contravariant components of the vector). We also have a version with a lower index, corresponding to a row-vector like configuration (known as the covariant components); the inner product can be written as \(\bm{\bar{x}} \cdot \bm{\bar{y}} = x_\mu y^\mu = x^\mu y_\mu\), where the sum over \(\mu\) (ranging from \(0\) to \(3\)) is implicit[5]. Within special relativity, the covariant components of a vector are the same as the contravariant ones, except for a minus sign on the space components: \(x_0 = x^0\), but \(x_i = - x^i\). We can summarize these relations using the metric tensor, which for special relativity is usually written as \(\eta^{\mu \nu}\) (and its inverse, \(\eta_{\mu \nu}\)), to distinguish from the general relativity version \(g^{\mu \nu}\). Using the Einstein summation convention, the metric tensor is defined through
from which we can read off that
The coefficients of \(\eta_{\mu \nu}\) are the same as those of \(\eta^{\mu \nu}\).
Derivatives can be taken with respect to any of the four components of the position (or time-position) vector; in short-hand notation, we have[6]
Equation (6.12) thus generalizes the partial derivative, and (6.13) the Laplacian; the operator \(\square\) is known as the d’Alembertian.
In terms of four-vectors, we can re-write equation (6.4) in (even) more concise form:
If we now apply the ‘quantization recipe’ of equation (6.2) to our four-vectors, we get
Unsurprisingly, just substituting the ‘quantization’ of the four-momentum in equation (6.14) and have it act on a wave function again gives us the Klein-Gordon equation, albeit in more concise form:
However, you might now guess where things go wrong: rather than ‘applying’ \(\bm{\bar{p}} \cdot \bm{\bar{p}}\), which only gives us the magnitude of the four-momentum, we’ll want each individual component, and will therefore have to ‘factorize’ equation (6.14) to get a more detailed view.
6.1.3. The Dirac equation#
To motivate why we’d want to factorize equation (6.14), let’s consider the case of a stationary particle[7], for which the three-momentum is zero, and we only have one nonzero component of the four-momentum, \(p^0\), directly related to its energy. In that case, equation (6.14) simplifies to
where we used that \(p_0 = p^0\) (i.e., the zeroth component of the covariant and contravariant representations are identical) in special relativity. We find that we have two solutions: either \(p^0 = mc\) or \(p^0 = -mc\). As \(p^0\) is the energy of our particle, classically we’d dismiss the second solution, but as we’ll see below, we will in fact always get two solutions in relativistic quantum mechanics.
Unfortunately, the factorization for a moving particle is more involved, because with the extra components we get cross terms, and moreover the covariant and contravariant components are no longer identical. Introducing new components \(\beta^\nu\) and \(\gamma^\lambda\) (so four each, for \(\nu, \lambda = 0, 1, 2, 3\)), we can formally proceed with the factorization:
Note that the first term in the last line of (6.18) is a sum over sixteen terms, and the second a sum over four terms. The second term however should vanish, as in the original sum in the first line of (6.18) there are no linear terms in the momentum. Therefore, we have \(\beta^\nu = \gamma^\nu\), and we’re left with four unknowns, the coefficients \(\gamma^\nu\), which satisfy \(p^\mu p_\mu = \gamma^\nu \gamma^\lambda p_\nu p_\lambda\). By writing out the four terms on the left and sixteen terms on the right of this equation, we get
We can summarize equations (6.19) using the anticommutator, \(\{a, b\} = ab + ba\), and the metric tensor \(\eta^{\mu \nu}\):
There is no solution of equations (6.20) in terms of numbers. However, there are solutions in which the coefficients \(\gamma^\mu\) are matrices. The smallest solutions are \(4 \times 4\) matrices, which can be expressed in terms of the \(2 \times 2\) identity matrix \(I_2\), the \(2 \times 2\) Pauli spin matrices \(\sigma^i\) (equation (3.50)), and the \(2 \times 2\) zero matrices \(0_2\):
With these matrices as the coefficients, we can finally factorize equation (6.18), ‘quantize’ the momenta \(p_\mu\), and have them act on a quantum function \(\psi\), which gives us the Dirac equation:
where
is known as the bispinor or Dirac spinor. Note that \(\psi\) is not a four-vector; it simply contains four pieces of information that can be cast in (regular) vector form.
6.1.4. Solutions to the Dirac equation#
To understand what the bispinor represents, it is helpful to look at a few simple solutions to the Dirac equation. These solutions are similar to the free particle solutions of the Schrödinger equation: they are solutions to the equation, but do not represent actual physical particles, as the solutions are not normalizable. Nonetheless, they give us some insight into the nature of the solutions, and like for the free particle, we can use them as a basis for solutions that are normalizable.
6.1.4.1. Particle at rest#
If a particle is at rest, its three-momentum \(\bm{p}\) is zero. In this case, the Dirac spinor is independent of position, i.e., all three spatial derivatives vanish:
The Dirac equation now simplifies to a first-order differential equation in time:
or
where \(\psi_A = \begin{pmatrix} \psi_1 \\ \psi_2 \end{pmatrix}\) and \(\psi_B = \begin{pmatrix} \psi_3 \\ \psi_4 \end{pmatrix}\) are two spinors as we’ve encountered them in Section 3.4. Because equation (6.26) separates into two equations for \(\psi_A\) and \(\psi_B\), we can solve them separately, and find
These solutions are identical to those of the Schrödinger equation, for particles that have (rest) energy \(E = mc^2\) (for \(\psi_A\)) and \(E = - mc^2\) (for \(\psi_B\)), respectively.
The Dirac equation thus predicts that there are two solutions: one with rest energy \(m c^2\), as we would expect from the theory of special relativity, and a second one with rest energy \(- mc^2\). To interpret this result, Dirac pictured the vacuum not as empty, but as filled exactly up to \(E = 0\) with particles, similar to how a metal is filled up to its Fermi energy with electrons. From this ‘sea’ of electrons, we can ‘free’ one (giving it a higher energy, allowing it to move around), but that act also creates a ‘hole’ with an effective positive charge in the remaining material. The hole moreover also moves around. If the electron and hole come together again, they can annihilate each other, going back to the initial state. Likewise, for every particle with mass \(m\) we ‘create’ from the vacuum, we need to open up a ‘hole’ with mass \(-m\). We can observe the hole - not as a particle with negative mass, but as an antiparticle with identical mass (it behaves like a particle with mass \(m\)) but opposite charge to the ‘regular’ particle.
The existence of antiparticles is a direct consequence of the Dirac equation, as published by Dirac in 1928 [Dirac, 1928], and realized in full by Oppenheimer in 1930 [Oppenheimer, 1930]. The first antiparticles, ‘anti-electrons’ (now known as positrons) were discovered already in 1932 by Anderson [Anderson, 1932], earning him (half of) the 1936 Nobel prize in physics; Schrödinger and Dirac shared the 1933 Nobel prize.
6.1.4.2. Plane-wave solutions#
For a free particle, we found in Section 2.2.2 that the solution is a simple plane wave. This solution generalizes to the Dirac equation, with a bispinor that is a function of the position four-vector \(\bm{\bar{x}}\), and a wave described by a wave four-vector[8] \(\bm{\bar{k}}\):
To find \(u(\bm{\bar{k}})\), we substitute the solution (6.28) into the Dirac equation (6.22), which gives
Note that in equation (6.29), the first term is a matrix, so to add it to the second term, we must multiply the second term with the identity matrix. Writing out the sum \(\gamma^\mu k_\mu\), we get
and therefore
from which we get (using that \((\bm{k} \cdot \bm{\sigma})^2 = \bm{k}^2 I_2\)):
Unsurprisingly, we retrieve the de Broglie relation for the energy (1.1) and momentum (1.2), now combined into a single four-momentum[9]:
Like for the stationary particle, we get two solutions, one with positive and one with negative energy, representing a particle and its antiparticle. Likewise, the bispinor again splits into two pieces, with the first two entries setting the spinor of the particle, and the second two those of the antiparticle, i.e., our solution set is given by
As in nonrelativistic quantum mechanics, our solution contains a free scalar factor, which means that we can normalize it. There are multiple possible choices for the normalization. The simplest would be to set \(u^\dagger u = 1\), but you may also encounter \(u^\dagger u = 2E/c\) or \(u^\dagger u = E/mc^2\). Naturally the physics won’t depend on the choice of normalization.
Another notation convention that is often used is to set \(u_A = u\) and \(u_B = v\); in this case, \(u\) and \(v\) are the spinors representing the particle and antiparticle, satisfying the momentum-space Dirac equation, respectively
for the particles and
for the antiparticles.
6.1.5. Electromagnetic field quantization: photons#
Light is an electromagnetic wave, and as such, a specific solution to the Maxwell equations for electrodynamics (cf problem). Just like we obtained the Klein-Gordon and Dirac equations by ‘quantizing’ the energy-momentum equation (6.4) of special relativity, we can get a quantized description of light by looking at the Maxwell equations through a quantum-mechanical lens. First however, we will rewrite the equations in relativistic form; because the equations are already compatible with special relativity, this really is only a different way of writing them down. In classical form, there are four Maxwell equations:
where \(\bm{E}\) and \(\bm{B}\) are the electric and magnetic fields, \(\rho\) is the electric charge density, \(\bm{J}\) the current density, and \(\varepsilon_0\) and \(\mu_0\) are the permittivity and permeability of free space, satisfying \(\mu_0 \varepsilon_0 = 1/c^2\).
To cast the Maxwell equations into relativistic form, we need to convert our fields, currents and densities to four-vectors. For the charge and current density that is easy, as together they from a four vector:
The electric and magnetic fields have less easy counterparts. Rather than extending them with a zeroth component, we need to combine them into an antisymmetric two-tensor, known as the field strength tensor \(\bm{\bar{F}}\), with components:
Next to the field strength tensor, there is another way of combining the fields into an antisymmetric tensor, known as the dual (field strength) tensor \(\bm{\bar{G}}\):
The two homogeneous Maxwell equations (6.37)b and (6.37)c can together be written very succinctly in terms of the tensor \(\bm{\bar{G}}\):
With the current density four vector and the field strength tensor, we can likewise rewrite the two inhomogeneous Maxwell equations (6.37)a and (6.37)d as a single equation:
From the antisymmetry of \(F^{\mu\nu}\), we also get the continuity equation, likewise in succinct form (see Exercise 6.4):
As you know, in electromagnetism both the electric and the magnetic field can be written in terms of a potential, respectively a scalar potential \(V\) and a vector potential \(\bm{A}\) (see Exercise 6.5). Unsurprisingly, they too combine into a four-vector potential:
The field strength tensor \(\bm{\bar{F}}\) can be written in terms of the potential as
The advantage of introducing the potential is that the homogeneous Maxwell equations (6.41) are always satisfied (see Exercise 6.5). The downside is that the potential contains an undetermined part, just like the scalar and vector potentials do in classical electrodynamics. It is easy to see that this the case: simply add the four-divergence of any scalar function (i.e., a term \(\partial_\mu \lambda\), with \(\lambda\) an arbitrary scalar function) to the four-vector \(\bm{\bar{A}}\), and equation (6.45) remains true. A change in the potential which does not affect the fields is known as a gauge transformation. We can exploit the additional degree of freedom to make a convenient choice for the potential; physicists call this step a gauge choice. A commonly used one in relativistic electrodynamics is the Lorentz gauge:
In the Lorentz gauge, the remaining (inhomogeneous) Maxwell equations (6.42) become
While the Lorentz gauge eliminated part of the gauge freedom, from equation (6.47) it is clear that we can still change \(\bm{\bar{A}}\) without affecting the physical current density \(\bm{\bar{J}}\): if we have a scalar function \(\phi\) that satisfies the wave equation \(\square \phi = 0\), adding \(\phi\) to any component of the potential \(\bm{\bar{A}}\) changes nothing. We can remove this additional degree of freedom by specifically picking one of the components of \(\bm{\bar{A}}\). The downside of doing so is that we essentially select a fixed coordinate frame to be our inertial frame of reference; we therefore remove some of the inherent underlying symmetry. In practice however, there is always an ‘observer frame’, and we can work from there. The specific choice we will take here is known as the Coulomb gauge, which simply sets
The photon is simply the quantization (the particle representation) of the electromagnetic field. In particular, it satisfies the wave equation (6.47). In vacuum, this equation reduces to the Klein-Gordon equation for a massless particle:
The solutions to this vacuum wave equation are plane waves again:
where \(\bm{\bar{k}}\) is the wave four-vector, and \(\bm{\bar{\varepsilon}}\) the polarization vector, which tells us the polarization (i.e., the spin state) of the photon. Substituting the plane-wave solution into the wave equation (6.49), we find a constraint on the wave vector:
so the wave vector has length zero, and hence it satisfies the relation \(\omega = ck\), as it should for any electromagnetic wave. Moreover, as \(\bm{\bar{p}} = \hbar \bm{\bar{k}}\), we also have \(p^\mu p_\mu = 0\), so the particle has mass \(0\), again as it should. Finally, the gauge choice puts some constraints on the polarization vector \(\bm{\bar{\varepsilon}}\). The Lorentz gauge gives
giving us a linear relation between the four components of \(\bm{\bar{\varepsilon}}\). The Coulomb gauge moreover gives
which removes one component from the polarization vector, and reduces equation (6.52) to a three-vector condition:
By equation (6.54), the polarization three-vector \(\bm{\varepsilon}\) is always perpendicular to the photon’s direction of propagation (along \(\bm{k}\)); photons are thus transversely polarized. The two constraints combined leave us two degrees of freedom: these of course are the familiar horizontal and vertical polarization, or equivalently, left- and right-handed circular polarization of light. Although photons have spin 1, they thus only have two spin states, due to the constraint that their polarization has to be perpendicular to their direction of propagation; this constraint ultimately comes from the fact that they have no mass. This result holds more generally: massless particles with spin \(s\) have only two magnetic spin states, \(m_s = \pm s\), unlike their massive counterparts, which have \(2s + 1\) possible spin states[10]
6.2. Quantum electrodynamics#
As we discussed in Section 6.1.4 above, we can interpret the negative-energy solutions to the Dirac equation as positive-energy antiparticles. Each spin-½ particle then has a corresponding antiparticle, with identical mass and opposite charge. The antiparticle of the electron is the positron.
Electrodynamics is the study of the motion and interactions of charged particles. While many charged particles exist, the electrons and protons are by far the most prevalent. Protons are about \(2 000\) times heavier than electrons, and in matter are bound in nuclei, which (in a conducting solid) are fixed in a lattice; electrodynamics is therefore essentially the study of the motion and interaction of electrons.
In classical electrodynamics, the interactions between electrons are described by forces and fields. In quantum field theory, of which quantum electrodynamics is the simplest example, we quantize the fields, just like we quantized electromagnetic waves as photons above. Interactions between material particles (which are all fermions) are then mediated by force-carrying gauge bosons. In quantum electrodynamics (or QED), the only material particles are the electron and the positron; the force-carrying gauge boson is the photon.
Quantum field theories are used to calculate what happens when particles interact. There are two important processes that involve such interactions: scattering (when particles collide, or come close enough together that they interact with each other, for example through Coulomb repulsion) and decay (of atomic nuclei, or other composite particles). In QED, we only have scattering, as there are no composite particles that consist of only electrons and positrons.
6.2.1. Symmetry#
Like all physics theories, symmetries play an important role in quantum field theories. We will need two of them to understand the basics of field theory calculations.
6.2.1.1. CPT symmetry#
The charge, parity and time (CPT) symmetry states that if we flip the sign of all charges in a system, mirror it in space, and reverse the direction of time, the behavior of the resulting system is exactly the same as that of the original[11]. The parity transformation can be either the flip of one spatial direction, or (in three dimensions), of all three at the same time; in both cases, a right handed circular motion becomes a left handed one. Combined with time reversal, we could consider a PT transformation also as a flip in all four components of spacetime. Specifically in QED, the uncharged photon should not be affected if we change both parity and time, and indeed it is not: the parity transformation changes a right handed polarization into a left handed one, but traveling in the opposite time direction (i.e., looking ‘down’ instead of ‘up’ the time axis), we are back with a right handed polarization.
The CPT symmetry has an important implication (the Feynman-Stückelberg interpretation): we can interpret antiparticles as regular particles traveling backwards in time. Consequently, in QED we would have only two particles: electrons (traveling either forward or backward in time) and photons. Of course, we do not see electrons moving backwards in time; we see positrons moving forward, but mathematically, their behavior is identical.
6.2.1.2. Crossing symmetry#
The crossing symmetry states that if a reaction of the form \(\text{A} + \text{B} \to \text{C} + \text{D}\) is known to occur (e.g., two electrons colliding with each other, in which case all four letters would correspond to an electron, but with different energies and momenta), then any of these particles can ‘cross over’ to the other side of the reaction, if it is replaced by its antiparticle. Therefore, denoting the antiparticle with an overline, we have the following reactions which all exhibit the same crossing symmetry:
and so on. An explicit example is that if scattering between two electrons is possible, an electron and a positron can also scatter.
6.2.2. Feynman diagrams#
As stated above, the objective of QED (and all quantum field theories) is to calculate what happens if particles interact. Given that we are dealing with quantum-mechanical particles, we can only calculate probabilities, e.g. the probability that two scattering electrons will emerge with certain momenta given their incoming momenta. These probabilities are similar to the transition probabilities between states we encountered in Chapter 4, when we discussed time-dependent perturbations and Fermi’s golden rule. Similar to the golden rule, the probability of a specific scattering (or decay, as we’ll encounter below) process is proportional to the square of the ‘weight’ of that process, which in Section 4.4 was the coefficient \(c_b(t)\) of the higher-energy state. The ‘weight’ of a process in quantum field theory is usually referred to as its amplitude. The calculation of these amplitudes are in general tremendously complicated. Luckily, they can be conceptually simplified by using a visual series expansion scheme due to Feynman, known as the Feynman diagrams. The idea is that the interaction is built up from elementary processes involving the material (fermionic) and force-carrying (bosonic) particles in the theory. In quantum electrodynamics, exploiting the symmetries indicated above, there is only one type of elementary process, which can be drawn as a simple diagram (Fig. 6.1):
Fig. 6.1 The elementary interaction in quantum electrodynamics: a material particle emitting or absorbing a force-carrying photon. The arrow in the fermionic (matter) particle reflects the traveling direction of the particle; by the CPT symmetry, we can mirror in time, which results in the arrow pointing in the opposite direction, if we also swap the charge; there is therefore an equivalent version where the electron is replaced by a positron.#
By conservation of energy-momentum, the elementary process cannot occur in isolation. It is not hard to see why: by the crossing symmetry, an equivalent version would be the collision of an electron and a positron, resulting in their annihilation and the production of a single photon. We could then always make a transformation to the center-of-momentum frame of the electron-positron system before the collision, meaning that in that system the net momentum is zero. After the collision however, we would be left with a single photon, traveling at the speed of light, which cannot have zero momentum. Therefore, we always need at least two elementary interaction steps in any process.
The simplest example of a process containing two elementary interactions is the lowest-order contribution to the scattering between two electrons, known as Møller scattering (Fig. 6.2):
Fig. 6.2 Feynman diagram for the lowest-order contribution to the Møller scattering of two electrons.#
You may have noticed that we’ve drawn the line representing the photon to go straight up in the elementary process of Fig. 6.1 and the diagram for the Møller scattering in Fig. 6.2. That is not by accident: any intermediate particles that participate in a Feynman diagram are known as virtual particles, which are inherently undetectable. At every vertex, energy and momentum must be conserved, but we cannot distinguish between the processes ‘the top electron emits a photon that is absorbed by the bottom one’ and ‘the bottom electron emits a photon that is absorbed by the top one’, which is why we can combine them into a single diagram.
By the crossing symmetry, we can draw a diagram very similar to the one of Fig. 6.2 for the scattering between an electron and a positron (known as Bhabha scattering). However, by the CPT symmetry, there is a second process that also describes Bhabha scattering, and also has two elementary interactions. As we cannot know what is actually happening, we have to account for both possibilities when doing calculations about Bhabha scattering probabilities (Fig. 6.3):
Fig. 6.3 Feynman diagrams for the two lowest-order contribution to the Bhabha scattering of an electron and a positron.#
Further application of the symmetries give us the lowest-order diagrams for three more processes (all described by a single diagram): annihilation of an electron-positron pair, creation of such a pair, and the Compton scattering between an electron and a photon (Fig. 6.4):
Fig. 6.4 Feynman diagrams for the lowest-order contribution to electron-positron annihilation and creation, and the Compton scattering between an electron and a photon. Note that in every vertex, the direction of the arrow of the material (fermionic) particle is continuous.#
Fig. 6.2, Fig. 6.3 and Fig. 6.4 all contain diagrams with two vertices. They represent the lowest-order contributions to the processes they describe. However, there are more contributions: within each diagram, we can add additional vertices, giving additional diagrams that contribute to the process (or, more precisely, contribute to the calculation of the probability that the process occurs). For example, to the diagram describing Møller scattering in Fig. 6.2, we can add a spontaneous generation and annihilation of an electron-positron pair in the trajectory of the photon, an additional interaction between the electron and itself, or a second photon connecting the two electrons, as sketched in Fig. 6.5. For obvious reasons, the first-order diagrams like the ones in Fig. 6.2 are called ‘tree-level’ diagrams, whereas the second-order diagrams of Fig. 6.5 are known as ‘one-loop’ diagrams.
Fig. 6.5 Three qualitatively different Feynman diagrams of the second-order contribution to the Møller scattering of an electron and a positron. Note that these are not all possible diagrams with four vertices.#
To calculate the total amplitude of Møller scattering to second order, we need to add all contributions, like we already have to do for the two lowest-order contributions for Bhabha scattering. Moreover, our process does not end with second-order contributions. There are also contributions at third-order (with six vertices), fourth order (eight vertices), fifth order (ten vertices), and so on. Taken together, any process is therefore the sum of infinitely many diagrams, as often indicated in the general diagram shown in Fig. 6.6.
Fig. 6.6 General Feynman diagram of the interaction between two particles.#
Luckily, although the number of diagrams with more vertices goes up rapidly, their contributions to the amplitude of the process we are calculating decreases even faster. Every vertex add a factor with magnitude \(\alpha = e^2 / \hbar c \approx 1/137\) (the fine-structure constant), and because in every next order we add two vertices, the contributions of the second-order diagrams are already a factor \(10^4\) smaller than those of the first order. Nonetheless, they can be calculated, and in some cases experimentally verified with several orders of precision. Higher-order calculations have also been essential in narrowing down the potential masses of known but previously undetected particles, resulting in their discovery in high energy supercolliders including the Tevatron at Fermilab and the Large Hadron Collider at CERN.
Translating Feynman diagrams into probabilities
As indicated at the start of this section, the probability that a certain process will occur is proportional to the square of the amplitude of that process. The probability moreover depends on the energies and momenta of the particles involved (unsurprisingly, the probability that two particles will scatter if they are moving away from each other gets smaller, whereas a collision gets more likely if they are on trajectories that will bring them closer together). This point is expressed in Fermi’s golden rule (equation (4.61)), relating the transition probability per unit time to the square of the amplitude of a perturbation times the density of states of that perturbation.
To calculate the amplitude, we need to take all internal parts of the Feynman diagram (vertices and connecting edges) into account, and integrate over all possible trajectories the internal particle could have taken. Such an integral over all possible trajectories is known as a path integral. The factor added by each element depends on the nature of the particle. In QED, there is only one type of vertex (so all vertex contributions are the same), and two types of connecting edges (known as ‘propagators’ in field theory), one for an electron and one for a photon. Finally, we impose the rule that energy and momentum must be conserved at every vertex. Therefore, for the simplest Møller scattering diagram (Fig. 6.2), for the amplitude we get
where the integral is taken over all possible photon paths, weighted by their probability. In (four)-position space, these integrals are usually very hard; however, we can make a Fourier transform to (four)-momentum space, in which we can exploit the constraint that energy and momentum (i.e., four-momentum) must be conserved at every vertex, to simplify the math, which makes evaluating the integral a doable task, albeit beyond the scope of this book. Higher-order diagrams add more vertices and propagators, and thus their amplitudes involve multiple integrals; they too can be evaluated in four-momentum space, allowing physicist to calculate scattering probabilities with high precision.
6.3. Quantum field theory of the strong and the weak force#
6.3.1. Quarks and gluons: the strong force#
A universe consisting only of electrons, positrons and photons would be rather boring: to create atoms, we also need nuclei. Nuclei consist of protons and neutrons, but unlike electrons, they are not elementary particles, as they themselves consist of other particles known as quarks. Quarks are, to the best of our knowledge, elementary; they are spin-½ particles that can combine in pairs, triplets and quintets to make other particles. Both protons and neutrons consist of three quarks, of two different types, knonw as the ‘up’ and ‘down’ quark. A proton is a combination of two up and one down quark, and a neutron a combination of two down and one up quark. From these combinations, we can deduce that up quarks have charge \(+\frac23 e\) and down quarks have charge \(-\frac13 e\). The electron charge is thus not the smallest unit of charge. Quarks are however never observed in isolation, and to date no isolated particles have been found whose charge is not an integer multiple of the electron charge.
As we discussed above, the electromagnetic interaction between charged particles can be described through the exchange of a force boson, which is the photon for the electromagnetic force. Quarks, as charged particles, also interact with each other through the electromagnetic force, and these interactions can be described in the same way as the interactions between leptons (electrons, protons, their heavier cousins the muons and tauons, and neutrinos, which we will discuss below). However, if quarks would only interact through the electromagnetic force, protons and neutrons would not be stable. To explain their existence, there therefore needs to be another force at play, which is fundamentally different from the electromagnetic force. This force, which is responsible for binding quarks together so strongly that they are never found in isolation, is known as the strong force. At the scale of the proton and neutron, it is much stronger than the electromagnetic force. It moreover only acts between quarks, ignoring leptons altogether. Unlike the electromagnetic and gravitational force, the strong force increases with distance, thus ensuring that the particles it acts between remain strongly bound together.
To describe the strong force, physicists have introduced the concept of a ‘color charge’. Color charges are similar to electrical charges, except that they come in three instead of two (positive and negative) types. The three color charges are commonly referred to as red, green and blue, which combine into a ‘charge neutral’ state known as white. Composite particles can only exist if their color charge is white. The three quarks in a proton or neutron thus must all have a different color. Composite particles consisting of two quarks, like the pion, consist of a quark and an antiquark, with an anti-color; for example, the \(\pi^+\) pion consists of an up and an anti-down quark, which can have any combination of a color and the corresponding anti-color. Because of the central role the color charge plays in the strong force, the quantum field theory of the strong force is known as quantum chromodynamics or QCD.
Just like the electromagnetic force, the strong force can be described using force-carrying bosons, known as gluons[12]. Like the quarks, gluons have a color charge, or more precisely, a combination of a color and anti-color charge. The basic Feynman diagram vertex for the strong force is similar to that of the electromagnetic force, but with the additional constraint of conservation of color. The simplest interaction between two quarks is then a ‘color exchange’, the analog of electron-electron scattering in quantum electrodynamics, see Fig. 6.7.
Fig. 6.7 Strong-force (quantum chromodynamics) interaction. Left: elementary vertex, a quark emitting or absorbing a gluon, with conservation of color (q = quark, g = gluon, r/g/b stands for color, an overbar for anti-color). Right: color-exchange intearction between two quarks.#
The interacting gluon in Fig. 6.7 could also have been a blue - anti-green gluon. For any interaction between two quarks, we therefore have two contributing diagrams that are identical except for the gluon charges, as illustrated in Fig. 6.8.
Fig. 6.8 Strong-force interactions can contain multiple almost identical diagrams with different gluon colors.#
The presence of multiple gluon intermediaries already makes the calculations of the strong force more involved than those of the electromagnetic force. There is, however, more: the gluons can also interact with each other. We have two more possible vertices, one with three and one with four gluons, as illustrated in Fig. 6.9.
Fig. 6.9 Interactions between gluons with different color combinations.#
The nine possible color-anticolor combinations for gluons that we work with in the Feynman diagrams are known as effective states. They are however not the actual states of the gluons; instead, the gluons, like spin-½ particles, combine into superposition states. The simplest combination would be the ‘color singlet’, given as
Stable hadronic particles (i.e., particles consisting of quarks), have to be ‘colorless’ (or ‘white’). For baryonic (three-quark) particles like the proton and the neutron, this condition implies that their constituent quarks each have a different color. Two such colorless particles could have a strong-force interaction if color singlet gluons would exist. However, measurements show that there are no long-range strong interactions; all such interactions are limited to the quarks as bound inside the hadronic particle. Therefore, there are no gluons in the singlet state.
The remaining ‘color octet’ for gluons can be written in many bases. The commonly used one is given below[13]
No doubt you’ve spotted the green anti-blue / blue anti-green combination of the diagram in Fig. 6.8.
6.3.2. Quark-lepton interactions: the weak force#
As we’ve seen above, photons mediate the electromagnetic force between charged particles (irrespective of whether they are leptons or quarks), and gluons mediate the strong force between quarks. Both these forces leave the nature of the interacting particles (their ‘flavor’ as physicist call it) unchanged. Nothing, however, lasts forever: particles can change flavor through radioactive decay. Neither the electromagnetic nor the strong force can describe what happens in a decay process; this is the purview of the weak force. Because weak forces can change the flavor of a particle, the quantum field theory of the weak force is sometimes called quantum flavordynamics.
There are three bosons that can mediate weak forces. Unlike the photons and gluons, they have nonzero mass, and two of them are charged. The uncharged weak force boson is known as the \(Z\)-boson, and its vertex is similar to that of the electromagnetic force. The charged weak force bosons are known as the \(W^\pm\) bosons, and their vertices are fundamentally different, as the incoming and outgoing fermions are different (see Fig. 6.10). Naturally the total charge must be preserved in any process. For example, a down quark can be converted into an up quark through the emission of a \(W^-\) boson. For leptons, the existence of the \(W^\pm\) bosons necessitates the introduction of a new, charge neutral particle, since the charge of the electron and the boson is the same. This new particle is known as the neutrino, first postulated by Pauli to ensure conservation of energy, momentum and angular momentum in beta-decay processes. Because they only interact through the weak force (and gravity), neutrinos can pass through other matter; the sun produces great numbers of them that pass through your body as you are reading this. Detecting neutrinos is therefore also very difficult, and requires large specialized facilities located underground or at the bottom of the sea to shield them from other particles. Neutrinos have a very small mass compared to other fundamental particles, but although it is small, the neutrino mass has been measured to be nonzero.
Fig. 6.10 The basic vertices of the weak force, mediated by neutral \(Z\) and charged \(W^\pm\) bosons. \(Z\) bosons can interact with any fermion. \(W^\pm\) bosons change the flavor of a fermion, and therefore participate in specific reactions, including a lepton-based vertex with an electron and a neutrino (middle) and a quark-based vertex involving a change of flavor (right).#
To illustrate how the weak force acts in radioactive decay, we look at the beta-decay process, in which an atomic nucleus emits an electron (and, as it turns out, an anti-neutrino), see Fig. 6.11. In the nucleus, only one neutron participates in the reaction, and in the neutron, only one of its quarks participates. This down quark gets converted to an up quark under the emission of a \(W^-\) boson, converting the neutron into a proton. To lowest order in the number of vertices, the \(W^-\) boson in turn decays into an electron and an anti-neutrino.
Fig. 6.11 The lowest-order Feynman diagram for radioactive \(\beta\)-decay, in which a nucleus emits an electron. As illustrated, one of the neutrons in the nucleus gets converted to a proton, through a conversion of one of its down quarks into an up quark, accompanied by the creation of an electron and an anti-electronneutrino. The interaction between the quarks and the leptons is mediated by a \(W^-\) boson.#
6.3.3. Electroweak theory and the Higgs mechanism#
Just like the theories of electricity and magnetism could be unified into electromagnetism, the quantum field theories of electromagnetism and the weak force can be unified into electroweak theory. While the effects of electromagnetic and weak interactions are very different at our everyday comparatively low temperatures, if the temperature gets high enough (estimated at around \(10^{15}\;\mathrm{K}\)), the effects become indistinguishable from each other, and the two forces combine into the electroweak force.
Similar to the strong force, in the electroweak field theory there are interactions between the bosons, leading to two new vertices, one with three and one with four bosons interacting (Fig. 6.12).
Fig. 6.12 Interactions between bosons in the electroweak field theory.#
At high temperatures, the electroweak theory has a high degree of symmetry, making the bosons indistinguishable from each other. The only time this happened in the evolution of the universe was shortly after the big bang. As the universe expanded and cooled down, the symmetry was broken, and the bosons as we observe them now emerged. The present-day bosons were not the only possibility; in principle, there was an infinite range of options, from which one was selected, through a process known as spontaneous symmetry breaking. Such a symmetry breaking happens in many branches of physics, for example in phase transitions, when a system in a phase with a high degree of symmetry transitions into one with lower symmetry. To illustrate how the process works, imagine a square table set for four people (see Fig. 6.13(a)) in a symmetric fashion: each person has their own plate set directly in front of them, but the four glasses of wine are placed exactly between two neighboring people. Until someone picks up a glass, the table has two symmetries: rotation over \(90^\circ\) and reflection over a diagonal. However, once one person has picked a glass (say with their right hand), everyone else has to follow suit, as otherwise not everyone gets a glass. The table then still has the rotational symmetry, but mirroring it will no longer be a symmetric operation (flipping between the choice of the right-hand and the left-hand glass). In physical systems undergoing a phase transition, something similar happens (Fig. 6.13(b)). As the control parameter (usually the temperature) changes, the energy describing the system changes shape, from a high-symmetry version with a single minimum to a system with multiple minima that is no longer fully symmetric about any of those minima. The system can evolve into either of the new minima, but once a minimum is chosen, the choice is fixed.
While in Fig. 6.13(b) the system had two minima to choose from, if the parameter \(\phi\) is a complex variable, the number of possible minima may be infinite (see Fig. 6.13(c), showing a sombrero-shaped potential due to Jeffrey Goldstone). A similar potential governs the electroweak theory below the critical temperature.
Fig. 6.13 Spontaneous symmetry breaking. (a) The table arrangement is fully symmetric. If four people sit at the table, each of them has a plate in front of them, and a glass of wine equally close to their left and right hand. Once one of the diners picks up a glass, the symmetry is broken; if the first diner to pick a glass takes the right-hand one, then all others have to do the same, to ensure everyone has a glass. (b) Symmetry breaking due to a change in the potential. Plotted lines are the potential \(V(\phi) = a \phi^2 + \phi^4\), for three different values of \(a\) (\(1\), \(-1\) and \(-2\), respectively). For \(a=1\), the potential is symmetric about its minimum at \(\phi = 0\). If we decrease \(a\), the minimum becomes an unstable local maximum, while new minima emerge at \(\phi \neq 0\). Any perturbation will cause the symmetry to break, and the system to choose one of the (otherwise equivalent) new minima. (c) Goldstone’s sombrero potential (\(V(\phi) = -5 |\phi|^2 + |\phi|^4\) for a complex scalar \(\phi\)) with infinitely many equivalent but nonzero minima at \(\phi = \sqrt{5/2} e^{i\theta}\).#
Although the original high-symmetry state is no longer a minimum in the broken-symmetry potential in Fig. 6.13(b) and (c), it is a local maximum, and thus the system needs a perturbation to push it to evolve to a new minimum. In phase transitions, the perturbation is due to thermal fluctuations. In the evolution of the universe, the perturbation was caused by the Higgs mechanism, introducing one more interaction to the collection we have built so far. The Higgs mechanism was originally introduced to explain why the \(W^\pm\) and \(Z\) bosons have nonzero mass. The idea is that there is a quantum field (called, unsurprisingly, the Higgs field) that permeates the universe, interacting both with massive fermions and the three weak field bosons. The Higgs field can either be described as a sort of fluid that fills all of space, causing a drag force on fermions and massive bosons (whereas the photon and gluons, which have no mass, would move through the field unaffected), or through a boson particle that interacts with fermions and weak bosons as described by two new vertices (see Fig. 6.14). Interactions with the Higgs field would then result in an effective mass of the weak force bosons; the presence of the Higgs field would also be the cause of the breaking of the electroweak symmetry.
The Higgs boson has spin zero; it also carries neither charge nor color. It does not mediate a force like the other bosons do, but rather helps explain why particles have mass. The Higgs mechanism (and corresponding boson) was developed by three independent teams in 1964; the boson was only detected in 2012, as the last of the bosons describing the fundamental interactions of quantum particles.
Fig. 6.14 Vertices of the field theory of the Higgs mechanism: the Higgs boson can interact with any massive fermion (left), and with the massive bosons of the weak force (right).#
6.3.4. The standard model of particle physics#
We can combine all interactions introduced in this chapter (together describing the electromagnetic, weak and strong force) and the fermionic material particles they mediate forces between, into a coherent overview, known as the standard model of particle physics, see Fig. 6.15 and Fig. 6.16. The standard model contains two types of material particles: quarks and leptons. Of each, there are three ‘generations’; in addition to the up and down quark, and the electron and associated electron-neutrino, with their antiparticles, there are two more groups of heavier cousins, that otherwise behave the same. For the quarks, we have the strange and charm quark[14] in the second generation, and the top and bottom (or, for the more poetically inclined, truth and beauty) quarks in the third. For the leptons, the heavier cousins of the electron are the muon and the tauon. They each also have a neutrino, although neutrinos follow their own logic: any created neutrino is actually a superposition of the three possible neutrinos, making these elusive particles even harder to study then they already are.
Nobody knows why there are three generations of massive particles. We are quite certain however that there are only three; people have calculated what would happen in high-energy collision experiments if there were four (in particular, if there were four neutrinos), and the resulting predictions fit the observations less well than those based on three families [collaboration", 1989].
The standard model is one of the major successes of both experimental and theoretical physics, and the path that lead to it a perfect example of how theory and experiment can stimulate and support each other, with predictions leading to observations, and observations leading to new theory. We do know however that the standard model is not fully correct; for example, in the standard model neutrino’s have no mass, and gravity is absent from the standard model altogether.
Fig. 6.15 Overview of the particles (fermionic matter and bosonic interactions) that together form the standard model of particle physics [15]. Thin lines indicate which fermions each of the bosons interacts with.#
Fig. 6.16 All possible interactions in the standard model, between fermions (straight solid lines with arrows) and bosons (curved and dotted lines).#
Following the success of creating a unified theory of electromagnetism and the weak force, physicists have been searching for a theory combining all four known fundamental forces (the strong, weak, electromagnetic, and gravitational forces). It is generally assumed that at very high temperatures and small length scales all four forces should combine into a single one, but to date no theory describing this unified force has been found. Gravity, while excellently described at the scale of large masses and distances by Einstein’s theory of general relativity, has been especially elusive. A field theory of gravity would predict the existence of at least one graviton, a boson with spin 2 (set by the properties of gravitational waves), which is sometimes added tentatively to the standard model, but nobody knows how a graviton would interact with fermionic matter. The search continues.
6.4. Problems#
Exercise 6.1 (A basis for the plane wave solutions to the Dirac equation)
In this problem, we’ll construct a basis for the plane wave solutions to the Dirac equation (6.28).
Set \(u_A = \begin{pmatrix} 1 \\ 0 \end{pmatrix}\), then find \(u_B\) from equation (6.31). Note that you have to pick the \(+\) sign solution in equation (6.33) to prevent \(u_B\) from diverging as \(p \to 0\).
Set \(u_A = \begin{pmatrix} 0 \\ 1 \end{pmatrix}\), then find \(u_B\) from equation (6.31).
Verify that the expressions you found for \(u_B\) in (a) and (b) are orthogonal. Remember that for vectors with complex components, the dot product includes a complex conjugate, just like the function inner product: \(\braket{\bm{v}|\bm{w}} = \bm{v}^\dagger \cdot \bm{w}\).
Verify that setting \(u_B = \begin{pmatrix} 1 \\ 0 \end{pmatrix}\) and \(u_B = \begin{pmatrix} 0 \\ 1 \end{pmatrix}\), you get the same expressions for \(u_A\) as you found for \(u_B\) in (a) and (b), if you choose the \(-\) sign solution in equation (6.33) - again as you have to, to prevent \(u_A\) from diverging for \(p \to 0\).
Exercise 6.2 (Spin eigenstates in the solutions of the Dirac equation.)
The spin matrices of the bispinor solutions of the Dirac equation can be written as
where the \(\bm{\sigma}\) are the usual Pauli spin matrices.
For a stationary particle, we can easily write down a basis for the bispinors, as the two components are independent:
(6.59)#\[\begin{split}\begin{align*} \psi^{(1)} &= \exp\left(-\frac{i mc^2}{\hbar}t \right) \begin{pmatrix} 1 \\ 0 \\ 0 \\ 0 \end{pmatrix}, \quad \psi^{(2)} = \exp\left(-\frac{i mc^2}{\hbar}t \right) \begin{pmatrix} 0 \\ 1 \\ 0 \\ 0 \end{pmatrix}, \\ \psi^{(3)} &= \exp\left(\frac{i mc^2}{\hbar}t \right) \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \end{pmatrix}, \quad \psi^{(4)} = \exp\left(\frac{i mc^2}{\hbar}t \right) \begin{pmatrix} 0 \\ 0 \\ 0 \\ 1 \end{pmatrix}. \end{align*}\end{split}\]Unsurprisingly, \(\psi^{(1)}\) and \(\psi^{(2)}\) correspond to the ‘spin up’ and ‘spin down’ eigenstates of the \(\hat{S}_z\) operator of the electron, respectively. Argue why for the positron it is the other way around: \(\psi^{(3)}\) corresponds to the ‘spin down’ eigenstate, and \(\psi^{(4)}\) to the ‘spin up’ eigenstate. Hint: think of Dirac’s interpretation of the negative-energy solutions.
For the plane-wave solutions, the basis we constructed in Exercise 6.1 are in general not eigenstates of the \(\hat{S}_z\). Verify this statement explicitly for \(u^{(1)}\), i.e., the state you constructed in Exercise 6.1(a).
From your answer at (b), conclude that for the case that the particle is moving in the \(z\) direction (i.e., we re-orient the \(z\)-axis to point in the direction of motion), then \(u^{(1)}\) is an eigenstate of \(\hat{S}_z\). Find the eigenvalue (the answer should not surprise you).
Exercise 6.3 (Relativistic Maxwell equations)
Show that equations (6.41) and (6.42) indeed are equivalent to the four classical Maxwell equations (6.37).
Exercise 6.4 (Continuity equation in relativistic electrodynamics)
Taking the derivative of \(J^\nu\) with respect to \(x^\nu\) and summing (i.e., calculating the four-divergence \(\partial_\nu J^\nu\)), and using that the field strength tensor \(\bm{\bar{F}}\) is anti-symmetric (i.e., \(F^{\mu \nu} = -F^{\nu \mu}\), derive equation (6.43) from equation (6.42).
Rewrite equation (6.43) to a relation between the divergence of the three-dimensional current density \(\bm{J}\) and the time derivative of the electric charge density \(\rho\), and use this relation to show that charge is conserved.
Exercise 6.5 (Relativistic electromegnetic potential)
In classical electromagnetism, the magnetic field can be written as the curl of a vector potential:
(6.60)#\[\bm{B} = \bm{\nabla} \times \bm{A}.\]Show that such a field \(\bm{B}\) indeed satisfies equation (6.37)c for an arbitrary vector field \(\bm{A}\).
Show that adding the gradient of an arbitrary scalar function \(f\) to \(\bm{A}\) does not change the magnetic field.
Use the vector potential to re-write Faraday’s law, equation (6.37)b, as
(6.61)#\[\bm{\nabla} \times \left( \bm{E} + \frac{\partial \bm{A}}{\partial t} \right) = 0.\]Use equation (6.61) to show that, in addition to the vector potential \(\bm{A}\), we can introduce a scalar potential \(V\), such that
(6.62)#\[\bm{E} = - \bm{\nabla} V - \frac{\partial \bm{A}}{\partial t}.\]Combining the results of (a)-(d), show that if we introduce the potentials \(V\) and \(\bm{A}\), the homogeneous Maxwell equations (6.37)c and (6.37)b are always satisfied.
Finally combining the scalar and vector potentials into a four-vector potential as in equation (6.44), show that we can write the field strength tensor as in equation (6.45).
Exercise 6.6 (Second order Feynman diagrams for Bhabha scattering)
Determine how many possible second-order (four-vertices, or one-loop) diagrams there are of the Bhabha scattering process.