# 10. Random variables#

Recommended reference: Wasserman [Was04], Sections 1.1–1.3, 2.1–2.2 and 3.1–3.3.

## 10.1. Introduction#

A **probability distribution** assigns *probabilities* (real numbers
in \([0,1]\)) to elements or subsets of a *sample space*. The elements
of a sample space are called *outcomes*, subsets are called *events*.

The space of outcomes is usually of one of two kinds:

some finite or countable set (modelling the number of particles hitting a detector, for example), or

the real line, a higher-dimensional space or some subset of one of these (modelling the position of a particle, for example).

These correspond to the two types of probability distributions that
are usually distinguished: *discrete* and *continuous* probability
distributions.

## 10.2. Random variables#

A **random variable** is any real-valued function on the space of
outcomes of a probability distribution. Random variables can often be
interpreted as observable (scalar) quantities such as length,
position, energy, the number of occurrences of some event or the spin
of an elementary particle.

Any random variable has a *distribution function*, which is how we
usually describe random variables. We will look at distribution
functions separately for discrete and continuous random variables.

## 10.3. Discrete random variables#

A discrete random variable \(X\) can take finitely many or countably
many distinct values \(x_0,x_1,x_2,\ldots\) in \(\RR\). It is
characterised by its *probability mass function* (PMF).

(Probability mass function)

The *probability mass function* of the discrete random variable \(X\)
is defined by

One can visualise a probability mass function by placing a vertical bar of height \(f_X(x)\) at each value \(x\), as in Fig. 10.1.

## Show code cell source

```
from matplotlib import pyplot
from myst_nb import glue
fig, ax = pyplot.subplots()
for x, px in ((-0.2, 0.3), (0.4, 0.2), (0.7, 0.5)):
ax.add_line(pyplot.Line2D((x, x), (0, px), linewidth=2))
ax.set_xbound(-0.3, 0.8)
ax.set_ybound(0, 0.6)
ax.set_xlabel('$x$')
ax.set_ylabel('$f(x)$')
glue("pmf", fig)
```

(Properties of a probability mass function)

Since the \(f_X(x)\) are probabilities, they satisfy

Since the \(x_i\) are all the possible values and their total probability equals 1, we also have

## 10.4. Continuous random variables#

For a continuous random variable \(X\), the set of possible values is
usually a (finite or infinite) interval, and the probability of any
single value occurring is usually zero. We therefore consider the
probability of the value lying in some interval. This can be
described by a *probability density function* (PDF).

(Probability density function)

The *probability density function* of the continuous random variable
\(X\) is a function \(f_X\colon\RR\to\RR\) such that

To visualise a continuous random variable, one often plots the probability density function, as in Fig. 10.2.

(Properties of a probability density function)

\(f_X(x)\ge0\) for all \(x\);

\(\int_{-\infty}^\infty f(x)dx=1\).

## Show code cell source

```
from matplotlib import pyplot
from myst_nb import glue
import numpy as np
x = np.linspace(0, 16, 101)
fx = 1/120 * x**5 * np.exp(-x)
fig, ax = pyplot.subplots()
ax.plot(x, fx)
ax.set_xbound(0, 16)
ax.set_ybound(0, 0.2)
ax.set_xlabel('$x$')
ax.set_ylabel('$f(x)$')
glue("pdf", fig)
```

## 10.5. Expectation and variance#

The *expectation* (or *expected value*, or *mean*) of a random
variable is the average value of many samples. The *variance* and the
closely related *standard deviation* measure by how much samples tend
to deviate from the average.

(Expectation)

The *expectation* or *mean* of a discrete random variable \(X\) with
probability mass function \(f_X\) is

The *expectation* or *mean* of a continuous random variable \(X\)
with probability density function \(f_X\) is

The expectation of \(X\) is often denoted by \(\mu\) or \(\mu(X)\).

(Variance and standard deviation)

The *variance* of a (discrete or continuous) random variable \(X\)
with mean \(\mu\) is

The *standard deviation* of \(X\) is

It is not hard to show (see Exercise 10.3) that

## 10.6. Cumulative distribution functions#

Besides the probability mass functions and probability density
functions introduced above, it is often useful to look at *cumulative*
distribution functions. Among other things, these have the advantage
that the definition is the same for discrete and continuous random
variables.

(Cumulative distribution function)

The *cumulative distribution function* of a (discrete or continuous)
random variable \(X\) is the function \(F_X\colon\RR\to\RR\) defined by

If \(X\) is a discrete random variable, this comes down to

This is illustrated in Fig. 10.3.

## Show code cell source

```
from matplotlib import pyplot
from myst_nb import glue
fig, ax = pyplot.subplots()
for x0, x1, h in ((-0.3, -0.2, 0.005), (-0.2, 0.4, 0.3),
(0.4, 0.7, 0.5), (0.7, 0.8, 0.995)):
ax.add_line(pyplot.Line2D((x0, x1), (h, h), linewidth=2))
ax.set_xbound(-0.3, 0.8)
ax.set_ybound(0, 1)
ax.set_xlabel('$x$')
ax.set_ylabel('$F(x)$')
glue("cdf-discrete", fig)
```

Now suppose \(X\) is a continuous random variable. Taking the limit \(a\to-\infty\) in the definition of the probability density function, we obtain

This is illustrated in Fig. 10.4.

## Show code cell source

```
from matplotlib import pyplot
from myst_nb import glue
import numpy as np
x = np.linspace(0, 16, 101)
Fx = 1 - 1/120 * (x**5 + 5*x**4 + 20*x**3 + 60*x**2 + 120*x + 120) * np.exp(-x)
fig, ax = pyplot.subplots()
ax.plot(x, Fx)
ax.set_xbound(0, 16)
ax.set_ybound(0, 1)
ax.set_xlabel('$x$')
ax.set_ylabel('$F(x)$')
glue("cdf", fig)
```

## 10.7. Independence and conditional probability#

Recommended reference for this subsection: Wasserman [Was04], Sections 2.5–2.8.

Informally speaking, two random variables \(X\) and \(Y\) are independent
if knowledge of the value of one of the two does not tell us anything
about the value of the other. However, knowledge of one random
variable often *does* give you information about another random
variable. This is encoded in the concept of conditional probability
distributions.

To make the notions of indepence and conditional probability precise,
we need two further concepts: the *joint probability mass function*
\(P(X=x\text{ and }Y=y)\) in the case of discrete random variables, and
the *joint probability density function* \(f_{X,Y}(x,y)\) in the case of
continuous random variables. These describe the joint probability
distribution of the two random variables \(X\) and \(Y\). Rather than
defining these here, we refer to Wasserman [Was04], Section 2.5. We
only note the relationship with the corresponding single-variable
functions. If \(X\) and \(Y\) are discrete random variables, we have

Similary, if \(X\) and \(Y\) are continuous random variables, the relationship is given by

In this situation, the distributions of \(X\) and \(Y\) individually are
called the *marginal distributions* of the joint distribution of \(X\)
and \(Y\).

(Independence of random variables)

Two discrete random variables \(X\) and \(Y\) are *independent* if for all
possible values \(x\) and \(y\) of \(X\) and \(Y\), respectively, we have

Similarly, two continous random variables \(X\) and \(Y\) are
*independent* if their probability density functions satisfy

(Conditional probability)

Consider two discrete random variables \(X\) and \(Y\), and a value \(y\)
such that \(P(Y=y)>0\). The *conditional probability* of \(x\) given \(y\)
is defined as

Analogously, consider two continuous random variables \(X\) and \(Y\), and
a value \(y\) such that \(f_Y(y)>0\). The *conditional probability* of
\(x\) given \(y\) is defined as

In Exercise 10.5, you will show that if two random variables \(X\) and \(Y\) are independent, then the distribution of \(X\) given a certain value of \(Y\) is the same as the distribution of \(X\).

## 10.8. Covariance and correlation#

Recommended reference for this subsection: Wasserman [Was04], Section 3.3.

(Covariance of two random variables)

Let \(X\) and \(Y\) be random variables with means \(\mu_X\) and \(\mu_Y\),
respectively. The *covariance* of \(X\) and \(Y\) is

The *correlation* of \(X\) and \(Y\) is

The covariance can be expressed in a way reminiscent of (10.1) as follows (see Exercise 10.6):

Using the Cauchy–Schwarz inequality (see Exercise 8.4), one can show that the correlation satisfies \(-1\le\rho(X,Y)\le 1\).

## 10.9. Moments and the characteristic function#

Note

The topics in this section are not necessarily treated in a BSc-level probability course. We include them both because they have applications in physics and data analysis, and because they connect nicely to various other topics in this module.

(Moments of a random variable)

Let \(X\) be a random variable. For \(j=0,1,\ldots\), the \(j\)-th *moment*
of \(X\) is the expectation of \(X^j\).

Concretely, for a discrete random variable \(X\), this means that the \(j\)-th moment is given by

For a continuous random variable, the \(j\)-th moment can be expressed as

Note that we have already encountered the first and second moments in the definition of the expectation (Definition 10.3) and in the formula (10.1) for the variance.

It is sometimes convenient to collect all the moments of \(X\) in a
power series. For this, we include the *moment generating function*
of \(X\).

(Moment generating function of a random variable)

Let \(X\) be a random variable. The *moment generating function* of \(X\)
is the following function of a real variable \(t\):

This definition may look a bit mysterious at first sight. Assuming that we can treat the random variable \(X\) in a similar way as an ordinary real number, we can compute the Taylor series of \(M_X(t)\) via the usual formula (see The exponential function), which will reveal how \(M_X(t)\) encodes the moments of \(X\):

In some situations, it is better to use a variant called the
*characteristic function*. (One reason is that the characteristic
function is defined for every probability distribution; the moment
generating function does not exist for ‘badly behaved’ distributions
like the Cauchy distribution that we will see in
Definition 11.1.)

(Characteristic function of a random variable)

Let \(X\) be a random variable. The *characteristic function* of \(X\) is
the following (complex-valued) function of a real variable \(t\):

A similar computation as above gives

On the other hand, using the probability density function \(f_X\), we can also express \(\phi_X(t)\) in a different way, namely as

This means that \(\phi_X(t)\) is essentially the Fourier transform of \(f_X\) (see Definition 3.1); more precisely, the relation is given by

The characteristic function can be used to give a proof of the
*central limit theorem*, which will be introduced in
Section 12.3.

## 10.10. Exercises#

Show that the function

is a probability mass function.

Which of the following functions are probability density functions?

\(f(x)=\begin{cases} 0& \text{if }x<0\\ x\exp(-x)& \text{if }x\ge 0\end{cases}\)

\(f(x)=\begin{cases} 1/4& \text{if }-2\le x\le 2\\ 0& \text{otherwise}\end{cases}\)

\(f(x)=\begin{cases} \frac{3}{4}(x^2-1)& \text{if }-2\le x\le 2\\ 0& \text{otherwise} \end{cases}\)

Deduce (10.1) from the definition of the variance.

Consider a continuous random variable \(X\) with probability density function

Compute the expectation and the variance of \(X\).

Consider two discrete random variables \(X\) and \(Y\). Show that if \(X\) and \(Y\) are independent, we have

(Intuitively, this means that observing \(Y\) tells us nothing about the probability of observing a certain value of \(X\).)

Let \(X\) and \(Y\) be random variables. Prove the formula (10.2).

Show that the moment generating function of the random variable from Exercise 10.4 is given by

Let \(X\) be a random variable with moment generating function \(M_X(t)\). Show that for all \(n\ge0\), the \(n\)-th moment of \(X\) can be computed as the \(n\)-th derivative of \(M_X(t)\) at \(t=0\), i.e.