Diagonalisability

6.3. Diagonalisability#

6.3.1. Introduction#

Before we delve into the important topic of diagonalisability, we first introduce the concept of similar matrices.

6.3.2. Similar matrices#

similar

Definition 6.3.1

Two \(n \times n\)-matrices \(A\) and \(B\) are called similar if they are related via the property

\[ B = PAP^{-1} \quad \text{for some invertible matrix } P. \]

Notation: \(A \simeq B\).

Remark 6.3.1

In the definition it seems as if \(A\) and \(B\) play different roles, but that is not the case. This can be seen as follows:

\[ A \simeq B \quad \iff \quad B = PAP^{-1} \quad \iff \quad P^{-1}BP = P^{-1}(PAP^{-1})P = A. \]

Since \((P^{-1})^{-1} = P\), we see that

\[ B = PAP^{-1} \quad \iff \quad A = QBQ^{-1}, \quad \text{where } Q = P^{-1}, \]

so similarity works both ways, that is,

\[ A \simeq B \quad \iff \quad B \simeq A. \]

Similar matrices have similar properties, especially with regard to eigenvalues and eigenvectors.

Proposition 6.3.1

If \(A = PBP^{-1}\), then \(A\) and \(B\) have the same eigenvalues.

Moreover, if \(\vect{v}\) is an eigenvector of \(B\), then \(P\vect{v}\) is an eigenvector of \(A\).

Proof of Proposition 6.3.1

Suppose \(\lambda\) is an eigenvalue of \(B\), and \(\vect{v}\) is a corresponding eigenvector. We then see that

\[ B\vect{v}= \lambda\vect{v} \quad \Longrightarrow \quad AP\vect{v} = (PBP^{-1})P\vect{v} = PB\vect{v} = P(\lambda\vect{v}) = \lambda P\vect{v}. \]

So \(AP\vect{v} = \lambda P\vect{v} \), and \(P\vect{v} \) is an eigenvector, provided it is not the zero vector. Since \(P\) is supposed to be invertible, and \(\vect{v}\) is not the zero vector, it is true that \(P\vect{v} \) is not the zero vector, and we are done.

Proposition 6.3.2

Similar matrices have the same characteristic polynomial.

Proof of Proposition 6.3.2

Suppose \(A = PBP^{-1}\).

Then we have

\[ \det{(A - \lambda I)} = \det{(PBP^{-1} - \lambda I)} = \det{(B - \lambda I)}. \]

The second equality is proved by the following chain of identities.

\[\begin{split} \begin{array}{rcl} \det{(PBP^{-1} - \lambda I)} &=& \det{(PBP^{-1} - \lambda PIP^{-1})} \\ & = & \det{(P(B- \lambda I)P^{-1})} \\ & = & \det{P}\cdot\det{(B- \lambda I)}\cdot\det{(P^{-1})} \\ & = & \det{P}\cdot\det{(B- \lambda I)}\cdot\dfrac{1}{\det{P}} \\ & = & \det{(B- \lambda I)}. \end{array} \end{split}\]

In fact, the first step contains the ‘smart move’, to bring in convenient factors \(P\) and \(P^{-1}\) via

\[ I = PIP^{-1}. \]

In the other steps we used the rule \(\det{(AB)} = \det{A}\det{B}\) and its consequence that for invertible matrices \(P\) we have

\[ \det{(P^{-1})} = \dfrac{1}{\det{P}}. \]

Remark 6.3.2

From Proposition 6.3.2 it follows that similar matrices have the same eigenvalues with the same algebraic multiplicities.

Remark 6.3.3

From Proposition 6.3.1 it follows that they also have the same geometric multiplicities. That is, if \(\vect{v}_1, \ldots, \vect{v}_m\) are linearly independent eigenvectors of \(B\) for the eigenvalue \(\lambda_k\), and \(A = PBP^{-1}\), then the vectors \(P\vect{v}_1, \ldots, P\vect{v}_m\) are linearly independent eigenvectors of \(A\), and vice versa.

Exercise 6.3.1

Prove Remark 6.3.3.

Solution to Exercise 6.3.1

If \(\mathbf{v}_i\) is an eigenvector of \(B\) for the eigenvalue \(\lambda_i\), then \(P\vect{v}_i\) is an eigenvector of \(A\) for the same eigenvalue \(\lambda_i\) (by Proposition 6.3.1). Similarly, if \(\mathbf{u}_i\) is an eigenvector of \(A\) for the eigenvalue \(\lambda_i\), then \(P^{-1}\vect{u}_i\) is an eigenvector of \(B\) for the same eigenvalue \(\lambda_i\).

Define \(V=\begin{pmatrix}\vect{v}_1&\cdots&\vect{v}_m\end{pmatrix}\) and consider the chain of equivalent statements:

\[\begin{split} \begin{array}{rclcrcl} PV\vect{x} &=& \vect{0} &\iff& V\vect{x} &=& P^{-1}\vect{0} \\ && &\iff& V\vect{x} &=& \vect{0} \\ && &\iff& \vect{x} &=& \vect{0}. \end{array} \end{split}\]

The last steps follows from \(V\) having linearly independent columns (by construction). This shows even more: \(PV\) has linearly independent columns if and only if \(V\) has linearly independent columns.

As the columns of \(PV\) are the vectors \(P\vect{v}_1, \ldots, P\vect{v}_m\), the vectors \(P\vect{v}_1, \ldots, P\vect{v}_m\) are linearly independent if and only if \(\vect{v}_1, \ldots, \vect{v}_m\) are linearly independent.

This directly concludes the proof, as the columns of \(PV\) are the vectors \(P\vect{v}_1, \ldots, P\vect{v}_m\).

The above considerations are summarised in the following proposition.

Proposition 6.3.3

Suppose \(A\) and \(B\) are similar matrices. Then they have the same eigenvalues with the same algebraic and geometric multiplicities.

Using the properties of similar matrices we can prove the inequality

(6.3.1)#\[\operatorname{g.m.}(\lambda) \leq \operatorname{a.m.}(\lambda) \]

that holds for the geometric and the algebraic multiplicity of an eigenvalue (cf. Proposition 6.2.4).

Proof of Proposition 6.2.4 (geom.mult. \(\leq\) alg.mult.)

Suppose the \(n\times n\)-matrix \(A\) has the eigenvalue \(\lambda_1\) of geometric multiplicity \(k\). We have to show that the algebraic multiplicity of \(\lambda_1\) is at least equal to \(k\). We will do so by constructing a matrix \(B\) that is similar to \(A\) and for which the eigenvalue \(\lambda_1\) will clearly have algebraic multiplicity at least equal to \(k\).

Suppose \(\vect{v}_1,\ldots,\vect{v}_k\) are \(k\) linearly independent eigenvectors for \(\lambda_1\). We can extend \(\{\vect{v}_1,\ldots,\vect{v}_k,\}\) to a basis \(\{\vect{v}_1,\ldots,\vect{v}_k, \ldots, \mathbf{v}_n \}\) of \(\mathbb{R}^n\). Let \(P\) be the matrix with \(\vect{v}_1,\ldots,\vect{v}_n\) as columns. \(P\) is invertible, and we have that

\[\begin{split} \begin{array}{ccl} AP &=& A (\vect{v}_1\,\, \cdots \,\, \vect{v}_k\,\, \vect{v}_{k+1}\,\,\cdots \,\, \mathbf{v}_n) \\ &=& (A\vect{v}_1\,\, \cdots \,\, A\vect{v}_k\,\, A\vect{v}_{k+1}\,\,\cdots \,\, A\mathbf{v}_n) \\ &=& (\lambda_1\vect{v}_1\,\, \cdots \,\, \lambda_1\vect{v}_k\,\,A\vect{v}_{k+1}\,\, \cdots \,\, A\mathbf{v}_n), \end{array} \end{split}\]

since \(\vect{v}_1, \ldots, \vect{v}_k\) were supposed to be eigenvectors for \(\lambda_1\).

Since \(P\) is invertible, for each \(j \geq k+1\) the equation \(P\vect{x} = A\vect{v}_{j}\) has the (unique) solution, \(\vect{b}_j = P^{-1}A\vect{v}_{j}\). So we see that

\[\begin{split} \begin{array}{l} (\lambda_1\vect{v}_1\,\, \cdots \,\, \lambda_1\vect{v}_k\,\,\,A\vect{v}_{k+1}\,\, \cdots \,\, A\mathbf{v}_n) \\ \quad = \quad (P(\lambda_1\vect{e}_1) \,\, \cdots \,\,P(\lambda_1\vect{e}_k)\,\,\,P\vect{b}_{k+1}\,\, \cdots \,\,P\vect{b}_{n}) \\ \quad = \quad P(\lambda_1\vect{e}_1 \,\, \cdots \,\,\lambda_1\vect{e}_k\,\,\,\vect{b}_{k+1}\,\, \cdots \,\,\vect{b}_{n}) = P B. \end{array} \end{split}\]

So we have that \(A = PBP^{-1}\), which means that \(A\) and \(B\) are similar, hence they have the same eigenvalues with the same algebraic (and also geometric) multiplicities.

Note that \(B\) is of the form

\[\begin{split} B \,=\,\, \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 & * & * & \cdots & * \\ 0 & \lambda_1 & \cdots & 0 & * & * & \cdots & * \\ \vdots & \vdots & \ddots & \vdots & * & * & \cdots & * \\ 0 & 0 & \cdots & \lambda_1 & * & * & \cdots & * \\ \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots \\ \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 & * & * & \cdots & * \\ \end{pmatrix}, \end{split}\]

where there are \(k\) entries \(\lambda_1\) on the diagonal.

It follows that the characteristic polynomial \(\det(B - \lambda I)\) will have at least \(k\) factors \((\lambda - \lambda_1)\). Thus the algebraic multiplicity of the eigenvalue \(\lambda_1\) for the matrix \(B\) is greater than or equal to \(k\). From the observed similarity \(A \simeq B\) it follows that this also holds for the algebraic multiplicity of \(\lambda_1\) for the matrix \(A\). So indeed the inequality

\[ \operatorname{a.m.}(\lambda_1) \geq \operatorname{g.m.}(\lambda_1) \]

is universally true.

One way to understand the similarity of similar matrices comes from considering the linear transformations they represent. In Subsection 4.3.4 it is shown that if \(T:\R^n\to\R^n\) is the linear transformation that has \(A\) as its standard matrix, and \(P = P_{\mathcal{B}}\) is the change-of-coordinates matrix from the basis \(\mathcal{B}\) to the standard matrix, then the matrix of \(T\) with respect to basis \(\mathcal{B}\) is given by

\[ [T]_{\mathcal{B}} = P^{-1}AP. \]

This means that if \(A\) and \(B\) are related via

\[ B = PAP^{-1} \]

then \(A\) and \(B\) are in fact matrices of the same linear transformation, only with respect to different bases. The fact that - for one thing - they share the same eigenvalues is then not very surprising.

The following proposition captures some other properties that similar matrices share.

Proposition 6.3.4

Suppose \(A\) and \(B\) are similar matrices. Then the following statements are true.

\(\det{A} = \det{B}\).
If \(A\) is invertible, then \(B\) is invertible (and vice versa).
\(A\) and \(B\) have the same rank.

Proof of Proposition 6.3.4

Suppose \(A = PBP^{-1}\).

As in the proof of the equality of the characteristic polynomials (Proposition 6.3.2) we have:

if \(A = PBP^{-1}\), then

\[ \det{A} = \det{(PBP^{-1})} = \det{P}\det{B}\det{(P^{-1})}, \]

which can be rewritten as follows

\[ \det{P}\det{B}\det{(P^{-1})} = \det{P}\det{B}(\det{P})^{-1} = \det{B}. \]
Follows immediately from i.:

matrix \(A\) is invertible \(\quad \iff \quad \det{(A)} \neq 0\).
If \(A\) has rank \(n\), then \(A\) is invertible, and then \(B\) is also invertible, so \(B\) has rank \(n\) too.

If \(\operatorname{rank}\) \(A < n\) then \(\lambda = 0\) is an eigenvalue of both \(A\) and \(B\). In this case we can use

\[ \operatorname{rank}A = n - \operatorname{dim}\operatorname{Nul}A \]

Recall that \(\operatorname{Nul}\) \(A\) is the eigenspace for the eigenvalue \(\lambda = 0\), so \(\operatorname{dim}\operatorname{Nul}\) \(A\) is the geometric multiplicity of the eigenvalue \(\lambda = 0\), which multiplicity is the same for \(A\) and \(B\). We deduce that

\[ \operatorname{rank}A = n - \operatorname{dim}\operatorname{Nul}A = n - \operatorname{dim}\operatorname{Nul}B = \operatorname{rank}B. \]

6.3.3. Diagonalisability#

diagonalisablediagonalisation

Definition 6.3.2

A matrix is \(A\) is called diagonalisable if it is similar to a diagonal matrix. That means that a diagonal matrix \(D\) and an invertible matrix \(P\) exist such that

\[ A = PDP^{-1}. \]

We then say that \(PDP^{-1}\) is a diagonalisation of \(A\).

An equivalent alternative characterisation of diagonalisability is given in the following proposition.

Proposition 6.3.5

An \(n \times n\)-matrix \(A\) is diagonalisable if and only if \(A\) has \(n\) linearly independent eigenvectors. Such a set of eigenvectors then forms a basis for \(\R^n\).

Since this proposition is such a pillar on which much of the theory of matrices rests, and diagonalisable matrices are important because they are in many respects easy to work with, we give two proofs.

First proof of Proposition 6.3.5

The first proof is algebraic. First we note that

\[ A = PDP^{-1} \,\, \iff \,\, AP = PDP^{-1}P \,\, \iff \,\, AP = PD. \]

Next we write out these last matrix products column by column:

\[ AP = A (\vect{p}_1 \quad \vect{p}_2 \quad \cdots \quad \vect{p}_n) = (A\vect{p}_1 \quad A\vect{p}_2 \quad \cdots \quad A\vect{p}_n) \]

and

\[\begin{split} PD = (\vect{p}_1 \quad \vect{p}_2 \quad \cdots \quad \vect{p}_n)\begin{pmatrix} d_1 & 0 & 0 & \cdots & 0 \\ 0 & d_2 & 0 & \cdots & 0 \\ 0 & 0 & d_3 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & d_n \end{pmatrix}, \end{split}\]

so

\[ PD = (d_1\vect{p}_1 \quad d_2\vect{p}_2 \quad \cdots \quad d_n\vect{p}_n). \]

Comparing \(AP\) and \(PD\) column by column we see that \(A\vect{p}_i = d_i\vect{p}_i\) for \(n\) linearly independent vectors in \(\R^n\). Namely, an invertible matrix \(P\) has linearly independent columns.

The second proof has a geometric flavour. Open it if you are interested.

Second proof of Proposition 6.3.5

First we show that diagonalisability implies the existence of \(n\) linearly independent eigenvectors.

If \(A = PDP^{-1}\) then by Proposition 6.3.1 \(A\) and \(D\) have the same eigenvalues and the relation between the eigenvectors is:

if \(\vect{v}\) is an eigenvector of \(D\) for the eigenvalue \(\lambda\) then \( P\vect{v}\) is an eigenvector of \(A\) for the same \(\lambda\).

The eigenvalues of \(D\) are simply the diagonal entries \(d_i\) with the vectors \(\vect{e}_i\) of the standard basis as corresponding eigenvectors.

\[\begin{split} \left(\begin{array}{cccc} d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_n \end{array} \right) \left(\begin{array}{c} 1 \\ 0\\ \vdots \\ 0 \end{array} \right) = \left(\begin{array}{c} d_1 \\ 0 \\ \vdots\\ 0 \end{array} \right) , \quad \left(\begin{array}{cccc} d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_n \end{array} \right) \left(\begin{array}{c} 0 \\ 1 \\ \vdots \\ 0 \end{array} \right) = \left(\begin{array}{c} 0 \\ d_2 \\ \vdots \\ 0 \end{array} \right) , \quad \text{etc.} \end{split}\]

Thus \(A = PDP^{-1}\) has the eigenvalues \(d_i\) with corresponding eigenvectors \(P\vect{e}_i = \vect{p}_i\). Thus the \(n\) columns of \(P\), which are linearly independent since \(P\) is invertible, give a basis of eigenvectors for \(A\).

The other half is a bit more involved. It relies on the transformation formula of matrix representations (see Proposition 4.3.6).

Let \(T: \R^n \to \R^n\) be the linear transformation with standard matrix \(A\), i.e., \(T(\vect{x}) = A\vect{x}\), and suppose \(A\) has \(n\) linearly independent eigenvectors \(\vect{v}_1, \ldots, \vect{v}_n\). Let \(\lambda_1, \ldots, \lambda_n\) denote the eigenvalues. So \(A\vect{v}_i =\lambda_i\vect{v}_i\).

For the basis \(\mathcal{B} = (\vect{v}_1, \ldots, \vect{v}_n)\) we then see that

\[\begin{split} [T]_{\mathcal{B}} = D = \begin{pmatrix} \lambda_1 & 0 & 0 & \cdots & 0 \\ 0 & \lambda_2 & 0 & \cdots & 0 \\ 0 & 0 & \lambda_3 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \lambda_n \end{pmatrix}, \end{split}\]

and the transformation formula gives

(6.3.2)#\[D = [T]_{\mathcal{B}} = P^{-1}[T]_{\mathcal{E}}P = P^{-1}AP,\]

where

\[ P = P_{\mathcal{E} \leftarrow \mathcal{B}} = ( \vect{v}_1 \,\, \vect{v}_2\, \cdots \,\, \vect{v}_n) \]

is the change-of-coordinates matrix from \(\mathcal{B}\) to the standard basis.

Lastly, the identity \(D=P^{-1}AP\) in Equation (6.3.2) is equivalent to \(A = PDP^{-1}\).

Example 6.3.1

We verify the relation \(A = PDP^{-1}\) for the matrix \(A = \begin{pmatrix} 1 & 4 \\ 1 & 1 \end{pmatrix}\) we studied before. We found that \(A\) has the eigenvalues \(\lambda_1 = 3\), \(\lambda_2 = -1\), with corresponding eigenvectors \(\vect{v}_1 = \begin{pmatrix} 2 \\1 \end{pmatrix}\) and \(\vect{v}_2 = \begin{pmatrix} -2 \\1 \end{pmatrix}\).

Thus for a diagonalisation of \(A\) we can take

\[\begin{split} P = \left(\begin{array}{cc}\vect{v}_1 & \vect{v}_2\end{array} \right) = \left(\begin{array}{cc}2 & -2 \\ 1 & 1 \end{array} \right) , \qquad D = \left(\begin{array}{cc} 3&0 \\ 0 & -1 \end{array} \right). \end{split}\]

We will check that this is okay. To start with,

\[\begin{split} P^{-1} = \dfrac14\left(\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right), \end{split}\]

so

\[\begin{split} PDP^{-1} = \underbrace{\left(\begin{array}{cc} 2 & -2 \\ 1 & 1 \end{array} \right) \left(\begin{array}{cc} 3&0 \\ 0 & -1 \end{array} \right) } \dfrac14\left(\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right) = \dfrac14 \underbrace{\left(\begin{array}{cc} 6 & 2 \\ 3 & -1 \end{array} \right) } \left(\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right) . \end{split}\]

The last product equals

\[\begin{split} \dfrac14 \begin{pmatrix} 4 & 16 \\ 4 & 4 \end{pmatrix} = \begin{pmatrix} 1 & 4 \\ 1 & 1 \end{pmatrix} = A, \end{split}\]

as it should.

Remark 6.3.4

A diagonalisation is not unique: the order of the eigenvalues can be changed, and the eigenvectors may be scaled. However, the order of the eigenvectors in \(P\) must correspond to the order of the eigenvalues on the diagonal of \(D\).

For instance, for the matrix \(A\) from Example 6.3.1, an alternative diagonalisation is given by

\[\begin{split} A = \left(\begin{array}{cc} 4 & 6 \\ -2 & 3 \end{array} \right) \left(\begin{array}{cc} -1 & 0 \\ 0 & 3 \end{array} \right) \left(\begin{array}{cc} 4 & 6 \\ -2 & 3 \end{array} \right) ^{-1}. \end{split}\]

Are all matrices diagonalisable? Most certainly not, as the following two examples, studied before, show.

Example 6.3.2

The matrix \(R = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\) of Example 6.1.8 does not have any real eigenvalues, so also no eigenvectors. Hence it cannot be diagonalised.

Remark 6.3.5

Things would be different if we would allow complex eigenvalues and eigenvectors. We will devote a separate section (Section 6.4) to this. And then it will appear that the matrix \(R\) is complex diagonalisable.

In the previous example there were not enough real eigenvalues for the matrix \(A\) to be diagonalisable. In the following example there is another reason why a matrix can fail to be diagonalisable.

Example 6.3.3

The matrix \(A = \left(\begin{array}{cc} 2 & 1 \\ 0 & 2 \end{array} \right)\) has the double eigenvalue \(\lambda_1 = \lambda_2 = 2\). Since
\(A - 2I = \left(\begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right)\) has rank \(1\), there is only one independent eigenvector. Thus there does not exist a basis of eigenvectors for \(A\), and consequently the matrix \(A\) is not diagonalisable.

Example 6.3.4

The matrix \(A = \left(\begin{array}{ccc} 4 & -1 & -2 \\0 & 3 & 0 \\ 1 & 2 & 1 \end{array} \right)\) of Example 6.2.4 and Example 6.2.5 provides another example of this phenomenon. It has the two eigenvalues, \(\lambda_1=3\), of algebraic multiplicity \(2\), and \(\lambda_2 = 2\), of algebraic multiplicity \(1\). There is only one independent eigenvector for \(\lambda_{1}\). This, together with the single independent eigenvector for \(\lambda_2\) is a maximal set of two linearly independent eigenvectors for \(A\). So, this matrix \(A\) is again not diagonalisable.

Exercise 6.3.2

Is the matrix \(A = \left(\begin{array}{cccc}1 & 1 & 0 & 1 \\ 0 & 4 & 0 & 0\\ 0 & 0 & 4 & 1 \\ 0 & 0 & 0 & 1 \end{array} \right)\) diagonalisable?

Solution to Exercise 6.3.2

Since the matrix is upper triangular, the eigenvalues are the diagonal entries. So, \(\lambda_1 = 1\), \(\lambda_2 = 4\), both with algebraic multiplicity \(2\). The eigenspace for \(\lambda_1\) is the null space of the matrix

\[\begin{split} A - \lambda_1I = A - I = \left(\begin{array}{cccc}0 & 1 & 0 & 1 \\ 0 & 3 & 0 & 0\\ 0 & 0 & 3 & 1 \\ 0 & 0 & 0 & 0 \end{array} \right) \end{split}\]

Since this matrix has three linear independent columns, it has rank \(3\), and its null space has dimension \(4-3 = 1\). Hence the geometric multiplicity of \(\lambda_1\) is equal to \(1\), which is smaller than its algebraic multiplicity. This implies that the matrix \(A\) is not diagonalisable.

These examples show the two causes why a matrix may not be diagonalisable, as is made explicit in the following proposition.

Theorem 6.3.1

The \(n \times n\)-matrix \(A\) is diagonalisable if and only if it satisfies the following two conditions:

The characteristic polynomial of \(A\) has exactly \(n\) real roots, counting algebraic multiplicities.
For each eigenvalue the geometric multiplicity is equal to the algebraic multiplicity.

Short proof of Theorem 6.3.1

First we show that a diagonalisable matrix satisfies the two conditions.

If \(A\) is diagonalisable, then there must be \(n\) independent eigenvectors. The sum of the dimensions \(m_k\) of the eigenspaces \(E_{\lambda_i}\), i.e., the sum of the geometric multiplicities must therefore be equal to \(n\). Since the algebraic multiplicities are at least as large as the geometric multiplicities, the sum of the algebraic multiplicities must be \(\geq n\). Since this sum cannot be larger, it means that the sum is equal to \(n\). Thus all algebraic multiplicities must in fact be equal to the corresponding geometric multiplicities. This settles properties i and ii.

Conversely, conditions i. and ii. immediately imply that there must be \(n\) linearly independent eigenvectors. The basic idea is that, since eigenvectors for different eigenvalues are automatically linearly independent, the bases for the eigenspaces put together give exactly \(n\) linearly independent eigenvectors.

Detailed proof of Theorem 6.3.1

Suppose that the \(n \times n\)-matrix \(A\) has only real eigenvalues, say \(\lambda_1,\ldots,\lambda_k\), and that for each eigenvalue \(\lambda_i\) the geometric multiplicity \(m_i\) is equal to the algebraic multiplicity, so

\[ \operatorname{g.m.}(\lambda_i) = m_i= \operatorname{a.m.}(\lambda_i) , \quad i = 1, \ldots, k. \]

Since the sum of the algebraic multiplicities is equal to \(n\), the sum of the geometric multiplicities must be equal to \(n\) too,

\[ m_1 + m_2 + \cdots + m_k = n. \]

For each \(i\) let \(\{\vect{v}^{(i)}_1, \ldots, \vect{v}^{(i)}_{m_i}\}\) be a basis for the eigenspace \(E(\lambda_i)\). If we can show that the union of all these bases is a basis for \(\R^n\), we have a basis of eigenvectors for matrix \(A\), and by Proposition 6.3.5 \(A\) is diagonalisable. To this end it is sufficient to show that the total set

\[ \left\{\vect{v}^{(1)}_1, \ldots, \vect{v}^{(1)}_{m_1}, .\,.\,.\,.\,., \vect{v}^{(k)}_1, \ldots, \vect{v}^{(k)}_{m_k} \right\} \]

is linearly independent. So, suppose that

(6.3.3)#\[ \underbrace{c^{(1)}_1\vect{v}^{(1)}_1+ \cdots + c^{(1)}_{m_1}\vect{v}^{(1)}_{m_1}} \,+ \,\cdot\,\cdot\,\cdot\,\cdot\,\cdot\, + \, \underbrace{c^{(k)}_1\vect{v}^{(k)}_1+ \cdots + c^{(k)}_{m_k}\vect{v}^{(k)}_{m_k}} = \vect{0}.\]

If we introduce

\[ \vect{y}_i = c^{(i)}_1\vect{v}^{(i)}_1+ \cdots + c^{(i)}_{m_i}\vect{v}^{(i)}_{m_i}, \quad i = 1,\ldots,k, \]

we have that

\[ \vect{y}_1 + \vect{y}_2 + \cdots + \vect{y}_k = \vect{0}. \]

Since each vector \(\vect{y}_i\) lies in the eigenspace \(E(\lambda_i)\), and by Proposition 6.1.3 eigenvectors for different eigenvalues are linearly independent, it follows that

\[ \vect{y}_i = \vect{0}, \quad i = 1, \ldots, k. \]

So we have for each underbraced term in Equation (6.3.3)

\[ c^{(i)}_1\vect{v}^{(i)}_1+ \cdots + c^{(i)}_{m_i}\vect{v}^{(i)}_{m_i} = \vect{0}. \]

Since the vectors in a set \(\{\vect{v}^{(i)}_1, \ldots, \vect{v}^{(i)}_{m_i}\}\) were supposed to form a basis for the eigenspace \(E(\lambda_i)\), they must be linearly independent. Thus it follows for the coefficients in the last sum that

\[ c^{(i)}_1 = 0, \,\ldots\,, \, c^{(i)}_{m_i}= 0. \]

This shows that Equation (6.3.3) can only hold if all coefficients are zero, and consequently the set

\[ \{\vect{v}^{(1)}_1, \,\ldots, \,\vect{v}^{(1)}_{m_1}, \,\,.\,.\,.\,.\,.\,\,, \,\vect{v}^{(k)}_1, \,\ldots, \, \vect{v}^{(k)}_{m_k}\} \]

being a linearly independent set of \(n\) vectors, will indeed be a basis consisting of eigenvectors (for \(\R^n\)).

We saw that there is a weak connection between eigenvalues and (non-)invertibility:

Proposition 6.1.4 states: a matrix is singular if and only if it has the eigenvalue \(0\).

In Grasple exercise 6.3.16 you are invited to investigate the connection (or non-connection) between diagonalisability and invertibility.

We stated that diagonalisable matrices have nice properties. Here is one: for diagonalisable matrices finding (high) powers can be done very efficiently.

Example 6.3.5

If \(A = PDP^{-1}\), then \(A^k = PD^kP^{-1}\), for \(k = 0,1,2,3, \ldots\).

For instance,

\[ A^3 = (PDP^{-1})(PDP^{-1})(PDP^{-1}) = PD (P^{-1}P)D (P^{-1}P)D P^{-1} = PD^3P^{-1}, \]

since the internal factors \(P^{-1}P\) reduce to the identity matrix \(I\), and \(ID = D\).

Check for yourself what happens if \(k = 0\).

The advantage is the following. Normally, multiplication of two \(n \times n\)-matrices requires \(n\) multiplications per entry (or \(2n-1\) operations, if additions are counted as well), and there are \(n\times n\) entries to be computed. So for the \(k\)-th power that requires about \(k\times n^3\) multiplications of numbers. To compute \(PD^kP^{-1}\) we need \(n\) \(k\)-th powers to find \(D^k\), and we are left with one ‘simple’ matrix product \(PD^k\) and one ‘full’ matrix product.

Example 6.3.6

We compute \(A^{10}\) for the matrix \(A = \left(\begin{array}{cc} 1 & 4 \\ 1 & 1 \end{array} \right)\) of Example 6.3.1.

There we already settled that \(A = PDP^{-1}\), with

\[\begin{split} P = \left(\begin{array}{cc} 2 & -2 \\ 1 & 1 \end{array} \right) , \quad D = \left(\begin{array}{cc} 3&0 \\ 0 & -1 \end{array} \right) , \quad P^{-1} = \dfrac14\left(\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right) . \end{split}\]

We see that

(6.3.4)#\[\begin{split}A^{10} = \left(\begin{array}{cc} 2 & -2 \\ 1 & 1 \end{array} \right) \left(\begin{array}{cc} 3^{10}&0 \\ 0 & (-1 )^{10} \end{array} \right) \dfrac14\left(\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right).\end{split}\]

This can be evaluated to yield

\[\begin{split} A^{10} = \frac14 \left(\begin{array}{cc} 2\cdot 3^{10} & -2 \\ 3^{10} & 1\end{array} \right) \left(\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right) = \frac14 \left(\begin{array}{cc} 2\cdot 3^{10}+2 & 4\cdot 3^{10}-4 \\ 3^{10}-1 & 2\cdot 3^{10}+2 \end{array} \right) . \end{split}\]

An alternative way to denote the last matrix:

\[\begin{split} A^{10} = \frac{3^{10}}{4} \left(\begin{array}{cc} 2 & 4 \\ 1 & 2 \end{array} \right) + \frac{1}{4}\left(\begin{array}{cc} 2 & -4 \\ -1 & 2 \end{array} \right) . \end{split}\]

Note that we could have found any power of \(A\) just as easily: replacing \(10\) by \(n\) in Equation (6.3.4) gives

\[\begin{split} \begin{array}{rcl} A^{n} &=& \left(\begin{array}{cc} 2 & -2 \\ 1 & 1 \end{array} \right) \left(\begin{array}{cc} 3^{n}&0 \\ 0 & (-1 )^{n} \end{array} \right) \dfrac14\left(\begin{array}{cc} 1 & 2 \\ -1 & 2\end{array} \right) \\ &=& \dfrac14 \left(\begin{array}{cc} 2\cdot 3^{n}+2\cdot(-1)^n & 4\cdot 3^{n}-4\cdot(-1)^n \\ 3^{n}- (-1)^n & 2\cdot 3^{n}+2\cdot(-1)^n \end{array} \right) \end{array}. \end{split}\]

To conclude this section we return to the ‘toy’ migration model (Example 6.1.1) of this chapter to illustrate the power of diagonalisation.

Example 6.3.7

Suppose the migrations between two cities \(A\) and \(B\) are described by the model

\[\begin{split} \left(\begin{array}{c} x_{k+1} \\y_{k+1}\end{array} \right) = \left(\begin{array}{c} 0.9x_{k} + 0.2 y_k\\0.1x_{k} + 0.8 y_k\end{array} \right) = \left(\begin{array}{cc} 0.9 & 0.2 \\ 0.1 & 0.8 \end{array} \right) \left(\begin{array}{c} x_{k} \\y_{k}\end{array} \right) . \end{split}\]

In short

\[\begin{split} \vect{x}_{k+1} = \left(\begin{array}{cc} 0.9 & 0.2 \\ 0.1 & 0.8 \end{array} \right) \vect{x}_{k} = M\vect{x}_{k}, \end{split}\]

where

\[\begin{split} \vect{x}_k = \left(\begin{array}{c} x_{k} \\y_{k}\end{array} \right) = \left(\begin{array}{c} \text{population in city } A \text{ at time } k \\ \text{population in city } B \text{ at time } k\end{array} \right) . \end{split}\]

It can be shown that \(M\) has the eigenvalues \(\lambda_1 = 1\) and \(\lambda_2 = 0.7\), with corresponding eigenvectors

\[\begin{split} \vect{v}_1 = \left(\begin{array}{c} 2 \\1\end{array} \right) , \quad \vect{v}_2 = \left(\begin{array}{c} 1 \\-1\end{array} \right) \quad \text{respectively.} \end{split}\]

Since \(\{\vect{v}_1, \vect{v}_2\}\) is a basis of eigenvectors, the matrix \(M\) is diagonalisable, and in fact we have

\[\begin{split} M = PDP^{-1} = \left(\begin{array}{cc} 2 &1\\1&-1\end{array} \right) \left(\begin{array}{cc} 1&0\\0&0.7\end{array} \right) \left(\begin{array}{cc} 2 &1\\1&-1\end{array} \right) ^{-1}. \end{split}\]

If the initial populations are given by

\[\begin{split} \vect{x}_0 = \left(\begin{array}{c} x_{0} \\y_{0}\end{array} \right) , \end{split}\]

then

\[ \vect{x}_k = M^k\vect{x}_0 = PD^kP^{-1}\vect{x}_0. \]

In this case we can clearly see what happens in the long run, i.e. when we let \(k\) go to infinity:

\[\begin{split} D^k = \left(\begin{array}{cc} 1^k&0\\0&0.7^k\end{array} \right) \longrightarrow \left(\begin{array}{cc} 1&0\\0&0\end{array} \right) , \quad \text{if } k \to \infty. \end{split}\]

By computing \(P^{-1}\) and the product of the three matrices \(P\), \(D\) and \(P^{-1}\) we find that if \( k \to \infty\),

\[\begin{split} M^k = PD^kP^{-1} \longrightarrow P\left(\begin{array}{cc} 1&0\\0&0\end{array} \right) P^{-1} = \frac13 \left(\begin{array}{cc} 2&2 \\ 1&1\end{array} \right).\end{split}\]

We may conclude that, for \( k \to \infty\),

\[\begin{split} \vect{x}_k = M^k\vect{x}_0 \longrightarrow \frac13 \left(\begin{array}{cc} 2&2 \\ 1&1\end{array} \right) \left(\begin{array}{c} x_{0} \\y_{0}\end{array} \right) = \frac13\left(\begin{array}{c} 2x_{0}+2y_{0} \\ x_{0}+y_{0}\end{array} \right) = \frac13(x_{0}+y_{0}) \left(\begin{array}{c} 2 \\ 1\end{array} \right) . \end{split}\]

The interpretation: in the long run the distribution of the people over the two cities approaches the steady state distribution where city \(A\) has twice as many inhabitants as city \(B\). Moreover, the total number of inhabitants of the two cities is still the same as at the beginning:

\[ x_{\infty} + y_{\infty} = \frac13(2x_{0}+2y_{0}) + \frac13(x_{0}+y_{0}) = x_{0}+y_{0}. \]

6.3.4. Grasple exercises#

Grasple exercise 6.3.1

https://embed.grasple.com/exercises/bd1c8f7a-917f-431f-889b-463ab7a7c6f6?id=91486

Given a \(2\times 2\)-matrix \(A\) and ‘diagonaliser’ \(P\), to find the diagonal matrix \(D\) such that \(A=PDP^{-1}\).

Grasple exercise 6.3.2

https://embed.grasple.com/exercises/5bcb24df-9cfd-4e4b-bcae-b550fb0fad63?id=91488

To find a diagonalisation of a \(2\times 2\)-matrix (insofar it exists).

Grasple exercise 6.3.3