6.3. Diagonalizability#

6.3.1. Similar matrices#

Definition 6.3.1

Two \(n \times n\) matrices \(A\) and \(B\) are called similar if they are related via the property

\[ B = PAP^{-1} \quad \text{for some invertible matrix } P. \]

Notation: \(A \sim B\).

It is true we already used the symbol \(\sim\) earlier to denote row equivalence of (augmented) matrices. When we use it, it will always be clear from the context what the meaning is at that instance.

Remark 6.3.1

In the definition it seems as if \(A\) and \(B\) play different roles, but that is not the case. This can be seen as follows:

\[ A \sim B \quad \iff \quad B = PAP^{-1} \quad \iff \quad P^{-1}BP = P^{-1}(PAP^{-1})P = A. \]

Since \((P^{-1})^{-1} = P\), we see that

\[ B = PAP^{-1} \quad \iff \quad A = QBQ^{-1}, \quad \text{where } Q = P^{-1}, \]

so similarity works both ways, that is,

\[ A \sim B \quad \iff \quad B \sim A. \]

Similar matrices have similar properties. Especially as regards eigenvalues and eigenvectors.

Proposition 6.3.1

If \(A = PBP^{-1}\), then \(A\) and \(B\) have the same eigenvalues.

Moreover, if \(\vect{v}\) is an eigenvector of \(B\), then \(P\vect{v}\) is an eigenvector of \(A\).

Proof of Proposition 6.3.1

Suppose \(\lambda\) is an eigenvalue of \(B\), and \(\vect{v}\) is a corresponding eigenvector. We then see that

\[ B\vect{v}= \lambda\vect{v} \quad \Longrightarrow \quad AP\vect{v} = (PBP^{-1})P\vect{v} = PB\vect{v} = P(\lambda\vect{v}) = \lambda P\vect{v} \]

So \(AP\vect{v} = \lambda P\vect{v} \), and \(P\vect{v} \) is an eigenvector, provided it is not the zero vector. Since \(P\) is supposed to be invertible, and \(\vect{v}\) is not the zero vector, it is true that \(P\vect{v} \) is not the zero vector, and we are done.

Proposition 6.3.2

Similar matrices have the same characteristic polynomial.

Proof of Proposition 6.3.2

Suppose \(A = PBP^{-1}\).

Then we have

\[ \det{(A - \lambda I)} = \det{(PBP^{-1} - \lambda I)} = \det{(B - \lambda I)}. \]

The second equality is proved by the following chain of identities.

\[\begin{split} \begin{array}{rcl} \det{(PBP^{-1} - \lambda I)} &=& \det{(PBP^{-1} - \lambda PIP^{-1})} \\ & = & \det{(P(B- \lambda I)P^{-1})} \\ & = & \det{P}\cdot\det{(B- \lambda I)}\cdot\det{(P^{-1})} \\ & = & \det{P}\cdot\det{(B- \lambda I)}\cdot\dfrac{1}{\det{P}} \\ & = & \det{(B- \lambda I)}. \end{array} \end{split}\]

In fact, the first step contains the ‘smart move’, to bring in convenient factors \(P\) and \(P^{-1}\) via

\[ I = PIP^{-1}. \]

In the other steps we used the rule \(\det{(AB)} = \det{A}\det{B}\) and its consequence that for invertible matrices \(P\) we have

\[ \det{(P^{-1})} = \dfrac{1}{\det{P}}. \]

From Proposition 6.3.2 it follows that similar matrices have the same eigenvalues with the same algebraic multiplicities.

From Proposition 6.3.1 it follows that they also have the same geometric multiplicities. That is, if \(\vect{v}_1, \ldots, \vect{v}_m\) are linearly independent eigenvectors of \(B\) for the eigenvalue \(\lambda_k\), and \(A = PBP^{-1}\), then the vectors \(P\vect{v}_1, \ldots, P\vect{v}_m\) are linearly independent eigenvectors of \(A\).
And vice versa.

Exercise 6.3.1

Fill in the details of the last remark.

The above considerations are summarized in the following proposition.

Proposition 6.3.3

Suppose \(A\) and \(B\) are similar matrices. Then they have the same eigenvalues with the same algebraic and geometric multiplicities.

Using the properties of similar matrices we can prove the inequality

(6.3.1)#\[\text{g.m.}(\lambda) \leq \text{a.m.}(\lambda) \]

that holds for the geometric and the algebraic multiplicity of an eigenvalue (cf. Proposition 6.2.4).

One way to understand the similarity of similar matrices comes from considering the linear transformations they represent. In Section 4.3.4 it is shown that if \(T:\R^n\to\R^n\) is the linear transformation that has \(A\) as its standard matrix, and \(P = P_{\mathcal{B}}\) is the change-of-coordinates matrix from the basis \(\mathcal{B}\) to the standard matrix, then the matrix of \(T\) with respect to basis \(\mathcal{B}\) is given by

\[ [T]_{\mathcal{B}} = P^{-1}AP. \]

This means that if \(A\) and \(B\) are related via

\[ B = PAP^{-1} \]

then \(A\) and \(B\) are in fact matrices of the same linear transformation, only with respect to different bases. The fact that - for one thing - they share the same eigenvalues is then not very surprising.

The following proposition captures some other properties that similar matrices share.

Proposition 6.3.4

Suppose \(A\) and \(B\) are similar matrices. Then the following statements are true.

  1. \(\det{A} = \det{B}\).

  2. If \(A\) is invertible, then \(B\) is invertible (and vice versa).

  3. \(A\) and \(B\) have the same rank.

6.3.2. Diagonalizability#

Definition 6.3.2

A matrix is \(A\) is called diagonalizable if it is similar to a diagonal matrix. That means that a diagonal matrix \(D\) and an invertible matrix \(P\) exist such that

\[ A = PDP^{-1}. \]

We then say that \(PDP^{-1}\) is a diagonalization of \(A\).

An equivalent alternative characterization of diagonalizability is given in the following proposition.

Proposition 6.3.5

An \(n \times n\) matrix \(A\) is diagonalizable if and only if \(A\) has \(n\) linearly independent eigenvectors. Such a set of eigenvectors then forms a basis for \(\R^n\).

Since this proposition is such a pillar on which much of the theory of matrices rests, and diagonalizable matrices are important because they are in many respects easy to work with, we give two proofs.

Proof of Proposition 6.3.5

The first proof is algebraic. First we note that

\[ A = PDP^{-1} \,\, \iff \,\, AP = PDP^{-1}P \,\, \iff \,\, AP = PD. \]

Next we write out these last matrix products column by column:

\[ AP = A [\vect{p}_1 \quad \vect{p}_2 \quad \cdots \quad \vect{p}_n] = [A\vect{p}_1 \quad A\vect{p}_2 \quad \cdots \quad A\vect{p}_n] \]

and

\[\begin{split} PD = [\vect{p}_1 \quad \vect{p}_2 \quad \cdots \quad \vect{p}_n]\begin{bmatrix} d_1 & 0 & 0 & \ldots & 0 \\ 0 & d_2 & 0 & \ldots & 0 \\ 0 & 0 & d_3 & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & d_n \end{bmatrix}, \end{split}\]

so

\[ PD = [d_1\vect{p}_1 \quad d_2\vect{p}_2 \quad \cdots \quad d_n\vect{p}_n]. \]

Comparing \(AP\) and \(PD\) column by column we see that \(A\vect{p}_i = d_i\vect{p}_i\) for \(n\) linearly independent vectors in \(\R^n\). Namely, an invertible matrix \(P\) has linearly independent columns.

The second proof has a geometric flavour. Open it if you are interested.

Example 6.3.1

We verify the relation \(A = PDP^{-1}\) for the matrix \(A = \begin{bmatrix} 1 & 4 \\ 1 & 1 \end{bmatrix}\) we studied before. We found that \(A\) has the eigenvalues \(\lambda_1 = 3\), \(\lambda_2 = -1\), with corresponding eigenvectors \(\vect{v}_1 = \begin{bmatrix} 2 \\1 \end{bmatrix}\) and \(\vect{v}_2 = \begin{bmatrix} -2 \\1 \end{bmatrix}\).

Thus for a diagonalization of \(A\) we can take

\[\begin{split} P = \left[\begin{array}{cc}\vect{v}_1 & \vect{v}_2\end{array} \right] = \left[\begin{array}{cc}2 & -2 \\ 1 & 1 \end{array} \right] , \qquad D = \left[\begin{array}{cc} 3&0 \\ 0 & -1 \end{array} \right]. \end{split}\]

We will check that this is okay. To start with,

\[\begin{split} P^{-1} = \dfrac14\left[\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right], \end{split}\]

so

\[\begin{split} PDP^{-1} = \underbrace{\left[\begin{array}{cc} 2 & -2 \\ 1 & 1 \end{array} \right] \left[\begin{array}{cc} 3&0 \\ 0 & -1 \end{array} \right] } \dfrac14\left[\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right] = \dfrac14 \underbrace{\left[\begin{array}{cc} 6 & 2 \\ 3 & -1 \end{array} \right] } \left[\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right] . \end{split}\]

The last product equals

\[\begin{split} \dfrac14 \begin{bmatrix} 4 & 16 \\ 4 & 4 \end{bmatrix} = \begin{bmatrix} 1 & 4 \\ 1 & 1 \end{bmatrix} = A, \end{split}\]

as it should.

Note that the diagonalization is not unique: the order of the eigenvalues can be changed, and the eigenvectors may be scaled. However, the order of the eigenvectors in \(P\) must correspond to the order of the eigenvalues on the diagonal of \(D\). For instance, for the matrix \(A\) at hand, an alternative diagonalization is given by

\[\begin{split} A = \left[\begin{array}{cc} 4 & 6 \\ -2 & 3 \end{array} \right] \left[\begin{array}{cc} -1 & 0 \\ 0 & 3 \end{array} \right] \left[\begin{array}{cc} 4 & 6 \\ -2 & 3 \end{array} \right] ^{-1}. \end{split}\]

Are all matrices diagonalizable? Most certainly not, as the following two examples, studied before, show.

Example 6.3.2

The matrix \(R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\) of Example 6.1.8 does not have any (real) eigenvalues, so also no eigenvectors. Hence it cannot be diagonalized.

Remark 6.3.2

Things would be different if we would allow complex eigenvalues and eigenvectors. We will devote a separate section (Section 6.4) to this. And then it will appear that the matrix \(R\) is complex diagonalizable.

In the previous example there were not enough eigenvalues for the matrix \(A\) to be real diagonalizable. In the following example there is another reason why a matrix can fail to be diagonalizable.

Example 6.3.3

The matrix \(A = \left[\begin{array}{cc} 2 & 1 \\ 0 & 2 \end{array} \right]\) has the double eigenvalue \(\lambda_1 = \lambda_2 = 2\). Since
\(A - 2I = \left[\begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right]\) has rank 1, there is only one independent eigenvector. Thus there does not exist a basis of eigenvectors for \(A\), and consequently the matrix \(A\) is not diagonalizable.

Example 6.3.4

The matrix \(A = \left[\begin{array}{ccc} 4 & -1 & -2 \\0 & 3 & 0 \\ 1 & 2 & 1 \end{array} \right]\) of Example 6.2.4 and Example 6.2.5 provides another example of this phenomenon. It has the two eigenvalues, \(\lambda_1=3\), of algebraic multiplicity 2, and \(\lambda_2 = 2\), of algebraic multiplicity 1. There is only one independent eigenvector for \(\lambda_{1}\). This, together with the single independent eigenvector for \(\lambda_2\) is a maximal set of two linearly independent eigenvectors for \(A\). So, this matrix \(A\) is again not diagonalizable.

Exercise 6.3.2

Is the matrix \(A = \left[\begin{array}{cccc}1 & 1 & 0 & 1 \\ 0 & 2 & 0 & 0\\ 0 & 0 & 2 & 1 \\ 0 & 0 & 0 & 1 \end{array} \right]\) diagonalizable?

These examples show the two causes why a matrix may not be diagonalizable, as is made explicit in the following proposition.

Theorem 6.3.1

The \(n \times n\) matrix \(A\) is (real) diagonalizable if and only if it satisfies the following two conditions.

  1. The characteristic polynomial of \(A\) has exactly \(n\) real roots, counting multiplicities.

  2. For each eigenvalue the geometric multiplicity is equal to the algebraic multiplicity.

Proof of Theorem 6.3.1

First we show that a diagonalizable matrix satisfies the two conditions.

If \(A\) is diagonalizable, then there must be \(n\) independent eigenvectors. The sum of the dimensions \(m_k\) of the eigenspaces \(E_{\lambda_i}\), i.e., the sum of the geometric multiplicities must therefore be equal to \(n\). Since the algebraic multiplicities are at least as large as the geometric multiplicities, the sum of the algebraic multiplicities must be \(\geq n\). Since this sum cannot be larger, it means that the sum is equal to \(n\). Thus all algebraic multiplicities must in fact be equal to the corresponding geometric multiplicities. This settles properties i and ii.

Conversely, conditions i. and ii. immediately imply that there must be \(n\) linearly independent eigenvectors. The basic idea is that, since eigenvectors for different eigenvalues are automatically linearly independent, the bases for the eigenspaces put together give exactly \(n\) linearly independent eigenvectors.

We saw that there is a weak connection between eigenvalues and (non-)invertibility:

Proposition 6.1.4 states: a matrix is singular if and only if it has the eigenvalue \(0\).

In exercise 6.3.12 below you are invited to investigate the connection (or no-connection) between diagonalizability and invertibility.

We stated that diagonalizable matrices have nice properties. Here is one: for diagonalizable matrices finding (high) powers can be done very efficiently.

Example 6.3.5

If \(A = PDP^{-1}\), then \(A^k = PD^kP^{-1}\), for \(k = 0,1,2,3, \ldots\)

For instance,

\[ A^3 = (PDP^{-1})(PDP^{-1})(PDP^{-1}) = PD (P^{-1}P)D (P^{-1}P)D P^{-1} = PD^3P^{-1}, \]

since the internal factors \(P^{-1}P\) reduce to the identity matrix \(I\), and \(ID = D\).

Check for yourself what happens if \(k = 0\).

The advantage is the following. Normally, multiplication of two \(n \times n\) matrices requires \(n\) multiplications per entry (or \(2n-1\) operations, if additions are counted as well), and there are \(n\times n\) entries to be computed. So for the \(k\)th power that requires about \(k\times n^3\) multiplications of numbers. To compute \(PD^kP^{-1}\) we need \(n\) \(k\)th powers to find \(D^k\), and we are left with one ‘simple’ matrix product \(PD^k\) and one ‘full’ matrix product.

Example 6.3.6

We compute \(A^{10}\) for the matrix \(A = \left[\begin{array}{cc} 1 & 4 \\ 1 & 1 \end{array} \right]\) of Example 6.3.1.

There we already settled that \(A = PDP^{-1}\), with

\[\begin{split} P = \left[\begin{array}{cc} 2 & -2 \\ 1 & 1 \end{array} \right] , \quad D = \left[\begin{array}{cc} 3&0 \\ 0 & -1 \end{array} \right] , \quad P^{-1} = \dfrac14\left[\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right] . \end{split}\]

We see that

(6.3.4)#\[\begin{split}A^{10} = \left[\begin{array}{cc} 2 & -2 \\ 1 & 1 \end{array} \right] \left[\begin{array}{cc} 3^{10}&0 \\ 0 & (-1 )^{10} \end{array} \right] \dfrac14\left[\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right].\end{split}\]

This can be evaluated to yield

\[\begin{split} A^{10} = \frac14 \left[\begin{array}{cc} 2\cdot 3^{10} & -2 \\ 3^{10} & 1\end{array} \right] \left[\begin{array}{cc} 1 & 2 \\ -1 & 2 \end{array} \right] = \frac14 \left[\begin{array}{cc} 2\cdot 3^{10}+2 & 4\cdot 3^{10}-4 \\ 3^{10}-1 & 2\cdot 3^{10}+2 \end{array} \right] . \end{split}\]

An alternative way to denote the last matrix:

\[\begin{split} A^{10} = \frac{3^{10}}{4} \left[\begin{array}{cc} 2 & 4 \\ 1 & 2 \end{array} \right] + \frac{1}{4}\left[\begin{array}{cc} 2 & -4 \\ -1 & 2 \end{array} \right] . \end{split}\]

Note that we could have found any power of \(A\) just as easily: replacing \(10\) by \(n\) in Equation (6.3.4) gives

\[\begin{split} \begin{array}{rcl} A^{n} &=& \left[\begin{array}{cc} 2 & -2 \\ 1 & 1 \end{array} \right] \left[\begin{array}{cc} 3^{n}&0 \\ 0 & (-1 )^{n} \end{array} \right] \dfrac14\left[\begin{array}{cc} 1 & 2 \\ -1 & 2\end{array} \right] \\ &=& \dfrac14 \left[\begin{array}{cc} 2\cdot 3^{n}+2\cdot(-1)^n & 4\cdot 3^{n}-4\cdot(-1)^n \\ 3^{n}- (-1)^n & 2\cdot 3^{n}+2\cdot(-1)^n \end{array} \right] \end{array} \end{split}\]

To conclude this section we return to the ‘toy’ migration model (Example 6.1.1) of this chapter to illustrate the power of diagonalization.

Example 6.3.7

Suppose the migrations between two cities \(A\) and \(B\) are described by the model

\[\begin{split} \left[\begin{array}{c} x_{k+1} \\y_{k+1}\end{array} \right] = \left[\begin{array}{c} 0.9x_{k} + 0.2 y_k\\0.1x_{k} + 0.8 y_k\end{array} \right] = \left[\begin{array}{cc} 0.9 & 0.2 \\ 0.1 & 0.8 \end{array} \right] \left[\begin{array}{c} x_{k} \\y_{k}\end{array} \right] . \end{split}\]

In short

\[\begin{split} \vect{x}_{k+1} = \left[\begin{array}{cc} 0.9 & 0.2 \\ 0.1 & 0.8 \end{array} \right] \vect{x}_{k} = M\vect{x}_{k}, \end{split}\]

where

\[\begin{split} \vect{x}_k = \left[\begin{array}{c} x_{k} \\y_{k}\end{array} \right] = \left[\begin{array}{c} \text{population in city } A \text{ at time } k \\ \text{population in city } B \text{ at time } k\end{array} \right] . \end{split}\]

It can be shown that \(M\) has the eigenvalues \(\lambda_1 = 1\) and \(\lambda_2 = 0.7\), with corresponding eigenvectors

\[\begin{split} \vect{v}_1 = \left[\begin{array}{c} 2 \\1\end{array} \right] , \quad \vect{v}_2 = \left[\begin{array}{c} 1 \\-1\end{array} \right] \quad \text{respectively.} \end{split}\]

Since \(\{\vect{v}_1, \vect{v}_2\}\) is a basis of eigenvectors, the matrix \(M\) is diagonalizable, and in fact we have

\[\begin{split} M = PDP^{-1} = \left[\begin{array}{cc} 2 &1\\1&-1\end{array} \right] \left[\begin{array}{cc} 1&0\\0&0.7\end{array} \right] \left[\begin{array}{cc} 2 &1\\1&-1\end{array} \right] ^{-1}. \end{split}\]

If the initial populations are given by

\[\begin{split} \vect{x}_0 = \left[\begin{array}{c} x_{0} \\y_{0}\end{array} \right] , \end{split}\]

then

\[ \vect{x}_k = M^k\vect{x}_0 = PD^kP^{-1}\vect{x}_0. \]

In this case we can clearly see what happens in the long run, i.e. when we let \(k\) go to infinity:

\[\begin{split} D^k = \left[\begin{array}{cc} 1^k&0\\0&0.7^k\end{array} \right] \longrightarrow \left[\begin{array}{cc} 1&0\\0&0\end{array} \right] , \quad \text{if } k \to \infty. \end{split}\]

By computing \(P^{-1}\) and the product of the three matrices \(P\), \(D\) and \(P^{-1}\) we find that if \( k \to \infty\),

\[\begin{split} M^k = PD^kP^{-1} \longrightarrow P\left[\begin{array}{cc} 1&0\\0&0\end{array} \right] P^{-1} = \frac13 \left[\begin{array}{cc} 2&2 \\ 1&1\end{array} \right].\end{split}\]

We may conclude that, for \( k \to \infty\),

\[\begin{split} \vect{x}_k = M^k\vect{x}_0 \longrightarrow \frac13 \left[\begin{array}{cc} 2&2 \\ 1&1\end{array} \right] \left[\begin{array}{c} x_{0} \\y_{0}\end{array} \right] = \frac13\left[\begin{array}{c} 2x_{0}+2y_{0} \\ x_{0}+y_{0}\end{array} \right] = \frac13(x_{0}+y_{0}) \left[\begin{array}{c} 2 \\ 1\end{array} \right] . \end{split}\]

The interpretation: in the long run the distribution of the people over the two cities approaches the steady state distribution where city \(A\) has twice as many inhabitants as city \(B\). Moreover, the total number of inhabitants of the two cities is still the same as at the beginning:

\[ x_{\infty} + y_{\infty} = \tfrac13(2x_{0}+2y_{0}) + \tfrac13(x_{0}+y_{0}) = x_{0}+y_{0}. \]

6.3.3. Grasple Exercises#

Grasple Exercise 6.3.1

https://embed.grasple.com/exercises/bd1c8f7a-917f-431f-889b-463ab7a7c6f6?id=91486

Given a \(2\times 2\) matrix \(A\) and ‘diagonalizer’ \(P\), to find the diagonal matrix \(D\) such that \(A=PDP^{-1}\).

Grasple Exercise 6.3.2

https://embed.grasple.com/exercises/5bcb24df-9cfd-4e4b-bcae-b550fb0fad63?id=91488

To find a diagonalization of a \(2\times 2\) matrix (insofar it exists).

Grasple Exercise 6.3.3

https://embed.grasple.com/exercises/c0d56365-5434-45b0-9c82-805112428024?id=91489

To find a diagonalization of a \(2\times 2\) matrix (insofar it exists).

Grasple Exercise 6.3.4

https://embed.grasple.com/exercises/5a71e703-acd5-48b1-9b6d-8a51f4f8cf95?id=91501

To investigate the diagonalizability of a (\(3 \times 3\)) matrix.

Grasple Exercise 6.3.5

https://embed.grasple.com/exercises/537a306b-47d1-422a-bc15-c7a75b81c24b?id=91496

To investigate the diagonalizability of a (\(3 \times 3\)) matrix.

Grasple Exercise 6.3.6

https://embed.grasple.com/exercises/5a71e703-acd5-48b1-9b6d-8a51f4f8cf95?id=91501

To investigate the diagonalizability of a (\(3 \times 3\)) matrix.

Grasple Exercise 6.3.7

https://embed.grasple.com/exercises/f61dfb8f-db65-4f17-80c7-b1702b0c2c07?id=104493

To investigate the diagonalizability of a 3x3 matrix of rank 1.

Grasple Exercise 6.3.8

https://embed.grasple.com/exercises/70b5964e-b6c7-4a64-a2e3-d10dc915f324?id=91503

To investigate the diagonalizability of a (\(3 \times 3\)) matrix.

Grasple Exercise 6.3.9

https://embed.grasple.com/exercises/534ce865-0960-403a-affc-0f23f2d14110?id=91521

For which \(\alpha\) is given (upper triangular) \(4 \times 4\) matrix diagonalizable?

Grasple Exercise 6.3.10

https://embed.grasple.com/exercises/f3cdb060-469a-4a30-be46-1ecc7197d66a?id=91522

True/False question (about a \(4\times4\) matrix with \(3\) distinct eigenvalues).

Grasple Exercise 6.3.11

https://embed.grasple.com/exercises/d1cb7e54-6c99-4a01-b161-832b37d650d0?id=91523

True/False question (invertibilty implies diagonalizability?)

Grasple Exercise 6.3.12

https://embed.grasple.com/exercises/4398c155-7971-42f6-b809-31ae507c0326?id=87331

Creating examples of all cases (non-)invertible versus (non-)diagonalizable.

Grasple Exercise 6.3.13

https://embed.grasple.com/exercises/9aca77fa-a7c8-4998-be00-a55c19e9fd70?id=62419

To draw conclusions from a diagonalization  \(A = PDP^{-1}\).