6.3. Diagonalizability#
6.3.1. Similar matrices#
Two \(n \times n\) matrices \(A\) and \(B\) are called similar if they are related via the property
Notation: \(A \sim B\).
It is true we already used the symbol \(\sim\) earlier to denote row equivalence of (augmented) matrices. When we use it, it will always be clear from the context what the meaning is at that instance.
In the definition it seems as if \(A\) and \(B\) play different roles, but that is not the case. This can be seen as follows:
Since \((P^{-1})^{-1} = P\), we see that
so similarity works both ways, that is,
Similar matrices have similar properties. Especially as regards eigenvalues and eigenvectors.
If \(A = PBP^{-1}\), then \(A\) and \(B\) have the same eigenvalues.
Moreover, if \(\vect{v}\) is an eigenvector of \(B\), then \(P\vect{v}\) is an eigenvector of \(A\).
Proof of Proposition 6.3.1
Suppose \(\lambda\) is an eigenvalue of \(B\), and \(\vect{v}\) is a corresponding eigenvector. We then see that
So \(AP\vect{v} = \lambda P\vect{v} \), and \(P\vect{v} \) is an eigenvector, provided it is not the zero vector. Since \(P\) is supposed to be invertible, and \(\vect{v}\) is not the zero vector, it is true that \(P\vect{v} \) is not the zero vector, and we are done.
Similar matrices have the same characteristic polynomial.
Proof of Proposition 6.3.2
Suppose \(A = PBP^{-1}\).
Then we have
The second equality is proved by the following chain of identities.
In fact, the first step contains the ‘smart move’, to bring in convenient factors \(P\) and \(P^{-1}\) via
In the other steps we used the rule \(\det{(AB)} = \det{A}\det{B}\) and its consequence that for invertible matrices \(P\) we have
From Proposition 6.3.2 it follows that similar matrices have the same eigenvalues with the same algebraic multiplicities.
From Proposition 6.3.1 it follows that they also have the same geometric multiplicities. That is,
if \(\vect{v}_1, \ldots, \vect{v}_m\) are linearly independent eigenvectors of \(B\) for the eigenvalue \(\lambda_k\), and \(A = PBP^{-1}\),
then the vectors \(P\vect{v}_1, \ldots, P\vect{v}_m\) are linearly independent eigenvectors of \(A\).
And vice versa.
Fill in the details of the last remark.
The above considerations are summarized in the following proposition.
Suppose \(A\) and \(B\) are similar matrices. Then they have the same eigenvalues with the same algebraic and geometric multiplicities.
Using the properties of similar matrices we can prove the inequality
that holds for the geometric and the algebraic multiplicity of an eigenvalue (cf. Proposition 6.2.4).
Proof of Proposition 6.2.4 (geom.mult. \(\leq\) alg.mult.)
Suppose the \(n\times n\) matrix \(A\) has the eigenvalue \(\lambda_1\) of geometric multiplicity \(k\). We have to show that the algebraic multiplicity of \(\lambda_1\) is at least equal to \(k\). We will do so by constructing a matrix \(B\) that is similar to \(A\) and for which the eigenvalue \(\lambda_1\) will clearly have algebraic multiplicity at least equal to \(k\).
Suppose \(\vect{v}_1,\ldots,\vect{v}_k\) are \(k\) linearly independent eigenvectors for \(\lambda_1\). We can extend \(\{\vect{v}_1,\ldots,\vect{v}_k,\}\) to a basis \(\{\vect{v}_1,\ldots,\vect{v}_k, \ldots, \mathbf{v}_n \}\) of \(\mathbb{R}^n\).
Let \(P\) be the matrix with \(\vect{v}_1,\ldots,\vect{v}_n\) as columns. \(P\) is invertible, and we have that
since \(\vect{v}_1, \ldots, \vect{v}_k\) were supposed to be eigenvectors for \(\lambda_1\).
Since \(P\) is invertible, for each \(j \geq k+1\) the equation \(P\vect{x} = A\vect{v}_{j}\) has the (unique) solution, \(\vect{b}_j = P^{-1}A\vect{v}_{j}\). So we see that
So we have that \(A = PBP^{-1}\), which means that \(A\) and \(B\) are similar, hence they have the same eigenvalues with the same algebraic (and also geometric) multiplicities.
Note that \(B\) is of the form
where there are \(k\) entries \(\lambda_1\) on the diagonal.
It follows that the characteristic polynomial det\((B - \lambda \mathrm I)\) will have at least \(k\) factors \((\lambda - \lambda_1)\).
Thus the algebraic multiplicity of the eigenvalue \(\lambda_1\) for the matrix \(B\) is greater than or equal to \(k\). From the observed similarity \(A \sim B\) it follows that this also holds for the algebraic multiplicity of \(\lambda_1\) for the matrix \(A\).
So indeed the inequality
is universally true.
One way to understand the similarity of similar matrices comes from considering the linear transformations they represent. In Section 4.3.4 it is shown that if \(T:\R^n\to\R^n\) is the linear transformation that has \(A\) as its standard matrix, and \(P = P_{\mathcal{B}}\) is the change-of-coordinates matrix from the basis \(\mathcal{B}\) to the standard matrix, then the matrix of \(T\) with respect to basis \(\mathcal{B}\) is given by
This means that if \(A\) and \(B\) are related via
then \(A\) and \(B\) are in fact matrices of the same linear transformation, only with respect to different bases. The fact that - for one thing - they share the same eigenvalues is then not very surprising.
The following proposition captures some other properties that similar matrices share.
Suppose \(A\) and \(B\) are similar matrices. Then the following statements are true.
-
\(\det{A} = \det{B}\).
-
If \(A\) is invertible, then \(B\) is invertible (and vice versa).
-
\(A\) and \(B\) have the same rank.
Proof of Proposition 6.3.4
Suppose \(A = PBP^{-1}\).
-
As in the proof of the equality of the characteristic polynomials (Proposition 6.3.2) we have:
if \(A = PBP^{-1}\), then
\[ \det{A} = \det{(PBP^{-1})} = \det{P}\det{B}\det{(P^{-1})}, \]which can be rewritten as follows
\[ \det{P}\det{B}\det{(P^{-1})} = \det{P}\det{B}(\det{P})^{-1} = \det{B}. \] -
Follows immediately from i.:
matrix \(A\) is invertible \(\quad \iff \quad \det{(A)} \neq 0\).
-
If \(A\) has rank \(n\), then \(A\) is invertible, and then \(B\) is also invertible, so \(B\) has rank \(n\) too.
If \(\text{rank}\) \(A < n\) then \(\lambda = 0\) is an eigenvalue of both \(A\) and \(B\). In this case we can use\[ \text{rank}\,A = n - \text{dim Nul}\,A \]Recall that \(\text{Nul}\) \(A\) is the eigenspace for the eigenvalue \(\lambda = 0\), so \(\text{dim Nul}\) \(A\) is the geometric multiplicity of the eigenvalue \(\lambda = 0\), which multiplicity is the same for \(A\) and \(B\). We deduce that
\[ \text{rank}\,A = n - \text{dim Nul}\,A = n - \text{dim Nul}\,B = \text{rank}\,B. \]
6.3.2. Diagonalizability#
A matrix is \(A\) is called diagonalizable if it is similar to a diagonal matrix. That means that a diagonal matrix \(D\) and an invertible matrix \(P\) exist such that
We then say that \(PDP^{-1}\) is a diagonalization of \(A\).
An equivalent alternative characterization of diagonalizability is given in the following proposition.
An \(n \times n\) matrix \(A\) is diagonalizable if and only if \(A\) has \(n\) linearly independent eigenvectors. Such a set of eigenvectors then forms a basis for \(\R^n\).
Since this proposition is such a pillar on which much of the theory of matrices rests, and diagonalizable matrices are important because they are in many respects easy to work with, we give two proofs.
Proof of Proposition 6.3.5
The first proof is algebraic. First we note that
Next we write out these last matrix products column by column:
and
so
Comparing \(AP\) and \(PD\) column by column we see that \(A\vect{p}_i = d_i\vect{p}_i\) for \(n\) linearly independent vectors in \(\R^n\). Namely, an invertible matrix \(P\) has linearly independent columns.
The second proof has a geometric flavour. Open it if you are interested.
Second proof of Proposition 6.3.5
First we show that diagonalizability implies the existence of \(n\) linearly independent eigenvectors.
If \(A = PDP^{-1}\) then by Proposition 6.3.5 \(A\) and \(D\) have the same eigenvalues and the relation between the eigenvectors is:
-
if \(\vect{v}\) is an eigenvector of \(D\) for the eigenvalue \(\lambda\) then \( P\vect{v}\) is an eigenvector of \(A\) for the same \(\lambda\).
The eigenvalues of \(D\) are simply the diagonal entries \(d_i\) with the vectors \(\vect{e}_i\) of the standard basis as corresponding eigenvectors.
Thus \(A = PDP^{-1}\) has the eigenvalues \(d_i\) with corresponding eigenvectors \(P\vect{e}_i = \vect{p}_i\). Thus the \(n\) columns of \(P\), which are linearly independent since \(P\) is invertible, give a basis of eigenvectors for \(A\).
The other half is a bit more involved. It relies on the transformation formula of matrix representations (see Proposition 4.3.6).
Let \(T: \R^n \to \R^n\) be the linear transformation with standard matrix \(A\), i.e., \(T(\vect{x}) = A\vect{x}\), and suppose \(A\) has \(n\) linearly independent eigenvectors \(\vect{v}_1, \ldots, \vect{v}_n\). Let \(\lambda_1, \ldots, \lambda_n\) denote the eigenvalues. So \(A\vect{v}_i =\lambda_i\vect{v}_i\).
For the basis \(\mathcal{B} = (\vect{v}_1, \ldots, \vect{v}_n)\) we then see that
and the transformation formula gives
where
is the change-of-coordinates matrix from \(\mathcal{B}\) to the standard basis.
Lastly, the identity \(D=P^{-1}AP\) in Equation (6.3.2) is equivalent to \(A = PDP^{-1}\).
We verify the relation \(A = PDP^{-1}\) for the matrix \(A = \begin{bmatrix} 1 & 4 \\ 1 & 1 \end{bmatrix}\) we studied before. We found that \(A\) has the eigenvalues \(\lambda_1 = 3\), \(\lambda_2 = -1\), with corresponding eigenvectors \(\vect{v}_1 = \begin{bmatrix} 2 \\1 \end{bmatrix}\) and \(\vect{v}_2 = \begin{bmatrix} -2 \\1 \end{bmatrix}\).
Thus for a diagonalization of \(A\) we can take
We will check that this is okay. To start with,
so
The last product equals
as it should.
Note that the diagonalization is not unique: the order of the eigenvalues can be changed, and the eigenvectors may be scaled. However, the order of the eigenvectors in \(P\) must correspond to the order of the eigenvalues on the diagonal of \(D\). For instance, for the matrix \(A\) at hand, an alternative diagonalization is given by
Are all matrices diagonalizable? Most certainly not, as the following two examples, studied before, show.
The matrix \(R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\) of Example 6.1.8 does not have any (real) eigenvalues, so also no eigenvectors. Hence it cannot be diagonalized.
Things would be different if we would allow complex eigenvalues and eigenvectors. We will devote a separate section (Section 6.4) to this. And then it will appear that the matrix \(R\) is complex diagonalizable.
In the previous example there were not enough eigenvalues for the matrix \(A\) to be real diagonalizable. In the following example there is another reason why a matrix can fail to be diagonalizable.
The matrix
\(A = \left[\begin{array}{cc} 2 & 1 \\ 0 & 2 \end{array}
\right]\)
has the double eigenvalue \(\lambda_1 = \lambda_2 = 2\).
Since
\(A - 2I = \left[\begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array}
\right]\)
has rank 1, there is only one independent eigenvector.
Thus there does not exist a basis of eigenvectors for \(A\), and consequently the matrix \(A\) is not diagonalizable.
The matrix \(A = \left[\begin{array}{ccc} 4 & -1 & -2 \\0 & 3 & 0 \\ 1 & 2 & 1 \end{array} \right]\) of Example 6.2.4 and Example 6.2.5 provides another example of this phenomenon. It has the two eigenvalues, \(\lambda_1=3\), of algebraic multiplicity 2, and \(\lambda_2 = 2\), of algebraic multiplicity 1. There is only one independent eigenvector for \(\lambda_{1}\). This, together with the single independent eigenvector for \(\lambda_2\) is a maximal set of two linearly independent eigenvectors for \(A\). So, this matrix \(A\) is again not diagonalizable.
Is the matrix \(A = \left[\begin{array}{cccc}1 & 1 & 0 & 1 \\ 0 & 2 & 0 & 0\\ 0 & 0 & 2 & 1 \\ 0 & 0 & 0 & 1 \end{array} \right]\) diagonalizable?
These examples show the two causes why a matrix may not be diagonalizable, as is made explicit in the following proposition.
The \(n \times n\) matrix \(A\) is (real) diagonalizable if and only if it satisfies the following two conditions.
-
The characteristic polynomial of \(A\) has exactly \(n\) real roots, counting multiplicities.
-
For each eigenvalue the geometric multiplicity is equal to the algebraic multiplicity.
Proof of Theorem 6.3.1
First we show that a diagonalizable matrix satisfies the two conditions.
If \(A\) is diagonalizable, then there must be \(n\) independent eigenvectors. The sum of the dimensions \(m_k\) of the eigenspaces \(E_{\lambda_i}\), i.e., the sum of the geometric multiplicities must therefore be equal to \(n\). Since the algebraic multiplicities are at least as large as the geometric multiplicities, the sum of the algebraic multiplicities must be \(\geq n\). Since this sum cannot be larger, it means that the sum is equal to \(n\). Thus all algebraic multiplicities must in fact be equal to the corresponding geometric multiplicities. This settles properties i and ii.
Conversely, conditions i. and ii. immediately imply that there must be \(n\) linearly independent eigenvectors. The basic idea is that, since eigenvectors for different eigenvalues are automatically linearly independent, the bases for the eigenspaces put together give exactly \(n\) linearly independent eigenvectors.
(More detailed) Proof of Theorem 6.3.1
Suppose that the \(n \times n\) matrix \(A\) has only real eigenvalues, say \(\lambda_1,\ldots,\lambda_k\), and that for each eigenvalue \(\lambda_i\) the geometric multiplicity \(m_i\) is equal to the algebraic multiplicity, so
Since the sum of the algebraic multiplicities is equal to \(n\), the sum of the geometric multiplicities must be equal to \(n\) too,
For each \(i\) let \(\{\vect{v}^{(i)}_1, \ldots, \vect{v}^{(i)}_{m_i}\}\) be a basis for the eigenspace \(E(\lambda_i)\). If we can show that the union of all these bases is a basis for \(\R^n\), we have a basis of eigenvectors for matrix \(A\), and by Proposition 6.3.5 \(A\) is diagonalizable. To this end it is sufficient to show that the total set
is linearly independent. So, suppose that
If we introduce
we have that
Since each vector \(\vect{y}_i\) lies in the eigenspace \(E(\lambda_i)\), and by Proposition 6.1.3 eigenvectors for different eigenvalues are linearly independent, it follows that
So we have for each underbraced term in Equation (6.3.3)
Since the vectors in a set \(\{\vect{v}^{(i)}_1, \ldots, \vect{v}^{(i)}_{m_i}\}\) were supposed to form a basis for the eigenspace \(E(\lambda_i)\), they must be linearly independent. Thus it follows for the coefficients in the last sum that
This shows that (6.3.3) can only hold if all coefficients are zero, and consequently the set
being a linearly independent set of \(n\) vectors, will indeed be a basis consisting of eigenvectors (for \(\R^n\)).
We saw that there is a weak connection between eigenvalues and (non-)invertibility:
Proposition 6.1.4 states: a matrix is singular if and only if it has the eigenvalue \(0\).
In exercise 6.3.12 below you are invited to investigate the connection (or no-connection) between diagonalizability and invertibility.
We stated that diagonalizable matrices have nice properties. Here is one: for diagonalizable matrices finding (high) powers can be done very efficiently.
If \(A = PDP^{-1}\), then \(A^k = PD^kP^{-1}\), for \(k = 0,1,2,3, \ldots\)
For instance,
since the internal factors \(P^{-1}P\) reduce to the identity matrix \(I\), and \(ID = D\).
Check for yourself what happens if \(k = 0\).
The advantage is the following. Normally, multiplication of two \(n \times n\) matrices requires \(n\) multiplications per entry (or \(2n-1\) operations, if additions are counted as well), and there are \(n\times n\) entries to be computed. So for the \(k\)th power that requires about \(k\times n^3\) multiplications of numbers. To compute \(PD^kP^{-1}\) we need \(n\) \(k\)th powers to find \(D^k\), and we are left with one ‘simple’ matrix product \(PD^k\) and one ‘full’ matrix product.
We compute \(A^{10}\) for the matrix \(A = \left[\begin{array}{cc} 1 & 4 \\ 1 & 1 \end{array} \right]\) of Example 6.3.1.
There we already settled that \(A = PDP^{-1}\), with
We see that
This can be evaluated to yield
An alternative way to denote the last matrix:
Note that we could have found any power of \(A\) just as easily: replacing \(10\) by \(n\) in Equation (6.3.4) gives
To conclude this section we return to the ‘toy’ migration model (Example 6.1.1) of this chapter to illustrate the power of diagonalization.
Suppose the migrations between two cities \(A\) and \(B\) are described by the model
In short
where
It can be shown that \(M\) has the eigenvalues \(\lambda_1 = 1\) and \(\lambda_2 = 0.7\), with corresponding eigenvectors
Since \(\{\vect{v}_1, \vect{v}_2\}\) is a basis of eigenvectors, the matrix \(M\) is diagonalizable, and in fact we have
If the initial populations are given by
then
In this case we can clearly see what happens in the long run, i.e. when we let \(k\) go to infinity:
By computing \(P^{-1}\) and the product of the three matrices \(P\), \(D\) and \(P^{-1}\) we find that if \( k \to \infty\),
We may conclude that, for \( k \to \infty\),
The interpretation: in the long run the distribution of the people over the two cities approaches the steady state distribution where city \(A\) has twice as many inhabitants as city \(B\). Moreover, the total number of inhabitants of the two cities is still the same as at the beginning:
6.3.3. Grasple Exercises#
Given a \(2\times 2\) matrix \(A\) and ‘diagonalizer’ \(P\), to find the diagonal matrix \(D\) such that \(A=PDP^{-1}\).
Show/Hide Content
To find a diagonalization of a \(2\times 2\) matrix (insofar it exists).
Show/Hide Content
To find a diagonalization of a \(2\times 2\) matrix (insofar it exists).
Show/Hide Content
To investigate the diagonalizability of a (\(3 \times 3\)) matrix.
Show/Hide Content
To investigate the diagonalizability of a (\(3 \times 3\)) matrix.
Show/Hide Content
To investigate the diagonalizability of a (\(3 \times 3\)) matrix.
Show/Hide Content
To investigate the diagonalizability of a 3x3 matrix of rank 1.
Show/Hide Content
To investigate the diagonalizability of a (\(3 \times 3\)) matrix.
Show/Hide Content
For which \(\alpha\) is given (upper triangular) \(4 \times 4\) matrix diagonalizable?
Show/Hide Content
True/False question (about a \(4\times4\) matrix with \(3\) distinct eigenvalues).
Show/Hide Content
True/False question (invertibilty implies diagonalizability?)
Show/Hide Content
Creating examples of all cases (non-)invertible versus (non-)diagonalizable.
Show/Hide Content
To draw conclusions from a diagonalization \(A = PDP^{-1}\).