6.3. Diagonalizability#
6.3.1. Similar matrices#
Two \(n \times n\) matrices \(A\) and \(B\) are called similar if they are related via the property
Notation: \(A \sim B\).
It is true we already used the symbol \(\sim\) earlier to denote row equivalence of (augmented) matrices. When we use it, it will always be clear from the context what is the meaning at that instance.
In the definition it seems as if \(A\) and \(B\) play different roles, but that is not the case. This can be seen as follows:
Since \((P^{-1})^{-1} = P\), we see that
so similarity works both ways, that is,
Similar matrices have similar properties. Especially as regards eigenvalues and eigenvectors.
If \(A = PBP^{-1}\), then \(A\) and \(B\) have the same eigenvalues.
Moreover, if \(\vect{v}\) is an eigenvector of \(B\), then \(P\vect{v}\) is an eigenvector of \(A\).
Proof. Suppose \(\lambda\) is an eigenvalue of \(B\), and \(\vect{v}\) is a corresponding eigenvector. We then see that
So \(AP\vect{v} = \lambda P\vect{v} \), and \(P\vect{v} \) is an eigenvector, provided it is not the zero vector. Since \(P\) is supposed to be invertible, and \(\vect{v}\) is not the zero vector, it is true that \(P\vect{v} \) is not the zero vector, and we are done.
Similar matrices have the same characteristic polynomial.
Proof. Suppose \(A = PBP^{-1}\).
Then we have
The second equality is shown in the following chain of identities.
In fact, the first step contains the ‘smart move’, to bring in convenient factors \(P\) and \(P^{-1}\) via
In the other steps we used the rule \(\det{(AB)} = \det{A}\det{B}\) and its consequence that for invertible matrices \(P\) we have
From Proposition 6.3.2 it follows that similar matrices have the same eigenvalues with the same algebraic multiplicities.
From Proposition 6.3.1 it follows that they also have the same geometric multiplicities. That is,
if \(\vect{v}_1, \ldots, \vect{v}_m\) are linearly independent eigenvectors of \(B\) for the eigenvalue \(\lambda_k\), and \(A = PBP^{-1}\),
then the vectors \(P\vect{v}_1, \ldots, P\vect{v}_m\) are linearly independent eigenvectors of \(A\).
And vice versa.
Fill in the details of the last remark.
One way to understand the similarity of similar matrices comes from considering the linear transformations they represent. In Section 4.3.4 it is shown that if \(T:\R^n\to\R^n\) is the linear transformation that has \(A\) as its standard matrix, and \(P = P_{\mathcal{B}}\) is the change-of-coordinates matrix from the basis \(\mathcal{B}\) to the standard matrix, then the matrix of \(T\) with respect to basis \(\mathcal{B}\) is given by
This means that if \(A\) and \(B\) are related via
then \(A\) and \(B\) are in fact matrices of the same linear transformation, only with respect to different bases. The fact that - for one thing - they share the same eigenvalues is then not very surprising.
The following proposition captures some other properties that similar matrices share.
Suppose \(A\) and \(B\) are similar matrices. Then the following statements are true.
-
\(\det{A} = \det{B}\).
-
If \(A\) is invertible, then \(B\) is invertible (and vice versa).
-
\(A\) and \(B\) have the same rank.
Proof of Proposition 6.3.3
Suppose \(A = PBP^{-1}\).
-
As in the proof of the equality of the characteristic polynomials (Proposition 6.3.2) we have:
if \(A = PBP^{-1}\), then
\[ \det{A} = \det{(PBP^{-1})} = \det{P}\det{B}\det{(P^{-1})}, \]which can be rewritten as follows
\[ \det{P}\det{B}\det{(P^{-1})} = \det{P}\det{B}(\det{P})^{-1} = \det{B}. \] -
Follows immediately from i.:
matrix \(A\) is invertible \(\quad \iff \quad \det{(A)} \neq 0\).
-
We can use the identities of Proposition 4.2.7 from the section ‘Basis and Dimension’. Since \(P\) and \(P^{-1}\) are both invertible we find: if \(A = PBP^{-1}\), then \(\text{rank}(A) = \text{rank}(PBP^{-1}) = \text{rank}(PB) = \text{rank}(B)\).
6.3.2. Diagonalizability#
A matrix is \(A\) is called diagonalizable if it is similar to a diagonal matrix. That means that a diagonal matrix \(D\) and an invertible matrix \(P\) exist such that
We then say that \(PDP^{-1}\) is a diagonalization of \(A\).
An equivalent alternative characterization of diagonalizability is given in the following proposition.
A matrix \(A\) is diagonalizable if and only if \(A\) has \(n\) linearly independent eigenvectors. Such a set of eigenvectors then forms a basis for \(\R^n\).
Since this proposition is such a pillar on which much of the theory of matrices rests, and diagonalizable matrices are important because they are in many respects easy to work with, we give two proofs.
Proof. The first proof is algebraic. First we note that
Next we write out these last matrix products column by column:
and
so
Comparing \(AP\) and \(PD\) column by column we see that \(A\vect{p}_i = d_i\vect{p}_i\) for \(n\) linearly independent vectors in \(\R^n\). Namely, an invertible matrix \(P\) has linearly independent columns.
The second proof has a geometric flavour.
Proof. First we show that diagonalizability implies the existence of \(n\) linearly independent eigenvectors.
If \(A = PDP^{-1}\) then by Proposition 6.3.4 \(A\) and \(D\) have the same eigenvalues and the relation between the eigenvectors is:
-
if \(\vect{v}\) is an eigenvector of \(D\) for the eigenvalue \(\lambda\) then \( P\vect{v}\) is an eigenvector of \(A\) for the same \(\lambda\).
The eigenvalues of \(D\) are simply the diagonal entries \(d_i\) with the vectors \(\vect{e}_i\) of the standard basis as corresponding eigenvectors.
Thus \(A = PDP^{-1}\) has the eigenvalues \(d_i\) with corresponding eigenvectors \(P\vect{e}_i = \vect{p}_i\). Thus the \(n\) columns of \(P\), which are linearly independent since \(P\) is invertible, give a basis of eigenvectors for \(A\).
The other half is a bit more involved. It relies on the transformation formula of matrix representations (see Proposition 4.3.6).
Let \(T: \R^n \to \R^n\) be the linear transformation with standard matrix \(A\), i.e., \(T(\vect{x}) = A\vect{x}\), and suppose \(A\) has \(n\) linearly independent eigenvectors \(\vect{v}_1, \ldots, \vect{v}_n\). Let \(\lambda_1, \ldots, \lambda_n\) denote the eigenvalues. So \(A\vect{v}_i =\lambda_i\vect{v}_i\).
For the basis \(\mathcal{B} = (\vect{v}_1, \ldots, \vect{v}_n)\) we then see that
and the transformation formula gives
where
is the change-of-coordinates matrix from \(\mathcal{B}\) to the standard basis.
Lastly, the identity \(D=P^{-1}AP\) in Equation (6.3.1) is equivalent to \(A = PDP^{-1}\).
We verify the relation \(A = PDP^{-1}\) for the matrix \(A = \begin{bmatrix} 1 & 4 \\ 1 & 1 \end{bmatrix}\) we studied before. We found that \(A\) has the eigenvalues \(\lambda_1 = 3\), \(\lambda_2 = -1\), with corresponding eigenvectors \(\vect{v}_1 = \begin{bmatrix} 2 \\1 \end{bmatrix}\) and \(\vect{v}_2 = \begin{bmatrix} -2 \\1 \end{bmatrix}\).
Thus for a diagonalization of \(A\) we can take
We will check that this is okay. To start with,
so
The last product equals
as it should.
Note that the diagonalization is not unique: the order of the eigenvalues is free to choose, and the eigenvectors may be scaled. However, the order of the eigenvectors in \(P\) must correspond to the order of the eigenvalues on the diagonal of \(D\). For instance, for the matrix \(A\) at hand, an alternative diagonalization is given by
Are all matrices diagonalizable? Most certainly not, as the following two examples, studied before, show.
The matrix \(R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\) of Example 6.1.8 does not have any (real) eigenvalues, so also no eigenvectors. Hence it cannot be diagonalized.
Things would be different if we would allow complex eigenvalues and eigenvectors. We will devote a special section to this. And then it will appear that the matrix \(R\) is complex diagonalizable.
In the previous example there were not enough eigenvalues for the matrix \(A\) to be real diagonalizable. In the following example there is another reason why a matrix can fail to be diagonalizable.
The matrix
\(A = \left[\begin{array}{cc} 2 & 1 \\ 0 & 2 \end{array}
\right]\)
has the double eigenvalue \(\lambda_1 = \lambda_2 = 2\).
Since
\(A - 2I = \left[\begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array}
\right]\)
has rank 1, there is only one independent eigenvector.
Thus there does not exist a basis of eigenvectors for \(A\), and consequently the matrix \(A\) is not diagonalizable.
The matrix \(A = \left[\begin{array}{ccc} 4 & -1 & 2 \\0 & 3 & 0 \\ 1 & 2 & 1 \end{array} \right]\) of Example 6.2.4 and Example 6.2.5 provides another example of this phenomenon. It has the two eigenvalues, \(\lambda_1=3\), of algebraic multiplicity 2, and \(\lambda_2 = 2\), of algebraic multiplicity 1. There is only one independent eigenvector for \(\lambda_{1}\). This, together with the single independent eigenvector for \(\lambda_2\) is a maximal set of two linearly independent eigenvectors for \(A\). So, this matrix \(A\) is again not diagonalizable.
Is the matrix \(A = \left[\begin{array}{cccc}1 & 1 & 0 & 1 \\ 0 & 2 & 0 & 0\\ 0 & 0 & 2 & 1 \\ 0 & 0 & 0 & 1 \end{array} \right]\) diagonalizable?
These examples show the two causes why a matrix may not be diagonalizable, as is made explicit in the following proposition.
The \(n \times n\) matrix \(A\) is (real) diagonalizable if and only if it satisfies the following two conditions.
-
The characteristic polynomial of \(A\) has exactly \(n\) real roots, counting multiplicities.
-
For each eigenvalue the geometric multiplicity is equal to the algebraic multiplicity.
Proof. First we show that a diagonalizable matrix satisfies the two conditions.
If \(A\) is diagonalizable, then there must be \(n\) independent eigenvectors. The sum of the dimensions \(m_k\) of the eigenspaces \(E_{\lambda_i}\), i.e., the sum of the geometric multiplicities must therefore be equal to \(n\). Since the algebraic multiplicities are at least as large as the geometric multiplicities, the sum of the algebraic multiplicities must be \(\geq n\). Since this sum cannot be larger, it means that the sum is equal to \(n\). Thus all algebraic multiplicities must in fact be equal to the corresponding geometric multiplicities. This settles properties i and ii.
Conversely, conditions i. and ii. immediately imply that there must be \(n\) linearly independent eigenvectors. Namely, since eigenvectors for different eigenvalues are automatically linearly independent, bases for the eigenspaces put together give exactly \(n\) linearly independent eigenvectors.
We saw that there is a weak connection between eigenvalues and (non-)invertibility:
Proposition 6.1.4 states: a matrix is singular if and only if it has the eigenvalue \(0\).
In exercise 6.3.11 below you are invited to investigate the connection (or no-connection) between diagonalizability and invertibility.
We stated that diagonalizable matrices have nice properties. Here is one: for diagonalizable matrices finding (high) powers can be done very efficiently.
If \(A = PDP^{-1}\), then \(A^k = PD^kP^{-1}\), for \(k = 0,1,2,3, \ldots\)
For instance,
since the internal factors \(P^{-1}P\) reduce to the identity matrix \(I\), and \(ID = D\).
Check for yourself what happens if \(k = 0\).
The advantage is the following. Normally, multiplication of two \(n \times n\) matrices requires \(n\) multiplications per entry (or \(2n-1\) operations, if additions are counted as well), and there are \(n\times n\) entries to be computed. So for the \(k\)th power that requires about \(k\times n^3\) multiplications of numbers. To compute \(PD^kP^{-1}\) we need \(n\) \(k\)th powers to find \(D^k\), and we are left with one ‘simple’ matrix product \(PD^k\) and one ‘full’ matrix product.
We compute \(A^{10}\) for the matrix \(A = \left[\begin{array}{cc} 1 & 4 \\ 1 & 1 \end{array} \right]\) of Example 6.3.1.
There we already settled that \(A = PDP^{-1}\), with
We see that
This can be evaluated to yield
An alternative way to denote the last matrix:
Note that we could have found any power of \(A\) just as easily: replacing \(10\) by \(n\) in Equation (6.3.2) gives
To conclude this section we return to the ‘toy’ migration model (Example 6.1.1) of this chapter to illustrate the power of diagonalization.
Suppose the migrations between two cities \(A\) and \(B\) are described by the model
In short
where
It can be shown that \(M\) has the eigenvalues \(\lambda_1 = 1\) and \(\lambda_2 = 0.7\), with corresponding eigenvectors
Since \(\{\vect{v}_1, \vect{v}_2\}\) is a basis of eigenvectors, the matrix \(M\) is diagonalizable, and in fact we have
If the initial populations are given by
then
In this case we can clearly see what happens in the long run, i.e. when we let \(k\) go to infinity:
By computing \(P^{-1}\) and the product of the three matrices \(P\), \(D\) and \(P^{-1}\) we find that if \( k \to \infty\),
We may conclude that, for \( k \to \infty\),
The interpretation: in the long run the distribution of the people over the two cities approaches the steady state distribution where city \(A\) has twice as many inhabitants as city \(B\). Moreover, the total number of inhabitants of the two cities is still the same as at the beginning:
6.3.3. Grasple Exercises#
Given a \(2\times 2\) matrix \(A\) and ‘diagonalizer’ \(P\), to find the diagonal matrix \(D\) such that \(A=PDP^{-1}\).
Show/Hide Content
To find a diagonalization of a \(2\times 2\) matrix (insofar it exists).
Show/Hide Content
To find a diagonalization of a \(2\times 2\) matrix (insofar it exists).
Show/Hide Content
To investigate the diagonalizability of a (\(3 \times 3\)) matrix.
Show/Hide Content
To investigate the diagonalizability of a (\(3 \times 3\)) matrix.
Show/Hide Content
To investigate the diagonalizability of a (\(3 \times 3\)) matrix.
Show/Hide Content
To investigate the diagonalizability of a (\(3 \times 3\)) matrix.
Show/Hide Content
For which \(\alpha\) is given (upper triangular) \(4 \times 4\) matrix diagonalizable?
Show/Hide Content
True/False question (about a \(4\times4\) matrix with \(3\) distinct eigenvalues).
Show/Hide Content
True/False question (invertibilty implies diagonalizability?)
Show/Hide Content
Creating examples of all cases (non-)invertible versus (non-)diagonalizable.
Show/Hide Content
To draw conclusions from a diagonalization \(A = PDP^{-1}\).