3.6. The \(LU\) decomposition#
As we have seen, one way to solve a linear system \(A\vect{x}=\vect{b}\) is to row reduce it to echelon form and then use back substitution. See for instance Example 2.1.8.
In this section we will learn how to solve an \(m\times n\) linear system \(A\mathbf{x}=\mathbf{b}\) by decomposing (or factorising) a matrix into a product of two ‘special’ matrices \(L\) and \(U\).
This is called an \(LU\)-decomposition of the matrix \(A\).
Consider the system \(A\mathbf{x} = \mathbf{b}\) where
To start with, the matrix \(A\) can be factorized as
We will see how this helps to solve the system in an efficient way.
With this factorisation we can solve the system in two steps, using both backward and forward substitution. In fact, the matrix \(U\) is an echelon matrix equivalent to \(A\). To solve the equation \(LU\vect{x} = L(U\vect{x}) =\vect{b}\) we can introduce the auxiliary vector \(\vect{y}\) via
We then first solve
and after that we find \(\mathbf{x}\) by solving
for the solution \( \tilde{\vect{y}}\) of system (3.6.1). For this solution \( \tilde{\vect{x}}\) we then have
We illustrate matters with the linear system of example Example 3.6.1
The system \(A\vect{x} = \vect{b}\) of Example 3.6.1 can be written as
By setting \(\mathbf{y} = \begin{bmatrix} 1 & 1 & 1 \\ 0 & 2 & 6 \\ 0 & 0 & 4 \end{bmatrix} \begin{bmatrix} x_1\\ x_2 \\ x_3 \end{bmatrix}\) we have
With forward substitution we can find the solution:
Next we solve
by back substitution. This gives the solution
which is the solution to our original system. Since the factorisation was given, we did not have to solve the system from scratch by trying to find an echelon form.
Here is one to try for yourself.
To solve a system \(LU\vect{x} = \vect{b}\).
Show/Hide Content
There are several methods for factorising matrices. The factorisations that we will see in this section use direct methods. Direct methods are methods that produce the exact factorisations in a finite number of steps.
In the next subsection we will address the questions of whether an \(LU\) decomposition always exists and if so, how to construct it. With an extra condition on \(L\) the decomposition, if it exists, will be unique. In the case where an \(LU\) decomposition does not exist we can instead consider the slightly more general \(PLU\) decomposition. In the remainder of the section we will consider the generalization to non-square matrices and we will analyze to which extent the \((P)LU\) decomposition gives an efficiency boost.
3.6.1. \(LU\) decomposition of a square matrix#
Let \(A\) be an \(n\times n\) matrix. An \(LU\) decomposition of \(A\) is a factorization of the type
where \(L\) is an \(n\times n\) lower triangular matrix with \(1\)s on the diagonal, and \(U\) an \(n\times n\) upper triangular matrix. So,
For convenience we have used blanks for zeros.
The next proposition captures the main interest of the \(LU\) decomposition, as is already illustrated in Example 3.6.2
Suppose that \(A=LU\), so that a linear system of equations \(A\mathbf{x}=\mathbf{b}\) can be written as \(LU\mathbf{x}=\mathbf{b}\). Then, by setting \(\mathbf{y} = U\mathbf{x}\), we can solve the linear system in two steps.
Solve the system \(L\mathbf{y}=\mathbf{b}\) and find \(\mathbf{y}\).
Solve the system \(U\mathbf{x}=\mathbf{y}\) to find \(\mathbf{x}\).
The solution of the second system is then a solution of the system \(A\mathbf{x}=\mathbf{b}\).
The next proposition tells us which matrices do have an \(LU\) decomposition.
A matrix \(A\) can be written as \(A = LU\), with \(L\) and \(U\) as described in Definition 3.6.1 if and only it can be row reduced to an echelon matrix with only additions of multiples of rows to rows below it. We will call this a top-down row reduction.
Instead of a giving a formal proof, we will illustrate matters first with an example. In this example you will see how an \(LU\)-decomposition of a matrix is found via a top-down row reduction that keeps track of the row operations involved.
We consider the matrix
To find an \(LU\) decomposition we will work towards an echelon form in the top-down direction. In addition, we will keep track of the actions we perform. For the first step,
The numbers \(2/3\) and \(1/3\) are called multipliers.
A second row reduction step, involving the multiplier \(1/2\), leads to an echelon/upper triangular matrix.
We can effectuate these row operations by matrix multiplications.
In the same way the second row reduction step can be described by
The echelon matrix \(A_3\) can act as our upper triangular matrix \(U\), and the above computations can be summarized as
where \(F = F_2F_1\).
We conclude that
which is indeed a product of a lower triangular matrix \(L\) (with 1’s on its diagonal) and an upper triangular matrix \(U\).
The above procedure shows how in principle an \(LU\) decomposition can be computed. However, writing down the explicit elementary like matrices \(F_i\), computing their product \(F\), and finding the inverse of \(F\) is unnecessary. Look at the matrix \(L = F^{-1}\) we found! The entries below the diagonal are exactly the numbers \(2/3\), \(1/3\) and \(1/2\) we called the multipliers. The following algorithm describes this ‘shortcut’ to find an \(LU\) decomposition.
Suppose the \(n\times n\) matrix \(A\) can be row reduced top-down to the echelon matrix \(U\). If the numbers \(m_{jk}\) denote the multiples of the \(k\)th row that are subtracted from the rows below it in the \(k\)th step (so \(1 \leq k < j \leq n\)), then
Another look at Example 3.6.3 explains why the detour via the matrices \(F_i\) and the inverse of their product can be skipped, as is expressed in the above algorithm.
For the matrix \(A= \begin{bmatrix} 3 & 1 & -2 \\ 2 & 4 & 1 \\ 1 & 2 & 1 \end{bmatrix}\).
The ‘trick’ is not to work with the matrices \(F_1\) and \(F_2\), but with their inverses.
The first row reduction step
can be rewritten as
Likewise the second row reduction step
can be can be represented as
Combining the two equations gives
We see the multipliers nicely fall into place!
Here is an example where we apply the algorithm applied without further ado to a \(4 \times 4\) matrix.
We use the \(LU\)-algorithm to the matrix \(A = \begin{bmatrix} 2 & 1 & -2 & 3 \\ 2 & -3 & -4 & 7 \\ -4 & 0 & -2 & -5 \\ -6 & -1 & 8 & -8 \end{bmatrix}.\)
We row reduce the matrix \(A\) top-down, diligently registering the multipliers. Recall that the multiplier \(m_{jk}\) is defined as the multiple of row \(k\) that is subtracted from row \(j\) in the \(k\)-th step of the reduction process.
Here we go:
Putting every multiplier in its right place gives
For the matrix \(A= \begin{bmatrix} 1 & 2 & 1 \\ 4 & 8 & 6 \\ 2 & 5 & 4 \end{bmatrix}\) the procedure breaks down at the second step.
The next step towards an echelon matrix needs a row swap. But then the lower triangular structure of \(L\) will be broken. In Subsection 3.6.2 we will study what we can do in such a situation.
Here’s one to try for yourself.
To compute the \(LU\)-decomposition of a 3x3 matrix \(A\).
Show/Hide Content
People that are not satisfied with two examples to show that Algorithm 3.6.2 works can have a look at the following (informal) proof of Proposition 3.6.1. It will also prove the statement that the existence of a \(LU\) decomposition of a matrix \(A\) implies that the matrix \(A\) can be row reduced top-down to a matrix in echelon form.
Proof of Proposition 3.6.1
Suppose \(A\) is an \(n\times n\) matrix that can be row reduced top down to an echelon matrix \(U\) matrix (which will then be an upper triangular). We can row reduce \(A\) from top to bottom, where we use the same form as in Example 3.6.4. For instance the first two steps are
where \(A_2\) will be of the form
On the next step will lead to
where \(A_3\) will be of the form
We thus end up with a product
where each matrix \(L_k\) is a matrix containing the multipliers of the \(k\)th step in the proces.
The crucial thing is that in the product \(L_{n-1}L_{n-2}\cdots L_2L_1\) in this order the multipliers do not ‘interact’. For instance, for \(n = 4\), we would get
At the end of this section we will analyze whether an \(LU\) decomposition if it exists, is unique. To answer that question we need the following properties of lower as well as upper triangular matrices.
Suppose \(A\) and \(B\) are lower triangular matrices with \(1\)s on their diagonals. Then the following properties hold.
\(AB\) is also a lower triangular matrix with \(1\)s on its diagonal.
\(A^{-1}\) is also a lower triangular matrix with \(1\)s on its diagonal.
In both properties ‘lower’ can be replaced by ‘upper’.
There are several ways to prove Proposition 3.6.2. The best would be to think of a proof yourself, but you can also have a peek at the exposition below.
Proof of Proposition 3.6.2
Suppose \(A\) and \(B\) are lower triangular matrices with \(1\)s on their diagonals.
One way to prove that \(AB\) also has these properties is to use the column-row expansion of the product. (Cf. Exercise 3.2.8.)
Let \(\vect{a}_k\) be the \(k\)th column of \(A\), and \(\vect{b}^{(k)}\) the \(k\)th row of \(B\). Then\[\begin{split} \vect{a}_k = \begin{bmatrix}0 \\ \vdots \\ 0 \\ a_{kk} \\ \vdots \\ a_{nk} \end{bmatrix} \quad \text{and} \quad \vect{b}_{(k)} = \begin{bmatrix}b_{k1} & \cdots & b_{kk} & 0 & \cdots & 0 \end{bmatrix}, \end{split}\]where moreover \(a_{kk} = b_{kk} = 1\). So the \(k\)th term in the column-row expansion of \(AB\) becomes
\[\begin{split} \begin{array}{rcl} \vect{a}_k\vect{b}^{(k)} &=& \begin{bmatrix}b_{k1}\vect{a}_k & b_{k2}\vect{a}_k & \cdots & b_{kk}\vect{a}_k & \vect{0} & \cdots & \vect{0}\end{bmatrix} \\ &=& \begin{bmatrix} 0 & 0 & \cdots & 0 & 0 & \cdots & 0 \\ \vdots & \vdots & & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 & 0 & \cdots & 0 \\ a_{kk}b_{k1} & a_{kk}b_{k2} & \cdots & a_{kk}b_{kk} & 0 & \cdots & 0 \\ \vdots & \vdots & & \vdots & \vdots & & \vdots \\ a_{nk}b_{k1} & a_{nk}b_{k2} & \cdots & a_{nk}b_{kk} & 0 & \cdots & 0 \end{bmatrix} \end{array}. \end{split}\]This is a lower triangular matrix, with a \(1\) on position \((k,k)\) and for the rest \(0\)s on the diagonal. Adding these \(n\) column-row products gives a matrix of the required form.
Next suppose that \(A\) is an upper triangular matrix with \(1\)s on the diagonal. If we apply the algorithm of Proposition 3.4.7 to find the inverse using the augmented matrix
\[\begin{split} [\,A | I \,] = \left[\begin{array}{ccccc|ccccc} 1 & 0 & 0 & \cdots & 0 & 1 & 0 & 0 & \cdots & 0 \\ a_{21} & 1 & 0 & \cdots & 0 & 0 & 1 & 0 & \cdots & 0 \\ a_{31} & a_{32} & 1 & \cdots & 0 & 0 & 0 & 1 & \cdots & 0 \\ \vdots &\vdots & & \ddots & 0 & \vdots & \vdots & & \ddots & 0 \\ a_{n1} &a_{n2} & a_{n3} & \cdots & 1 & 0 & 0 & 0 & \cdots & 1 \end{array} \right], \end{split}\]to row reduce \(A\) to \(I\) we only need to subtract multiples of rows from rows below it. Thus in the matrix to the right of the bar, where the identity matrix is transformed into the inverse matrix of \(A\), the triangular structure with \(1\)s on the diagonal remains.
The uniqueness issue is not of major importance, but for completeness’ sake we phrase it in a proposition, followed by a possible argument to prove it.
For an invertible matrix \(A\), if it has an \(LU\)-decomposition, it will be unique.
Proof of Proposition 3.6.3.
Suppose \(A\) is an invertible matrix with two \(LU\) decompositions
We then have to show that
Since \(A\) is invertible, the matrices \(L_1, L_2, U_1, U_2\) are also invertible.
From
it follows that
From Proposition 3.6.2 we know that \(L_2^{-1}\) is a lower triangular matrix with \(1\)s on the diagonal, and that the product \(L_2^{-1}L_1\) is also of this form. At the same time the product \(U_2U_1^{-1}\) must be an upper triangular matrix with \(1\)s on its diagonal. Then (3.6.3) implies that
from which the identities
immediately follow.
Lastly we give an example to show that the uniqueness may fail for a singular matrix.
For the matrix \(A = \begin{bmatrix} 1 & 1 & 1 \\ 2 & 2 & 2 \\ 3 & 3 & 3 \end{bmatrix}\) it holds that
where the parameter \(a\) is free to choose.
3.6.2. Generalization to non-square matrices and \(PLU\) decomposition#
In this section we describe what can be said regarding \(LU\) decompositions for non-square matrices, and present a ‘workaround’ for matrices for which there is no top-down row reduction to echelon form.
We start with an example to show that not much changes if we apply Algorithm 3.6.2 to non-square matrices.
The matrix \(A\) is given by
We can row reduce \(A\) top-down to an echelon matrix:
We then have
If we want to solve an equation like
we can use the same two-step procedure as in Example 3.6.2.
For instance, to solve the system
we can first solve
Using forward substitution we find the solution
and then the system
gives the solution(s)
The generalization of Proposition 3.6.1 to non-square matrices is captured in the next proposition.
An \(m\times n\) matrix \(A\), with \(m \leq n\) can be written as \(A = LU\), with \(L\) an \(m\times m\) upper triangular matrix with \(1\)s on its diagonal and \(U\) an echelon matrix if and only if \(A\) can be row reduced top-down to the echelon matrix \(U\).
Moreover, if the rows of \(A\) are linearly independent the matrices \(L\) and \(U\) are unique.
A similar proposition holds for \(m \times n\) matrices with \(m > n\). However, in this case the systems \(A\vect{x} = \vect{b}\) are inconsistent for most vectors \(\vect{b}\), and then other techniques come into play (e.g., see Section 7.4.2).
In the remainder of this subsection we address the next best thing we can do in case a matrix does not have an \(LU\) decomposition.
Let us return to Example 3.6.6 where the matrix \(A\) cannot be written as \(LU\). We copy from there
We need a row swap to bring the last matrix to echelon form. The key idea is to perform all the row exchanges first and then add multiples of rows (top-down) to other rows to obtain an echelon form. Here the one row exchange is captured by the matrix
We get
This is what we will call a PLU decomposition.
Note that first subtracting row 1 four times from row 2 and two times from row 3, followed by exchanging row 2 and row 3 leads to the same echelon matrix as first swapping row 2 and 3 and then subtracting row 1 two times from row 2 and four times from row 3.
A permutation matrix is an \(n \times n\) matrix \(P\) with only entries \(0\) and \(1\) in such a way that each row and each column contain exactly one \(1\).
Two \(4\times 4\) permutation matrices are
Note that for an arbitrary \(4 \times 4\) matrix \(A\) we have
and
In general (pre-)multiplication of a matrix \(A\) with any permutation matrix reorders the rows of \(A\).
The following properties of permutation matrices come in handy.
The product of two \(n\times n\) permutation matrices is again a permutation matrix.
The inverse of a permutation matrix is its transpose. Thus, \(P^{-1} = P^T\).
The following example provides some illustrations. And the interested reader may open the proof below it.
For the matrices \(P_1\) and \(P_2\) from (3.6.4) we have
and
Note that from the second identity it follows that \(P_2^{-1} = P_2^T\).
Proof of Proposition 3.6.5
The product of two \(n\times n\) permutation matrices is again a permutation matrix.
Suppose \(P_1\) and \(P_2\) are permutation matrices. With the product \(P_1P_2\) the rows of \(P_2\) are reordered, and that leaves the properties a \(1\) in each row, a \(1\) in each column, all other entries equal to \(0\), intact.
The inverse of a permutation matrix is its transpose. Thus, \(P^{-1} = P^T\).
In a product \(A^TA\), the entry on position \((i,j)\) is the inner product of the \(i\)th column of \(A\) with the \(j\)th column of \(A\). (Cf., Exercise 3.2.4.) Since the \(n\) columns of \(A\) are the \(n\) columns \(\vect{e}_j\) of the identity matrix (in some order), and
\[\begin{split} \vect{e}_i\ip\vect{e}_j = \left\{\begin{array}{l} 1, \,\,\text{if}\,\, i = j\\ 0, \,\,\text{if}\,\, i \neq j, \end{array} \right. \end{split}\]the result immediately follows.
\(PLU\) Decomposition)
(Existence of aSuppose that \(A\) is an \(m\times n\) matrix with real coefficients, and let \(m \leq n\). Then there exist a permutation matrix \(P\), an upper triangular matrix \(L\) and an echelon matrix \(U\) such that
As was mentioned before, the key idea is to perform the row exchanges first. These can be put together in the permutation matrix \(P\). The algorithm to actually find the \(LU\) decomposition of \(PA\) without doing the whole row reduction process for \(PA\) all over again is rather intricate, and in our view belongs to a course of numerical linear algebra. There it will be explained that it may also be preferable to work towards a \(PLU\) decomposition instead of an \(LU\) decomposition in cases where theoretically it is not absolutely necessary. For numerical reasons, having to do with finite accuracy when representing real numbers in computers, it may be better to choose the pivots in another order than just top-down.
In the following example the matrix does have a proper \(LU\) decomposition, but we will row reduce it in another order than top-down to echelon form. It is a tiny example to illustrate that it is possible to deduce a \(PLU\) decomposition from it when we keep record of both the multipliers and the positions of the pivots.
We will row reduce the matrix \( A= \begin{bmatrix}2&4&3 \\ 1&2&3\\1&3&2 \end{bmatrix}\) in an alternative order than top-down and extricate a \(PLU\) decomposition from it.
If we put together the matrices that describe the row operations we get
If we would have started from
using the same pivots would have arrived at
so
The matrix \(P\) is the inverse of (so also the transpose of) the matrix that describes the positions of the pivots as in Equation (3.6.5). It asks for some careful analysis how to reconstruct \(L\) from \(\tilde{L}\).
3.6.3. Efficiency Issues#
To be filled in later.
3.6.4. Grasple Exercises#
To identify triangular matrices.
Show/Hide Content
To compute the \(LU\)-decomposition of a 2x2 matrix \(A\).
Show/Hide Content
To compute the \(LU\)-decomposition of a 3x3 matrix \(A\).
Show/Hide Content
To compute the \(LU\)-decomposition of a 3x3 matrix \(A\) and use it to solve \(A\vect{x} = \vect{b}\).
Show/Hide Content
To compute the \(LU\)-decomposition of a 3x3 matrix \(A\) and use it to solve \(A\vect{x} = \vect{b}\).
Show/Hide Content
To decide solving \(A\vect{x} = \vect{b}\) via (given) \(A=LU\) or (given) \(A^{-1}\).
Show/Hide Content
To compute \(A^{-1}\) using \(A = LU\).
Show/Hide Content
Explorative exercise about the \(LDU\)-decomposition
Show/Hide Content
To compute the \(LU\)-decomposition of 3x4 matrix \(A\) and use it to solve \(A\vect{x} = \vect{b}\).
Show/Hide Content
To compute a \(PLU\)-decomposition of a 3x3 matrix.
Show/Hide Content
To solve a system \(A\vect{x} = \vect{b}\) using \( PA = LU\).