7.3. The Gram-Schmidt process#
7.3.1. Introduction#
In Section 7.2 we have seen that orthogonal bases are nice to work with. Both for finding coordinates and also for finding orthogonal projections.
In this section an algorithm is presented to construct an orthogonal basis from an arbitrary basis of a subspace \(W\) in \(\R^n\).
As a starter have a look at the following example
Example 7.3.1
Let \(W\) be the subspace in \(\R^3\) spanned by the two vectors \(\vect{a}_1 = \begin{pmatrix} 2 \\ 1 \\3 \end{pmatrix}\) and \(\vect{a}_2 = \begin{pmatrix} 3 \\ -2 \\1 \end{pmatrix}\). We want to construct an orthogonal basis \(\{\vect{b}_1,\vect{b}_2 \}\) for \(W\).
For the first vector of this ‘new’ basis we can simply take \(\vect{b}_1 = \vect{a}_1\).
As a second basis vector we can take
It is then clear that \(\vect{b}_2\) is in \(\operatorname{Span}\{\vect{a}_1, \vect{a}_2\}\), and by the property of the orthogonal projection \(\operatorname{proj}_{\vect{a}_1}(\vect{a}_2)\) it follows that \(\vect{b}_2 \perp \vect{b}_1\).
A fortiori \(\{\vect{b}_1, \vect{b}_2\}\) is linearly independent, so \(\{\vect{b}_1, \vect{b}_2\}\) is an orthogonal basis for \(W = \operatorname{Span}{\{\vect{a}_1, \vect{a}_2\}}\).
The explicit vectors we find are
If we prefer vectors without fractions, we can rescale the second vector, and then find the orthogonal basis
7.3.2. The Gram-Schmidt process#
Theorem 7.3.1
Suppose \(W\) is a subspace in \(\R^n\) with basis \(\{\vect{a}_1,\ldots,\vect{a}_m\}\). Construct the set of vectors \(\vect{b}_1,\ldots,\vect{b}_m\) according to the following rules
and in general
for \(j = 1,2,\ldots, m-1\).
Then all along the way
is an orthogonal basis for
In particular, in the end \(\{\vect{b}_1,\ldots,\vect{b}_m\}\) will be an orthogonal basis for \(\operatorname{Span}\{\vect{a}_1, \ldots, \vect{a}_m\} = W\).
The above construction/algorithm is called the Gram-Schmidt process.
Example 7.3.2
Let \(W\) be the defined as the span of the set \(\left\{\begin{pmatrix} 1 \\ 1 \\ -1 \\ 1 \end{pmatrix}, \begin{pmatrix} 3 \\ 3 \\ -2 \\ 0 \end{pmatrix}, \begin{pmatrix} 3 \\ 1 \\ 4 \\ -4 \end{pmatrix} \right\}\). It can be shown that these vectors are linearly independent.
We use the Gram-Schmidt algorithm to create an orthogonal basis for \(W\).
Step by step we find
and
Check for yourself that the vectors \(\vect{b}_1,\vect{b}_2, \vect{b}_3\) are indeed orthogonal.
Proof of Theorem 7.3.1
Let \(W_j\) be the subspace spanned by the first \(j\) vectors \(\vect{a}_1, \ldots, \vect{a}_j\), for \(j = 1,2\ldots,m\).
First of all
In the other steps, assume that we have so far created the orthogonal basis \(\{\vect{b}_1, \ldots,\vect{b}_j \}\) for \(W_j\).
Then in fact
Namely, the projection \(\operatorname{proj}_{W_j}(\vect{a}_{j+1})\) can be computed using the already created orthogonal basis \((\vect{b}_1, \ldots, \vect{b}_j)\) of \(W_j\). See Figure 7.3.1
Fig. 7.3.1 First steps in the Gram-Schmidt process.#
That is,
This makes \(\operatorname{proj}_{W_j}(\vect{a}_{j+1})\) an element of \(W_{j}\) and \(\vect{b}_{j+1} = \vect{a}_{j+1} - \operatorname{proj}_{W_j}(\vect{a}_{j+1})\) an element of \(W_{j+1}\).
\(\vect{b}_{j+1} \neq \vect{0}\) since we assumed that the vectors \(\vect{a}_1, \ldots, \vect{a}_m\) are independent, so \(\vect{a}_{j+1}\) is not in \(W_j\).
Moreover, \(\vect{b}_{j+1} = \vect{a}_{j+1} - \operatorname{proj}_{W_j}(\vect{a}_{j+1})\) is perpendicular to \(W_j\) by the properties of orthogonal projection.
Since all \(\vect{b}_i\) with \(i \leq j\) lie in \(W_j\), this makes \(\vect{b}_{j+1}\) orthogonal to all its predecessors \(\vect{b}_1, \ldots, \vect{b}_j\).
Hence if the vectors \(\vect{b}_1, \ldots, \vect{b}_j\) are orthogonal,
so are the vectors \(\vect{b}_1, \ldots, \vect{b}_j, \vect{b}_{j+1}\).
(By using mathematical induction the proof can be made absolutely rigorous.)
Grasple Exercise 7.3.1
Gram-Schmidt D.I.Y.
Click to show/hide
The following example shows what happens if the Gram-Schmidt construction is applied to a subspace \(W = \operatorname{Span}\{\vect{a}_1, \ldots, \vect{a}_m\}\) where the vectors \(\vect{a}_i\) are not linearly independent.
Example 7.3.3
Let \(W\) be the defined as the span of the set \(\left\{\begin{pmatrix} 1 \\ -1 \\ 2 \\ 3 \end{pmatrix}, \begin{pmatrix} 2 \\ -2 \\ 4 \\ 6 \end{pmatrix}, \begin{pmatrix} 2 \\ 0 \\ 1 \\ 2 \end{pmatrix}, \begin{pmatrix} 0 \\ 2 \\ -3 \\ -4 \end{pmatrix} \right\}\).
Let us denote the vectors by \(\vect{a}_1, \ldots, \vect{a}_4\). As in the proof of the Gram-Schmidt process we use the notation \(W_j\) for the span of the vectors \( \vect{a}_1, \ldots, \vect{a}_j\).
Just following the protocol we find
\( \)\vect{b}_1 = \vect{a}1 = \begin{pmatrix} 1 \ -1 \ 2 \ 3 \end{pmatrix}, \quad \vect{b}2 = \vect{a}{2} - \dfrac{\vect{a}{2}\ip\vect{b}_1}{\vect{b}_1\ip\vect{b}_1}\vect{b}_1 = \begin{pmatrix} 0 \ 0 \ 0 \ 0 \end{pmatrix}. $$
The explanation is that \(\vect{b}_2 = \vect{a}_{2} - \operatorname{proj}_{\vect{a}_1}(\vect{a}_2) = \vect{a}_{2} - \vect{a}_{2} = \vect{0}\), since \(\vect{a}_2\) lies in \(\operatorname{Span}\{\vect{a}_1\}\). In other words, since \(W_2 = W_1\).
We discard the zero vector, and continu to compute the next new basis vector.
To get rid of fractions we rather continue with \(\vect{b}_3 = \begin{pmatrix} 4 \\ 2 \\ -1 \\ 0 \end{pmatrix}\).
The last step:
Here again we may conclude that \(\vect{a}_4\) lies in \(\operatorname{Span}\{\vect{b}_1,\vect{b}_3\}\), and we discard \(\vect{b}_4\) from our basis.
The conclusion is that
is an orthogonal basis for \(\operatorname{Span}\{\vect{a}_1,\ldots, \vect{a}_4\}\).
The idea of Example 7.3.3 can be generalised as follows. Suppose \(W = \operatorname{Span}\{\vect{a}_1,\ldots,\vect{a}_m\}\), where the vectors \(\vect{a}_i\) are not necessarily linearly independent. If we apply the Gram-Schmidt construction and discard the zero vector if it comes up, then we end up with an orthogonal basis \(\{\vect{b}_1,\ldots,\vect{b}_k\}\) for \(W\). Note that \(k < m\) occurs precisely when the original generating set of vectors \(\{\vect{a}_1,\ldots,\vect{a}_m\}\) is linearly dependent.
Remark 7.3.1
By applying the Gram-Schmidt process to a linearly independent set of vectors \(\{\vect{a}_1,\ldots,\vect{a}_m\}\) we get a orthogonal basis \(\{\vect{b}_1,\ldots,\vect{b}_m\}\) for the subspace \(W = \operatorname{Span}\{\vect{a}_1,\ldots,\vect{a}_m\}\).
By rescaling the vectors \(\vect{b}_i\) as follows
we can turn this new basis into an orthonormal basis.
Example 7.3.4
For the subspace
we found, in Example 7.3.2, the orthogonal basis
Rescaling (or normalising) gives the orthonormal basis
Grasple Exercise 7.3.2
Finding an orthonormal basis.
Click to show/hide
7.3.3. The \(QR\) decomposition#
The Gram-Schmidt process leads to the following interesting decomposition of an \(n \times m\)-matrix \(A\) with linearly independent columns.
Theorem 7.3.2 (\(QR\) decomposition)
Suppose \(A\) is an \(n \times m\)-matrix of rank \(m\). Then \(A\) can be written as
where \(Q\) is an \(n \times m\)-matrix with orthonormal columns, and \(R\) is an upper-triangular \(m \times m\)-matrix, with positive diagonal entries.
The matrix \(Q\) is found by applying the Gram-Schmidt process to the (linearly independent) columns \(\vect{a}_1,\ldots,\vect{a}_m\) of the matrix \(A\) and then renormalising.
Proof of Theorem 7.3.2
One way to see this, is to look at the creation of the orthonormal set \(\{\vect{q}_1,\ldots,\vect{q}_m\}\) from the linearly independent set \(\{\vect{a}_1,\ldots,\vect{a}_m\}\).
The general step in the Gram-Schmidt process is of the form
Realising that each \(\vect{b}_i\) is in the span of \(\{\vect{a}_1, \vect{a}_2, \ldots , \vect{a}_{i} \}\),
it follows that
so
Normalising the vectors \(\vect{b}_i\) can be seen as multiplying the matrix \(B\) with a diagonal matrix \(D\):
where the diagonal entries \(d_{ii}\) of \(D\) are given by
So
where \(CD\) is an upper-triangular matrix with positive diagonal entries.
Multiplying both sides with the inverse of \(CD\) gives
where \(R = (CD)^{-1}\) is still an upper-triangular matrix with positive diagonal entries.
There is an quicker way to find the matrix \(R\) than by inverting the matrix \(CD\) as in the proof of Theorem 7.3.2, which is the context of the next proposition.
Proposition 7.3.1
Let \(Q = \begin{pmatrix}\vect{q}_1&\cdots&\vect{q}_m\end{pmatrix}\) be the matrix constructed by exactly applying the Gram-Schmidt process followed by rescaling. Define \(R = Q^TA\).
Then \(R\) is an upper-triangular matrix with a positive entries on its diagonal, and
Proof of Proposition 7.3.1
We know that for the matrix \(Q\) as specified the decomposition
exists, with an upper-triangular matrix \(R\).
The matrix \(Q\) has orthonormal columns, so
Thus we can retrieve \(R\) if we multiply both sides of the equation \(\,A = QR\,\) by \(\,Q^T\):
The following example provides an illustration of this last proposition.
Example 7.3.5
Consider the matrix
From Example 7.3.2 and Example 7.3.4 we know that applying the Gram-Schmidt process to the columns of \(A\) leads to the matrix
We compute \(R = Q^TA\)
and see that this is indeed an upper-triangular matrix (with positive diagonal entries).
You may check for yourself that \(QR = A\).
Warning
The columns of \(Q\) being orthonormal is equivalent to \(Q^TQ = I\). However, in the case where \(Q\) is not a square matrix, this does not imply that \(QQ^T = I\).
7.3.4. Grasple exercises#
The first exercises are about applying the Gram-Schmidt (GS) algorithm
Grasple Exercise 7.3.3
Performing one step in the Gram-Schmidt process.
Click to show/hide
Grasple Exercise 7.3.4
Applying the Gram-Schmidt process for two vectors in \(\R^3\).
Click to show/hide
Grasple Exercise 7.3.5
Applying Gram-Schmidt for two vectors in \(\R^4\).
Click to show/hide
Grasple Exercise 7.3.6
Applying the Gram-Schmidt process for three vectors in \(\R^4\).
Click to show/hide
Grasple Exercise 7.3.7
Orthogonal basis for \(\operatorname{Span}\{\mathbf{a}_1, \mathbf{a}_2,\mathbf{a}_3\}\) in \(\R^3\).
Click to show/hide
Grasple Exercise 7.3.8
Finding an orthonormal basis for \(\operatorname{Span}\{\mathbf{a}_1, \mathbf{a}_2\}\) in \(\R^3\).
Click to show/hide
Grasple Exercise 7.3.9
Applying Gram-Schmidt for three vectors in \(\R^4\).
Click to show/hide
Grasple Exercise 7.3.10
Applying Gram-Schmidt for three vectors in \(\R^5\).
Click to show/hide
Grasple Exercise 7.3.11
Finding the \(QR\) decomposition of a \(2\times2\)-matrix.
Click to show/hide
The following exercises are about the \(QR\) decomposition.
Grasple Exercise 7.3.12
Finding the \(QR\) decomposition of a \(3\times2\)-matrix.
Click to show/hide
Grasple Exercise 7.3.13
Finding the \(QR\) decomposition of a \(3\times3\)-matrix.
Click to show/hide
Grasple Exercise 7.3.14
Finding the \(QR\) decomposition of a \(3\times3\)-matrix.
Click to show/hide
Grasple Exercise 7.3.15
Finding the \(QR\) decomposition of a \(4\times3\)-matrix.
Click to show/hide
Grasple Exercise 7.3.16
Finding the orthogonal projection onto \(\operatorname{Col}(A)\) via a \(QR\) decomposition.
Click to show/hide
The last exercises are more conceptual than computational.
Grasple Exercise 7.3.17
True/False question about properties of the Gram-Schmidt process.
Click to show/hide
Grasple Exercise 7.3.18
From orthogonal to orthonormal.
Click to show/hide
Grasple Exercise 7.3.19
What about GS for a linearly dependent set?
Click to show/hide
Grasple Exercise 7.3.20
GS for set of linearly dependent set of vectors.
Click to show/hide
Grasple Exercise 7.3.21
To build an orthogonal basis for the column space of a matrix.
Click to show/hide
Grasple Exercise 7.3.22
Ponderings about the \(QR\) decomposition (of a \(4 \times 2\)-matrix \(A\)).
Click to show/hide
Grasple Exercise 7.3.23
How many \(QR\) decompositions are there for an \(m \times n\)-matrix?