7.3. The Gram-Schmidt Process#
7.3.1. Introduction#
In Section Section 7.2 we have seen that orthogonal bases are nice to work with. Both for finding coordinates and also for finding orthogonal projections.
In this section an algorithm is presented to construct an orthogonal basis from an arbitrary basis of a subspace \(W\) in \(\R^n\).
As a starter have a look at the following example
Let \(W\) be the subspace in \(\R^3\) spanned by the two vectors \(\vect{a}_1 = \begin{bmatrix} 2 \\ 1 \\3 \end{bmatrix}\) and \(\vect{a}_2 = \begin{bmatrix} 3 \\ -2 \\1 \end{bmatrix}\). We want to construct an orthogonal basis \(\{\vect{b}_1,\vect{b}_2 \}\) for \(W\).
For the first vector of this ‘new’ basis we can simply take \(\vect{b}_1 = \vect{a}_1\).
As a second basis vector we can take
It is then clear that \(\vect{b}_2\) is in span\(\{\vect{a}_1, \vect{a}_2\}\), and by the property of the orthogonal projection \(\text{proj}_{\vect{a}_1}(\vect{a}_2)\) it follows that \(\vect{b}_2 \perp \vect{b}_1\).
A fortiori \(\{\vect{b}_1, \vect{b}_2\}\) is linearly independent, so \(\{\vect{b}_1, \vect{b}_2\}\) is an orthogonal basis for \(W = \text{span}{\{\vect{a}_1, \vect{a}_2\}}\).
The explicit vectors we find are
If we prefer vectors without fractions, we can rescale the second vector, and then find the orthogonal basis
7.3.2. The Gram-Schmidt process#
Suppose \(W\) is a subspace in \(\R^n\) with basis \(\{\vect{a}_1,\ldots,\vect{a}_m\}\). Construct the set of vectors \(\vect{b}_1,\ldots,\vect{b}_m\) according to the following rules
and in general
for \(j = 1,2,\ldots, m-1\).
Then all along the way
is an orthogonal basis for
In particular, in the end \(\{\vect{b}_1,\ldots,\vect{b}_m\}\) will be an orthogonal basis for \(\text{span}\{\vect{a}_1, \ldots, \vect{a}_m\} = W\).
The above construction/algorithm is called the Gram-Schmidt process.
Let \(W\) be the defined as the span of the set \(\left\{\begin{bmatrix} 1 \\ 1 \\ -1 \\ 1 \end{bmatrix}, \begin{bmatrix} 3 \\ 3 \\ -2 \\ 0 \end{bmatrix}, \begin{bmatrix} 3 \\ 1 \\ 4 \\ -4 \end{bmatrix} \right\}\). It can be shown that these vectors are linearly independent.
We use the Gram-Schmidt algorithm to create an orthogonal basis for \(W\).
Step by step we find
and
Check for yourself that the vectors \(\vect{b}_1,\vect{b}_2, \vect{b}_3\) are indeed orthogonal.
Proof of Theorem 7.3.1
Let \(W_j\) be the subspace spanned by the first \(j\) vectors \(\vect{a}_1, \ldots, \vect{a}_j\), for \(j = 1,2\ldots,m\).
First of all
In the other steps, assume that we have so far created the orthogonal basis \(\{\vect{b}_1, \ldots,\vect{b}_j \}\) for \(W_j\).
Then in fact
Namely, the projection \(\text{proj}_{W_j}(\vect{a}_{j+1})\) can be computed using the already created orthogonal basis \((\vect{b}_1, \ldots, \vect{b}_j)\) of \(W_j\). See Figure 7.3.1
That is,
This makes \(\text{proj}_{W_j}(\vect{a}_{j+1})\) an element of \(W_{j}\) and \(\vect{b}_{j+1} = \vect{a}_{j+1} - \text{proj}_{W_j}(\vect{a}_{j+1})\) an element of \(W_{j+1}\).
\(\vect{b}_{j+1} \neq \vect{0}\) since we assumed that the vectors \(\vect{a}_1, \ldots, \vect{a}_m\) are independent, so \(\vect{a}_{j+1}\) is not in \(W_j\).
Moreover, \(\vect{b}_{j+1} = \vect{a}_{j+1} - \text{proj}_{W_j}(\vect{a}_{j+1})\) is perpendicular to \(W_j\) by the properties of orthogonal projection.
Since all \(\vect{b}_i\) with \(i \leq j\) lie in \(W_j\), this makes \(\vect{b}_{j+1}\) orthogonal to all its predecessors \(\vect{b}_1, \ldots, \vect{b}_j\).
Hence if the vectors \(\vect{b}_1, \ldots, \vect{b}_j\) are orthogonal,
so are the vectors \(\vect{b}_1, \ldots, \vect{b}_j, \vect{b}_{j+1}\).
(By using mathematical induction the proof can be made absolutely rigorous.)
Gram-Schmidt D.I.Y.
Show/Hide Content
The following example shows what happens if the Gram-Schmidt construction is applied to a subspace \(W = \text{span}\{\vect{a}_1, \ldots, \vect{a}_m\}\) where the vectors \(\vect{a}_i\) are not linearly independent.
Let \(W\) be the defined as the span of the set \(\left\{\begin{bmatrix} 1 \\ -1 \\ 2 \\ 3 \end{bmatrix}, \begin{bmatrix} 2 \\ -2 \\ 4 \\ 6 \end{bmatrix}, \begin{bmatrix} 2 \\ 0 \\ 1 \\ 2 \end{bmatrix}, \begin{bmatrix} 0 \\ 2 \\ -3 \\ -4 \end{bmatrix} \right\}\).
Let us denote the vectors by \(\vect{a}_1, \ldots, \vect{a}_4\). As in the proof of the Gram-Schmidt process we use the notation \(W_j\) for the span of the vectors \( \vect{a}_1, \ldots, \vect{a}_j\).
Just following the protocol we find
\(\vect{b}_1 = \vect{a}_1 = \begin{bmatrix} 1 \\ -1 \\ 2 \\ 3 \end{bmatrix}, \quad \vect{b}_2 = \vect{a}_{2} - \dfrac{\vect{a}_{2}\ip\vect{b}_1}{\vect{b}_1\ip\vect{b}_1}\vect{b}_1 = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}\).
The explanation is that \(\vect{b}_2 = \vect{a}_{2} - \text{proj}_{\vect{a}_1}(\vect{a}_2) = \vect{a}_{2} - \vect{a}_{2} = \vect{0}\), since \(\vect{a}_2\) lies in span\(\{\vect{a}_1\}\). In other words, since \(W_2 = W_1\).
We discard the zero vector, and continu to compute the next new basis vector.
To get rid of fractions we rather continue with \(\vect{b}_3 = \begin{bmatrix} 4 \\ 2 \\ -1 \\ 0 \end{bmatrix}\).
The last step:
Here again we may conclude that \(\vect{a}_4\) lies in span\(\{\vect{b}_1,\vect{b}_3\}\), and we discard \(\vect{b}_4\) from our basis.
The conclusion is that
is an orthogonal basis for span\(\{\vect{a}_1,\ldots, \vect{a}_4\}\).
The idea of Example 7.3.3 can be generalized as follows. Suppose \(W = \text{span}\{\vect{a}_1,\ldots,\vect{a}_m\}\), where the vectors \(\vect{a}_i\) are not necessarily linearly independent. If we apply the Gram-Schmidt construction and discard the zero vector if it comes up, then we end up with an orthogonal basis \(\{\vect{b}_1,\ldots,\vect{b}_k\}\) for \(W\). Note that \(k < m\) occurs precisely when the original generating set of vectors \(\{\vect{a}_1,\ldots,\vect{a}_m\}\) is linearly dependent.
By applying the Gram-Schmidt process to a linearly independent set of vectors \(\{\vect{a}_1,\ldots,\vect{a}_m\}\) we get a orthogonal basis \(\{\vect{b}_1,\ldots,\vect{b}_m\}\) for the subspace \(W = \text{span}\{\vect{a}_1,\ldots,\vect{a}_m\}\).
By rescaling the vectors \(\vect{b}_i\) as follows
we can turn this new basis into an orthonormal basis.
For the subspace
we found, in Example 7.3.2, the orthogonal basis
Rescaling (or normalizing) gives the orthonormal basis
Finding an orthonormal basis
Show/Hide Content
7.3.3. The QR decomposition#
The Gram-Schmidt process leads to the following interesting decomposition of an \(n \times m\) matrix \(A\) with linearly independent columns.
(QR decomposition)
Suppose \(A\) is an \(n \times m\) matrix of rank \(m\). Then \(A\) can be written as
where \(Q\) is an \(n \times m\) matrix with orthonormal columns, and \(R\) is an upper triangular \(m \times m\) matrix, with positive diagonal entries.
The matrix \(Q\) is found by applying the Gram-Schmidt process to the (linearly independent) columns \(\vect{a}_1,\ldots,\vect{a}_m\) of the matrix \(A\) and then renormalizing.
Proof of Theorem 7.3.2
One way to see this, is to look at the creation of the orthonormal set \(\{\vect{q}_1,\ldots,\vect{q}_m\}\) from the linearly independent set \(\{\vect{a}_1,\ldots,\vect{a}_m\}\).
The general step in the Gram-Schmidt process is of the form
Realizing that each \(\vect{b}_i\) is in the span of \(\{\vect{a}_1, \vect{a}_2, \ldots , \vect{a}_{i} \}\),
it follows that
so
Normalizing the vectors \(\vect{b}_i\) can be seen as multiplying the matrix \(B\) with a diagonal matrix \(D\):
where the diagonal entries \(d_{ii}\) of \(D\) are given by
So
where \(CD\) is an upper triangular matrix with positive diagonal entries.
Multiplying both sides with the inverse of \(CD\) gives
where \(R = (CD)^{-1}\) is still an upper triangular matrix with positive diagonal entries.
There is an quicker way to find the matrix \(R\) than by inverting the matrix \(CD\) as in the proof of Theorem 7.3.2, which is the context of the next proposition.
Let \(Q = [\vect{q}_1,\ldots,\vect{q}_m]\) be the matrix constructed by exactly applying the Gram-Schmidt process followed by rescaling. Define \(R = Q^TA\).
Then \(R\) is an upper triangular matrix with a positive entries on its diagonal, and
Proof of Proposition 7.3.1
We know that for the matrix \(Q\) as specified the decomposition
exists, with an upper triangular matrix \(R\).
The matrix \(Q\) has orthonormal columns, so
Thus we can retrieve \(R\) if we multiply both sides of the equation \(\,A = QR\,\) by \(\,Q^T\):
The following example provides an illustration of this last proposition.
Consider the matrix
From Example 7.3.2 and Example 7.3.4 we know that applying the Gram-Schmidt process to the columns of \(A\) leads to the matrix
We compute \(R = Q^TA\)
and see that this is indeed an upper triangular matrix (with positive diagonal entries).
You may check for yourself that \(QR = A\).
Fill in the details of the following more direct proof of the quick way to find \(QR\)-decomposition.
Let \(Q\) be the matrix coming from the Gram-Schmidt process, followed by rescaling.
Since by construction
it follows that the entries below the diagonal of the product \(Q^TA\) are all equal to zero. So: \(Q^TA = R\) is an upper triangular matrix.
Recalling the construction of \(\vect{b}_i\) and \(\vect{q}_i\) from \(\vect{a}_1,\ldots,\vect{a}_i\), it can be shown that the diagonal entries \(r_{ii}\) are equal to \(\vect{q}_i^T\vect{a}_i = \norm{\vect{a}_i} > 0\).
Furthermore, since the columns of \(Q\) form an orthonormal basis of \(\text{Col}\,{A}\), the matrix \(QQ^T\) represents the orthogonal projection onto \(\text{Col}\,A\). Use this to show that \(QQ^TA = A\).
Warning: the columns of \(Q\) being orthonormal is equivalent to \(Q^TQ = I\). However, in the case where \(Q\) is not a square matrix, this does not imply that \(QQ^T = I\).
7.3.4. Grasple Exercises#
Performing one step in the Gram-Schmidt process.
Show/Hide Content
Applying the Gram-Schmidt process for two vectors in \(\R^3\).
Show/Hide Content
Applying Gram-Schmidt for two vectors in \(\R^4\).
Show/Hide Content
Applying the Gram-Schmidt process for three vectors in \(\R^4\).
Show/Hide Content
Orthogonal basis for span\(\{\mathbf{a}_1, \mathbf{a}_2,\mathbf{a}_3\}\) in \(\R^3\).
Show/Hide Content
Finding an orthonormal basis for span\(\{\mathbf{a}_1, \mathbf{a}_2\}\) in \(\R^3\).
Show/Hide Content
Applying Gram-Schmidt for three vectors in \(\R^4\).
Show/Hide Content
Applying Gram-Schmidt for three vectors in \(\R^5\).
Show/Hide Content
Finding the QR-decomposition of a \(2\times2\) matrix.
Show/Hide Content
Finding the QR-decomposition of a \(3\times2\) matrix.
Show/Hide Content
Finding the QR-decomposition of a \(3\times3\) matrix.
Show/Hide Content
Finding the QR-decomposition of a \(3\times3\) matrix.
Show/Hide Content
Finding the QR-decomposition of a \(4\times3\) matrix.
Show/Hide Content
T/F question about properties of Gram Schmidt process.
Show/Hide Content
From orthogonal to orthonormal.
Show/Hide Content
What about GS for a linearly dependent set?
Show/Hide Content
GS for set of linearly dependent set of vectors.
Show/Hide Content
To build an orthogonal basis for the column space of a matrix.
Show/Hide Content
Ponderings about the QR-decomposition (of a \(4 \times 2\) matrix \(A\)).
Show/Hide Content
How many QR-decompositions are there for an \(m \times n\) matrix?