7.2. Orthogonal and Orthonormal bases#
7.2.1. Orthogonal and Orthonormal bases#
A subset \(S\) of \(\R^{n}\) is called orthogonal if any two distinct vectors \(\vect{v}_{1}\) and \(\vect{v}_{2}\) in \(S\) are orthogonal to each other. If \(S\) is a basis for a subspace \(V\) and \(S\) is orthogonal, we say it is an orthogonal basis for \(V\).
Consider the plane
Both \(\vect{v}_{1}\) and \(\vect{v}_{2}\) lie in \(\mathcal{P}\). The set \(\mathcal{B}=\left\{\vect{v}_{1},\vect{v}_{2}\right\}\) is a linearly independent set of two vectors in \(\mathcal{P}\). Since \(\dim(\mathcal{P})=2\), it must therefore be a basis. Furthermore, \(\vect{v}_{1}\ip\vect{v}_{2}=1-1-0=0\) so \(\vect{v}_{1}\) is orthogonal to \(\vect{v}_{2}\). Hence \(\mathcal{B}\) is an orthogonal basis for \(\mathcal{P}\).
Since \(\vect{0}\) is orthogonal to every vector, adding it to a set or removing it from a set does not change whether the set is orthogonal or not.
An orthogonal set \(S\) which does not contain \(\vect{0}\) is linearly independent.
Proof of Proposition 7.2.1
Assume \(S\) is linearly dependent. Then there are vectors \(\vect{v}_{1},...,\vect{v}_{n}\) in \(S\) and scalars \(c_{1},...,c_{n}\), not all zero, such that \(\vect{0}=c_{1}\vect{v}_{1}+\cdots +c_{n}\vect{v}_{n}.\) But then, for any \(i\):
Since no \(\vect{v}_{i}\) is \(\vect{0}\), all \(\vect{v}_{i}\ip\vect{v}_{i}\) are non-zero, hence all \(c_{i}\) must be zero, which contradicts our assumption.
As a consequence of Proposition 7.2.1, any orthogonal set that does not contain \(\vect{0}\) is an orthogonal basis for its span.
An orthogonal basis is called orthonormal if all elements in the basis have norm \(1\).
If \(\vect{v}_{1},...,\vect{v}_{n}\) is an orthogonal basis for a subspace \(V\), then an orthonormal basis for \(V\) can be obtained by dividing each \(\vect{v}_{i}\) by its norm.
Consider the plane \(\mathcal{P}\), the vectors \(\vect{v}_{1},\vect{v}_{2}\) and the basis \(\mathcal{B}\) from Example 7.2.1. This \(\mathcal{B}\) is an orthogonal basis, but \(\norm{\vect{v}_{1}}=\sqrt{2}\) and \(\norm{\vect{v}_{2}}=\sqrt{6}\) so it is not orthonormal.
We can remedy this by considering the basis \(\mathcal{B}_{2}=\left\{\vect{u}_{1},\vect{u}_{2}\right\}\) where
This new basis \(\mathcal{B}_{2}\) is an orthonormal basis. We have kept the directions of \(\vect{v}_{1}\) and \(\vect{v}_{2}\), but we have made sure that their norms are now \(1\).
The essence of Theorem 7.2.1 is that it is easy to find the coordinates of any vector in a subspace \(V\) with respect to a given orthogonal basis of \(V\). In fact, this is largely why we are interested in such bases.
Let \(V\) be a subspace of \(\R^{n}\) and assume \(\vect{v}_{1},...,\vect{v}_{k}\) is an orthogonal basis for \(V\). Then any vector \(\vect{v}\) in \(V\) can be written as:
In particular, if \(\vect{v}_{1},..,\vect{v}_{k}\) is an orthonormal basis, then any \(\vect{v}\) in \(V\) can be written as:
Proof of Theorem 7.2.1
Since \(\vect{v}_{1},...,\vect{v}_{k}\) is a basis for \(V\) and \(\vect{v}\) is in \(V\), there are scalars \(c_{1},...,c_{k}\) such that \(\vect{v}=c_{1}\vect{v}_{1}+\cdots +c_{k}\vect{v}_{k}\). We only have to show that these scalars are as claimed. For any \(j\) between \(1\) and \(k\),
by the orthogonality of \(\left\{\vect{v}_{1},...,\vect{v}_{k}\right\}\). This implies \(c_{j}=\frac{\vect{v}\ip\vect{v}_{j}}{\vect{v}_{j}\ip\vect{v}_{j}}\) as claimed.
If \(\vect{v}_{1},...,\vect{v}_{k}\) is orthonormal, then \(\vect{v}_{j}\ip\vect{v}_{j}=1\) for every \(j\), so this reduces to \(c_{j}=\vect{v}\ip\vect{v}_{j}\).
In this theorem, it is vital that \(\vect{v}\) is known to be in \(V\). If \(\vect{v}\) is not in \(V\), then it can definitely not be expressed as a linear combination of basis elements of \(V\). However, the right hand side appearing in Theorem 7.2.1 is still very important. It comes back in Theorem 7.2.2.
7.2.2. Orthogonal Projections Revisited#
In Section 3.3.1, we have already briefly touched upon orthogonal projections in higher dimension. Now that we know about orthogonal bases, we can make this more concrete. Let us start with a general definition of the orthogonal projection.
Let \(V\) be a subspace of \(\R^{n}\), let \(\vect{u}\) be a vector in \(\R^{n}\), and let \(\vect{u}=\vect{u}_{V}+\vect{u}_{V^{\bot}}\) be the orthogonal decomposition of \(\vect{u}\) with respect to \(V\) as defined in Proposition 7.1.5. We call \(\vect{u}_{V}\) the orthogonal projection of \(\vect{u}\) on \(V\).
We now establish the following useful facts about the orthogonal projection. Of particular interest is iii., which states in essence that \(\vect{u}_{V}\) is the best approximation of \(\vect{u}\) with a vector from \(V\) or, in other words, that the projection of \(\vect{u}\) onto \(V\) is the point in \(V\) which is closest to \(\vect{u}\).
Let \(V\) be a subspace of \(\R^{n}\) and let \(\vect{u}\) be an arbitrary vector in \(\R^{n}\) with orthogonal decomposition \(\vect{u}=\vect{u}_{V}+\vect{u}_{V^{\bot}}\). Then:
-
\(\norm{\vect{u}}\geq \norm{\vect{u}_{V}}\).
-
\(\vect{u}\ip\vect{u}_{V}\geq 0\) and \(\vect{u}\ip\vect{u}_{V}=0\) precisely when \(\vect{u}\) is in \(V^{\bot}\).
-
For any \(\vect{v}\) in \(V\), \(\norm{\vect{u}-\vect{u}_{V}}\leq \norm{\vect{u}-\vect{v}}\).
Proof of Proposition 7.2.2
Recall that the inner product of any vector with itself is non-negative and that \(\vect{u}_{V}\ip\vect{u}_{V^{\bot}}=0\).
-
We find:
\[\begin{split} \begin{align*} \norm{\vect{u}}&=\sqrt{\vect{u}\ip\vect{u}}=\sqrt{(\vect{u}_{V}+\vect{u}_{V^{\bot}})\ip(\vect{u}_{V}+\vect{u}_{V^{\bot}})}\\ &=\sqrt{\vect{u}_{V}\ip\vect{u}_{V}+2\vect{u}_{V}\ip\vect{u}_{V^{\bot}}+\vect{u}_{V^{\bot}}\ip\vect{u}_{V^{\bot}}}=\sqrt{\vect{u}_{V}\ip\vect{u}_{V}+\vect{u}_{V^{\bot}}\ip\vect{u}_{V^{\bot}}}\\ &\geq\sqrt{\vect{u}_{V}\ip\vect{u}_{V}}=\norm{\vect{u}_{V}} \end{align*} \end{split}\] -
We have:
\[\begin{split} \begin{align*} \vect{u}\ip\vect{u}_{V}&=(\vect{u}_{V}+\vect{u}_{V^{\bot}})\ip\vect{u}_{V}\\ &=\vect{u}_{V}\ip\vect{u}_{V}\geq 0. \end{align*} \end{split}\]Furthermore, \(\vect{u}_{V}\ip\vect{u}_{V}=0\) implies \(\vect{u}_{V}=\vect{0}\), so \(\vect{u}=\vect{u}_{V^{\bot}}\) which is in \(V^{\bot}\).
-
For arbitrary \(\vect{v}\) in \(V\), \(\vect{u}_{V}-\vect{v}\) is in \(V\). As \(\vect{u}_{V^{\bot}}\) is in \(V^{\bot}\), this implies \((\vect{u}_{V}-\vect{v})\ip \vect{u}_{V^{\bot}}=0\). Therefore,
\[\begin{split} \begin{align*} \norm{\vect{u}-\vect{v}}&=\sqrt{(\vect{u}_{V}+\vect{u}_{V^{\bot}}-\vect{v})\ip(\vect{u}_{V}+\vect{u}_{V^{\bot}}-\vect{v})}\\ &=\sqrt{(\vect{u}_{V}-\vect{v})\ip(\vect{u}_{V}-\vect{v})+\vect{u}_{V^{\bot}}\ip\vect{u}_{V^{\bot}}}\\ &\geq\sqrt{\vect{u}_{V^{\bot}}\ip\vect{u}_{V^{\bot}}}=\norm{\vect{u}-\vect{u}_{V}}. \end{align*} \end{split}\]
Naturally, we want to know how to find such an orthogonal projection. If we have an orthogonal basis for \(V\), there turns out to be a convenient way to compute it, as per Theorem 7.2.2.
Suppose \(V\) is a subspace of \(\R^{n}\) with orthogonal basis \(\vect{v}_{1},...,\vect{v}_{k}\) and let \(\vect{u}\) be a vector in \(\R^{n}\). Then
Proof of Theorem 7.2.2
Put
Since all the \(\vect{v}_{i}\)’s are in \(V\), so is \(\vect{w}\). It suffices to show that \(\vect{u}-\vect{w}\) is in \(V^{\bot}\), because then \(\vect{u}=\vect{w}+(\vect{u}-\vect{w})\) must be the decomposition as in Proposition 7.1.5.
To prove this, we check that \(\vect{u}-\vect{w}\) is orthogonal to all the \(\vect{v}_{i}\)’s, which form a basis of \(V\). This follows readily:
It is worthwhile to compare this result to the formula for the projection of one vector on another given in Proposition 1.2.3. What Theorem 7.2.2 states is essentially this: if \(V\) has an orthogonal basis \(\vect{v}_{1},...,\vect{v}_{k}\), then the projection of any vector \(\vect{u}\) onto \(V\) is the sum of the projections of \(\vect{u}\) on the \(\vect{v}_{i}\)’s. This is illustrated in Figure 7.2.1
Let us revisit the plane \(\mathcal{P}\) with orthogonal basis \(\mathcal{B}=\left\{\vect{v}_{1},\vect{v}_{2}\right\}\) from Example 7.2.1, i.e.
We find \(\vect{u}\ip\vect{v}_{1}=-2,\vect{u}\ip\vect{v}_{2}=-4,\) and \(\vect{v}_{1}\ip\vect{v}_{1}=2,\vect{v}_{2}\ip\vect{v}_{2}=6\). Consequently,
is the orthogonal projection of \(\vect{u}\) on \(\mathcal{P}\).
If \(V\) is a subspace of \(\R^{n}\), then
is a linear transformation. It is called the orthogonal projection on \(V\). The standard matrix of this transformation is the matrix for which the \(i\)-th column is:
Here the \(\vect{v}_{1},...,\vect{v}_{k}\) are an arbitrary orthogonal basis for \(V\).
Let us once more consider the Example 7.2.1 and let us find the standard matrix corresponding to the linear transformation \(T:\R^{3}\to\R^{3}, \vect{u}\mapsto\vect{u}_{V}.\) In Example 7.2.1, we already found that \(\vect{v}_{1}\ip\vect{v}_{1}=2\) and \(\vect{v}_{2}\ip\vect{v}_{2}=6\). Standard computations yield:
so the first column of the standard matrix will be :
Similarly, we find
so the second column of the standard matrix will be :
Finally,
so the last column of the standard matrix will be:
Let us verify that, for the vector \(\vect{u}\) from Example 7.2.3 we do indeed get the right answer:
7.2.3. Orthogonal Matrices#
Square matrices for which the columns are orthonormal turn out to be of particular importance. For instance, they turn up in numerical linear algebra, where using them can speed up certain computations considerably.
We call a square matrix orthogonal if its columns form an orthonormal set.
A matrix for which the columns are orthogonal is not necessarily an orthogonal matrix! It is vital that the columns are orthonormal. The terminology is somewhat confusing, but it has become standard.
Let us consider some examples and non-examples.
-
The identity matrix \(I_{n}\) is an orthogonal matrix for any \(n\).
-
The matrix
\[\begin{split} A=\begin{bmatrix} 1&1\\ 1&-1 \end{bmatrix} \end{split}\]is not orthogonal. Its columns are pairwise orthogonal, but neither columns has norm 1. Indeed, the norm of both columns is \(\sqrt{2}\).
-
If we consider the matrix from ii. but we divide both columns by their norms, we obtain:
\[\begin{split} B=\begin{bmatrix} \frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}&-\frac{1}{\sqrt{2}} \end{bmatrix}. \end{split}\]This matrix really is orthogonal.
What we did in going from ii. to iii. works in general: if we have a matrix \(A\) with orthogonal columns, we can turn it into an orthogonal matrix \(B\) by dividing every column by its norm. Under one condition, though: none of the columns of \(A\) may be the zero vector, for then we would need to divide by \(0\), which is impossible.
An \(n\times n\)-matrix \(A\) is orthogonal if and only if \(A^{T}A=I_{n}\).
Proof of Proposition 7.2.3
Let \(\vect{v}_{1},\vect{v}_{2}...,\vect{v}_{n}\) be the columns of \(A\), so \(\vect{v}_{1}^{T},\vect{v}_{2}^{T},...,\vect{v}_{n}^{T}\) are the rows of \(A^{T}\). Consequently,
The matrix on the right hand side is \(I_{n}\) if and only if all diagonal entries are \(1\) and all off-diagonal entries are \(0\). This happens precisely when
that is, when \(\left\{\vect{v}_{1},\vect{v}_{2},...,\vect{v}_{k}\right\}\) is an orthonormal set.
A square matrix \(A\) is orthogonal if and only if \(A^{T}=A^{-1}\).
The main reason orthogonal matrices are so useful is that they preserve lengths and angles. That this is so, is shown in Proposition 7.2.4.
Let \(A\) be an orthogonal \(n\times n\)-matrix and let \(\vect{v}_{1},\vect{v}_{2}\) be arbitrary vectors in \(\R^{n}\). Then:
-
\((A\vect{v}_{1})\ip(A\vect{v}_{2})=\vect{v}_{1}\ip\vect{v}_{2}\),
-
\(\norm{A\vect{v}_{1}}=\norm{\vect{v}_{1}}\),
-
\(\angle(A\vect{v}_{1},A\vect{v}_{2})=\angle(\vect{v}_{1},\vect{v}_{2})\).
Proof of Proposition 7.2.4
Using \(A^{T}A=I_{n}\), we find:
which establishes i. The other points are direct consequences of i.; we leave their proofs to the reader.
Many statements about orthogonal matrices still hold for non-square matrices, as long as the columns form an orthonormal set. Both Proposition 7.2.3 and Proposition 7.2.4 remain precisely the same, with the same proof, for an \(m\times n\) matrix \(A\). Corollary 7.2.1 doesn’t hold for non-square matrices, as the inverse of a non-square matrix cannot exist.
You could of course also consider matrices for which the rows are orthonormal. It turns out, however, that this yields the exact same concept.
An \(n\times n\)-matrix \(A\) is orthogonal if and only if its rows are orthonormal.
Proof of Proposition 7.2.5
We know that \(A\) is orthogonal if and only if \(A^{T}A=I_{n}\). But this implies \(A^{T}=A^{-1}\) and therefore also \(AA^{T}=I_{n}\). Since \((A^{T})^{T}A^{T}=AA^{T}=I_{n}\), \(A^{T}\) must be orthogonal by Proposition 7.2.3. Hence the columns of \(A^{T}\), which are the rows of \(A\), must be orthonormal.
7.2.4. Grasple Exercises#
Orthogonal basis and scalar multiplication.
Show/Hide Content
Extending a vector in \(\R^3\) to an orthogonal basis for \(\R^3\).
Show/Hide Content
Extending a set of 2 orthogonal vectors in \(\R^4\) to an orthogonal basis of \(\R^4\).
Show/Hide Content
Matrix of projection onto plane in \(\R^3\) with an orthogonal basis
Show/Hide Content
Projection onto a 2-dimensional subspace of \(\R^4\) with orthogonal basis
Show/Hide Content
Projection onto the null space of a matrix
Show/Hide Content
Projection formula in case of a non-orthogonal basis?
Show/Hide Content
Alternative definition of an orthogonal matrix?
Show/Hide Content
Ponderings about orthogonal matrices