8.3. Singular Value Decomposition (SVD)#

We have seen already several ways to factorise matrices. In Sec:LU-decomp, we studied the \(LU\) and the \(PLU\) factorisations, and in Section 7.3.3 we laid the QR Decomposition on the table. In Section 8.1 we showed that every symmetric (square) matrix \(A\) can be written as \(A = QDQ^{-1} = QDQ^T\). In this section it is in a sense this last decomposition we will generalize to non-symmetric matrices, and even to non-square matrices. We will introduce and study the so-called singular value decomposition (SVD) of a matrix. In the first subsection (Section 8.3.1) we will give the definition of the SVD, and illustrate it with a few examples. In the second subsection (Section 8.3.2) an algorithm to compute the SVD is presented and illustrated. And it will be shown that this algorithm always yields a proper SVD. The last two subsections will be devoted to understanding the SVD in a geometric way, and to possible practical uses of the SVD.

8.3.1. Definition of the Singular Value Decomposition#

Let \(A\) be an \(m\times n\) matrix, and let \(p\) be the minimum of \(m\) and \(n\).

Definition 8.3.1

A singular value decomposition of \(A\) is a decomposition

\[ A = U\Sigma V^T, \]

where

  • \(U\) is an \(m\times m\) orthogonal matrix,

  • \(V\) is an \(n \times n\) orthogonal matrix,

  • \(\Sigma\) is an \(m\times n\) matrix which is zero everywhere, apart from the entries \(\Sigma_{ii} = \sigma_i\), \(i = 1,\ldots , p\), which are all \(\geq 0\), and in decreasing order. That is, \(\sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_p\).

The nonnegative numbers \(\sigma_i\) are called the singular values of \(A\).

So the decomposition will look like either

(8.3.1)#\[\begin{split}U\begin{bmatrix} \sigma_1 & 0 & \cdots & \cdots& 0 & 0& \cdots& 0\\ 0 & \sigma_2 & \ddots & & \vdots& 0& \cdots& 0 \\ \vdots & \ddots & \ddots & \ddots &\vdots & \vdots & & \vdots\\ \vdots & & \ddots & \ddots & 0 &\vdots & & \vdots\\ 0 & \cdots & \cdots & 0 & \sigma_p & 0& \cdots& 0\\ \end{bmatrix} V^T, \quad m \leq n\end{split}\]

or

(8.3.2)#\[\begin{split}U\begin{bmatrix} \sigma_1 & 0 & \cdots & \cdots& 0 \\ 0 & \sigma_2 & \ddots & & \vdots \\ \vdots & \ddots & \ddots & \ddots &\vdots \\ \vdots & & \ddots & \ddots & 0 \\ 0 & \cdots & \cdots & 0 & \sigma_p \\ & & & & \\ 0 & 0 & \cdots & \cdots & 0 \\ \vdots & \vdots & & \ &\vdots \\ 0 & 0 & \cdots & \cdots & 0 \\ \end{bmatrix} V^T, \quad m > n.\end{split}\]

That the singular values must be nonnegative keeps open the possibility that some of the (last) singular values may be 0.

Example 8.3.1

For the matrix \(A = \begin{bmatrix}1&3\\2&2\\3&1 \end{bmatrix}\) a singular value decomposition is given by \(A = U\Sigma V^T\) where

\[\begin{split} U = \begin{bmatrix}\dfrac{1}{\sqrt{3}}&-\dfrac{1}{\sqrt{2}}&-\dfrac{1}{\sqrt{6}}\\ \dfrac{1}{\sqrt{3}}&0&\dfrac{2}{\sqrt{6}}\\\dfrac{1}{\sqrt{3}}&\dfrac{1}{\sqrt{2}}&-\dfrac{1}{\sqrt{6}} \end{bmatrix}, \quad \Sigma = \begin{bmatrix}\sqrt{24}&0\\0&2\\0&0 \end{bmatrix}, \quad V = \begin{bmatrix}\dfrac1{\sqrt{2}} &\dfrac1{\sqrt{2}} \\ \dfrac1{\sqrt{2}} & -\dfrac1{\sqrt{2}} \end{bmatrix}. \end{split}\]

Let us point out a few properties of this decomposition.

  • The matrix \(A\) has size \(3\times2\), so \(\Sigma\) is of the form depicted in (8.3.2).

  • The third column of \(U\) does not really play a role in the product, since all its entries are multiplied by two zeros in the last row of \(\Sigma\).
    We can write this SVD in a more ‘economic’ form by leaving out the third column of \(U\) and the third row of \(\Sigma\):


    \[\begin{split}A = \begin{bmatrix}\dfrac{1}{\sqrt{3}}&-\dfrac{1}{\sqrt{2}}\\ \dfrac{1}{\sqrt{3}}&0\\ \dfrac{1}{\sqrt{3}}&\dfrac{1}{\sqrt{2}} \end{bmatrix} \begin{bmatrix}\sqrt{24}&0\\0&2\end{bmatrix} \begin{bmatrix}\dfrac1{\sqrt{2}} &\dfrac1{\sqrt{2}} \\ \dfrac1{\sqrt{2}} & -\dfrac1{\sqrt{2}} \end{bmatrix}^T\end{split}\]

  • The first two colums of \(U\), multiples of the vectors \(\begin{bmatrix}1\\1\\1\end{bmatrix}\) and \(\begin{bmatrix}-1\\0\\1\end{bmatrix}\), give an orthonormal basis of the column space of the matrix \(A\). \(\begin{bmatrix}1\\1\\1\end{bmatrix} = \dfrac14\begin{bmatrix}1\\2\\3\end{bmatrix} +\dfrac14\begin{bmatrix}3\\2\\1\end{bmatrix}= \frac14\vect{a}_1 + \frac14\vect{a}_2\), and \(\begin{bmatrix}-1\\0\\1\end{bmatrix} = \frac12\vect{a}_2 - \frac12\vect{a}_1\).

  • The two columns of the matrix \(V\) give an orthonormal basis of the row space of the matrix \(A\). (Which is not so striking here, since that row space is the whole of \(\R^2\).)

  • The number of nonzero singular values is two, which is equal to the number of independent columns (and also rows) of \(A\), which is the rank of \(A\).

  • The decomposition can be rewritten in a way analogous to the spectral decomposition Theorem 8.1.2.


    \[\begin{split} \begin{array}{ccl} A &=& \sigma_1 \mathbf{u}_1\mathbf{v}_1^T + \sigma_2\mathbf{u}_2\mathbf{v}_2^T \\ &=& \sqrt{24} \begin{bmatrix}\dfrac{1}{\sqrt{3}} \\ \dfrac{1}{\sqrt{3}} \\ \dfrac{1}{\sqrt{3}}\end{bmatrix} \begin{bmatrix}\dfrac{1}{\sqrt{2}} & \dfrac{1}{\sqrt{2}}\end{bmatrix} + 2\begin{bmatrix}-\dfrac{1}{\sqrt{2}} \\0\\ \dfrac{1}{\sqrt{2}}\end{bmatrix} \begin{bmatrix}\dfrac{1}{\sqrt{2}} & -\dfrac{1}{\sqrt{2}}\end{bmatrix} \end{array}. \end{split}\]

    With the spectral decomposition (see Theorem 8.1.2) we found that any symmetric matrix \(A\) can be written as a linear combination of rank one matrices \(P_i\). Moreover, these matrices \(P_i\) can be interpreted as projections onto orthogonal one-dimensional subspaces of \(\R^n\). Here the least we can say is that we have written the matrix \(A\) of rank 2 as a linear combination of two rank 1 matrices.

The properties mentioned in Example 8.3.1 hold more generally. We collect a few (and add a fourth) in the next proposition.

Proposition 8.3.1 (Basic properties of a singular value decomposition.)

Suppose \(A = U\Sigma V^T\), with \(U, \Sigma, V\) as in the definition.

  1. The number \(r\) of nonzero singular values is equal to the rank of \(A\).

  2. The first \(r\) columns of \(U\) give an orthonormal basis for the column space of \(A\).

  3. The first \(r\) columns of \(V\) give an orthonormal basis for the row space of \(A\).

  4. A singular value decomposition of the matrix \(A^T\) is given by \(A^T = V\Sigma^TU^T\).

The main goal is to show that a singular value decomposition always exists, and, how to construct one. Both questions will be answered in reversed order in the next subsection

8.3.2. Existence of a Singular Value Decomposition#

We start with an important observation that explains the central role of the matrix \(A^TA\) in the algorithm to come.

Proposition 8.3.2 (Computing the Singular Values)

Let \(A\) be an \(m\times n\) matrix.
The singular values of a matrix \(A\) are the square roots of the (nonnegative!) eigenvalues of the \((n \times n)\) matrix \(A^TA\). Thus, if \(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n\) are the eigenvalues of \(A^TA\), then \(\sigma_i = \sqrt{\lambda_i}\) are the singular values of \(A\).

More specific, if \(A\) is an \(m\times n\) matrix with the singular value decomposition \(A = U\Sigma V^T\), then the ‘diagonal’ elements \(\Sigma_{ii}\) of \(\Sigma\), i.e., the singular values \( \sigma_i\), are the square roots of the eigenvalues \(\lambda_i\) of the matrix \(A^TA\).
Moreover, the columns of \(V\) are corresponding eigenvectors (of \(A^TA\)).

We are now ready to present the algorithm. To be followed up by examples and some (theoretical) considerations. Suppose \(A\) is an \(m\times n\) matrix of rank \(r\). (The rank, as we have seen in Proposition 8.3.1, is the number of nonzero singular values.)

Algorithm 8.3.1

  1. Compute \(A^TA\).

  2. Find the eigenvalues \(\lambda_1 \geq \lambda_2 \geq \ldots \geq \lambda_n\) of \(A^TA\).

  3. Construct the \(m\times n\) matrix \(\Sigma\), putting zeros on every position except on the main ‘diagonal’, where \(\Sigma_{ii}=\sigma_i = \sqrt{\lambda_i} \).

  4. Compute a complete set of orthonormal eigenvectors \(\mathbf{v}_1,\dots,\mathbf{v}_n\), corresponding to \(\lambda_1, \ldots, \lambda_n\), and take them as columns in the matrix \(V\).

  5. Compute \(\mathbf{u}_i = \dfrac{1}{\sigma_i}A\mathbf{v}_i\), for \(i=1,\dots,r\), where \(r\) is the number of nonzero singular values. If \(r < m\) extend the set \(\{\mathbf{u}_1,\dots,\mathbf{u}_r\}\) to an orthonormal basis \(\{\mathbf{u}_1, \dots ,\mathbf{u}_m\}\) of \(\mathbb{R}^m\).

  6. Construct the \(m\times m\) matrix \(U\) with columns \(\mathbf{u}_1\), \(\dots\), \(\mathbf{u}_m\).

Apart from step 2., where we need the eigenvalues of an \(n\times n\) matrix \(A^TA\), every step can be worked out with pen and paper (though step 4. and step 5. can be terribly error prone).
The step that, we think, most needs some explaining is step 5. Why does it lead to an orthonormal set of vectors \(\{\mathbf{u}_1,\dots,\mathbf{u}_r\}\)? We will show that indeed it does in the proof of Theorem 8.3.1. It is time for an example first (no nice numbers though!).

Example 8.3.2

We will find a singular value decomposition of the matrix \( A = \begin{bmatrix} 5 & -1 \\ -3 & 2 \\ -1 & 3 \end{bmatrix}. \)

We follow the steps of the algorithm.

  1. We first compute \(A^TA = \begin{bmatrix} 35 & -14 \\ -14 & 14 \end{bmatrix} \).

  2. The eigenvalues \(\lambda_1 \ge \lambda_2\) of \(A^TA\) are given by \(\lambda_1 = 42\), \(\lambda_2 = 7\).
    This gives us the singular values \(\sigma_1 = \sqrt{42}\), and \(\sigma_2=\sqrt{7}\).

  3. Our matrix \(\Sigma\) becomes \(\Sigma = \begin{bmatrix} \sqrt{42} & 0 \\ 0 & \sqrt{7} \\ 0 & 0 \end{bmatrix}\).

  4. We have to find the eigenvectors of \(A^TA\) for \(\lambda_1 = 42\), \(\lambda_2 = 7\).
    Skipping the computations we find \(\mathbf{w}_1 = \begin{bmatrix} 2\\-1 \end{bmatrix}\) and \(\mathbf{w}_2 = \begin{bmatrix} 1\\2 \end{bmatrix}\).
    Normalizing and putting them in a matrix gives \(V = \begin{bmatrix} \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \\[.5ex] -\frac{1}{\sqrt{5}} & \frac{2}{\sqrt{5}} \end{bmatrix}\).

  5. We compute
    \(\vect{u}_1 = \dfrac{1}{\sigma_1}A\vect{v}_1 = \dfrac{1}{\sqrt{42}}\times\dfrac{1}{\sqrt{5}} \begin{bmatrix} 5 & -1 \\ -3 & 2 \\ -1 & 3 \end{bmatrix} \begin{bmatrix} 2\\-1 \end{bmatrix} = \dfrac{1}{\sqrt{210}}\begin{bmatrix}11\\-8\\-5 \end{bmatrix}\)
    and
    \(\vect{u}_2 = \dfrac{1}{\sigma_2}A\vect{v}_2 = \dfrac{1}{\sqrt{7}}\times\dfrac{1}{\sqrt{5}}\begin{bmatrix} 5 & -1 \\ -3 & 2 \\ -1 & 3 \end{bmatrix} \begin{bmatrix} 1\\2 \end{bmatrix} = \dfrac{1}{\sqrt{35}}\begin{bmatrix}3\\1\\5 \end{bmatrix}\).

Note that ‘magically’ \(\{\mathbf{u}_1, \mathbf{u}_2\}\) is indeed an orthonormal set!

We have to extend this to an orthonormal basis of \(\R^3\). For this low dimensional problem we can use the cross product!
First we compute the orthogonal vector \(\quad \vect{w}_3 = \begin{bmatrix}11\\-8\\-5 \end{bmatrix} \times \begin{bmatrix}3\\1\\5 \end{bmatrix} = \begin{bmatrix}-35\\-70\\35 \end{bmatrix} = 35 \begin{bmatrix}-1\\-2\\1 \end{bmatrix}\).
Normalizing \(\vect{w}_3\) gives the third basis vector \(\vect{u}_3 = \dfrac{1}{\sqrt{6}} \begin{bmatrix}-1\\-2\\1 \end{bmatrix}\).

Thus we end up with the matrix \(U = \begin{bmatrix}\frac{11}{\sqrt{210}}&\frac{3}{\sqrt{35}} &-\frac{1}{\sqrt{6}}\\ -\frac{8}{\sqrt{210}}&\frac{1}{\sqrt{35}} &-\frac{2}{\sqrt{6}}\\-\frac{5}{\sqrt{210}}&\frac{5}{\sqrt{35}} & \frac{1}{\sqrt{6}}\end{bmatrix}\).

Along the way we came along some explicit and implicit properties of the matrix \(A^TA\). We have collected them in the following proposition.

Proposition 8.3.3 (Properties of the matrix \(A^TA\))

Let \(A\) be an \(m\times n\) matrix with real entries. Then the following properties hold:

  1. The matrices \(AA^T\) and \(A^TA\) are symmetric.

  2. \(\Nul{A} = \Nul{(A^TA)}\).

  3. \(\Rank{A} = \Rank{(A^TA)}\).

  4. The eigenvalues of \(A^TA\) are real and nonnegative.

  5. The non-zero eigenvalues of \(AA^T\) are the same as the non-zero eigenvalues of \(A^TA\). Moreover the algebraic and geometric multiplicities of these eigenvalues are the same.

We want to stress the importance of property iv.. We know from Section 8.1 that the eigenvalues of a symmetric matrix are real. The previous proposition tells us that, in addition, the eigenvalues of the symmetric matrix \(A^TA\) are non-negative. This property is the key for the singular value decomposition.

Theorem 8.3.1 (Existence of a singular value decomposition)

For every \(m \times n\) matrix \(A\) a singular value decomposition exists

The proof consists in showing that all steps in the algorithm do what they are supposed to do, and that the final result consists of three matrices \(U, \Sigma, V\) that can act as a singular value decomposition of \(A\).

Some concluding remarks concerning the algorithm.

Remark 8.3.1

  1. Because of the basic property that says that transposing an SVD of an \(m \times n\) matrix \(A\) gives an SVD of \(A^T\) (Proposition 8.3.1 iv.) it may be profitable to find an SVD for \(A^T\) first, and then transpose this.
    The singular values of \(A\) are the eigenvalues of \(A^TA\), an \(n \times n\) matrix, the singular values of \(A^T\) are the eigenvalues of \(AA^T\), an \(m \times m\) matrix. The smaller the better!
    In most applications the singular value decomposition will be applied to \(m\times n\) matrices \(A\) with much more rows that columns, so \(m \gg n\). For such matrix \(A\) working with \(A^TA\) is the best bet.

  2. The normalization of the vectors \(\mathbf{v}_i\) and \(\mathbf{u}_j\) may be postponed till the end of step 5. That prevents dragging along the obnoxious square root denominators.

To illustrate Remark 8.3.1 and to conclude this subsection let us consider a second example.

Example 8.3.3

We will find a singular value decomposition of the matrix \( A = \begin{bmatrix} 1 & 1 & 0 &-1\\ -1 & 1 & 0&3\\ -2 & 1 & 2&0 \end{bmatrix} \).

Following the suggestion of the remark we first construct an SVD of the matrix \(B = A^T = \begin{bmatrix} 1 & -1 &-2 \\ 1 & 1 &1 \\ 0 & 0 & 2 \\ -1 & 3 & 0 \end{bmatrix}\).

We will pass along all steps of Algorithm 8.3.1.

Step 1. \(B^TB = AA^T = \begin{bmatrix} 3 & -3 & -1\\ -3 &11 &3 \\ -1 & 3 & 9 \end{bmatrix}\). This is straightforward.

Step 2. Computing the characteristic polynomial is already quite a task here, but it is doable. The result:

\[ \det(B^TB - \lambda \mathrm{I}) = \lambda^3 -23\lambda^2 +140 \lambda -196. \]

Without a computer we would be stuck. How to find the zeros of this polynomial? However, we have come up with a very special matrix \(A\) here, for which the squares of all the singular values are integers. (So in that sense, this is not a very representative example, and in general the computations will be even worse, not to say impossible.) Here, the eigenvalues of \(B^TB\) are given by \(\lambda_1 = 14, \lambda_2 = 7, \lambda_3 = 2\). Which finishes step 2.

Step 3 is straightforward: \(\Sigma = \begin{bmatrix} \sqrt{14} & 0 & 0 \\ 0 & \sqrt{7} & 0 \\ 0 & 0 & \sqrt{2} \\ 0 & 0 & 0 \end{bmatrix}\).

Step 4. With some effort we can find eigenvectors:
\(\vect{v}_1 = \begin{bmatrix} 1 \\ -3 \\ -2 \end{bmatrix}\), for \(\lambda_1 = 14\), \(\vect{v}_2 = \begin{bmatrix} 1 \\ -3 \\ 5 \end{bmatrix}\), for \(\lambda_2 = 7\), and \(\vect{v}_3 = \begin{bmatrix} 3 \\ 1 \\ 0 \end{bmatrix}\), for \(\lambda_3 = 2\).
Note that these are indeed three orthogonal vectors, which, according to Remark 8.3.1, are better not normalized immediately.

Step 5. Next we compute the vectors \(\vect{u}_i\), again without normalizing, and also for the moment not taking the (ugly!) factors \(\frac{1}{\sigma_i} \) into account. Since the singular values are nonzero, we use all three vectors \(\vect{v}_i\).
This gives

\[\begin{split} \mathbf{u}_1 = B\mathbf{v}_1 = \begin{bmatrix} 8 \\ -4 \\ -4 \\ -10 \end{bmatrix}, \quad \mathbf{u}_2 = B\mathbf{v}_2 = \begin{bmatrix}-6 \\ 3 \\ 10 \\ -10 \end{bmatrix}, \quad \mathbf{u}_3 = B\mathbf{v}_3 = \begin{bmatrix} 2 \\ 4 \\ 0 \\ 0 \end{bmatrix}. \end{split}\]

It should not come as a surprise that these vectors are orthogonal!
We have to find a fourth orthogonal vector \(\vect{u}_4\). One way is to look for a nonzero vector in the nulspace of the matrix \(\begin{bmatrix} 8 & -4 & -4 &-10 \\ -6 & 3 & 10 & -10 \\ 2 & 4 & 0 & 0\end{bmatrix}\).
You may check that the vector \(\vect{u}_4 = \begin{bmatrix} 4 \\ -2 \\ 5 \\ 2 \end{bmatrix}\) does the trick.

Step 6. (Where we also still have to present our \(V\).)
We rescale all vectors to unit vectors and put them side by side, to arrive at the matrices

\[\begin{split} V = \begin{bmatrix} \frac{1}{\sqrt{14}} &\frac{1}{\sqrt{35}} & \frac{3}{\sqrt{10}} \\ -\frac{3}{\sqrt{14}} & -\frac{3}{\sqrt{35}} &\frac{1}{\sqrt{10}} \\ -\frac{2}{\sqrt{14}} & \frac{5}{\sqrt{35}} & 0 \end{bmatrix}, \quad U = \begin{bmatrix} \frac{4}{7} & -\frac{6}{\sqrt{245}} & \frac{1}{\sqrt{5}} & \frac{4}{7}\\ -\frac{2}{7} & \frac{3}{\sqrt{245}} & \frac{2}{\sqrt{5}}& -\frac{2}{7}\\ -\frac{2}{7} & \frac{10}{\sqrt{245}} & 0 &\frac{5}{7} \\ -\frac{5}{7} & -\frac{10}{\sqrt{245}}& 0 & \frac{2}{7}\\ \end{bmatrix}. \end{split}\]

And then we must not forget that we have just constructed an SVD for \(A^T\) instead of \(A\)!

From \(A^T = U\Sigma V^T\) it follows swiftly that \(A = V \Sigma^TU^T\) is an SVD for \(A\).

8.3.3. Understanding the SVD Geometrically#

In this section we will have a deeper look at the decomposition and its meaning. As we have done previously, we can think about an \(m\times n\) matrix \(A\) as the standard matrix of a linear transformation from \(\R^n\) to \(\R^m\).

By definition, the matrices \(U\) and \(V\) in the SVD of an \(m\times n\) matrix \(A\) are orthogonal matrices. Thus the columns of \(U\) give an orthonormal basis of \(\R^m\), the columns of \(V\) an orthonormal basis of \(\R^n\). The decomposition \(U\Sigma V^T\) then becomes a composition of transformations. We can visualise this using the graph in Figure 8.3.1:

../_images/Fig-SVD-Decomposition.svg

Fig. 8.3.1 Diagram showing the SVD as a composition of linear transformations.#

Let us first consider the case where \(A\) is a \(2 \times 2\) matrix, as in that case everything takes place in the plane, and we can make an exact drawing of what is going on. Every \(2\times 2\) orthogonal matrix has one of the two forms

\[\begin{split} \begin{bmatrix} \cos({\varphi})&-\sin({\varphi}) \\ \sin({\varphi}) & \cos({\varphi})\end{bmatrix}, \quad \quad \begin{bmatrix} \cos({\varphi})&\sin({\varphi}) \\ \sin({\varphi}) & -\cos({\varphi})\end{bmatrix}. \end{split}\]

The first matrix represents a rotation, the second matrix represents a reflection. In an SVD of \(A\), we can always construct \(V\) to be a rotation. Namely, the columns of \(V\) must be eigenvectors of the matrix \(A^TA\), and eigenvectors remain eigenvectors if one multiplies them with a factor \((-1)\). In that case \(V^T\), which is just \(V^{-1}\), is also a rotation. Since the matrix \(U\) in general is uniquely determined once \(V\) is chosen (the exception being the case where \(A\) has zero as a singular value), \(U\) is either a rotation (when det\((A)>0\)) or a reflection (when det\((A)<0\)). The workings of the concatenation \(U\Sigma V^T\) are then

  1. Multiplication by \(V^T\) rotates the eigenvectors \(\mathbf{v}_1\) and \(\mathbf{v}_2\) of \(A^TA\) to the standard basis vectors \(\mathbf{e}_1\) and \(\mathbf{e}_2\).

  2. Multiplication by \(\Sigma\) stretches the basic vectors \(\mathbf{e}_1\) and \(\mathbf{e}_2\) with factors \(\sigma_1\) and \(\sigma_2\).

  3. Multiplication by \(U\) rotates or reflects the vectors \(\sigma_1\mathbf{e}_1\) and \(\sigma_2\mathbf{e}_2\) to the vectors \(\sigma_1\mathbf{u}_1\) and \(\sigma_2\mathbf{u}_2\).

If we consider the total effect on the unit circle (i.e., all vectors of length 1), then

  1. is a rotation of the circle,

  2. shrinks or stretches the unit circle to an ellipse, and

  3. rotates or reflects this ellipse.

Let us consider a numerical example.

Example 8.3.4

We will analyze the SVD of the matrix \(A = \begin{bmatrix} 5 & 2 \\ 0 & 6 \end{bmatrix}\).

The matrix

\[\begin{split} A^TA = \begin{bmatrix} 25 & 10 \\ 10 & 40 \end{bmatrix} = 5 \begin{bmatrix} 5 & 2 \\ 2 & 8 \end{bmatrix} \end{split}\]

has the eigenvalues

\[ \lambda_1 = 45, \quad \lambda_2 = 20, \]

with corresponding eigenvectors

\[\begin{split} \mathbf{v}_1 = \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \quad \mathbf{v}_2 = \begin{bmatrix} 2 \\ -1 \end{bmatrix}. \end{split}\]

Normalizing them, and giving the second vector a minus sign, we find for an SVD of the matrix \(A\) the matrices

\[\begin{split} V = \dfrac{1}{\sqrt{5}} \begin{bmatrix} 1 & -2 \\ 2 & 1 \end{bmatrix}, \quad \Sigma = \begin{bmatrix}\sqrt{45} & 0\\0 & \sqrt{20} \end{bmatrix} = \begin{bmatrix} 3\sqrt{5} & 0\\0 & 2\sqrt{5} \end{bmatrix} \end{split}\]

to start with.

Applying the last two steps of Algorithm 8.3.1 we find that

\[\begin{split} U = \dfrac{1}{5} \begin{bmatrix} 3 & -4 \\ 4 & 3 \end{bmatrix}. \end{split}\]

So, \(V\) represents a rotation about an angle \(\varphi = \arccos\left(\frac{1}{\sqrt{5}}\right) \approx 63^{o}\), and \(U\) represents a rotation about an angle \(\psi = \arccos\left(\frac{3}{5}\right) \approx 53^{o}\). Between those two rotations, \(\Sigma\) ‘stretches’ vectors with a factor \(3\sqrt{5}\) in the \(x\)-direction and a factor \(2\sqrt{5}\) in the \(y\)-direction.

Note that \(V\) maps \(\vect{e}_1,\vect{e}_2\) to \(\vect{v}_1,\vect{v}_2\),   so \(V^T = V^{-1}\) maps \(\vect{v}_1,\vect{v}_2\) to \(\vect{e}_1,\vect{e}_2\).

We give the matrix an extra factor \(\dfrac1{\sqrt{5}}\) to get a better picture. With the matrix as it was given, the stretching factors \(3\sqrt{5}\) and \(2\sqrt{5}\) would ‘blow up’the unit circle too much too our taste. With this extra factor, the matrix

\[\begin{split} \tilde{A} = \frac{1}{\sqrt{5}} \begin{bmatrix} 5 & 2 \\ 2 & 8 \end{bmatrix} \end{split}\]

has the SVD

\[ \tilde{A} = U\tilde{\Sigma}V^T \]

where \(U\) and \(V\) are the same as for \(A\), and \(\tilde{\Sigma} = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix}\).

Figure 8.3.2 visualizes what is going on.

Note that at the end the vector \(\vect{e}_1\) comes to rest on the \(x\)-axis, as it should, since

\[\begin{split} \tilde{A}\vect{e}_1 \,=\, \dfrac{1}{\sqrt{5}}\begin{bmatrix} 5 & 2 \\ 0 & 6 \end{bmatrix}\begin{bmatrix} 1\\ 0 \end{bmatrix} \,=\, \begin{bmatrix} \sqrt{5}\\ 0 \end{bmatrix}. \end{split}\]
../_images/Fig-SVD-GeometricView.svg

Fig. 8.3.2 Geometric decomposition of \(\tilde{A} = \dfrac{1}{\sqrt{5}}\begin{bmatrix} 5 & 2 \\ 0 & 6 \end{bmatrix} = U\tilde{\Sigma}V^T\).#

In general, for an \(m\times n\) matrix \(A\) of say rank \(r\), suppose

\[ A = U\Sigma V^T \]

is a singular value decomposition. Since \(V^T\) is an orthogonal matrix, it will map the \(\mathbb{R}^n\) onto itself and it will preserve the norms of all vectors. More precisely, it will map the ‘principal axes’ corresponding to \(\vect{v}_1,\ldots,\vect{v}_n\) onto the coordinate axes generated by \(\vect{e}_1, \ldots, \vect{e}_n\). With that, the \(n\)-dimensional unit sphere, consisting of all vectors of norm 1, is mapped onto itself.

The \(m \times n\) matrix \(\Sigma\) maps the first \(r\) basis vectors \(\vect{e}_1, \ldots, \vect{e}_r\) in \(\R^n\) to the vectors \(\sigma_1\vect{e}_1, \ldots, \sigma_r\vect{e}_r\) in \(\R^m\). The remaining (if any) basis vectors \(\vect{e}_{r+1}, \ldots, \vect{e}_n\) are mapped to \(\vect{0}\in \R^{m}\). All in all, \(\Sigma\) maps the unit sphere in \(\R^n\) onto an \(r\)-dimensional ‘ellipsoid’ in \(\R^m\).

Lastly the orthogonal matrix \(U\) rotates/reflects this ellipsoid in \(\R^m\).


8.3.4. Applications of the SVD#

There will be two applications described in this section.

  1. Data compression

  2. Linear Least Squares.

We start with the first.
Numerical data can be stored in a matrix.
For instance, a black-and-white picture/photo can be stored ‘pixel by pixel’, by numbers that indicate the gray scale, which may for instance be any integer from 0 (completely white) to 31 (completely black). A 9:16 photo may then be stored as a 1080x1350 matrix.
As another example, think of a survey of \(n\) questions that have to be answered using a 1-5 scale. If the numbers of respondents is \(N\), the data can be represented by an \(N \times n\) matrix.

In the first situation there will be both a high correlation between columns that are close to each other, as well as between nearby rows. If the picture is a true photo the matrix will be far from a ‘random’ matrix. In the second context one might expect that the columns will highly correlate: people that agree on certain issues are more likely to agree on other issues as well.

The main features of the data may be filtered out by analyzing an SVD of the matrix at hand.

The basic idea comes from the ‘spectral decomposition’ as in the last observation of Example 8.3.1. Suppose \(A\) is a matrix of rank \(r\), with nonzero singular values \(\sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_r > 0\). Furthermore, let \(U\Sigma V^T\) be a singular value decomposition of the matrix \(A\). Let

\[\begin{split} \Delta = \begin{bmatrix} \sigma_1 & 0 & 0 & \cdots & 0 \\ 0 & \sigma_2 & 0 & \cdots & 0 \\ 0 & 0 & \sigma_3 & \cdots & 0 \\ \vdots & \vdots & & \ddots &\vdots \\ 0 & 0 & 0 & \ldots & \sigma_r \end{bmatrix}. \end{split}\]

So \(\Delta\) is the diagonal matrix that remains if all the zero rows and zero columns (if any) of \(\Sigma\) are removed. If \(U_r\) is the matrix with the first \(r\) colums of \(U\), and \(V_r\) is the matrix with the first \(r\) colums of \(V\), then as in Example 8.3.1 we have that

\[ A = U\Sigma V^T = U_r\Delta V_r \]

which can be rewritten as

(8.3.4)#\[ A = \sigma_1 \vect{u}_1\vect{v}_1^T + \sigma_2 \vect{u}_2\vect{v}_2^T + \cdots + \sigma_r \vect{u}_r\vect{v}_r^T.\]

Here \(A\) is written as a sum of rank 1 matrices, and because of the decreasing singular values, these rank 1 matrices get less and less ‘important’. If the \(\sigma_i\) become very small (relatively) for, say, \(k < i \leq r\), we might expect that the sum

\[ \sigma_1 \vect{u}_1\vect{v}_1^T + \ldots + \sigma_k \vect{u}_k\vect{v}_k^T \]

of the first \(k\) terms gives a good approximation of the matrix \(A\).

The gain is the following. If the data is put in the form of an \(m \times n\) matrix \(A\), then it needs \(m \times n\) memory cells to store \(A\). If \(k\) is much smaller than \(r = \) rank \(A\) (which in general will be equal to the smallest of \(m\) and \(n\)), then \(U_k\), \(V_k\) and the \(k\) largest singular values only take up \(m\times k + n\times k + k = (m+n+k)k\), memory places.
If, for instance, a 1080x1350 ( \(\approx\) 1.45 MB) image is stored using the thirty per cent highest singular values, the storage space reduces to 324x(1080+1350+320) \(\approx\) 0.78 MB. So the data as been ‘compressed’ by more or less a factor \(0.78/1.45 \approx 0.54\).

In general, the higher the correlation/dependency between the columns (or, for that matter, the rows) of a matrix \(A\), the fewer singular values are needed for a good approximation of \(A\).

Example 8.3.5

\(A = \begin{bmatrix} 104.39 & 44.80 & 78.92 & 47.65 & 134.79 & 100.40 & 60.09 & 52.01 & 97.98 & 31.55 \\ 98.31 & 52.03 & 86.88 & 47.15 & 137.53 & 87.79 & 61.69 & 30.50 & 80.36 & 21.78 \\ 96.90 & 30.09 & 72.26 & 41.60 & 117.75 & 85.80 & 61.83 & 49.55 & 83.76 & 29.89 \\ 103.10 & 37.40 & 87.82 & 44.96 & 132.17 & 83.69 & 70.21 & 34.02 & 75.32 & 23.11 \\ 98.15 & 26.47 & 86.50 & 41.39 & 120.58 & 70.97 & 71.95 & 29.99 & 60.89 & 19.06 \\ 110.52 & 53.64 & 100.51 & 52.45 & 152.16 & 94.76 & 71.29 & 30.70 & 85.70 & 22.33 \\ 77.84 & 42.01 & 64.82 & 38.50 & 107.74 & 76.16 & 47.39 & 31.58 & 72.85 & 20.04 \\ 97.45 & 27.40 & 67.92 & 39.84 & 115.48 & 88.80 & 59.76 & 53.93 & 85.09 & 30.08 \\ 82.95 & 53.48 & 76.69 & 42.60 & 122.52 & 79.24 & 48.94 & 21.20 & 73.14 & 16.97 \\ 100.79 & 24.25 & 73.17 & 40.82 & 117.69 & 87.36 & 65.54 & 54.21 & 83.48 & 31.19 \\ 120.68 & 49.88 & 102.65 & 55.36 & 158.59 & 103.20 & 78.26 & 43.66 & 95.51 & 28.63 \\ 103.22 & 54.07 & 79.81 & 50.01 & 139.32 & 104.55 & 57.99 & 51.55 & 103.19 & 31.47 \end{bmatrix}\).

The singular values are

\[ \sigma_1 = 839.36, \quad \sigma_2 = 58.84, \quad \sigma_3 = 45.32, \quad \sigma_4 = 2.82, \quad \sigma_5 = 2.14, \,\, \ldots \]

Note the severe drop-off after the third singular value.
If we take from the singular values decomposition \(A = U\Sigma V^T\) only the first three columns of \(U\) and \(V\), i.e. we put \(A_3 = U_3\Sigma_{33}V_3^T\), we get

\[\begin{split} A_3 = \begin{bmatrix} 0.31 & 0.27 & -0.21 \\ 0.29 & -0.27 & -0.15 \\ 0.27 & 0.30 & 0.13 \\ 0.29 & -0.18 & 0.26 \\ 0.26 & -0.26 & 0.58 \\ 0.32 & -0.38 & -0.02 \\ 0.24 & -0.01 & -0.26 \\ 0.27 & 0.42 & 0.12 \\ 0.26 & -0.33 & -0.37 \\ 0.28 & 0.38 & 0.29 \\ 0.35 & -0.14 & 0.10 \\ 0.31 & 0.24 & -0.44 \end{bmatrix} \begin{bmatrix} 839.36 & 0 & 0 \\ 0 & 58.84 & 0 \\ 0 & 0 & 45.32 \end{bmatrix} \begin{bmatrix} 0.41 & 0.05 & 0.32 \\ 0.17 & -0.32 &-0.63 \\ 0.34 & -0.38 & 0.23 \\ 0.19 & -0.08 &-0.06 \\ 0.54 & -0.28 &-0.06 \\ 0.37 & 0.23 &-0.24 \\ 0.26 & -0.07 & 0.51 \\ 0.17 & 0.63 & 0.10 \\ 0.34 & 0.37 &-0.35 \\ 0.11 & 0.28 & 0.02 \end{bmatrix}^T \end{split}\]

We expect \(A_3\) to be a good approximation of \(A\).
Comparing the two matrices:

\[\begin{split} \begin{array}{ccc} A = U\Sigma V^T & & A_3 = U_3\Sigma_{33} V_3^T \\[1ex] \begin{bmatrix} 104.3 & 44.8 & .\,.\,. & 97.9 & 31.5 \\ 98.3 & 52.0 & .\,.\,. & 80.3 & 21.7 \\ 96.9 & 30.0 & .\,.\,. & 83.7 & 29.8 \\ 103.1 & 37.4 & .\,.\,. & 75.3 & 23.1 \\ 98.1 & 26.4 & .\,.\,. & 60.8 & 19.0 \\ 110.5 & 53.6 & .\,.\,. & 85.7 & 22.3 \\ 77.8 & 42.0 & .\,.\,. & 72.8 & 20.0 \\ 97.4 & 27.4 & .\,.\,. & 85.0 & 30.0 \\ 82.9 & 53.4 & .\,.\,. & 73.1 & 16.9 \\ 100.7 & 24.2 & .\,.\,. & 83.4 & 31.1 \\ 120.6 & 49.8 & .\,.\,. & 95.5 & 28.6 \\ 103.2 & 54.0 & .\,.\,. & 103.1 & 31.4 \end{bmatrix} & \quad & \begin{bmatrix} 103.9 & 45.1 & .\,.\,. & 98.0 & 31.4 \\ 98.1 & 51.5 & .\,.\,. & 81.0 & 21.3 \\ 97.2 & 29.9 & .\,.\,. & 83.1 & 29.1 \\ 102.8 & 37.3 & .\,.\,. & 75.1 & 22.8 \\ 98.2 & 26.1 & .\,.\,. & 60.9 & 19.5 \\ 110.4 & 54.1 & .\,.\,. & 85.4 & 22.4 \\ 78.5 & 41.7 & .\,.\,. & 72.5 & 20.6 \\ 96.8 & 27.7 & .\,.\,. & 85.5 & 30.9 \\ 82.6 & 53.8 & .\,.\,. & 73.0 & 17.0 \\ 101.0 & 24.3 & .\,.\,. & 83.5 & 30.9 \\ 120.7 & 49.9 & .\,.\,. & 95.4 & 28.4 \\ 103.4 & 53.5 & .\,.\,. & 103.2 & 31.4 \end{bmatrix} \end{array} \end{split}\]

As you can see, \(A_3\) closely resembles \(A\). (You have to trust us with regard to the hidden columns. 😉
The gain: to store the \(12\times10\) matrix \(A\), we have to store \(120\) reals.
To store \(U_3, \Sigma_{33}\) and \(V_3\) , we only have to store \(12\times3 + 3 + 10\times 3 = 69\) numbers. The middle 3 comes from first three elements (\(=\) singular values) on the diagonal of \(\Sigma\).

With (much) larger matrices the reduction in terms of storage capacity may be even better.

8.3.5. Grasple Exercises#

Grasple Exercise 8.3.1

https://embed.grasple.com/exercises/27adae2a-db2a-46fa-800f-49e4c0dfe4fa?id=93487

If \(A = U\Sigma V^T\) for an mxn matrix \(A\), what are the sizes of \(U\), \(Σ\) and \(V\)?

Grasple Exercise 8.3.2

https://embed.grasple.com/exercises/caac29d1-9700-4a30-8c7c-19ea6148258f?id=93490

To describe the meaning of the singular values of a matrix.

Grasple Exercise 8.3.3

https://embed.grasple.com/exercises/e4c651aa-a998-4e19-957b-20ddf41509bf?id=93468

To find the singular values of a 2x2 matrix \(A\)

Grasple Exercise 8.3.4

https://embed.grasple.com/exercises/47ebaa77-9f3c-4363-a57e-d37242c6e598?id=93471

To compute the singular values of a 3x2 matrix \(A\).

Grasple Exercise 8.3.5

https://embed.grasple.com/exercises/79d22478-56e3-49ee-9b19-77ab1ad06eaf?id=93470

To compute an SVD for a 2x3 matrix \(A\).

Grasple Exercise 8.3.6

https://embed.grasple.com/exercises/10affbae-4221-40f8-bf0e-df626a0e64ae?id=93479

To compute an SVD for a 2x3 matrix \(A\) (of rank 1).

Grasple Exercise 8.3.7

https://embed.grasple.com/exercises/37ea17f1-1bfb-4a19-b9e2-1292a593dfa3?id=93480

To compute an SVD for a 3x2 matrix \(A\) (of rank 1).

Grasple Exercise 8.3.8

https://embed.grasple.com/exercises/20dd219a-35f3-48d7-ad9d-35038047336b?id=92586

To compute an SVD for a matrix \(A\) with orthogonal columns.

Grasple Exercise 8.3.9

https://embed.grasple.com/exercises/9848d7be-1530-46b0-941f-9ae76e95abfa?id=93481

To draw conclusion(s) about \(A\) from a given SVD of \(A\).

Grasple Exercise 8.3.10

https://embed.grasple.com/exercises/3fdad317-fc18-4f88-b416-87cbd1d5e708?id=93495

Finding the maximal value of \(\norm{A\vect{x}}\), if \(\norm{\vect{x}}=1\), \(A\) a 3x2 matrix.