Dot product

1.2. Dot product#

1.2.1. Introduction#

In this section we will consider other (geometric) properties of vectors, like the length of a vector and the angle between two vectors. When the angle between two vectors is equal to \(\frac12\pi\), two vectors are perpendicular, which is also known as orthogonal. These properties can all be expressed using a new operator: the inner product or dot product.

We will start by considering vectors in \(\mathbb{R}^2\) and \(\mathbb{R}^3\). The translation of the concepts to the general space \(\mathbb{R}^n\) will then become more or less immediate.

1.2.2. Length and perpendicularity in \(\mathbb{R}^2\) and \(\mathbb{R}^3\)#

The length of a vector

\[\begin{split} \mathbf{v}=\begin{pmatrix} a_{1}\\a_{2} \end{pmatrix} \end{split}\]

in the plane, which we denote by \(\norm{\mathbf{v}}\), can be computed using the Pythagorean theorem:

(1.2.1)#\[\norm{\mathbf{v}} = \sqrt{a_1^2+a_2^2}.\]

../_images/Fig-InnerProduct-Length-2D.svg — Fig. 1.2.1 The length of a vector via Pythagoras’ Theorem.#

../_images/Fig-InnerProduct-length-3D.svg — Fig. 1.2.2 The length of a vector via Pythagoras’ Theorem.#

Using this theorem twice we find a similar formula for the length of a vector

\[\begin{split} \mathbf{v}=\begin{pmatrix} a_{1}\\a_{2}\\a_{3}\end{pmatrix} \end{split}\]

in \(\mathbb{R}^3\). Look at Figure 1.2.2. There are two right triangles: \(\Delta OPQ\) where \(\angle OPQ\) is right, and \(\Delta OQA\) where \(\angle OQA\) is right.

From

\[ OQ^2 = OP^2 + PQ^2 = a_1^2 + a_2^2, \]

where for two points \(A\) and \(B\), by \(AB\) we denote the length of the vector \(\overrightarrow{AB}\), and

\[ OA^2 = OQ^2+QA^2 = a_1^2 + a_2^2+a_3^2 \]

we find that

(1.2.2)#\[\norm{\mathbf{v}}= OA = \sqrt{a_1^2 + a_2^2+a_3^2}.\]

../_images/Fig-InnerProduct-perp-non-perp.svg — Fig. 1.2.3 Perpendicular versus non-perpendicular.#

Let us now turn our attention to another important geometric concept, namely that of perpendicularity. It is clear from Figure 1.2.3 that the vectors \(\begin{pmatrix}2\\3\end{pmatrix}\) and \(\begin{pmatrix}-3\\2\end{pmatrix}\) are perpendicular, whereas the vectors \(\begin{pmatrix}2\\3\end{pmatrix}\) and \(\begin{pmatrix}-1\\3\end{pmatrix}\) are not.
There is another way to look at this, which will be useful for the definition of perpendicularity in higher dimensions. To that end, consider Figure 1.2.4. Here you see two vectors \(\vect{v}\) and \(\vect{w}\) and the parallelogram they span. You also see the diagonals of this parallelogram, which are given by \(\vect{v}+\vect{w}\) and \(\vect{v}-\vect{w}\). Two vectors are perpendicular if and only if the parallelogram they span is a rectangle, and this is exacty the situation where the diagonals have the same length, i.e.,

(1.2.3)#\[\norm{\mathbf{v}+\mathbf{w}} = \norm{\mathbf{v}-\mathbf{w}}.\]

../_images/Fig-InnerProduct-diagonal-parallelogram.svg — Fig. 1.2.4 The parallelogram spanned by \(\vect{v}\) and \(\vect{w}\) and its diagonals. How should you choose \(\vect{v}\) and \(\vect{w}\) such that the diagonals have the same length?#

You can change \(\mathbf{v}\) and/or \(\mathbf{w}\) in the picture such that these two vectors are not perpendicular and

\[ \norm{\mathbf{v}+\mathbf{w}} \neq \norm{\mathbf{v}-\mathbf{w}}. \]

So far we have been talking about two (non-zero) vectors in the plane, i.e., in \(\mathbb{R}^2\). However, two vectors in \(\mathbb{R}^3\) form a parallelogram as well, which also becomes a rectangle if and only if the vectors are perpendicular. We introduce a notation for this: if \( \mathbf{v}\) and \(\mathbf{w}\) are perpendicular, we write this as

(1.2.4)#\[\mathbf{v} \perp \mathbf{w}.\]

Taking squares in Equation (1.2.3), we see that the following holds both in \(\mathbb{R}^2\) and in \(\mathbb{R}^3\):

\[ \mathbf{v} \perp \mathbf{w} \iff \norm{\mathbf{v}+\mathbf{w}}^2 = \norm{\mathbf{v}-\mathbf{w}}^2. \]

If we write this out for two arbitrary vectors \(\mathbf{v}=\begin{pmatrix} a_{1}\\a_{2}\end{pmatrix},\mathbf{w}=\begin{pmatrix} b_{1}\\b_{2}\end{pmatrix}\) in \(\mathbb{R}^2\) we get the following:

\[\begin{split} \begin{array}{rcl} \mathbf{v} \perp \mathbf{w} &\iff &\norm{\mathbf{v}+\mathbf{w}}^2 = \norm{\mathbf{v}-\mathbf{w}}^2\\ &\iff &(a_1+b_1)^2 + (a_2+b_2)^2 = (a_1-b_1)^2 + (a_2-b_2)^2\\ &\iff &a_1^2+2a_1b_1 + b_1^2 + a_2^2+2a_2b_2 + b_2^2 = a_1^2 -2a_1b_1+b_1^2+ a_2^2 -2a_2b_2+b_2^2\\ &\iff &4(a_1b_1 +a_2b_2)=0 \\ &\iff &a_1b_1 +a_2b_2=0. \end{array} \end{split}\]

Likewise, for vectors \(\mathbf{v}=\begin{pmatrix} a_{1}\\a_{2}\\a_{3}\end{pmatrix},\,\mathbf{w}=\begin{pmatrix} b_{1}\\b_{2}\\b_{3}\end{pmatrix}\) in \(\mathbb{R}^3\):

(1.2.5)#\[\mathbf{v} \perp \mathbf{w} \iff a_1b_1 +a_2b_2+a_3b_3=0.\]

The derivation is completely analogous to the one above, only now we have one extra term. So to check ‘algebraically’ whether two vectors are perpendicular we just have to compute \(a_1b_1 +a_2b_2\, (\,+\,a_3b_3\,)\) and see whether this is equal to \(0\).

This expression is called the dot product (or inner product) of the vectors \(\mathbf{v}\) and \(\mathbf{w}\). We denote it by \(\mathbf{v}\ip\mathbf{w}\). Note that the dot product of a general vector \(\mathbf{v}=\begin{pmatrix} a_{1}\\a_{2}\\a_{3}\end{pmatrix}\) in \(\mathbb{R}^3\) with itself gives

\[ \mathbf{v}\ip\mathbf{v} = a_1^2+a_2^2+a_3^2 = \norm{\mathbf{v}}^2, \]

so the length of a vector can be expressed as follows using the dot product

(1.2.6)#\[\norm{\mathbf{v}} = \sqrt{\mathbf{v}\ip\mathbf{v}\,}.\]

Using the dot product the concepts length and perpendicular easily carry over to any \(\mathbb{R}^n\), \(n \geq 4\). Let’s do it one by one, starting by generalising the dot product in the next subsection.

1.2.3. Dot product in \(\mathbb{R}^n\)#

inner productdot product

Definition 1.2.1

The dot product (or inner product) of two vectors \(\mathbf{v}=\begin{pmatrix}a_{1}\\a_{2}\\ \vdots\\a_{n}\end{pmatrix}\) and \(\mathbf{w}=\begin{pmatrix}b_{1}\\b_{2}\\ \vdots\\b_{n}\end{pmatrix}\) in \(\mathbb{R}^n\) is defined as

(1.2.7)#\[\mathbf{v}\ip\mathbf{w} = a_1b_1 +a_2b_2+ \cdots + a_nb_n.\]

Example 1.2.1

The dot product of the two vectors

\[\begin{split} \mathbf{v}_1=\begin{pmatrix} 5\\3\\4\\-2\end{pmatrix} \quad \text{and}\quad \mathbf{v}_2=\begin{pmatrix} 2\\3\\0\\1\end{pmatrix} \end{split}\]

is given by

\[ \mathbf{v}_1\ip\mathbf{v}_2 = 5\cdot2 + 3\cdot3 +4\cdot0 + (-2)\cdot1 = 17. \]

And the dot product of the two vectors

\[\begin{split} \mathbf{v}_1=\begin{pmatrix} 5\\3\\4\\-2\end{pmatrix} \quad \text{and}\quad \mathbf{v}_3=\begin{pmatrix} -4\\3\\2\end{pmatrix} \end{split}\]

is not defined. In fact, the dot product of a vector \(\mathbf{v}\) in \(\mathbb{R}^m\) and a vector \(\mathbf{w}\) in \(\mathbb{R}^n\) is only defined if \(m = n\).

We state the characteristic rules of the dot product in \(\mathbb{R}^n\), which in the sequel we will use time and again, in the following proposition.

Proposition 1.2.1

The following properties hold for any vectors \(\mathbf{v},\mathbf{v}_1,\mathbf{v}_2,\mathbf{v}_3\) in \(\mathbb{R}^n\) and scalars \(c \in \mathbb{R}\):

i. \(\mathbf{v}_1\ip\mathbf{v}_2 = \mathbf{v}_2\ip\mathbf{v}_1\).

ii. \((c\mathbf{v}_1)\ip\mathbf{v}_2 = c(\mathbf{v}_1\ip\mathbf{v}_2) = \mathbf{v}_1\ip(c \mathbf{v}_2)\).

iii. \((\mathbf{v}_1+\mathbf{v}_2)\ip\mathbf{v}_3 = \mathbf{v}_1\ip\mathbf{v}_3+\mathbf{v}_2\ip\mathbf{v}_3\).

iv. \(\mathbf{v}\ip\mathbf{v} \geq 0\), and \(\mathbf{v}\ip\mathbf{v} = 0 \iff \mathbf{v} = \mathbf{0}\).

Proof of Proposition 1.2.1

The first three properties follow from the corresponding properties of real numbers. For instance, for the first rule we simply use that \(ab = ba\) holds for the product of real numbers \(a\) and \(b\).

i. Let

\[\begin{split} \mathbf{v}_1=\begin{pmatrix} a_1\\a_2\\ \vdots\\ a_n\end{pmatrix} \quad \text{and}\quad \mathbf{v}_2=\begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix} \end{split}\]

be two arbitrary vectors in \(\mathbb{R}^n\). Then

\[\begin{split} \begin{align*} \mathbf{v}_1 \ip \mathbf{v}_2 &= \begin{pmatrix}a_{1} \\ a_{2}\\ \vdots\\a_{n}\end{pmatrix} \ip \begin{pmatrix}b_{1} \\ b_{2}\\ \vdots \\ b_{n}\end{pmatrix} \\ &= a_1b_1 +a_2b_2+ \cdots + a_nb_n \\ &= b_1a_1 +b_2a_2+ \cdots + b_na_n \\&= \begin{pmatrix}b_{1} \\ b_{2}\\ \vdots \\ b_{n}\end{pmatrix}\ip\begin{pmatrix}a_{1} \\ a_{2}\\ \vdots\\ a_{n}\end{pmatrix} \\&= \mathbf{v}_2\ip\mathbf{v}_1. \end{align*} \end{split}\]

ii. For two vectors \(\vect{v}_1 = \begin{pmatrix}a_{1} \\ a_{2}\\ \vdots\\ a_{n}\end{pmatrix}\), \(\vect{v}_2 = \begin{pmatrix}b_{1} \\ b_{2}\\ \vdots\\ b_{n}\end{pmatrix}\) and any constant \(c\) we see that

\[\begin{split} \begin{align*} (c\mathbf{v}_1)\ip\mathbf{v}_2 &= \begin{pmatrix}ca_{1}\\ca_{2}\\ \vdots\\ca_{n}\end{pmatrix}\ip\begin{pmatrix}b_{1}\\b_{2}\\ \vdots\\b_{n}\end{pmatrix} \\ &= (ca_1)b_1 + (ca_2)b_2+ \cdots + (ca_n)b_n \\ &= c\,(a_1b_1 +a_2b_2+ \cdots + a_nb_n) \\ &= c\, (\mathbf{v}_1\ip\mathbf{v}_2). \end{align*} \end{split}\]

iii. Is proved in the same way as (ii).

iv. This consists of two statements. For the first, we note that

\[ \mathbf{v}\ip\mathbf{v} = a_1a_1 +a_2a_2+ \cdots + a_na_n = a_1^2+a_2^2 + \cdots + a_n^2 \]

is the sum of squares of real numbers, so it is non-negative. That is,

\[ \mathbf{v}\ip\mathbf{v} \geq 0. \]

To prove the second statement, we see that

\[ \mathbf{v}\ip\mathbf{v} = a_1^2+a_2^2 + \cdots + a_n^2 = 0 \]

if and only if all the squares are \(0\), which only happens if each entry \(a_i\) is equal to zero, that is, if \(\mathbf{v} = \mathbf{0}\).

Exercise 1.2.1

Prove property iii.

Solution to Exercise 1.2.1

Let

\[\begin{split} \mathbf{v}_1=\begin{pmatrix} a_1\\a_2\\ \vdots\\ a_n\end{pmatrix} \quad \text{and}\quad \mathbf{v}_2=\begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix} \quad \text{and}\quad \mathbf{v}_3=\begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{pmatrix}\end{split}\]

be three arbitrary vectors in \(\mathbb{R}^n\). Then

\[\begin{split} \begin{align*} \left(\mathbf{v}_1 + \mathbf{v}_2 \right) \ip \mathbf{v}_3 &= \left(\begin{pmatrix}a_{1} \\ a_{2}\\ \vdots\\a_{n}\end{pmatrix} + \begin{pmatrix}b_{1} \\ b_{2}\\ \vdots \\ b_{n}\end{pmatrix} \right) \ip \begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{pmatrix} \\ &= \begin{pmatrix} a_1+b_1\\a_2+b_2\\ \vdots\\ a_n+b_n\end{pmatrix}\ip \begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{pmatrix} \\ &= (a_1+b_1)c_1 +(a_2+b_2)c_2+ \cdots + (a_n+b_n)c_n \\ &= a_1c_1 +b_1c_1+a_2c_2+b_2c_2 \cdots + a_nc_n+b_nc_n \\ &= a_1c_1 +a_2c_2+\cdots + a_nc_n +b_1c_1+b_2c_2 \cdots +b_nc_n \\ &= \begin{pmatrix}a_{1} \\ a_{2}\\ \vdots\\a_{n}\end{pmatrix}\ip\begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{pmatrix}+\begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{pmatrix}\ip\begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{pmatrix} \\ &= \mathbf{v}_1\ip\mathbf{v}_3+\mathbf{v}_2\ip\mathbf{v}_3. \end{align*} \end{split}\]

Exercise 1.2.2

Prove the identity

\[ (\mathbf{v}_1+\mathbf{v}_2)\ip(\mathbf{v}_1-\mathbf{v}_2) = \mathbf{v}_1\ip\mathbf{v}_1-\mathbf{v}_2\ip\mathbf{v}_2. \]

Solution to Exercise 1.2.2

First of all, because of rule i. and rule iii. of Proposition 1.2.1 it holds that

\[ \mathbf{v}_1\ip(\mathbf{v}_2+\mathbf{v}_3) = \mathbf{v}_1\ip\mathbf{v}_2+\mathbf{v}_1\ip\mathbf{v}_3 \]

and it also follows from ii. and iii. that

\[ \mathbf{v}_1\ip(\mathbf{v}_2-\mathbf{v}_3) = \mathbf{v}_1\ip(\mathbf{v}_2+(-1)\mathbf{v}_3) =\mathbf{v}_1\ip\mathbf{v}_2+\mathbf{v}_1\ip(-1\mathbf{v}_3) = \mathbf{v}_1\ip\mathbf{v}_2-\mathbf{v}_1\ip\mathbf{v}_3. \]

Then the statement is proved by the following chain of identities

\[\begin{split} \begin{array}{rcl}(\mathbf{v}_1+\mathbf{v}_2)\ip(\mathbf{v}_1-\mathbf{v}_2) &=& \mathbf{v}_1\ip(\mathbf{v}_1-\mathbf{v}_2) + \mathbf{v}_2\ip(\mathbf{v}_1-\mathbf{v}_2) \\ &=& \mathbf{v}_1\ip\mathbf{v}_1-\mathbf{v}_1\ip\mathbf{v}_2 + \mathbf{v}_2\ip\mathbf{v}_1-\mathbf{v}_2\ip\mathbf{v}_2\\ &=& \mathbf{v}_1\ip\mathbf{v}_1-\mathbf{v}_2\ip\mathbf{v}_2. \end{array}\end{split}\]

Exercise 1.2.3

Prove the identity

\[ \norm{\mathbf{v}_1+\mathbf{v}_2}^2 + \norm{\mathbf{v}_1-\mathbf{v}_2}^2 = 2 (\norm{\mathbf{v}_1}^2 + \norm{\mathbf{v}_2}^2), \]

and explain why it is called the parallelogram rule.

Solution to Exercise 1.2.3

Again it’s a chain of identities using basic properties of the dot product.

\[\begin{split} \begin{array}{rcl} \norm{\mathbf{v}_1+\mathbf{v}_2}^2 + \norm{\mathbf{v}_1-\mathbf{v}_2}^2&=& (\mathbf{v}_1+\mathbf{v}_2)\cdot(\mathbf{v}_1+\mathbf{v}_2) + (\mathbf{v}_1-\mathbf{v}_2)\cdot(\mathbf{v}_1-\mathbf{v}_2) \\ &=& \mathbf{v}_1\cdot\mathbf{v}_1 +2\mathbf{v}_1\cdot\mathbf{v}_2 + \mathbf{v}_2\cdot\mathbf{v}_2 + \mathbf{v}_1\cdot\mathbf{v}_1 -2\mathbf{v}_1\cdot\mathbf{v}_2 + \mathbf{v}_2\cdot\mathbf{v}_2 \\ &=& 2\,\mathbf{v}_1\cdot\mathbf{v}_1 +2\,\mathbf{v}_2\cdot\mathbf{v}_2 \\ &=& 2 (\norm{\mathbf{v}_1}^2 + \norm{\mathbf{v}_2}^2). \end{array}\end{split}\]

The figure explains the name.

../_images/Fig-InnerProduct-ParGramRule.svg — Fig. 1.2.5 Parallelogram rule explained.#

In the parallelogram \(OABC\) the sum of the squares of the lengths of the four sides equals the sum the squares of the lengths of the diagonals.

\[\begin{split} \begin{array}{ll} OA^2 + AB^2 + BC^2 + CO^2 &=& 2\norm{\vect{v}_1}^2 + 2\norm{\vect{v}_2}^2 \\ &=& \norm{\mathbf{v}_1+\mathbf{v}_2}^2 + \norm{\mathbf{v}_1-\mathbf{v}_2}^2 \\ &=& OB^2 + CA^2. \end{array} \end{split}\]

Orthogonality

In \(\mathbb{R}^2\) and \(\mathbb{R}^3\) the dot product gives an easy way to check whether two vectors are perpendicular:

\[ \mathbf{v}\perp\mathbf{w} \iff \mathbf{v}\ip\mathbf{w} = 0. \]

We use this identity to define the concept of perpendicularity in \(\mathbb{R}^n\). It seems a bit ‘academic’, but in this more general setting the term orthogonal is used.

orthogonal

Definition 1.2.2

Two vectors \(\mathbf{v}\) and \(\mathbf{w}\) in \(\mathbb{R}^n\) are called orthogonal if \(\mathbf{v}\ip\mathbf{w} = 0\). As before, we denote this by \(\mathbf{v}\perp\mathbf{w}\).

Example 1.2.2

Let \(\mathbf{u} = \begin{pmatrix} 1\\2\\-1\\-1\end{pmatrix}\), \(\mathbf{v} = \begin{pmatrix} 3\\-1\\2\\-1\end{pmatrix}\) and
\(\mathbf{w} = \begin{pmatrix} 2\\2\\-1\\2\end{pmatrix}\).

We compute

\[ \mathbf{u}\ip\mathbf{v} = 3-2-2+1 = 0, \]

\[ \mathbf{u}\ip\mathbf{w} = 2+4+1-2 = 5, \]

\[ \mathbf{v}\ip\mathbf{w} = 6 - 2 - 2 - 2 = 0, \]

and conclude that \(\mathbf{u}\) and \(\mathbf{v}\) are orthogonal, \(\mathbf{u}\) and \(\mathbf{w}\) are not orthogonal,
\(\mathbf{v}\) and \(\mathbf{w}\) are orthogonal.

Grasple exercise 1.2.1

https://embed.grasple.com/exercises/59912254-6fc8-43c7-9c44-1ea7eab1c236?id=62409

To compute some dot products in \(\R^2, \R^3, \R^4\).

In \(\mathbb{R}^2\), two non-zero vectors that are orthogonal to the same non-zero vector \(\mathbf{v}\) are automatically multiples of each other (i.e. have either the same or the opposite direction). In \(\mathbb{R}^n\) with \(n \geq 3\) this no longer holds. In the previous example both vectors \(\mathbf{u}\) and \(\mathbf{w}\) are orthogonal to the vector \(\mathbf{v}\), but \(\mathbf{u} \neq c\mathbf{w}\).

By definition the zero vector is orthogonal to any vector, since \(\mathbf{0}\ip\mathbf{v} = 0\). Moreover, the zero vector is the only vector that is orthogonal to itself, which is the content of the next proposition.

Proposition 1.2.2

Suppose \(\mathbf{v} \in \mathbb{R}^n\). Then \(\mathbf{v}\perp\mathbf{v} \iff \mathbf{v} = \mathbf{0}\).

Proof of Proposition 1.2.2

By definition

\[ \mathbf{v}\perp\mathbf{v} \iff \mathbf{v}\ip\mathbf{v}=0 \]

In Proposition 1.2.1 iv. it was stated that the last equality only holds for \(\mathbf{v} = \mathbf{0}\).

The fact that the zero vector is orthogonal to any vector is an immediate consequence of the definition, but it may seem counterintuitive to you. The following example illustrates a situation where this orthogonality leads to a much nicer outcome.

Example 1.2.3

Let \(\mathbf{n}\) be any non-zero vector in the plane. The set of vectors that are orthogonal to \(\mathbf{n}\) all lie on a line \(\mathcal{L}\) through the origin. (See Figure 1.2.6.) If we agree that \(\mathbf{0}\perp\mathbf{n}\), it will be the whole line. The vector \(\mathbf{n}\) is often said to be a normal vector to the line.

../_images/Fig-InnerProduct-PerpendicularLine.svg — Fig. 1.2.6 The line \(\mathcal{L}\) of vectors orthogonal to a non-zero vector \(\mathbf{n}\) in the plane.#

We conclude this subsection with another concept that we will come across later in a much more general context. Informally, it is the (orthogonal) projection of a vector onto another vector. More precisely, it is the orthogonal projection of a vector \(\mathbf{w}\) onto the line \(\mathcal{L}\) generated by the non-zero vector \(\mathbf{v}\), by which we mean \(\mathcal{L}= \{ c\mathbf{v}: c \in \mathbb{R}\}\).

See Figure 1.2.7.

orthogonal projection

Definition 1.2.3

The orthogonal projection of a vector \(\mathbf{w}\) onto a non-zero vector \(\mathbf{v}\) is the vector \(\mathbf{\hat{w}} = c\mathbf{v} \) for which

\[ (\mathbf{w} - \mathbf{\hat{w}}) \perp \mathbf{v}. \]

Another notation for this vector is

\[ \mathbf{\hat{w}} = \operatorname{proj}_{\mathbf{v}}(\mathbf{w}). \]

../_images/Fig-InnerProduct-ProjectionVectorLine.svg — Fig. 1.2.7 Projection of a vector \(\mathbf{w}\) onto a non-zero vector \(\mathbf{v}\).#

Proposition 1.2.3

In the definition above the vector \(\mathbf{\hat{w}}\) with these properties is unique and it is given by

\[ \operatorname{proj}_{\mathbf{v}}(\mathbf{w}) = \mathbf{\hat{w}} = \frac{\mathbf{w}\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}} \mathbf{v}. \]

Proof of Proposition 1.2.3

With the rules of the dot product the vector \(\mathbf{w}\) is easily constructed.

Starting from

\[ \mathbf{\hat{w}} = c\mathbf{v}, \text{ for some } c\in\mathbb{R} \]

and

\[ (\mathbf{w} - \mathbf{\hat{w}}) \perp \mathbf{v}, \]

it follows that we must have

\[ (\mathbf{w} - c\mathbf{v}) \ip \mathbf{v} = \mathbf{w}\ip \mathbf{v} - c \,(\mathbf{v}\ip \mathbf{v}) = 0. \]

So \(c\) is uniquely given by

\[ c = \frac{\mathbf{w}\ip \mathbf{v}}{\mathbf{v}\ip \mathbf{v}} \]

and indeed \(\mathbf{\hat{w}}\) must be as stated.

Example 1.2.4

We compute the orthogonal projection of the vector

\[\begin{split} \mathbf{w} = \begin{pmatrix} 2\\ -4 \\ -1 \\ -5\end{pmatrix} \end{split}\]

onto the vector

\[\begin{split} \mathbf{v} = \begin{pmatrix} 1 \\1\\1\\1\end{pmatrix}. \end{split}\]

We proceed as follows

\[\begin{split} \mathbf{\hat{w}} = \operatorname{proj}_{\mathbf{v}}(\mathbf{w}) = \frac{\mathbf{w}\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}} \mathbf{v} = \frac{-8}{4}\begin{pmatrix} 1 \\1\\1\\1\end{pmatrix} = \begin{pmatrix} -2\\-2\\-2\\-2\end{pmatrix}. \end{split}\]

We verify the orthogonality:

\[\begin{split} (\mathbf{w} - \mathbf{\hat{w}} )\ip \mathbf{v} = \begin{pmatrix} 4 \\-2\\1\\-3\end{pmatrix} \ip \begin{pmatrix} 1 \\1\\1\\1\end{pmatrix} = 4-2+1-3 = 0, \end{split}\]

so indeed

\[ (\mathbf{w} - \mathbf{\hat{w}} )\perp \mathbf{v}, \]

as required.

Grasple exercise 1.2.2

https://embed.grasple.com/exercises/88c460cd-36ee-49b0-8fb8-d29b55ad253a?id=84822

Computing the projection of a vector \(\vect{w}\) onto a vector \(\vect{v}\).

Exercise 1.2.4

Suppose \(\operatorname{proj}_{\mathbf{v}}(\mathbf{w}_1) = \operatorname{proj}_{\mathbf{v}}(\mathbf{w}_2) \), for three non-zero vectors \(\mathbf{v}, \,\mathbf{w}_1,\,\mathbf{w}_2\) in \(\mathbb{R}^n\). What does this say about the relative positions of the three vectors?

Verify your statement for the following three vectors

\[\begin{split} \mathbf{v} = \begin{pmatrix} 1\\ 1 \\ -2 \\ -3\end{pmatrix}, \quad \mathbf{w}_1 = \begin{pmatrix} 6\\ 4 \\ -7 \\ -7\end{pmatrix}, \quad \mathbf{w}_2 = \begin{pmatrix} 5\\ 6 \\ -2 \\ -10\end{pmatrix}. \end{split}\]

Solution to Exercise 1.2.4

Suppose \(\operatorname{proj}_{\mathbf{v}}(\mathbf{w}_1) = \operatorname{proj}_{\mathbf{v}}(\mathbf{w}_2) \). Thus \(\dfrac{\mathbf{w}_1\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}} \mathbf{v} = \dfrac{\mathbf{w}_2\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}} \mathbf{v}\).

Since \(\mathbf{v}\) is not the zero vector this implies that \(\mathbf{w}_1\ip\mathbf{v} = \mathbf{w}_2\ip\mathbf{v}\). In other words,

\[ \mathbf{w}_1\ip\mathbf{v} - \mathbf{w}_2\ip\mathbf{v} = (\mathbf{w}_1 - \mathbf{w}_2)\ip \mathbf{v} = 0, \]

which expresses that \((\mathbf{w}_1 - \mathbf{w}_2)\perp \vect{v}\).

For the given vectors \(\mathbf{v}, \mathbf{w}_1, \mathbf{w}_2\) we find

\[ \dfrac{\mathbf{w}_1\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}} \mathbf{v} = \frac{\mathbf{w}_2\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}} \mathbf{v} = \dfrac{45}{15}\mathbf{v} \]

and

\[\begin{split} \mathbf{w}_1 - \mathbf{w}_2 = \begin{pmatrix} 6\\ 4 \\ -7 \\ -7\end{pmatrix} - \begin{pmatrix} 5\\ 6 \\ -2 \\ -10 \end{pmatrix} = \begin{pmatrix} 1\\ -2 \\ -5 \\ 3\end{pmatrix}. \end{split}\]

We see \((\mathbf{w}_1 - \mathbf{w}_2)\ip \mathbf{v} = 1 - 2 + 10 + 9 = 0\), so indeed \((\mathbf{w}_1 - \mathbf{w}_2)\) and \(\vect{v}\) are orthogonal.

Figure 1.2.8 shows what is going on.

../_images/Fig-InnerProduct-SameProj.svg — Fig. 1.2.8 Two vectors \(\vect{w}_1\), \(\vect{w}_2 \) with the same projection onto \(\vect{v}\).#

1.2.4. Norm in \(\mathbb{R}^n\)#

The length of a vector in the plane can be computed using the dot product: for \(\mathbf{v}=\begin{pmatrix}a_{1}\\a_{2}\end{pmatrix}\) in \(\mathbb{R}^2\) we have seen that

\[ \norm{\mathbf{v}} = \sqrt{a_1^2 + a_2^2} = \sqrt{\mathbf{v}\ip\mathbf{v}}. \]

The identity \(\norm{\mathbf{v}} = \sqrt{\mathbf{v}\ip\mathbf{v}}\) also holds in \(\mathbb{R}^3\).

It seems natural to extend the concept to \(\mathbb{R}^n\). Again, for this more general space a new word is introduced.

norm

Definition 1.2.4

The norm of a vector \(\mathbf{v}\) in \(\mathbb{R}^n\), denoted by \(\norm{\mathbf{v}}\), is defined by

\[ \norm{\mathbf{v}} = \sqrt{\mathbf{v}\ip\mathbf{v}\,}. \]

Expressed in the entries of \(\mathbf{v}\) this yields

\[ \norm{\mathbf{v}} = \sqrt{a_1^2+ a_2^2 + \cdots +a_n^2\,}\,, \]

so for vectors in \(\mathbb{R}^2\) and \(\mathbb{R}^3\) the norm of a vector is just the length of the vector.

As we might expect the norm has many properties in common with length.

Proposition 1.2.4

For any \(\mathbf{v}, \,\mathbf{w} \in \mathbb{R}^{n}\) and all \(c \in \mathbb{R}\) the following holds:

i. \(\norm{\mathbf{v}}\geq 0\), and \(\norm{\mathbf{v}} = 0\) only for \(\mathbf{v}=\mathbf{0}\).

ii. Scaling property:

(1.2.8)#\[\norm{c\mathbf{v}} = |c|\norm{\mathbf{v}}.\]

iii. Triangle Inequality:

(1.2.9)#\[\norm{\mathbf{v}+\mathbf{w}} \leq \norm{\mathbf{v}}+\norm{\mathbf{w}}.\]

The first two of these properties are very easy to prove. The proof of the triangle inequality we postpone until the end of the section. Figure 1.2.9 explains the name.

../_images/Fig-InnerProduct-TriangleInequality.svg — Fig. 1.2.9 The Triangle Inequality.#

Example 1.2.5

We compute the norms of the vectors

\[\begin{split} \mathbf{v} = \begin{pmatrix} 1 \\ -2 \\ 3 \\ -1 \end{pmatrix} \quad \text{and} \quad -2\mathbf{v} = \begin{pmatrix} -2 \\ 4 \\ -6 \\ 2 \end{pmatrix}. \end{split}\]

We find

\[ \norm{\mathbf{v}} = \sqrt{1^2 + (-2)^2 + 3^2 + (-1)^2\,} = \sqrt{15}. \]

and

\[ \norm{-2\mathbf{v}} = \sqrt{(-2)^2 + 4^2 + (-6)^2 + 2^2\,} = \sqrt{60} = 2\sqrt{15}. \]

The last norm can also be found via

\[ \norm{-2\mathbf{v}} = |-2|\cdot\norm{\mathbf{v}} = 2 \sqrt{15}. \]

distance

Definition 1.2.5

The distance between two vectors in \(\R^n\) is defined by

\[ \operatorname{dist}(\vect{u},\vect{v}) = \norm{\vect{v}-\vect{u}}. \]

Example 1.2.6

For the vectors \(\vect{u} = \begin{pmatrix}1 \\ 3 \\ 2 \\ 4 \end{pmatrix}\) and \(\vect{v} = \begin{pmatrix}5 \\ 1 \\ 3 \\ 4 \end{pmatrix}\) in \(\R^4\)

the distance is given by

\[\begin{split} \norm{\vect{v}-\vect{u}} = \norm{\begin{pmatrix}4 \\ -2 \\ 1 \\ 0 \end{pmatrix}} = \sqrt{4^2 + (-2)^2 + 1^2 + 0^2} = \sqrt{21}. \end{split}\]

../_images/Fig-InnerProduct-Distance.svg — Fig. 1.2.10 The distance between two vectors.#

Grasple exercise 1.2.3

https://embed.grasple.com/exercises/5bc4274c-56a0-461b-bd3d-9f8bdb8f44e0?id=69740

Computing the distance between two vectors in \(\R^3\).

From the rules of the norm the following rules of the distance function can be deduced.

Proposition 1.2.5

For any three vectors \(\mathbf{u}, \mathbf{v}\) and \(\mathbf{w} \in \mathbb{R}^{n}\) the following statements hold.

i. \(\operatorname{dist}(\vect{u},\vect{v}) = \operatorname{dist}(\vect{v},\vect{u})\).

ii. \(\operatorname{dist}(\vect{u},\vect{v}) = 0 \iff \vect{u}=\vect{v}\).

iii. \(\operatorname{dist}(\vect{u},\vect{w}) \leq \operatorname{dist}(\vect{u},\vect{v}) + \operatorname{dist}(\vect{v},\vect{w})\).

Rule iii. is again called the Triangle Inequality.

Exercise 1.2.5

Check the three properties of the distance function as stated in Proposition 1.2.5.
For Rule iii., only show how it follows from the corresponding Rule iii. in Proposition 1.2.4.

Solution to Exercise 1.2.5

\(\operatorname{dist}(\mathbf{u},\mathbf{v})=\left\|\mathbf{v}-\mathbf{u}\right\|=\left\|(-1)\left(\mathbf{u}-\mathbf{v}\right)\right\|=\left|-1\right|\left\|\mathbf{u}-\mathbf{v}\right\|=\left\|\mathbf{u}-\mathbf{v}\right\|=\operatorname{dist}(\mathbf{u},\mathbf{v})\).
\(\operatorname{dist}(\mathbf{u},\mathbf{v})=0 \iff \left\|\mathbf{v}-\mathbf{u}\right\|=0 \iff \mathbf{v}-\mathbf{u}=\mathbf{0} \iff \mathbf{u}=\mathbf{v}\).
We perform a few steps:

\[\begin{split}\begin{array}{rcl}\operatorname{dist}(\mathbf{u},\mathbf{w})&=&\left\|\mathbf{w}-\mathbf{u}\right\|\\&=&\left\|\mathbf{w}-\mathbf{u}+\mathbf{v}-\mathbf{v}\right\|\\&=&\left\|\mathbf{w}-\mathbf{v}+\mathbf{v}-\mathbf{u}\right\|\\&\leq&\left\|\mathbf{w}-\mathbf{v}\left\|+\right\|\mathbf{v}-\mathbf{u}\right\|\\&=&\operatorname{dist}(\vect{u},\vect{v}) + \operatorname{dist}(\vect{v},\vect{w})\end{array}.\end{split}\]

With the tools so far we can define a notion that comes in handy later.

unit vector

Definition 1.2.6

A unit vector is a vector of norm \(1\).

Moreover, for any non-zero vector \(\mathbf{v}\), the vector

\[ \mathbf{u} = \frac{\mathbf{v}}{\norm{\mathbf{v}}} \]

is called the unit vector in the direction of \(\mathbf{v}\).

Proposition 1.2.6

For a non-zero vector \(\mathbf{v}\)

\[ \frac{\mathbf{v}}{\norm{\mathbf{v}}} \]

is the unique vector \(\mathbf{u}\) of norm 1 such that

\[ \mathbf{u} = k\mathbf{v}, \text{ for some } k > 0. \]

Proof of Proposition 1.2.6

Assume that \(\mathbf{v} \neq \mathbf{0}\). For \(\mathbf{u} = k\mathbf{v}\), with \(\norm{\mathbf{u}} = 1\) and \(k > 0\) to hold, we must have

\[ \norm{\mathbf{u}} = \norm{k\mathbf{v}} = |k|\norm{\mathbf{v}} = k\norm{\mathbf{v}} = 1. \]

We see that

\[ k = \dfrac{1}{\norm{\mathbf{v}}} \]

and consequently

\[ \mathbf{u} = \dfrac{1}{k}\mathbf{v} = \frac{\mathbf{v}}{\norm{\mathbf{v}}}. \]

Example 1.2.7

We compute the unit vector \(\mathbf{u}\) in the direction of the vector \(\mathbf{v} = \begin{pmatrix}1 \\ 2 \\ 4 \\ -2 \end{pmatrix}\) in \(\mathbb{R}^4\).
As follows:

\[\norm{\mathbf{v}} = \sqrt{1^2+2^2+4^2+(-2)^2} = \sqrt{25} = 5, \]

so

\[\begin{split} \mathbf{u} = \dfrac{1}{5} \begin{pmatrix}1 \\ 2 \\ 4 \\ -2 \end{pmatrix} = \begin{pmatrix}1/5 \\ 2/5 \\ 4/5 \\ -2/5 \end{pmatrix}. \end{split}\]

Interestingly, Pythagoras’ theorem also holds in \(\mathbb{R}^n\).

Theorem 1.2.1

For any two vectors \(\mathbf{v}\) and \(\mathbf{w}\) in \(\mathbb{R}^n\) we have

\[ \norm{\mathbf{v}+\mathbf{w}}^2 = \norm{\mathbf{v}}^2 + \norm{\mathbf{w}}^2 \iff \mathbf{v} \perp \mathbf{w}. \]

Proof of Theorem 1.2.1

This follows quite straightforwardly from the properties of the dot product.

Let us start from the identity on the left and work our way to the conclusion on the right, making sure that each step is reversible. Note that from the definition of the norm it follows immediately that \(\norm{\mathbf{v}}^2 = \mathbf{v}\ip\mathbf{v}\).

\[\begin{split} \begin{array}{cl} &\norm{\mathbf{v}+\mathbf{w}}^2 = \norm{\mathbf{v}}^2 + \norm{\mathbf{w}}^2 \\ \iff &(\mathbf{v}+\mathbf{w})\ip(\mathbf{v}+\mathbf{w}) = \mathbf{v}\ip\mathbf{v} + \mathbf{w}\ip\mathbf{w} \\ \iff&\mathbf{v}\ip\mathbf{v} + \mathbf{v}\ip\mathbf{w}+\mathbf{w}\ip\mathbf{v}+ \mathbf{w}\ip\mathbf{w} = \mathbf{v}\ip\mathbf{v} + \mathbf{w}\ip\mathbf{w}. \end{array} \end{split}\]

Next we subtract \(\mathbf{v}\ip\mathbf{v} + \mathbf{w}\ip\mathbf{w}\) from both sides. Thus the last identity is equivalent to

\[ \mathbf{v}\ip\mathbf{w}+\mathbf{w}\ip\mathbf{v} = 0. \]

And then we are almost there:

\[ \mathbf{v}\ip\mathbf{w}+\mathbf{w}\ip\mathbf{v} = 0 \iff 2\,\mathbf{v}\ip\mathbf{w} = 0 \iff \mathbf{v}\ip\mathbf{w}= 0 \iff \mathbf{v}\perp\mathbf{w}. \]

Example 1.2.8

We verify the equality for the vectors \(\mathbf{v} = \begin{pmatrix} 2 \\ -3\\ 3 \\ 1 \end{pmatrix}\) and \(\mathbf{w} = \begin{pmatrix} 2 \\ 4 \\ 1 \\ 5 \end{pmatrix}\) in \(\mathbb{R}^4\).

First of all

\[ \mathbf{v} \ip \mathbf{w} = 4 - 12 + 3 + 5 = 0, \]

so \(\mathbf{v}\perp \mathbf{w}\), and second

\[ \norm{\mathbf{v}} = \sqrt{2^2 + (-3)^2 + 3^2 + 1^2} = \sqrt{23}, \quad \norm{\mathbf{w}} = \sqrt{2^2 + 4^2 + 1^2 + 5^2} = \sqrt{46}. \]

Furthermore

\[\begin{split} \mathbf{v}+\mathbf{w} = \begin{pmatrix} 4 \\ 1 \\ 4 \\ 6 \end{pmatrix} \Longrightarrow \norm{\mathbf{v}+\mathbf{w}} = \sqrt{4^2+1^2+4^2+6^2} = \sqrt{69}\end{split}\]

and we see that indeed

\[ \norm{\mathbf{v}+\mathbf{w}}^2 = 69 = 23 + 46 = \norm{\mathbf{v}}^2+\norm{\mathbf{w}}^2. \]

One of the most basic properties, also one with a wide range of applications, is the so-called Cauchy-Schwarz Inequality.

Cauchy-Schwarz inequality

Theorem 1.2.2 (Cauchy-Schwarz Inequality)

For any two vectors in \(\mathbb{R}^n\)

\[ |\mathbf{v}\ip\mathbf{w}| \leq \norm{\mathbf{v}} \norm{\mathbf{w}}. \]

There are many ways to prove the Cauchy-Schwarz inequality. There is even a whole book devoted to it: “Cauchy Schwarz master class” by J.M. Steele.

The following proof is based on orthogonal projection and Pythagoras’ Theorem.

Proof of Theorem 1.2.2 (Cauchy-Schwarz Inequality)

If \(\mathbf{v} = \mathbf{0}\), the zero vector, then the inequality obviously holds; in fact it becomes an equality:

\[ \mathbf{v} = \mathbf{0} \Longrightarrow \norm{\mathbf{v}} = 0 \Longrightarrow \norm{\mathbf{v}} \norm{\mathbf{w}} = 0 \]

and also

\[ \mathbf{v} = \mathbf{0} \Longrightarrow \mathbf{v}\ip \mathbf{w} = 0 \Longrightarrow |\mathbf{v}\ip \mathbf{w}| = 0. \]

So now suppose \(\mathbf{v} \neq \mathbf{0}\).

Let

\[ \mathbf{\hat{w}} = \dfrac{\mathbf{w}\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}}\,\mathbf{v} \]

be the projection of \(\mathbf{w}\) onto \(\mathbf{v}\). Then we can apply Pythagoras’ Theorem:

\[ (\mathbf{w} - \mathbf{\hat{w}}) \perp \mathbf{\hat{w}} \Longrightarrow \norm{\mathbf{w} - \mathbf{\hat{w}}}^2 + \norm{ \mathbf{\hat{w}}}^2 = \norm{(\mathbf{w} - \mathbf{\hat{w}}) + \mathbf{\hat{w}}}^2 = \norm{\mathbf{w}}^2. \]

It follows that

\[ \norm{ \mathbf{\hat{w}}}^2 = \norm{\mathbf{w}}^2 - \norm{\mathbf{w} - \mathbf{\hat{w}}}^2 \leq \norm{\mathbf{w}}^2. \]

Substitution of the expression for \(\mathbf{\hat{w}}\) leads to

\[ \left(\dfrac{\mathbf{w}\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}}\right)^2 \norm{\mathbf{v}}^2 = \dfrac{(\mathbf{w}\ip\mathbf{v})^2}{(\mathbf{v}\ip\mathbf{v})^2} \norm{\mathbf{v}}^2 \leq \norm{\mathbf{w}}^2. \]

Using

\[ \mathbf{v}\ip\mathbf{v} = \norm{\mathbf{v}}^2 \]

we deduce that

\[ (\mathbf{w}\ip\mathbf{v})^2 \leq \norm{\mathbf{v}}^2\norm{\mathbf{w}}^2. \]

Taking square roots we may conclude that indeed

\[ |\mathbf{w}\ip\mathbf{v}| \, \leq \, \norm{\mathbf{v}} \norm{\mathbf{w}}. \]

Example 1.2.9

We verify that the inequality holds for the vectors \(\mathbf{v} = \begin{pmatrix} 1 \\ -2\\ 3 \\ -4 \end{pmatrix}\) and \(\mathbf{w} = \begin{pmatrix} -5 \\ 4 \\-3 \\ 0 \end{pmatrix}\) in \(\mathbb{R}^4\).

As follows

\[ \mathbf{v}\ip\mathbf{w} = -5-8-9 = -22, \quad \norm{\mathbf{v}} = \sqrt{30}, \quad \norm{\mathbf{w}} = \sqrt{50} \]

and we see that indeed

\[ |\mathbf{v}\ip\mathbf{w}| = 22 \leq \norm{\mathbf{v}} \norm{\mathbf{w}} = \sqrt{1500}. \]

With this inequality established, the Triangle Inequality in Equation (1.2.9) is easily proved. Let’s repeat it, and prove it.

Theorem 1.2.3

For any two vectors in \(\mathbb{R}^n\),

\[ \norm{\mathbf{v}+\mathbf{w}} \leq \norm{\mathbf{v}}+\norm{\mathbf{w}}. \]

Proof of Theorem 1.2.3

Since all terms involved are non-negative we may as well show that the inequality holds for the squares:

\[\begin{split} \begin{array}{l} \norm{\mathbf{v}+\mathbf{w}}^2 \leq (\norm{\mathbf{v}}+\norm{\mathbf{w}})^2 \\ \iff (\mathbf{v}+\mathbf{w})\ip(\mathbf{v}+\mathbf{w}) \leq \norm{\mathbf{v}}^2 + 2\norm{\mathbf{v}}\norm{\mathbf{w}} + \norm{\mathbf{w}}^2 \\ \iff \mathbf{v}\ip\mathbf{v} + 2\mathbf{v}\ip\mathbf{w}+\mathbf{w}\ip\mathbf{w} \leq \norm{\mathbf{v}}^2 + 2\norm{\mathbf{v}}\norm{\mathbf{w}} + \norm{\mathbf{w}}^2 \\ \iff 2\,\mathbf{v}\ip\mathbf{w} \leq 2\norm{\mathbf{v}}\norm{\mathbf{w}} \end{array} \end{split}\]

and this, apart from the factor \(2\), is the Cauchy-Schwarz Inequality.

Example 1.2.10

We verify the inequality for the vectors \(\mathbf{v} = \begin{pmatrix} -1 \\ 2\\ 3 \end{pmatrix}\) and \(\mathbf{w} = \begin{pmatrix} 4 \\ -4\\ 3 \end{pmatrix}\):

\[ \norm{\mathbf{v} + \mathbf{w}} = \sqrt{3^2+(-2)^2+6^2} =\sqrt{49} = 7 \]

and indeed

\[ \norm{\mathbf{v}} + \norm{\mathbf{w}} = \sqrt{14} + \sqrt{35} \approx 9.7 > \norm{\mathbf{v} + \mathbf{w}}. \]

1.2.5. Angles in \(\mathbb{R}^n\)#

The first motivation to consider the dot product came from the question of perpendicularity of two vectors in the plane or in \(\R^3\). Perpendicularity of two vectors means that the angle between them is equal to \(\frac12\pi\). Below we will show that it is possible to express the angle between any two (non-zero) vectors into dot products. And use this to define the concept of angle in a general space \(\R^n\).

../_images/Fig-InnerProduct-AngleAndProjection.svg — Fig. 1.2.11 Angle between two vectors.#

First we will show a geometrical characterisation of the dot product that holds in \(\mathbb{R}^2\) as well as in \(\mathbb{R}^3\).

Proposition 1.2.7

For two non-zero vectors \(\mathbf{v}\) and \(\mathbf{w}\) in either \(\mathbb{R}^2\) or \(\mathbb{R}^3\) the following identity holds:

(1.2.10)#\[\mathbf{v}\ip\mathbf{w} = \norm{\mathbf{v}}\norm{\mathbf{w}} \cos(\varphi)\]

where \(\varphi\) is the angle between \(\mathbf{v}\) and \(\mathbf{w}\).

Note that this is in line with the special case of two perpendicular vectors:

\[ \mathbf{v}\perp\mathbf{w} \iff \mathbf{v}\ip\mathbf{w}=0 \iff \cos(\varphi)=0. \]

Observation 1.2.1

The angle between two non-zero vectors \(\mathbf{v}\) and \(\mathbf{w}\) is thus determined by dot products in the following way

\[ \cos(\varphi) = \frac{\mathbf{w}\ip\mathbf{v}}{\norm{\mathbf{v}}\norm{\mathbf{w}}}. \]

The value of \(\varphi\) between \(0\) and \(\pi\) is then uniquely determined by

\[ \varphi = \arccos\left(\frac{\mathbf{w}\ip\mathbf{v}}{\norm{\mathbf{v}}\norm{\mathbf{w}}}\right)= \cos^{-1}\left(\frac{\mathbf{w}\ip\mathbf{v}}{\norm{\mathbf{v}}\norm{\mathbf{w}}}\right). \]

Proof of Proposition 1.2.7

We will derive Equation (1.2.10). Assume that \(\mathbf{v}\) and \(\mathbf{w}\) are non-zero vectors. Recall the formula of the orthogonal projection of \(\mathbf{w}\) onto \(\mathbf{v}\),

\[ \mathbf{\hat{w}} = \dfrac{\mathbf{w}\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}}\mathbf{v}. \]

Let \(\varphi \in[0,\pi]\) denote the angle between two non-zero vectors \(\mathbf{v}\) and \(\mathbf{w}\).

From Figure 1.2.11 it is clear that the factor

\[ \dfrac{\mathbf{w}\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}} \]

is positive if the angle is acute, zero if the angle is right, and negative if the angle is obtuse.

In the case of an acute angle, by considering the right triangle \(\Delta OAB\), where \(A\) is the end point of \(\mathbf{\hat{w}}\) and \(B\) is the end point of \(\mathbf{w}\), we see that on the one hand

\[ OA = \norm{\dfrac{\mathbf{w}\ip\mathbf{v}}{\mathbf{v}\ip\mathbf{v}}\mathbf{v}} = \dfrac{|\mathbf{w}\ip\mathbf{v}|}{\mathbf{v}\ip\mathbf{v}}\norm{\mathbf{v}} = \dfrac{\mathbf{w}\ip\mathbf{v}}{\norm{\mathbf{v}}^2} \norm{\mathbf{v}} = \dfrac{\mathbf{w}\ip\mathbf{v}}{\norm{\mathbf{v}}} \]

and on the other hand

\[ OA = OB\cos(\varphi) = \norm{\mathbf{w}}\cos(\varphi). \]

So we may conclude that

(1.2.11)#\[\mathbf{w}\ip\mathbf{v} = \norm{\mathbf{v}}\norm{\mathbf{w}}\cos(\varphi).\]

In the case of an obtuse angle, we use that the projection of \(\mathbf{w}\) onto \(\mathbf{v}\) is equal to the projection of \(\mathbf{w}\) onto \(-\mathbf{v}\), as it is in fact the projection onto the line consisting of all multiples of \(\mathbf{v}\). Now look at the picture on the right of Figure 1.2.11 . There you see that \(\mathbf{w}\) and \(-\mathbf{v}\) make an acute angle \(\psi = \pi - \varphi\), so we can apply Equation (1.2.11) to \(\mathbf{w}\) and \(-\mathbf{v}\):

\[\begin{split} \begin{array}{rcl} \mathbf{w}\ip\mathbf{v} = - \mathbf{w}\ip(\mathbf{-v}) &=& -\norm{\mathbf{w}}\norm{\mathbf{-v}}\cos(\psi) \\ &=& -\norm{\mathbf{w}}\norm{\mathbf{v}}\cos(\pi-\varphi) \\ &=& \norm{\mathbf{w}}\norm{\mathbf{v}}\cos(\varphi). \end{array} \end{split}\]

Observation 1.2.2

Note that the absolute value of the expression

\[ \norm{\mathbf{w}}\cos(\varphi) \]

is the length of the orthogonal projection of \(\vect{w}\) onto \(\vect{v}\).

Example 1.2.11

In a methane molecule \(\ce{CH_4}\) the four \(\ce{H}\)-atoms are positioned in a perfectly symmetrical way around the \(\ce{C}\)-atom. We can model this as follows: put the \(\ce{C}\)-atom at the origin of \(\mathbb{R}^3\), and the \(\ce{H}\)-atoms at the positions/vectors

\[\begin{split} \mathbf{v}_1 = \begin{pmatrix}1 \\ 1 \\ 1 \end{pmatrix}, \quad \mathbf{v}_2 = \begin{pmatrix}-1 \\ -1 \\ 1 \end{pmatrix}, \quad \mathbf{v}_3 = \begin{pmatrix}-1 \\ 1 \\ -1 \end{pmatrix} \quad \text{and} \quad \mathbf{v}_4 = \begin{pmatrix}1 \\ -1 \\ -1 \end{pmatrix}. \end{split}\]

Then all four points have the same distance \(\sqrt{3}\) to the origin, and all points have the same distance to each other, namely

\[ \norm{\vect{v}_i - \vect{v}_j} = \sqrt{2^2 + 2^2 + 0^2} = \sqrt{8}, \text{ for } i \neq j. \]

The angle between, for instance, \(\mathbf{v}_1\) and \(\mathbf{v}_3\) is determined by

\[ \cos(\varphi) = \dfrac{\mathbf{v}_1\ip\mathbf{v}_3}{\norm{\mathbf{v}_1}\norm{\mathbf{v}_3}} = \dfrac{-1}{\sqrt{3}\cdot\sqrt{3}} = -\frac13. \]

So

\[ \varphi = \arccos(-\tfrac13) \approx 1.9106 \approx 109.47^{\circ}. \]

Since we have defined the dot product and the norm in \(\mathbb{R}^n\), we can use the last formula to also define the angle between two vectors in \(\mathbb{R}^n\).

angle

Definition 1.2.7

For two non-zero vectors \(\mathbf{v}\) and \(\mathbf{w}\) in \(\mathbb{R}^n\), the angle between the vectors is defined as

\[ \varphi = \angle(\mathbf{v},\mathbf{w}) = \arccos\left(\dfrac{\mathbf{v}\ip\mathbf{w}}{\norm{\mathbf{v}} \norm{\mathbf{w}}} \right). \]

This definition makes sense, since the Cauchy-Schwarz inequality (Theorem 1.2.2) implies

\[ -1 \leq \dfrac{\mathbf{v}\ip\mathbf{w}}{\norm{\mathbf{v}}\,\norm{\mathbf{w}}} \leq 1. \]

Note that just as before in the plane and in three-dimensional space, for non-zero vectors \(\mathbf{v}\) and \(\mathbf{w}\) we have

\[ \mathbf{v}\perp\mathbf{w} \iff \mathbf{v}\ip\mathbf{w}=0 \iff \dfrac{\mathbf{v}\ip\mathbf{w}}{\norm{\mathbf{v}}\,\norm{\mathbf{w}}}=0 \iff \varphi = \angle(\mathbf{v},\mathbf{w}) = \tfrac12\pi. \]

Example 1.2.12

Let \(\mathbf{e_1}\) be the vector in \(\mathbb{R}^n\) with first entry equal to \(1\) and all other entries equal to \(0\), and \(\mathbf{v}\) be the vector with all entries equal to \(1\). We find the angle between \(\mathbf{e}_1\) and \(\mathbf{v}\) in all cases \(n = 2, 3, 4,\ldots\)

For each \(n\geq2\) we write \(\varphi_n = \angle(\mathbf{e}_1,\mathbf{v})\). Then

\[ \cos(\varphi_n) = \dfrac{\mathbf{e}_1\ip\mathbf{v}}{\norm{\mathbf{e}_1}\norm{\mathbf{v}}} = \dfrac{1}{\sqrt{n}}. \]

So:

\[ \varphi_n = \arccos(\tfrac{1}{\sqrt{n}}), \, n = 1,2,3,\ldots \]

For \(n=1\) we find \(\cos(\varphi_1) = 1\), so \(\varphi_1 = 0\), which makes sense, and for \(n=2\), \(\cos(\varphi_2) = \frac{1}{\sqrt{2}}\), so \(\varphi_2 = \frac14\pi\), which you can check by a sketch in the plane.

For \(n\geq3\) we don’t get easy answers, but as \(\frac{1}{\sqrt{n}} \downarrow 0\) when \(n\) gets large, we may conclude that for large \(n\) in \(\mathbb{R}^n\) the two vectors are ‘almost’ orthogonal.

1.2.6. Grasple exercises#

Grasple exercise 1.2.4

https://embed.grasple.com/exercises/7bb32c8c-9a2e-49bd-85fa-b7d205949510?id=114535

To compute dot products in \(\R^2\), \(\R^3\) and \(\R^4\).

Grasple exercise 1.2.5

https://embed.grasple.com/exercises/7b49e0f5-ae8b-4e92-8878-665dc080b7ee?id=65601

To find a vector orthogonal to a given vector in \(\R^2\).

Grasple exercise 1.2.6

https://embed.grasple.com/exercises/c8b4eed4-179f-42ab-9ec9-07f66445c960?id=69482

To find a vector orthogonal to two given vectors in \(\R^2\).

Grasple exercise 1.2.7

https://embed.grasple.com/exercises/b5a4e1c0-92ca-4307-9eb0-25a3a5807fc7?id=62415

To find a vector orthogonal to a given vector in \(\R^3\).

Grasple exercise 1.2.8

https://embed.grasple.com/exercises/34bbb9e1-207e-4c06-8686-1c32b3f3d0aa?id=78751

To find a vector orthogonal to a given vector in \(\R^4\).

Grasple exercise 1.2.9

https://embed.grasple.com/exercises/30a7abfe-9d40-4faa-a848-83bd67e024a0?id=62406

To compute the norms of vectors in \(\R^2\), \(\R^3\), \(\R^4\).

Grasple exercise 1.2.10

https://embed.grasple.com/exercises/7dc339bb-fe79-4eb9-914c-ea1a7ca85a85?id=69737

To find the norm of the ‘all one’ vector in \(\mathbb{R}^n\).

Grasple exercise 1.2.11

https://embed.grasple.com/exercises/8de90b0e-e89a-49a6-aa63-1b1e39f6e98e?id=79262

To find the distance between two vectors in \(\mathbb{R}^4\).

Grasple exercise 1.2.12

https://embed.grasple.com/exercises/d4dd1154-a3ec-497e-bc73-1cd96529f0e7?id=69741

Find \(h\) such that the distance between two points has a given value \(d\).

Grasple exercise 1.2.13

https://embed.grasple.com/exercises/c2242315-7e4f-463b-b3cf-09e9e15c8b2b?id=69739

To find a unit vector on a given line through \((0,0)\).

Grasple exercise 1.2.14

https://embed.grasple.com/exercises/67334454-d109-45a2-b640-545041ff896d?id=62416

Find \(\operatorname{proj}_{\mathbf{v}}(\mathbf{w})\) in \(\R^2\).

Grasple exercise 1.2.15

https://embed.grasple.com/exercises/9705b078-6c91-42c6-9768-8a043115b881?id=62658

Find \(\operatorname{proj}_{\mathbf{v}}(\mathbf{w})\) in \(\R^4\).

Grasple exercise 1.2.16

https://embed.grasple.com/exercises/531d3be2-dd62-4c21-b023-70e0b63809be?id=78747

Regarding norm and orthogonality of \(\vect{u}\), \(\vect{v}\), \(\vect{u}-\vect{v}\) and \(\vect{u}+\vect{v}\).

Grasple exercise 1.2.17

https://embed.grasple.com/exercises/c4d2743f-5f14-4812-9531-1a40c28c15cb?id=62413

To prove that \((\vect{v}+\vect{w})\ip\vect{x} = \vect{v}\ip\vect{x}+\vect{w}\ip\vect{x}\).

Grasple exercise 1.2.18

https://embed.grasple.com/exercises/161ecdf6-4cfb-41ba-bc16-685fe8532471?id=62414

To show that \((\vect{v}+\vect{w})\ip(\vect{v}-\vect{w}) = \norm{\vect{v}}^2 - \norm{\vect{w}}^2\).

Grasple exercise 1.2.19

https://embed.grasple.com/exercises/407cb45d-2baf-4b0d-a1eb-6e51186e19f3?id=69738

What to conclude from \(\norm{\vect{v}+\vect{w}} = \norm{\vect{v}}+\norm{\vect{w}}\)?

Grasple exercise 1.2.20

https://embed.grasple.com/exercises/c4c1c609-b1dd-4588-865f-53d7e8221f88?id=62689

To prove that \(-1 \leq \dfrac{\vect{u}\ip\vect{v}}{\norm{\vect{u}} \norm{\vect{v}}} \leq 1\).

Grasple exercise 1.2.21

https://embed.grasple.com/exercises/2a2423c3-0907-40b7-bd5f-7607baf7cc09?id=62668

What to conclude from \(\operatorname{proj}_{\mathbf{v}}(\mathbf{w}_1 ) = \operatorname{proj}_{\mathbf{v}}(\mathbf{w}_2)\)?