7.4. Least Squares Solutions#
7.4.1. Introduction#
In Chapter 2, especially Section 2.1, we studied linear systems. One way to write them down was as a matrix-vector equation
In this section we will reconsider the inconsistent situation and ask ourselves the question whether there is a vector
One common situation where an inconsistent linear system arises quite naturally is fitting a line through a set of points.
Suppose
There are different ways to define what is the best line. For instance, we may mean the line that minimizes the sum of the (perpendicular) distances of the points to the line. From a purely geometric point of view that seems the most natural way. Or, we can take the line for which the sum of vertical distances from the points to the line, i.e.,
is minimal. This approach makes sense in a physicial context where typically the
Both are sensible ideas. However, to turn any of these two ideas into an algorithm to find the best line is not as straightforward as the computations that come up if we put the question into the realm of linear algebra. And there it will turn out to be the problem of an inconsistent linear system.
Ideally there are parameters
contains all points
That means that all equations
will be satisfied simultaneously. This only happens if the matrix-vector equation
is consistent. Which generally is not the case.
We will come back to this question in Subsection 7.4.5.
7.4.2. Least Squares Solutions#
Let
We have seen (Section 2.4, Remark 2.4.1) that the linear system
is equivalent to the vector equation
What we can do if the linear system is inconsistent, thus if
is try to find the best approximation of the vector
Let
holds.
The vector
is called the least squares error.
Consider the linear system
For the trial solution
and
By definition
By minimizing
From
we read off that a least squares solution yields a linear combination of the columns of
For a consistent linear system a least squares solution will be an actual solution,
i.e.
will be zero.
In the situation where we want to fit a line
The interpretation of the ‘error vector’
and the error
The least squares solution
The questions we will address are
-
Does a least squares solution always exist?
-
How to compute the least squares solution(s)?
-
If a least squares solution exists, is it unique?
The next proposition provides the answers to question i. and question iii.
For each linear system
Proof of Proposition 7.4.1
In Remark 7.4.1 it was noted that a least squares solution corresponds to the vector in Col
The vector in Col
This projection, a linear combination of the colums of
The coefficients of this linear combination then give a least squares solution.
Lastly, these coefficients are unique if and only if the columns of
Find the least squares solution of the linear system
According to Proposition 7.4.1 the least squares solution consists of the coefficients of the orthogonal projection of the vector
In this first example we have chosen
So by the projection formula for an orthogonal basis (see Theorem 7.2.2, see side note), the projection is given by
And then the least squares solution is found to be
For this vector we find
Proposition 7.4.1 tells us this is the closest we can get.
In Example 7.4.2 the coefficients of the orthogonal projection were quickly found due to the fact that the vectors
7.4.3. Normal Equations#
There is a direct way to find the coefficients of the orthogonal projection onto Col
(Normal Equations)
Suppose
Then the system of linear equations
is always consistent. The equations in this system are called the normal equations.
Any solution
Before having a look at the proof consider the following example.
We compute the least squares solution of the linear system
The normal equations lead to the augmented matrix
This can be row reduced to the echelon form
The least squares solution can be read off from the last column in this matrix.
The error vector and the least squares error are found to be
This is slightly better than the ‘trial solution’ in Example 7.4.1,
where the norm of the error vector was found to be
In the proof properties of the orthogonal projection are combined in a clever way. As usual, feel free to skip it.
Proof of Theorem 7.4.1
As usual we denote the columns of the
From the section about orthogonal projections, we know that the orthogonal projection of
for certain constants
By the definition of the orthogonal projection we have that
This leads to
for the unknowns
These equations can be rewritten as
and further to
In matrix-vector form this becomes
Which leads to the following very concise form.
So, to find the least squares solution(s) of the linear system
If
If the columns
must have a unique solution.
There is another way to see this, which follows from the next proposition.
Suppose
Proof of Proposition 7.4.2
In fact, something stronger holds:
First if
To prove the converse, suppose
Then
Now realize that
So
The equivalence (7.4.4) implies: if
Prove the converse of Proposition 7.4.2.
For any
Solution to Exercise 7.4.1
Suppose that
has only the trivial solution
So suppose that
The assumption that
As stated, the least squares solution of a system
of
For a matrix
So for a matrix
We already knew how to find this projection if the columns are orthogonal. Using the normal equations we don’t need orthogonality.
Applying Theorem 7.4.1 to the earlier example (Example 7.4.2), shows that in the case of orthogonal vectors there is actually nothing new. This is illustrated in the next example.
Let us again find, but now using Theorem 7.4.1, the least squares solution of the linear system
The normal equations give the augmented matrix
Note that the orthogonality of the columns leads to a coefficient matrix
This (again) yields the least squares
solution
The previous example can be generalized as follows.
If the columns
of a vector
Since the columns are orthogonal, all products
Expressing the equation using inner products we find
Which leads to the good old expressions
As before (Theorem 7.2.2) the orthogonal projection becomes
Show that Formula (7.4.5) for a matrix
Also explain this simpler formula by interpreting the
Solution to Exercise 7.4.2
This involves some elementary matrix operations.
Suppose
Substitution of
gives
Using
The interpretation is as follows. The columns
can be written in a very concise way as
To conclude this section we will consider the situation where the matrix
Let us look at an example first.
We find the least squares solutions of the linear system
Note that the columns of
For the least squares solution we have to solve the system with the augmented matrix
The augmented matrix can be row reduced to
From this we can read off all least squares solutions. They are given by
For this low-dimensional problem we can draw a picture.
The least squares solutions are depicted as the line
We can analyse Example 7.4.5 from a higher perspective. In the general solution of the normal equations
the ‘homogeneous’ part
which is visualized in Figure 7.4.2.
Let us give one more example to illustrate matters.
We find the least squares solutions of the linear system
The first column of
For the least squares solution we have to solve the system with augmented matrix
The augmented matrix can be row reduced to
From this we can read off all least squares solutions. They are given by
As in Example 7.4.5 the ‘homogeneous’ part
For instance, by taking
For both we find the least squares error
7.4.4. Grasple Exercises#
About the interpretation of a least squares solution.
Show/Hide Content
About the uniqueness of a least squares solution.
Show/Hide Content
About least squares solutions and normal equations.
Show/Hide Content
Finding the LS solution for 3x2 systems in three steps.
Show/Hide Content
LS solution + LS error for a 3x2 system.
Show/Hide Content
LS solution + LS error for a 4x4 system.
Show/Hide Content
LS solution + LS error for a 4x3 system.
Show/Hide Content
On the connection between orthogonal projections and least squares problems.
Show/Hide Content
LS solution + LS error for a 3x2 system.
Show/Hide Content
LS solutions + LS error for a 4x3 system.
Show/Hide Content
Finding the LS solution for a 4x3 system (involving quite some reduction work)
Show/Hide Content
Finding the LS solution for a 4x3 system (with some tricky reduction work).
Show/Hide Content
What is the quickest way to find the least squares solution?
Show/Hide Content
7.4.5. Linear Models#
In Section 7.4.1 we looked at ways to fit a line
One way to define the best fitting line
Note that in these equations the parameters
This line is sometimes refered to as the least squares line.
Suppose we want to find the least squares line for the set of four points
At first sight the line through the first and the last point, i.e.,
seems a good candidate. The points and the ‘first guess’ are depicted in Figure 7.4.3
For this line the sum of the squares of the errors
which in this context are called residues, becomes
To find the least squares line we consider the four equations in the form
Since the matrix has linearly independent columns the normal equations, with augmented matrix
give a unique least squares solution, and it is
Figure 7.4.4 shows both lines.
For the line
This is indeed slightly better than with the line found ‘at first sight’, where the sum was equal to
We can even find a ready-made formula for the least squares line through
The coefficients of the least squares line
Introducing the notation
the coefficients get the following more concise form,
We will derive the formula by diligently working through the normal equations.
In matrix-vector form the linear system we want to solve reads
Noting that
and
This leads to the normal equations
If we multiply both sides of this equation by
we see the expressions (7.4.6) readily appearing.
Fitting a line to a set of points, may be looked upon as fitting a linear combination of the functions
For instance, when we use
And we may even go beyond that. Then we get a more general so-called linear model.
(Linear Model)
Suppose
The linear model refers to the ‘best’ linear combination, in the least squares sense, of the form
That is, the linear combination that minimizes
the sum of the squares of the errors/residues
The epithet linear refers to the fact that the parameters
The parameters
In practice the points
Several generalizations are possible. We mention two.
-
The input may be multivariate. The set of data points then has the form
(7.4.9)#
and we want to find the linear combination
that best fits these data.
For instance, to find the linear function
that best fits the data in (7.4.9), we can take
and , for .In a least squares model we then look for the parameters
that minimize(7.4.10)# -
In a weighted least squares model the terms in the sum (7.4.10) get different weights
. When building a statistical model this may be desirable when some data give more ‘information’.Then the expression we want to minimize is given by
This approach may be relevant if the variance of some of the observations that led to them seems smaller than the variance of other observations.
To illustrate matters we present two examples.
(Fitting a plane)
Suppose
of the functions
In the least square sense we would then have to solve the normal equations for the linear system
Suppose we have
One way to go about to find suitable parameters
The relation between
We can find the least squares linear fit to the points
for the points
If we have found the least squares parameters
Figure 7.4.5 shows what’s going on.
The first plot shows the points
7.4.6. Grasple Exercises (for Linear Models)#
On the precise definition of the least squares line.
Show/Hide Content
Design matrix and input vector for the regression line through a set of points.
Show/Hide Content
Fitting a line through set of points and compute the residual vector.
Show/Hide Content
Fitting a parabola to a set of points.
Show/Hide Content
Fitting a quadratic polynomial to a set of points.
Show/Hide Content
Design matrix to fit
Show/Hide Content
Design matrix to fit