2.1. Systems of Linear Equations#
2.1.1. Consistent and Inconsistent Linear Systems#
In Chapter 1 the question whether two lines or two planes intersect or do not intersect was touched upon. In the case of two planes the question can be resolved by finding equations for the planes and checking whether there are points that simultaneously satisfy these two equations. We can write this in the form
which we will call a system of two linear equations in three unknowns.
Note that we have adapted the names of the variables from \(x,y,z\) to \(x_1,x_2, x_3\). This notation is more convenient when we come across equations with more than three variables.
A linear equation is an equation that can be written in the form
The numbers \(a_1,a_2,\ldots,a_n\) are called the coefficients and the variables \(x_1,x_2, \ldots, x_n\) the unknowns. The term \(k\) is referred to as the constant term.
The equation
is a linear equation, as it can be rewritten as
By contrast, the equation
is not linear because of the nonlinear term \(2x_1x_2\).
A set of one or more linear equations is called a system of linear equations (or linear system, for short). In the case of \(m\) linear equations in the variables \(x_1,x_2,\ldots, x_n\) we speak of a system of \(m\) equations in \(n\) unknowns. The most general system then looks as follows:
Of course, if we have equations we want to solve them. Here is what we mean by that in a precise way.
A solution of a linear system is an ordered list of \(n\) values \((c_1, c_2, \ldots, c_n)\), or, depending on the context, a vector \(\begin{bmatrix}c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix}\), such that substitution of
into each of the equations yields true identities.
The solution set or general solution of a system is the set of all solutions.
Two solutions for the system of equations
are given by
or, equivalently, by
For instance, substitution of the second proposed solution yields
which are both true identities.
To check whether a vector is a solution of a linear system.
Show/Hide Content
The solution set may be empty, as in the following example.
The system
has no solutions because the first equation conflicts with the third equation: if \((c_1,c_2,c_3)\) is a triple that satisfies
then automatically
so \((c_1,c_2,c_3)\) cannot also be a solution of the third equation.
If the solution set of a linear system is empty, a system is said to be inconsistent. This concept and its opposite are sufficiently important to be properly defined.
A system of linear equations is consistent if it has at least one solution. Otherwise it is called inconsistent.
The simplest inconsistent system may well be the system with the one equation
As we will see later, this conflicting equation in a way pops up in any inconsistent system.
A consistent system of one equation in \(n\) unknowns is easily solved. Any unknown with a nonzero coefficient can be expressed in the other unknowns, and the other unknowns can be chosen freely. In words this may look more complicated than it is, as the following example illustrates.
Find all solutions of the equation in the variables \(x_1,\ldots,x_5\).
One way to denote the set of solutions:
By this we mean: if we assign arbitrary values to the variables \(x_2, x_3, x_4\) and \(x_5\), say
and put
then
so
is indeed a solution of the given equation.
However, this is not the only way to write down the general solution: in this example almost any set of four variables can act as free variables. The descriptions
and
are just as good. You might want to avoid fractions though. The only set of four variables that cannot act as free variables is the set \(\{x_1,x_2,x_3,x_5\}\).
The idea behind all methods to find the general solution of a linear system: rewrite the system in simpler forms, basically by eliminating variables from equations. We illustrate this by an example.
We solve the system
First method: from the first equation it follows that
From this it follows that
Substitution of the expression for \(x_1\) into the second equation yields
an equation with \(x_2\) as single unknown (the jargon: \(x_1\) has been eliminated), and then
and finally
Thus we have found that there is a unique solution:
There is nothing wrong with this method, but with more than two equations it has the tendency to become messy.
Second method: take clever combinations of the equations to eliminate variables. For the above example we may for instance subtract the first equation twice from the second. Think a moment why this is okay. It is the crucial step in the elimination method we will explain in the next subsection.
Again we see that \(x_2 = 2\), in fact the equation \(3x_2 = 6\) is the same equation that we arrived at with the substitution method above. Substitution of this into the first equation again yields
2.1.2. Solving a Linear System by Elimination#
We start with an example of three equations in three unknowns.
We can simplify this system by successively eliminating unknowns from equations by combining equations in a clever way. We can for instance eliminate the variable \(x_1\) from the second equation by subtracting the first equation three times from the second equation. Likewise we can subtract the first equation twice from the third equation:
With the arrow we express that if we have a solution of the system on the left, this will also be a solution of the system on the right.
Now, why is this okay, why is it allowed to ‘subtract equations’? Let us introduce the shorthand notation
for the expressions on the left sides of the first two equations. Then the given equations are:
It then follows that we must have
which yields
The last equation is exactly the second equation of the second system.
The crucial thing to note is that these operations can be undone. If in the second system the first equation is added three times to the second equation and added twice to the third equation we end up with the original system. So in fact we have
The implication works two ways, which we can write as follows.
So any triple of values \((c_1,c_2,c_3)\) that satisfies the first system satisfies the second system and vice versa. Systems that have the same set of solutions are called equivalent.
Two systems of linear equations are called equivalent if they have the same set of solutions.
By the same line of reasoning as in the above example we can deduce that adding an arbitrary multiple of any equation to another equation does not change the solution set of the system. Of course if we multiply an equation with some nonzero constant, the solution set also remains invariant. This operation is called scaling. For the system at hand we could, as a next step, scale the second equation with a factor \(-\frac12\). The following proposition summarizes the suitable operations to adapt a system of equations.
The following operations applied to a linear system always yield an equivalent system
-
Adding a multiple of an equation to another equation.
-
Scaling an equation with a nonzero scaling factor \(c\).
-
Changing the order of the equations.
Proof of Proposition 2.1.1
The correctness of the first operation is illustrated in
Example 2.1.7. One example is by far not a proof, but the explanation given there can be generalized/formalized.
The other two statements are rather obvious.
We take up Example 2.1.7 at the point where we left it and work our way to its solution. Also we will introduce a notation that makes it easier for the reader to see what’s going on. And also for yourself, in case you look back at your computations later, e.g., if you want to check your computations. The ‘\(E\)’ stands for: ‘Equation’.
We scale the second equation with a factor \(-\frac12\)
and then subtract the second equation four times from the third:
From the third equation it follows that
and then we can work upwards to find that
and finally from the first equation it follows that
Conclusion: the system has the unique solution
Problem solved.
The last few steps, working our way up from the last equation to successively find \(x_3\), \(x_2\), \(x_1\), is referred to as backward substitution.
Consider the following system of equations
Making use of the notation introduced in the previous example we simplify the system:
From the last equation it immediately follows that there are no solutions, in other words, the system is inconsistent.
Let us look at one more example. Here we will see how to find a solution that contains a free variable.
We find the general solution of the linear system
Using the shorthand notation just introduced the system can be simplified as follows: we first interchange the first and the third equation to have a first equation where the coefficient of \(x_1\) is equal to 1. That way we avoid fractions in at least the first elimination step.
Again we can find the solution by back-substitution:
the third equation can be rewritten as
Via the second equation we can express \(x_2\) as a function of \(x_4\)
And then it follows from the first equation that
So the solution can be written as
Note that the row swap that we used as a first step is not really necessary. However, this way we avoided having to work with non-integer multiples of rows.
We summarize the elimination method in a
Summary
Any linear system in the variables \(x_1,\ldots, x_n\) can be solved as follows:
-
Using the operations of Proposition 2.1.1 the system can be simplified to an equivalent linear system with the following property: in each equation at least one more of the first unknowns has a coefficient 0 than in the previous equation. If an unknown has a coefficient 0 we say that the unknown has been eliminated.
-
If an equation
\[ 0x_1 + 0x_2 + \cdots + 0x_n = b, \]with \(b\neq 0\) pops up, the system is inconsistent.
-
If no such equation appears, the general solution can be found by backward substitution: starting from the last equation, we work our way upwards.
In theory the method works for any linear system, however large, though with pen and paper it soon becomes cumbersome. In the next subsection we will use an appropriate representation of a linear system to solve it in a more efficient way. And we will also see how the procedure of back-substitution can be incorporated in the elimination process.
2.1.3. Augmented Matrices#
We will introduce a convenient shorthand notation for linear systems. This notation contains the essentials of the system in a structured way.
Before that, we define the concept of one of the most basic building blocks in linear algebra: the matrix.
An \(m \times n\) matrix \(A\) is a rectangular array of numbers \(a_{ij}\), \(1\leq i \leq m\), \(1 \leq j \leq n\).
It consists of \(m\) horizontal rows of size \(n\), or, equivalently, of \(n\) vertical columns of size \(m\).
In a statement about a matrix the first index always refers to the row(s), the second index to the column(s). E.g., \(a_{ij}\) is the number in the \(i\)-th row and the \(j\)-th column and an \(m \times n\) matrix has \(m\) rows and \(n\) columns.
A matrix is usually surrounded by parentheses or (square) brackets. We opt for brackets.
The matrix
is a \(3\times 5\) matrix.
Its second row is \(\begin{bmatrix} 2 & 7 & -1 & 0 & 8 \end{bmatrix}\) and its third column is \( \left[ \begin{array}{c} 3 \\ -1 \\ 5 \end{array}\right]. \)
Matrices play an important role in linear algebra. In this section we will use them as concise representations of linear systems, in which the computations involved to solve a system can be done quite systematically.
The augmented matrix for a system of equations
is the matrix
The part before the vertical bar, i.e.
is called the coefficient matrix of the system. The column behind the bar contains the constant terms.
The augmented matrix is nothing more than an abbreviation for a system of equations. With the vertical bar we want to indicate that the last column plays a special role, namely, it contains the constants on the right sides of the equations. If we denote these terms by the vector
the augmented matrix can be written as
To conclude this subsection we will reconsider the earlier example of a system of three equations in three unknowns
We will apply the same simplifications to the system as before. Parallel to this we adapt the augmented matrix accordingly, using a notation that speaks for itself. \(R\) stands for ‘row’.
As we have seen before, the solution can now be found by backward substitution.
The right moment to start this backward substitution is when the augmented matrix has been simplified to so-called echelon form.
2.1.4. Row Reduction and Echelon Forms#
In Section 2.1.2 we have solved linear systems by eliminating variables from equations. It would be nice to have a clear mark where we can stop rewriting the given system, to forestall ending up in a never ending loop. When we use the notation of an augmented matrix we can identify such a mark. We first need a few more definitions.
A matrix is in row echelon form if it has the following two properties:
-
All non-zero rows are above all rows that contain only zeros.
-
Each non-zero row that is not the last row starts with fewer zeros than the rows below it.
Such a matrix is also called a row echelon matrix.
The following three matrices are meant to visualize the structure of an echelon matrix. The symbol \(\blacksquare\) denotes an arbitrary nonzero number and \(\ast\) just any real number.
In a similar manner we can define the concept of a column echelon matrix. However, since we will only consider row echelon matrices we will not do this. In the sequel we will drop the epithet ‘row’ and simply speak of echelon form and echelon matrix.
A pivot of a row in an echelon matrix is the first nonzero element (the so-called leading entry) of that row.
The following three matrices are in echelon form:
The following two matrices are not in echelon form
Namely, in matrix \(A_4\) the second row is a non-zero row that is below the all-zero first row. And in matrix \(A_5\) the third row does not start with more zeros than the second row.
To check whether a matrix is in echelon form.
Show/Hide Content
Here are the three echelon matrices again, with boxes around their pivots:
The third and the fourth row of the second matrix do not have pivots.
In practice the pivots are the coefficients in the equations of a system that are used to eliminate variables from other equations. In the context of augmented matrices: they are the entries used to create zeros in the column below that entry.
Note that from the second condition in Definition 2.1.8 it follows that automatically all entries in the column below a pivot must be 0.
Now have a look again at the derivation at the end of the previous subsection. We worked our way downwards through the rows to create zeros in the first columns, while keeping in mind that we did not change the solution set of the corresponding linear system. The process is called row reduction. The end point, from which we could start building the solution by backward substitution, was an augmented matrix in echelon form!
The elementary row operations that one can apply to a matrix are
-
Adding a multiple of a row to another row.
-
Multiplying a row by a non-zero number. This is also referred to as scaling.
-
Interchanging (or: swapping) two rows.
Note that these row operations match exactly the operations of Proposition 2.1.1. This proposition now states that the row operations do not change the solutions of the corresponding linear system.
Matrices that can be transformed into each other via row operations are called row equivalent. If two matrices \(A\) and \(B\) are row equivalent we denote this by \(A \sim B\).
If two augmented matrices are row equivalent it means that the linear systems they represent are equivalent (i.e., have the same solution set).
Above we applied row operations to an augmented matrix, to work our way to the solution of a system of equations. In fact we simplified the system and the matrix along parallel paths. From now on we will simplify a system by working almost always with the corresponding augmented matrix.
In later chapters we will also apply row reduction to matrices in other contexts, i.c. for other purposes.
To perform row operations on a matrix.
Show/Hide Content
We will row reduce the matrix
to a matrix \(E\) in echelon form:
Here a row swap was essential to bring the matrix into echelon form. Sometimes a row swap may just be convenient to simplify the computations. Note that we have also introduced a notation for a row swap. We stress again that it is good practice to use a notation like this when you do a row reduction process yourself. To speed up the process it may be preferable to combine row operations that do not interfere. In this example the second and the third step both involved adding multiples of the first row to the other rows. This can be done simultaneously:
Any matrix is row equivalent to an echelon matrix.
We will not give a formal proof.
The idea that a matrix can be reduced to an echelon matrix is as follows: just start from the top left and work downwards.
If \(a_{11}\) is not 0, that will be the first pivot. We can use it to make all the other elements in the first column 0.
We then get
From then on we will leave the first row as it is.
If \(a_{11} = 0\), then we try to find a non-zero element in the first column. If this is the element \(a_{i1}\), then we can start by interchanging the first and the \(i\)-th row. After this row swap we use the new first row to create zeros in the first column.
If the first column happens to consist of zeros only, we skip it and start from the first non-zero column.
We continue with the part of the matrix below and to the right of the first pivot, i.e.,
And so on, until we get to the last row, or until we get to a row below which all rows only contain zeros.
The echelon matrix to which a matrix can be reduced is in no way unique. For instance, by scaling a row in an echelon matrix the echelon form persists. We can go a bit further, namely we can create zeros in the columns above the pivots as well. The following example shows how. First we work downwards to the echelon form and then work upwards to create the extra zeros, as mentioned.
The matrix
is row equivalent to all of the following echelon matrices:
Or, using the notation for the row operations:
There are three important observations regarding this example.
Apart from the second step, where two rows were scaled, in each step one pivot was used to make all elements right above and right below it equal to 0. In this way we move forward all the time to a matrix with more and more zeros in a structured way.
The last matrix can really be seen as a natural end point of the reduction process.
-
The pivots are all 1, the simplest non-zero number.
-
If we try to create more zeros, we can only do so in the fourth column. But then we we will lose one or more of the zeros in the first three columns.
The third remark is the most important one, keeping in mind the goal of this section: solving linear systems.
If the matrix \(M\) were actually an augmented matrix for a system, in which we’d better have written
then the linear system corresponding to the final echelon matrix
is given by
which is in fact the solution!
This natural end point of the row reduction process has got his own name.
A reduced echelon matrix or matrix in reduced echelon form is an echelon matrix with the extra properties
-
All pivots are 1.
-
In a column with a pivot all other elements are 0.
Of the matrices
the first and the third are echelon matrices and only the third is a reduced echelon matrix.
Checking whether a matrix is in (reduced) echelon form.
Show/Hide Content
The big advantage of reduced echelon matrices, already hinted at in Remark 2.1.7, is the following:
If the augmented matrix of a linear system is in reduced echelon from, the solution of the system is found by expressing the variables corresponding to the pivots in terms of the other variables. These other variables can be assigned arbitrary values.
In the solution as constructed according to the previous proposition the pivot variables are called the basic variables. The other variables are called free variables.
We find the solution of the linear system with the following augmented matrix, which is already in row reduced echelon:
We go back to the corresponding system and bring the non-pivot variables \(x_3\) and \(x_5\) to the right:
and we add: ‘\(x_3\) and \(x_5\) are free’.
The row reduction of the augmented matrix to echelon form corresponds to the forward substitution as in Example 2.1.6 and Example 2.1.10. There we found the solution by backward substitution. When the augmented matrix is reduced to reduced echelon form we have in fact incorporated this backward substitution part and can write down the general solution directly. We think that to solve a system ‘with pen and paper’, working with the reduced echelon matrix is less error prone. This holds in particular for a system where the solution contains one or more free variables.
Any matrix is row equivalent to a reduced echelon matrix. Moreover, this last matrix is unique.
Again we give no formal proof. In the previous subsection we showed, also informally, that any matrix can be reduced to a matrix in echelon form.
In this echelon matrix we may divide each row by its pivot (first nonzero element).
And lastly ‘working upwards’ step by step we use each pivot – which we made equal to 1 – to create zeros in all positions above it.
(The last two simplifications may be done in reversed order: first use the pivots to create zeros in the positions above them and then scale the rows.) This reasoning supports the validity of the first statement. The uniqueness is harder to show in an intuitive way, and it is definitely harder to prove rigorously.
Let us further simplify the echelon matrix
to reduced echelon form.
step 1: use the pivot in the third row to create zeros above it;
step 2: use the pivot in the second row to create a zero above it;
step 3: scale all rows:
Instead of a formal proof of the uniqueness of the row reduced echelon form of a matrix, we illustrate this uniqueness with one example.
We will find the row reduced echelon form of the matrix
via two different routes.
Route 1: Use the top left entry \(a_{11} = 2\) as a first pivot. An auxiliary step, to avoid fractions, is to scale the second row with a factor 2:
Alternatively, we may start with a row swap:
the same outcome as before.
The following algorithm summarizes the solution method for a linear system.
Elimination method to solve a linear system.
Any system of linear equations can be solved as follows.
-
Write down the augmented matrix corresponding to the system.
-
Row reduce the augmented matrix to reduced echelon form.
-
If there is a pivot in the last column (the column ‘behind the bar’), the system is inconsistent.
-
If the last column does not contain a pivot, write down the corresponding system of equations and express the variables in the pivot columns into the other variables (if any). These other variables are free variables.
The word ‘elimination’ refers to the fact that the zeros that are created in the augmented matrix correspond to the elimination of variables from the equations.
Applying the algorithm to compute a solution of a linear system.
Show/Hide Content
(Row reduced echelon matrix versus back-substitution)
We started this section by solving a linear system by reducing it to an equivalent system in echelon form and then use back-substitution. This is still a viable option. The advantage of the method described in Algorithm 2.1.1 is that it avoids the clutter back-substitution may lead to in the case of free variables.
The following important general statement about the solution set of linear systems has already been lurking in the background all the time.
A system of linear equations has either zero, or one, or infinitely many solutions. In the case when there is exactly one solution we speak of a unique solution.
Proof of Theorem 2.1.2
This just depends on the outcome of the elimination method (i.e. Algorithm 2.1.1). If iii. occurs, the number of solutions is zero; if iv. occurs and there are no free variables, there is just one solution. Lastly, if there is at least one free variable, the solution set automatically contains infinitely many solutions.
Note that to answer the question which of the three cases – zero solutions, a unique solution or infinitely many solutions – holds, it suffices to reduce the augmented matrix to just any echelon form. From this echelon form it can already be decided whether the system is consistent, and if it is, whether there are free variables.
We want to find out whether the linear system
has zero, exactly one, or infinitely many solutions.
We row reduce the augmented matrix just as far as necessary:
From the echelon matrix at the end we can see that the system is consistent and there will be a free variable. We can conclude that the system has infinitely many solutions.
This example stresses that the conclusion whether a linear system has zero, one or infinitely many solutions, is basically a matter of the structure of an echelon matrix that is equivalent to the augmented matrix of the system.
Suppose the augmented matrices of three linear systems can be row reduced to the following matrices
and
where \(\blacksquare\) denotes an arbitrary nonzero number, and \(\ast\) just any real number.
Then the first system has a unique solution, the second system has infinitely many solutions, the third system is inconsistent.
Conclusions about the solutions from the structure of the echelon form.
Show/Hide Content
A linear system of \(m\) equations in \(n\) unknowns can only have a unique solution if \(m \geq n\), i.e. if the number of unknowns is at most equal to the number of equations.
Proof of Proposition 2.1.4
Let
be the augmented matrix of the system, and
an equivalent echelon matrix. Here \(E\) is an \(m\times n\) echelon matrix. Since the pivots are in different rows, there are at most \(m\) pivots.
If \(m < n\), there must be at least one column without a pivot. This implies that either the system is inconsistent (zero solutions) or the system has a solution with at least one free variable (infinitely many solutions). And we have shown that a unique solution is impossible for a system of \(m\) equations in \(n\) unknowns with \(m < n\).
For geometric interpretation of the last proposition, suppose \(n = 3\).
The solution set of a linear equation
can be seen as a plane in \(\mathbb{R}^3\). The previous proposition tells us that the intersection of \(m\) planes in \(\mathbb{R}^3\), where \(m < 3\), cannot be a single point.
2.1.5. Grasple Exercises#
The first exercises are quite straightfordwardly computational. The remaining exercises tend to be a bit more theoretic.
Solving a linear system of 2 equations in 2 unknowns.
Show/Hide Content
To determine the number of sulutions of a \(2\times 2\) linear system.
Show/Hide Content
Identifying the size of a linear system.
Show/Hide Content
\(3 \times 3\) linear system ‘in triangular form’.
Show/Hide Content
To solve a \(3 \times 3\) linear system.
Show/Hide Content
To solve a \(3 \times 4\) linear system.
Show/Hide Content
Two \(3\times 3\) systems with the same coefficient matrix.
Show/Hide Content
To write down the augmented matrix of a linear system.
Show/Hide Content
To check whether a linear system is consistent.
Show/Hide Content
To check whether a linear system is consistent.
Show/Hide Content
To recognize row operations.
Show/Hide Content
To find the row reduced echelon form.
Show/Hide Content
To find the row reduced echelon form.
Show/Hide Content
To find the row reduced echelon form.
Show/Hide Content
To find the row reduced echelon form.
Show/Hide Content
Row reduced echelon form of a \(3 \times 5\) matrix.
Show/Hide Content
Solving a linear system using the augmented matrix.
Show/Hide Content
Solving a linear system using the augmented matrix.
Show/Hide Content
The remaining exercises are a bit more theoretical.
To solve a linear system by geometric considerations.
Show/Hide Content
How many pivots can an \(m\times n\) matrix have?
Show/Hide Content
To determine which variables can be taken as free variables.
Show/Hide Content
Four exercises about linear systems with a parameter.
A \(2\times 2\) linear system with a parameter \(h\).
Show/Hide Content
A \(2\times 2\) linear system with a parameter \(h\).
Show/Hide Content
Yet another \(2\times 2\) linear system with a parameter \(h\).
Show/Hide Content
Fourth and last \(2\times 2\) linear system with a parameter \(h\).
Show/Hide Content
Three exercises about linear systems and pivots.
Linear systems and pivots.
Show/Hide Content
Linear systems and pivots.
Show/Hide Content
Linear systems and pivots.
Show/Hide Content
How many ‘different’ echelon forms are there?
Show/Hide Content
To determine (in)consistency without computations (‘by inspection’).
Show/Hide Content
Freedom of free variables?
Show/Hide Content
To check whether matrices are (row) equivalent.