Direct Methods for Solving Linear Systems of Equations

This section covers direct methods for solving linear systems of equations. Later, we'll also cover iterative methods; the distinction will be obvious once both types of methods are discussed.

Solving systems of linear equations

Recall the three basic rules for matrix manipulation from linear algebra:

Switching two rows or columns does not change the solution of the linear system.
Any row can be multiplied by a constant without changing the solution of the linear system.
Any row or linear multiple of a row can be added/subtracted to/from another row without changing the solution of the linear system.

Consider the system of equations:

$$3 x_1 + 8 x_2 = 10$$$$4 x_1 + 6 x_2 = 2$$

We can write in augmented matrix form:

$$ \begin{bmatrix} 3 & 8 & 10\\ 4 & 6 & 2 \end{bmatrix} $$

We can multiply the 1st row $\left(R_1\right)$ by $-\frac{4}{3}$ and add it to the 2nd row $\left(R_2\right)$, etc...

$$ \begin{bmatrix} 3 & 8 & 10 \\ 4 & 6 & 2 \end{bmatrix} \xrightarrow{-\frac{4}{3}R_1 \to R_1} \begin{bmatrix} -4 & -\frac{32}{3} & -\frac{40}{3} \\ 4 & 6 & 2 \end{bmatrix} \xrightarrow{R_1 + R_2 \to R_2} \begin{bmatrix} -4 & -\frac{32}{3} & -\frac{40}{3} \\ 0 & -\frac{14}{3} & -\frac{34}{3} \end{bmatrix} {\xrightarrow[-\frac{3}{14}R_2 \to R_2]{-\frac{1}{4}R_1 \to R_1}} \begin{bmatrix} 1 & \frac{8}{3} & \frac{10}{3} \\ 0 & 1 & \frac{17}{7} \end{bmatrix} $$

By inspection we can see that $x_2 = \frac{17}{7}$ we can then substitute this solution into a modified first equation

$$ x_1 + \frac{8}{3}x_2 = \frac{10}{3} $$

and solve for $x_1$. The final solution is $x_1 = -\frac{22}{7}$, $x_2 = \frac{17}{7}$.

This procedure is called Gaussian elimination.

Stability of Gaussian elimination

Recall that another way of writing a linear system of equations in matrix form is $\mathbf{A}\vec{x} = \vec{b}$, where $\mathbf{A}$ is the coefficient matrix, $\vec{x}$ is the vector of unknowns, and $\vec{b}$ is the right-hand side vector.

One method of determining if a linear system has a solution is to take the determinate of the $\mathbf{A}$ matrix. If the determinate of $\mathbf{A} = 0$ then the matrix is said to be singular and the system either has no solution or an infinite number of solutions.

A matrix that is near-singular is said to be ill-conditioned whereas one that is far from singular is said to be well-conditioned. There is a notion of condition number that mathematically quantifies a matrix's amenability to numeric computations with matrices having high condition numbers being well conditioned and matrices with low condition numbers being ill conditioned. We will return to condition numbers in more detail when we discuss singular value decomposition. Below is an example of two matrices $\mathbf{A}$ and $\mathbf{B}$. $\mathbf{A}$ is an ill-conditioned matrix, $\mathbf{B}$ is well-conditioned.

If we were to change the last entry of $\mathbf{A}$ to $a_{22} = 1$, it would be singular.

A = np.array([[1, 1], 
              [1, 1.0001]])

np.linalg.det(A)

9.99999999999889e-05

B = np.array([[0.0001, 1],
              [1     , 1]])

np.linalg.det(B)

-0.9999

To illustrate the ill-conditioning of $\mathbf{A}$ in another way let's consider two very close right-hand side vectors, $\vec{b}$:

\begin{align} x_1 &+ \phantom{1.0001}x_2 = 2 \\ x_1 &+ 1.0001x_2 = 2 \label{system1} \tag{1} \end{align}

and \begin{align} x_1 &+ \phantom{1.0001}x_2 = 2 \\ x_1 &+ 1.0001 x_2 = 2.0001 \label{system2} \tag{2} \end{align}

The system of equations (\ref{system1}) has the solution $x_1 = 2, x_2 = 0$, and the system of equations (\ref{system2}) has the solution $x_1 = x_2 = 1$. Notice that a change in the fifth digit of $\vec{b}$ was amplified to a change in the first digit of the solution. Even the most robust numerical method will have trouble with this sensitivity to small perturbations.

Now I would like to illustrate that even a well-conditioned matrix like $\mathbf{B}$ can be ruined by a poor algorithm. Consider the matrix $\mathbf{B}$ with the right-hand side vector, $\vec{b} = \left[1,2\right]$ and let's proceed with Gaussian elimination.

\begin{equation} \begin{bmatrix} 0.0001 & 1 & 1\\1&1&2\end{bmatrix} \xrightarrow{10000 R_1 + R_2 \to R_2} \begin{bmatrix} 0.0001&1&1\\0&-9999&-9998\end{bmatrix} \xrightarrow{-\frac{1}{9999}R_2 \to R_2} \begin{bmatrix} 0.0001&1&1\\0&1&0.9999\end{bmatrix} \end{equation}

by inspection we see that $x_2 = 0.9999$, now we can back substitute into the first equation \begin{equation} x_1+10000x_2 = 10000 \end{equation}

and solve for $x_1$. The final solution is $x_1 = 1, x_2 = 0.9999$. Now let us do the same computation, this time on a computer that rounds the results of the computations to 3 decimal places (a machine epsilon of 0.0001), the resulting matrix after manipulations would be

The leading non-zero entry of the row you are using to eliminate entries below is called the pivot. $0.0001$ is the pivot in this example.

\begin{equation} \begin{bmatrix} 0.0001 & 1 & 1 \\ 0 & 1 & 1 \end{bmatrix} \end{equation}

by inspection we see that $x_2 = 1$ and back substitution yields $x_1 = 0$. This is the incorrect solution brought on by the small pivot 0.0001. For this matrix Gaussian elimination is a poor algorithm. Fortunately there is an easy solution: exchange rows.

Now lets do the same computation on the same computer that rounds off the computation to 3 decimal places. This time we will exchange the rows before proceeding with the Gaussian elimination. Eliminating some detail we can see

$$ \begin{bmatrix}1&1&2\\0.0001&1&1\end{bmatrix}\xrightarrow{R_1 - 10000 R_2 \to R_2} \begin{bmatrix}1&1&2\\0&0.999&0.999\end{bmatrix}\xrightarrow{\frac{1}{0.999}R_2 \to R_2} \begin{bmatrix}1&1&2\\0&1&1\end{bmatrix} $$

by inspection we see that $x_2 = 1$, and back substitution yields $x_1 = 1$. We can see that this solution differs from the exact solution only by the roundoff error of the computer. Exchanging rows to obtain the largest possible pivot is called partial pivoting. Exchanging both rows and columns to obtain the largest possible pivot is called full pivoting. Full pivoting will result in the most stable algorithm but requires more computations and the requirement of tracking column permutations. For most matrices partial pivoting is sufficient enough for stability.

Pseudocode for Gaussian elimination with back substitution and partial pivoting

Consider a linear system with $n$ equations and $n$ unknowns:

\begin{align} E_1&: a_{11}x_1+a_{12}x_2+\ldots+a_{1n}x_n = a_{1,n+1}\\ E_2&: a_{21}x_1+a_{22}x_2+\ldots+a_{2n}x_n = a_{2,n_1}\\ & \vdots \\ E_n&: a_{n1}x_1+a_{n2}x_2+\ldots+a_{nn}x_n = a_{n,n+1} \end{align}

Pseudocode is a human readable computer algorithm description. It omits any language specific syntax in favor of clarity using general programming flow control and conditional statements.

Steps
1	For $i = 1$, ..., $n$ do Steps 2-4
2	$\phantom{--}$ Find $p$, where $p$ is the largest number with $i\leq p\leq n$
3	$\phantom{--}$ If $p \neq i$, then exchange row $i$ with row $p$
4	$\phantom{--}$ For $j=i+1, ..., n$ do Steps 5-6
5	$\phantom{----}$ set $m_{ji} = a_{ji}/a_{ii}$
6	$\phantom{----}$ Perform $E_j = ( E_j - m_{ji}E_i)$
7	Set $x_n = a_{n,n+1}/a_{nn}$
8	For $i = n-1, ..., 1$ do Step 9
9	$$ x_i=\left.\left(a_{i,n+1}-\sum_{j=i+1}a_{ij}x_j \right) \middle/ a_{ii} \right.$$

Python/NumPy implementation for Gaussian elimination with back substitution and partial pivoting

This implementation eliminates a few of the explicit loops described in the algorithm pseudocode by using NumPy broadcasting operations.

def gauss_solve(A, b):
    
    #Concontanate the matrix A and right hand side column 
    #vector b into one matrix
    temp_mat = np.c_[A, b]
    
    #Get the number of rows
    n = temp_mat.shape[0]
    
    #Loop over rows
    for i in range(n):
            
        #Find the pivot index by looking down the ith 
        #column from the ith row to find the maximum 
        #(in magnitude) entry.
        p = np.abs(temp_mat[i:, i]).argmax()
            
        #We have to reindex the pivot index to be the 
        #appropriate entry in the entire matrix, not 
        #just from the ith row down.
        p += i 
    
        #Swapping rows to make the maximal entry the 
        #pivot (if needed).
        if p != i:
            temp_mat[[p, i]] = temp_mat[[i, p]]
            
        #Eliminate all entries below the pivot
        factor = temp_mat[i+1:, i] / temp_mat[i, i]
        temp_mat[i+1:] -= factor[:, np.newaxis] * temp_mat[i]
                
    #Allocating space for the solution vector
    x = np.zeros_like(b, dtype=np.double);

    #Here we perform the back-substitution.  Initializing 
    #with the last row.
    x[-1] = temp_mat[-1,-1] / temp_mat[-1, -2]
    
    #Looping over rows in reverse (from the bottom up), starting with the second to
    #last row, because the last row solve was completed in the last step.
    for i in range(n-2, -1, -1):
        x[i] = (temp_mat[i,-1] - np.dot(temp_mat[i,i:-1], x[i:])) / temp_mat[i,i]
        
    return x

Gauss-Jordan elimination

Gauss-Jordan elimination simply adds steps to the simple Gauss elimination procedure to produce a matrix that is in reduced row echelon form. This is done by eliminating values both above and below the pivots and ensuring that each pivot has the value 1. Starting where we ended on the exact solution of matrix $\mathbf{B}$ earlier we can simply add two steps to produce a reduced row echelon matrix.

$$ \begin{bmatrix}0.0001&1&1\\0&1&0.9999\end{bmatrix}\xrightarrow{R_1 - R_2 \to R_1}\begin{bmatrix}0.0001&0&0.0001\\0&1&0.9999\end{bmatrix} $$

then we normalize the first equation to produce a 1 on the pivot:

$$ \begin{bmatrix}0.0001&0&0.0001\\0&1&0.9999\end{bmatrix}\xrightarrow{\frac{1}{0.0001}R_1 \to R_1} \begin{bmatrix}1&0&1\\0&1&0.9999\end{bmatrix} $$

now by inspection (and without the use of back substitution) we can see that the solution is $x_1 = 1, x_2 = 0.9999$ just as before.

For a linear system with $n$ equations and $m$ unknowns. The innermost loops (one subtraction and one multiplication) of a Gauss-Jordan elimination routine are executed $n^3$ and $n^2$ $m$ times. The corresponding loops in a Gaussian scheme are executed $\frac{1}{3}n^3$ and $\frac{1}{2}n^2m$ times, followed by a back substitution for a similar loop (one subtraction and one multiplication) that occurs $\frac{1}{2}n^2$ times. If there are far more equations that unknowns the Gaussian scheme with back substitution enjoys roughly a factor of 3 advantage over the Gauss-Jordan, for $n=m$ Gaussian elimination with back substitution enjoys a factor of 6 advantage over Gauss-Jordan. For matrix inversion (discussed later) the two methods have identical efficiencies.

Pseudocode for Gauss-Jordan elimination with partial pivoting

Consider a linear system with $n$ equations and $n$ unknowns:

\begin{align} E_1&: a_{11}x_1+a_{12}x_2+...+a_{1n}x_n = a_{1,n+1}\\ E_2&: a_{21}x_1+a_{22}x_2+...+a_{2n}x_n = a_{2,n_1}\\ & \vdots\\ E_n&: a_{n1}x_1+a_{n2}x_2+...+a_{nn}x_n = a_{n,n+1} \end{align}

Steps
1	For $i=1,...,n$ do Steps 2-4
2	$\phantom{--}$ Find $p$, where $p$ is the largest number with $i\leq p \leq n$
3	$\phantom{--}$ If $p \neq i$, then exchange row $i$ with row $p$
4	$\phantom{--}$ For $j=1,...,n$ do Steps 5-6
5	$\phantom{----}$ Perform $E_i = \frac{1}{a_{ii}}E_i$
6	$\phantom{----}$ If $i \neq j$ perform $E_j-a_{ji}E_i$
7	For $i = 1,...,n$ set $x_i = a_{i,n+1}$

Python/NumPy implementation for Gauss-Jordan elimination with partial pivoting

def gauss_jordan_solve(A, b):
    
    #Concontanate the matrix A and right hand side column 
    #vector b into one matrix
    temp_mat = np.c_[A, b]
    
    #Get the number of rows
    n = temp_mat.shape[0]
    
    #Loop over rows
    for i in range(n):
            
        #Find the pivot index by looking down the ith 
        #column from the ith row to find the maximum 
        #(in magnitude) entry.
        p = np.abs(temp_mat[i:, i]).argmax()
            
        #We have to reindex the pivot index to be the 
        #appropriate entry in the entire matrix, not 
        #just from the ith row down.
        p += i 
    
        #Swapping rows to make the maximal entry the 
        #pivot (if needed).
        if p != i:
            temp_mat[[p, i]] = temp_mat[[i, p]]
            
        #Make the diagonal entires 1
        temp_mat = temp_mat / np.diagonal(temp_mat)[:, np.newaxis]
        
        #Eliminate all entries above the pivot
        factor = temp_mat[:i, i] 
        temp_mat[:i] -= factor[:, np.newaxis] * temp_mat[i]
            
        #Eliminate all entries below the pivot
        factor = temp_mat[i+1:, i] 
        temp_mat[i+1:] -= factor[:, np.newaxis] * temp_mat[i]
    
    #If solution is a column vector, flatten it into a 1D array
    if temp_mat[:,n:].shape[1] == 1:
        return temp_mat[:,n:].flatten()
    else:
        return temp_mat[:,n:]

Matrix Inversion with Gauss-Jordan scheme

For matrix inversion, both the Gauss elimination with back-substitution and Gauss-Jordan schemes described previously have identical efficiencies. For this reason we will, for simplicities sake, only consider the process for matrix inversion using the Gauss-Jordan scheme.

All we have to do is augment an $\mathbf{A}$ matrix with an identity matrix of the same dimensions and proceed with Gauss-Jordan elimination exactly as we have done previously. Consider the following $\mathbf{A}$ matrix:

\begin{equation} \mathbf{A} = \begin{bmatrix}1&3&2\\2&4&5\\10&7&3\end{bmatrix} \end{equation}

we augment it with a $3\times3$ identity matrix $\mathbf{A}\vert\mathbf{I}$ and proceed with Gauss-Jordan Elimination.

$$ \begin{bmatrix} 1&3&2&1&0&0\\2&4&5&0&1&0\\10&7&3&0&0&1 \end{bmatrix} \rightarrow \begin{bmatrix} 1&0&0&-\frac{23}{57}&\frac{5}{57}&\frac{7}{57}\\0&1&0&\frac{44}{57}&-\frac{17}{57}&-\frac{1}{57}\\0&0&1&-\frac{26}{57}&-\frac{23}{57}&-\frac{2}{57} \end{bmatrix} $$

We can see by inspection the right side $3\times3$ submatrix on the right is the inverse of $\mathbf{A}$.

Python / NumPy implementation of matrix inverse with Gauss-Jordan algorithm

def gauss_jordan_inverse(A):
    #initialize b as an identity matrix the same size as A, and call
    #gauss_jordan_solve as before.
    return gauss_jordan_solve(A, np.eye(A.shape[0]))