Iterative Methods for Solving Linear Systems of Equations

Iterative techniques are rarely used for solving linear systems of small dimension because the computation time required for convergence usually exceeds that required for direct methods such as Gaussian elimination. However, for very large systems, especially sparse systems (systems with a high percentage of 0 entries in the matrix), these iterative techniques can be very efficient in terms of computational run times and memory usage.

An iterative technique starts to solve the matrix equation $\mathbf{A}\vec{x} = \vec{b}$ starts with an initial approximation $\vec{x^0}$ and generates a sequence of vectors $\{\vec{x}^1,\vec{x}^2,\ldots, \vec{x}^N\}$ that converges to $\vec{x}$ as $N\rightarrow\infty$. These techniques involve a process that converts the system $\mathbf{A}\vec{x}=\vec{b}$ to an equivalent system of the form $\vec{x}=\mathbf{T}\vec{x}+\vec{c}$. The process then follows, for an initial guess $\vec{x}^0$

\begin{align} \vec{x}^1 &= \mathbf{T}\vec{x}^0 + \vec{c}\\ \vec{x}^2 &= \mathbf{T}\vec{x}^1 + \vec{c}\\ &\phantom{=}\vdots\\ \vec{x}^N &= \mathbf{T}\vec{x}^{N-1} + \vec{c}\\ \end{align}

We will stop the iteration when some convergence criterion has been reached. A popular convergence criterion uses the $L_{\infty}$norm. The $L_{\infty}$ norm is a metric that represents the greatest length or size of a vector or matrix component. Expressed mathematically,

\begin{equation} \vec{x} = [x_1, x_2, ..., x_n]^T \end{equation}\begin{equation} \Vert \vec{x}\Vert_{L_\infty} = \max \vert x_i\vert \quad \forall \quad i = 1, 2, ..., n \end{equation}

An example using an iterative method

Consider the system:

$$ \begin{matrix} E_1: & 10x_1 & -x_2 & +2x_3 & & =& 6\\E_2: & -x_1 & +11x_2 & -x_3 & +3x_4 & =&25\\E_3: & 2x_1 & -x_2 & + 10x_3 & -x_4 & =&-11\\E_4: & & -3x_2 & -x_3 & +8x_4 & = &15\end{matrix} $$

Let us solve each equation, $E_j$, for the variable $x_j$.

$$ \begin{matrix} E_1: & x_1 =& &\frac{1}{10}x_2& - \frac{1}{5}x_3 & & +\frac{3}{5}\\ E_2: & x_2 = &\frac{1}{11}x_1 & &\frac{1}{11}x_3 & -\frac{3}{11}x_4 & +\frac{25}{11}\\ E_3: & x_3 = &-\frac{1}{5}x_1 & +\frac{1}{10}x_2 & & +\frac{1}{10}x_4 & -\frac{11}{10}\\ E_4: & x_4 = & & -\frac{3}{8}x_2 & + \frac{1}{8}x_3 & & +\frac{15}{8} \end{matrix} $$

using the notation introduced previously we have

$$ \vec{x}^1 = \begin{bmatrix} 0 & \frac{1}{10} & -\frac{1}{5} & 0\\ \frac{1}{11} & 0 & \frac{1}{11} & -\frac{3}{11}\\ -\frac{1}{5} & \frac{1}{10} & 0 & \frac{1}{10}\\ 0 & -\frac{3}{8} & \frac{1}{8} & 0 \end{bmatrix}\vec{x}_0 + \begin{bmatrix}\frac{3}{5}\\\frac{25}{11}\\-\frac{11}{10}\\\frac{15}{8}\end{bmatrix},$$

for

$$\vec{x}^0 = \begin{bmatrix}0\\0\\0\\0\end{bmatrix}, \qquad \vec{x}^1 = \begin{bmatrix}0.6000\\2.2727\\-1.1000\\1.8750\end{bmatrix}$$

we repeat this process until the desired convergence has been reached. This technique is called the Jacobi iterative method.

Psuedocode for Jacobi iteration

For the matrix equation $\mathbf{A} \vec{x} = \vec{b}$ with an initial guess $\vec{x}^0$.

\begin{equation} A = \begin{bmatrix}a_{11}&a_{12}&...&a_{1n}\\a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\a_{n1}&...&...&a_{nn}\end{bmatrix}, \quad \vec{b} = \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{n}\end{bmatrix}, \quad \vec{x}^{0} = \begin{bmatrix} x^0_{1} \\ x^0_{2} \\ \vdots \\ x^0_{n}\end{bmatrix}, \end{equation}

Steps
1.	While $\frac{\Vert \vec{x}_k - \vec{x}_{k-1} \Vert_{L_\infty}}{\Vert \vec{x} \Vert_{L_\infty}}> tolerance$ do Step 2
2.	$\phantom{--}$ For $i = 1, 2, \ldots, n$ do Step 3
3.	$$\phantom{----} x^k_i = \left.\left( -\sum_{\substack{j=1 \\ i \ne j}}^n a_{ij} x^k_j + b_i\right) \middle/ a_{ii} \right.$$

Python/NumPy implementation of Jacobi iteration

def jacobi(A, b, tolerance=1e-10, max_iterations=10000):
    
    x = np.zeros_like(b, dtype=np.double)
    
    T = A - np.diag(np.diagonal(A))
    
    for k in range(max_iterations):
        
        x_old  = x.copy()
        
        x[:] = (b - np.dot(T, x)) / np.diagonal(A)
        
        if np.linalg.norm(x - x_old, ord=np.inf) / np.linalg.norm(x, ord=np.inf) < tolerance:
            break
            
    return x

Observations on the Jacobi iterative method

Let's consider a matrix $\mathbf{A}$, in which we split into three matrices, $\mathbf{D}$, $\mathbf{U}$, $\mathbf{L}$, where these matrices are diagonal, upper triangular, and lower triangular respectively.

$$ A = \begin{bmatrix}a_{11}&a_{12}&\ldots&a_{1n}\\a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\a_{n1}&\ldots&\ldots&a_{nn}\end{bmatrix}, D = \begin{bmatrix}a_{11}&0&\ldots&0\\0&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\0&\ldots&\ldots&a_{nn}\end{bmatrix}, \\ L = \begin{bmatrix}0&0&\ldots&0\\-a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\-a_{n1}&\ldots&-a_{n(n-1)}&0\end{bmatrix}, U = \begin{bmatrix}0&-a_{12}&\ldots&-a_{1n}\\\vdots&\ddots& &\vdots\\\vdots& &\ddots&-a_{(n-1)n}\\0&\ldots&\ldots&0\end{bmatrix} $$

If we let $\mathbf{A = D-L-U}$, then the matrix equation $\mathbf{A}\vec{x} = \vec{b}$ becomes

\begin{equation} \left(\mathbf{D-L-U}\right)\vec{x} = \vec{b} \quad \mathrm{or} \quad \mathbf{D}\vec{x} = \left(\mathbf{L+U}\right)\vec{x}+\vec{b} \end{equation}

if $\mathbf{D}^{-1}$ exists, that implies $a_{jj} \neq 0$, then

$$\vec{x} = \mathbf{D}^{-1}\left(\mathbf{L+U}\right)\vec{x}+\mathbf{D}^{-1}\vec{b}$$

The results in the matrix form of the Jacobi iteration method

\begin{equation} \vec{x}^k = \mathbf{D}^{-1}\left(\mathbf{L+U}\right)\vec{x}^{k-1}+\mathbf{D}^{-1}\vec{b} \end{equation}

We can see that one requirement for the Jacobi iteration to work is for $a_{ii} \neq 0$. This may involve row exchanges before iterating for some linear systems.

Gauss-Seidel iteration: An improvement to the Jacobi iterative method

During the Jacobi iteration we always use the components of $\vec{x}^{k-1}$ to compute $\vec{x}^{k}$ but for $i > 1, {x}^{k}_1, \ldots, {x}^{k}_{i-1}$ are already computed and are most likely the best approximations of the real solution. Therefore, we can calculate ${x}^{k}_i$ using the most recently calculated values when available. This technique is called Gauss-Seidel iteration. The pseudocode is as follows

Pseudocode for Gauss-Seidel iteration

For the matrix equation $\mathbf{A} \vec{x} = \vec{b}$ with an initial guess $\vec{x}^0$.

$$ A=\begin{bmatrix}a_{11}&a_{12}&\ldots&a_{1n}\\a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\a_{n1}&\ldots&\ldots&a_{nn}\end{bmatrix}, \bar{b} = \begin{bmatrix}b_1\\b_1\\\vdots\\b_n\end{bmatrix}, \bar{x}_0 = \begin{bmatrix}x^{0}_1\\x^{0}_2\\\vdots\\x^{0}_n\end{bmatrix} $$

Steps
1.	While $\frac{\Vert\vec{x}^{k}-\vec{x}^{k-1}\Vert_{L_\infty}}{\Vert\bar{x}^{k}\Vert_{L_\infty}} > tolerance$ do Step 2
2.	$\phantom{--}$ For $i = 1, 2, ..., n$ do Step 3
3.	$$\phantom{----} x^{k}_i = \left.\left(-\sum_{j=1}^{i-1}a_{ij}x^{k}_j-\sum_{j=i+1}^{n}a_{ij}x^{k-1}_j+b_i\right) \middle/ {a_{ii}}\right.$$

Python/NumPy implementation of Gauss-Seidel iteration

def gauss_seidel(A, b, tolerance=1e-10, max_iterations=10000):
    
    x = np.zeros_like(b, dtype=np.double)
    
    #Iterate
    for k in range(max_iterations):
        
        x_old  = x.copy()
        
        #Loop over rows
        for i in range(A.shape[0]):
            x[i] = (b[i] - np.dot(A[i,:i], x[:i]) - np.dot(A[i,(i+1):], x_old[(i+1):])) / A[i ,i]
            
        #Stop condition 
        if np.linalg.norm(x - x_old, ord=np.inf) / np.linalg.norm(x, ord=np.inf) < tolerance:
            break
            
    return x

Observations on the Gauss-Seidel iterative method

Let's consider a matrix $\mathbf{A}$, in which we split into three matrices, $\mathbf{D, U, L}$, where these matrices are diagonal, upper triangular, and lower triangular respectively.

We will leave, as an exercise for the student, the derivation, but the matrix equation for the Gauss-Seidel iteration method is as follows:

$$\vec{x}^k = \left(\mathbf{D-L}\right)^{-1}\mathbf{U}\vec{x}^{k-1}+\left(\mathbf{D-L}\right)^{-1}\vec{b}$$

In order for the lower triangular matrix $\mathbf{D-L}$ to be invertible it is necessary and sufficient for $a_{ii}\neq 0$. As before, this may involve row exchanges before iterating for some linear systems.

Convergence of iterative methods

First, a definition, the spectral radius, $\rho$, of matrix $\mathbf{A}$ is the maximum of the absolute values of the matrix $\mathbf{A}$'s eigenvalues.

$$\rho\left(\mathbf{A}\right) = \max\vert\lambda_i\vert$$

where the $\lambda_i$'s are the eigenvalues of $\mathbf{A}$.

Now, if $\rho\left(\mathbf{A}\right) < 1$, then $(\mathbf{I-A})^{-1}$ exists, and

$$(\mathbf{I-A})^{-1} = \mathbf{I + A + A}^2 + \ldots = \sum_{j=0}^{\infty}\mathbf{A}^j$$

We can prove this, starting with the eigenvalue equation

$$\mathbf{A}\vec{x} = \lambda \vec{x} \rightarrow (\mathbf{I-A})\vec{x} = (1-\lambda)\vec{x}$$

$\lambda$ is an eigenvalue of $\mathbf{A}$, exactly when $(1-\lambda)$ is an eigenvalue of $\mathbf{I-A}$. But $\vert\lambda\vert \leq \rho\left(\mathbf{A}\right) < 1$, therefore $1$ cannot be an eigenvalue of $\mathbf{A}$, and $0$ cannot be an eigenvalue of $\mathbf{I-A}$. A matrix in which none of the eigenvalues are zero is always invertible, therefore $\left(\mathbf{I-A}\right)^{-1}$ exists. There also exists an identity which states that a matrix $\mathbf{A}$ is convergent if $\rho\left(\mathbf{A}\right) < 1$, which implies that $\lim_{n\to\infty}\mathbf{A}^n\vec{x} = 0$ for all $\vec{x}$. Now, let

$$\mathbf{S}_m = \mathbf{I} + \mathbf{A} + \mathbf{A}^2 + \ldots + \mathbf{A}^m$$

Then,

$$\left(\mathbf{I-A}\right)\mathbf{S}_m = \left(1+\mathbf{A+A}^2+\ldots + \mathbf{A}^m)-(\mathbf{A+A}^2+\ldots+\mathbf{A}^{m+1}\right)=\left(\mathbf{I-A}^{m+1}\right).$$

Since $\mathbf{A}$ is convergent, we can see that by taking the limit of both sides

$$\lim_{m\to\infty}\left(\mathbf{I-A}\right)\mathbf{S}_m=\lim_{m\to\infty}\left(\mathbf{I-A}^{m+1}\right)=\mathbf{I},$$

therefore

$$\left(\mathbf{I-A}\right)^{-1} = \lim_{m\to\infty}\mathbf{S}_m = \sum_{j=0}^{\infty}\mathbf{A}^j$$

Now for any $\bar{x}_0\in\mathbb{R}^n$ the sequence $\{\bar{x}^k\}_{k=0}^{\infty}$ computed by

$$\vec{x}^k = \mathbf{T}\vec{x}^{k-1}+\vec{c},$$

for each $k\geq 1$

Converges to the unique solution of $\vec{x} = \mathbf{T} \vec{x} + \vec{c}$ if and only if $\rho \left(\mathbf{T}\right) < 1$. We can prove this, first we will assume that $\rho\left(T\right) < 1$. Then,

\begin{align} \vec{x}^k &= \mathbf{T}\vec{x}^{k-1}+\vec{c} \\ &= \mathbf{T}\left(\mathbf{T}\vec{x}^{k-2} + \vec{c}\right) + \vec{c} \\ &= \mathbf{T}^2 \vec{x}^{k-2}+\left(\mathbf{T-I}\right)\vec{c} \\ &\vdots \\ &= \mathbf{T}^k \vec{x}^0 + (\mathbf{T}^{k-1} + \ldots + \mathbf{T + I})\bar{c} \end{align}

Since $\rho\left(\mathbf{T}\right) < 1$, $\mathbf{T}$ is convergent, then from what we learned previously, it follows

$$\lim_{k\to\infty}\vec{x}^k = \lim_{k\to\infty}\mathbf{T}^k\vec{x}^0 + \left(\sum_{j=0}^{\infty}\mathbf{T}^j\right)\vec{c} = \mathbf{0} + \left(\mathbf{I-T}\right)^{-1} \vec{c}$$

Therefore, $\vec{x}^k$ converges to $\left(\mathbf{I-T}\right)^{-1}\vec{c}$.