Iterative techniques are rarely used for solving linear systems of small dimension because the computation time required for convergence usually exceeds that required for direct methods such as Gaussian elimination. However, for very large systems, especially sparse systems (systems with a high percentage of 0 entries in the matrix), these iterative techniques can be very efficient in terms of computational run times and memory usage.
An iterative technique starts to solve the matrix equation $\mathbf{A}\vec{x} = \vec{b}$ starts with an initial approximation $\vec{x^0}$ and generates a sequence of vectors $\{\vec{x}^1,\vec{x}^2,\ldots, \vec{x}^N\}$ that converges to $\vec{x}$ as $N\rightarrow\infty$. These techniques involve a process that converts the system $\mathbf{A}\vec{x}=\vec{b}$ to an equivalent system of the form $\vec{x}=\mathbf{T}\vec{x}+\vec{c}$. The process then follows, for an initial guess $\vec{x}^0$
We will stop the iteration when some convergence criterion has been reached. A popular convergence criterion uses the $L_{\infty}$norm. The $L_{\infty}$ norm is a metric that represents the greatest length or size of a vector or matrix component. Expressed mathematically,
Let us solve each equation, $E_j$, for the variable $x_j$.
$$ \begin{matrix} E_1: & x_1 =& &\frac{1}{10}x_2& - \frac{1}{5}x_3 & & +\frac{3}{5}\\ E_2: & x_2 = &\frac{1}{11}x_1 & &\frac{1}{11}x_3 & -\frac{3}{11}x_4 & +\frac{25}{11}\\ E_3: & x_3 = &-\frac{1}{5}x_1 & +\frac{1}{10}x_2 & & +\frac{1}{10}x_4 & -\frac{11}{10}\\ E_4: & x_4 = & & -\frac{3}{8}x_2 & + \frac{1}{8}x_3 & & +\frac{15}{8} \end{matrix} $$using the notation introduced previously we have
for
$$\vec{x}^0 = \begin{bmatrix}0\\0\\0\\0\end{bmatrix}, \qquad \vec{x}^1 = \begin{bmatrix}0.6000\\2.2727\\-1.1000\\1.8750\end{bmatrix}$$we repeat this process until the desired convergence has been reached. This technique is called the Jacobi iterative method.
Psuedocode for Jacobi iteration
For the matrix equation $\mathbf{A} \vec{x} = \vec{b}$ with an initial guess $\vec{x}^0$.
\begin{equation} A = \begin{bmatrix}a_{11}&a_{12}&...&a_{1n}\\a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\a_{n1}&...&...&a_{nn}\end{bmatrix}, \quad \vec{b} = \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{n}\end{bmatrix}, \quad \vec{x}^{0} = \begin{bmatrix} x^0_{1} \\ x^0_{2} \\ \vdots \\ x^0_{n}\end{bmatrix}, \end{equation}Steps | |
---|---|
1. | While $\frac{\Vert \vec{x}_k - \vec{x}_{k-1} \Vert_{L_\infty}}{\Vert \vec{x} \Vert_{L_\infty}}> tolerance$ do Step 2 |
2. | $\phantom{--}$ For $i = 1, 2, \ldots, n$ do Step 3 |
3. | $$\phantom{----} x^k_i = \left.\left( -\sum_{\substack{j=1 \\ i \ne j}}^n a_{ij} x^k_j + b_i\right) \middle/ a_{ii} \right.$$ |
def jacobi(A, b, tolerance=1e-10, max_iterations=10000):
x = np.zeros_like(b, dtype=np.double)
T = A - np.diag(np.diagonal(A))
for k in range(max_iterations):
x_old = x.copy()
x[:] = (b - np.dot(T, x)) / np.diagonal(A)
if np.linalg.norm(x - x_old, ord=np.inf) / np.linalg.norm(x, ord=np.inf) < tolerance:
break
return x
Let's consider a matrix $\mathbf{A}$, in which we split into three matrices, $\mathbf{D}$, $\mathbf{U}$, $\mathbf{L}$, where these matrices are diagonal, upper triangular, and lower triangular respectively.
If we let $\mathbf{A = D-L-U}$, then the matrix equation $\mathbf{A}\vec{x} = \vec{b}$ becomes
\begin{equation} \left(\mathbf{D-L-U}\right)\vec{x} = \vec{b} \quad \mathrm{or} \quad \mathbf{D}\vec{x} = \left(\mathbf{L+U}\right)\vec{x}+\vec{b} \end{equation}if $\mathbf{D}^{-1}$ exists, that implies $a_{jj} \neq 0$, then
The results in the matrix form of the Jacobi iteration method
We can see that one requirement for the Jacobi iteration to work is for $a_{ii} \neq 0$. This may involve row exchanges before iterating for some linear systems.
During the Jacobi iteration we always use the components of $\vec{x}^{k-1}$ to compute $\vec{x}^{k}$ but for $i > 1, {x}^{k}_1, \ldots, {x}^{k}_{i-1}$ are already computed and are most likely the best approximations of the real solution. Therefore, we can calculate ${x}^{k}_i$ using the most recently calculated values when available. This technique is called Gauss-Seidel iteration. The pseudocode is as follows
Pseudocode for Gauss-Seidel iteration
For the matrix equation $\mathbf{A} \vec{x} = \vec{b}$ with an initial guess $\vec{x}^0$.
$$ A=\begin{bmatrix}a_{11}&a_{12}&\ldots&a_{1n}\\a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\a_{n1}&\ldots&\ldots&a_{nn}\end{bmatrix}, \bar{b} = \begin{bmatrix}b_1\\b_1\\\vdots\\b_n\end{bmatrix}, \bar{x}_0 = \begin{bmatrix}x^{0}_1\\x^{0}_2\\\vdots\\x^{0}_n\end{bmatrix} $$Steps | |
---|---|
1. | While $\frac{\Vert\vec{x}^{k}-\vec{x}^{k-1}\Vert_{L_\infty}}{\Vert\bar{x}^{k}\Vert_{L_\infty}} > tolerance$ do Step 2 |
2. | $\phantom{--}$ For $i = 1, 2, ..., n$ do Step 3 |
3. | $$\phantom{----} x^{k}_i = \left.\left(-\sum_{j=1}^{i-1}a_{ij}x^{k}_j-\sum_{j=i+1}^{n}a_{ij}x^{k-1}_j+b_i\right) \middle/ {a_{ii}}\right.$$ |
def gauss_seidel(A, b, tolerance=1e-10, max_iterations=10000):
x = np.zeros_like(b, dtype=np.double)
#Iterate
for k in range(max_iterations):
x_old = x.copy()
#Loop over rows
for i in range(A.shape[0]):
x[i] = (b[i] - np.dot(A[i,:i], x[:i]) - np.dot(A[i,(i+1):], x_old[(i+1):])) / A[i ,i]
#Stop condition
if np.linalg.norm(x - x_old, ord=np.inf) / np.linalg.norm(x, ord=np.inf) < tolerance:
break
return x
Let's consider a matrix $\mathbf{A}$, in which we split into three matrices, $\mathbf{D, U, L}$, where these matrices are diagonal, upper triangular, and lower triangular respectively.
We will leave, as an exercise for the student, the derivation, but the matrix equation for the Gauss-Seidel iteration method is as follows:
In order for the lower triangular matrix $\mathbf{D-L}$ to be invertible it is necessary and sufficient for $a_{ii}\neq 0$. As before, this may involve row exchanges before iterating for some linear systems.
First, a definition, the spectral radius, $\rho$, of matrix $\mathbf{A}$ is the maximum of the absolute values of the matrix $\mathbf{A}$'s eigenvalues.
where the $\lambda_i$'s are the eigenvalues of $\mathbf{A}$.
Now, if $\rho\left(\mathbf{A}\right) < 1$, then $(\mathbf{I-A})^{-1}$ exists, and
We can prove this, starting with the eigenvalue equation
$$\mathbf{A}\vec{x} = \lambda \vec{x} \rightarrow (\mathbf{I-A})\vec{x} = (1-\lambda)\vec{x}$$$\lambda$ is an eigenvalue of $\mathbf{A}$, exactly when $(1-\lambda)$ is an eigenvalue of $\mathbf{I-A}$. But $\vert\lambda\vert \leq \rho\left(\mathbf{A}\right) < 1$, therefore $1$ cannot be an eigenvalue of $\mathbf{A}$, and $0$ cannot be an eigenvalue of $\mathbf{I-A}$. A matrix in which none of the eigenvalues are zero is always invertible, therefore $\left(\mathbf{I-A}\right)^{-1}$ exists. There also exists an identity which states that a matrix $\mathbf{A}$ is convergent if $\rho\left(\mathbf{A}\right) < 1$, which implies that $\lim_{n\to\infty}\mathbf{A}^n\vec{x} = 0$ for all $\vec{x}$. Now, let
$$\mathbf{S}_m = \mathbf{I} + \mathbf{A} + \mathbf{A}^2 + \ldots + \mathbf{A}^m$$.
Then,
$$\left(\mathbf{I-A}\right)\mathbf{S}_m = \left(1+\mathbf{A+A}^2+\ldots + \mathbf{A}^m)-(\mathbf{A+A}^2+\ldots+\mathbf{A}^{m+1}\right)=\left(\mathbf{I-A}^{m+1}\right).$$Since $\mathbf{A}$ is convergent, we can see that by taking the limit of both sides
$$\lim_{m\to\infty}\left(\mathbf{I-A}\right)\mathbf{S}_m=\lim_{m\to\infty}\left(\mathbf{I-A}^{m+1}\right)=\mathbf{I},$$therefore
$$\left(\mathbf{I-A}\right)^{-1} = \lim_{m\to\infty}\mathbf{S}_m = \sum_{j=0}^{\infty}\mathbf{A}^j$$Now for any $\bar{x}_0\in\mathbb{R}^n$ the sequence $\{\bar{x}^k\}_{k=0}^{\infty}$ computed by
$$\vec{x}^k = \mathbf{T}\vec{x}^{k-1}+\vec{c},$$for each $k\geq 1$
Converges to the unique solution of $\vec{x} = \mathbf{T} \vec{x} + \vec{c}$ if and only if $\rho \left(\mathbf{T}\right) < 1$. We can prove this, first we will assume that $\rho\left(T\right) < 1$. Then,
Since $\rho\left(\mathbf{T}\right) < 1$, $\mathbf{T}$ is convergent, then from what we learned previously, it follows
$$\lim_{k\to\infty}\vec{x}^k = \lim_{k\to\infty}\mathbf{T}^k\vec{x}^0 + \left(\sum_{j=0}^{\infty}\mathbf{T}^j\right)\vec{c} = \mathbf{0} + \left(\mathbf{I-T}\right)^{-1} \vec{c}$$Therefore, $\vec{x}^k$ converges to $\left(\mathbf{I-T}\right)^{-1}\vec{c}$.