The eigenvalue problem was discussed previously in conjunction with the convergence of iterative methods in the solution of linear systems. As engineers we are often introduced to the eigenproblem in mechanics courses as the principal values and directions of the moment of inertia tensor, the stress and strain tensors, and as natural frequencies and modes in vibration theory. From a mathematical standpoint the eigenproblem can provide an elegant method viewing the structure of a matrix. A more formal introduction to the problem and the methods one might use to solve the eigenproblem is the subject of this section.
First, a short refresher. Suppose a scalar, $\lambda$, and a vector $\vec{x}$ are found to satisfy the matrix equation
Then $\lambda$ is said to be an eigenvalue and $\vec{x}$ an eigenvector of $\mathbf{A}$. We can rearrange this equation to be in the equivalent form
In order for a non-trivial solution to exist then,
which results in a polynomial equation in $\lambda$ known as the characteristic polynomial. The roots of the characteristic polynomial can be found to solve for the eigenvalues of $\mathbf{A}$. This would be a very direct method for finding the eigenvalues of $\mathbf{A}$ and typically how we may have been taught to do it in vibrations class or when solving for principal stresses in mechanics and materials. However, for large $\mathbf{A}$, the problem becomes intractable to proceed by hand and we must resort to the computer for help. The problem is, that finding determinants of matrices is somewhat computationally expensive and finding good approximations to the roots of the characteristic polynomial can be difficult (we shall see when we cover nonlinear root solving). There are other computational ways of estimating the eigenvalues of a matrix, these methods are discussed in the sequel.
The Power Method is an iterative technique used to determine the eigenvalue of largest magnitude in a matrix. Once this method is understood, we will modify it slightly to determine other eigenvalues. This method also produces an associated eigenvector. Sometimes this method is used to find an eigenvector when an eigenvalue is known or found by other means. Let us assume that an $n \times n$ matrix $\mathbf{A}$ has $n$ eigenvalues, $\lambda_1, \lambda_2, \ldots, \lambda_n$ with an associated collection of linearly independent eigenvectors $\vec{v}_1, \vec{v}_2, \ldots,\vec{v}_n$. Let us also make the assumption that $\lambda_1$ is the largest in magnitude, meaning $\vert\lambda_1\vert > \vert\lambda_2\vert \geq \ldots \geq \vert\lambda_n\vert$. If we define a vector $\vec{x}$ in $\mathbb{R}^n$, because the eigenvectors are linear independent, it implies that constants $\beta_1, \beta_2, \ldots, \beta_n$ exist such that,
Multiplying this equation on both sides by $\mathbf{A}, \mathbf{A}^2, \ldots, \mathbf{A}^k$ we have:
$$ \mathbf{A}\vec{x} = \sum_{j=1}^n\beta_j\mathbf{A}\vec{v}_j = \sum_{j=1}^n\beta_j\lambda_j\vec{v}_j\\ \mathbf{A}^2\vec{x} = \sum_{j=1}^n\beta_j\mathbf{A}^2\vec{v}_j = \sum_{j=1}^n\beta_j\lambda_j^2\vec{v}_j\\ \vdots\\ \mathbf{A}^k\vec{x} = \sum_{j=1}^n\beta_j\mathbf{A}^k\vec{v}_j = \sum_{j=1}^n\beta_j\lambda_j^k\vec{v}_j\\ $$Now let us factor out $\lambda_1^k$ from the right hand side of the last equation.
$$\mathbf{A}^k\vec{x} = \lambda_1^k\sum_{j=1}^n\beta_j\left(\frac{\lambda_j}{\lambda_1}\right)^k\vec{v}_j$$Now take the limit of both sides as $k \to \infty$.
For $j = 2, 3, \ldots, n$ we have $\lim_{k\to\infty}\left(\frac{\lambda_j}{\lambda_1}\right)^k = 0$ because $\vert\lambda_1\vert > \vert\lambda_j\vert$, that leaves
The final equation converges to $0$ if $\vert\lambda_1\vert < 1$ and diverges if $\vert\lambda_1\vert >1$.
To avoid confusion with taking powers of scalars, we use the additional parenthesis on the superscript to indicate iterations.
Looking at the final relationship we can scale the powers of $\mathbf{A}^k\vec{x}$ in a manner which ensures that the limit is always finite and non-zero. The scaling begins by choosing $\vec{x}$ to be a unit vector, $\vec{x}^{(0)}$, relative to the infinity norm and choosing a component $x_{p_0}^{(0)}$ of $\vec{x}^{(0)}$ such that
Now, we let $\vec{y}^{(1)}=\mathbf{A}\vec{x}^{(0)}$, and define $\mu^{(1)} = y_{p_0}^{(1)}$. Then,
$$ \mu^{(1)} = y_{p_0}^{(1)} = \frac{y_{p_0}^{(1)}}{x_{p_0}^{(0)}} = \frac{\beta_1\lambda_1 v_{p_0}^{(1)} + \sum_{j=2}^n \beta_j \lambda_j v_{p_0}^{(j)}}{\beta_1 v_{p_0}^{(1)} + \sum_{j=2}^n \beta_j v_{p_0}^{(j)}} = \lambda_1 \left[\frac{\beta_1 \lambda_1 v_{p_0}^{(1)} + \sum_{j=2}^n \beta_j \left(\frac{\lambda_j}{\lambda_1}\right) v_{p_0}^{(j)}}{\beta_1 v_{p_0}^{(1)} + \sum_{j=2}^n \beta_j v_{p_0}^{(j)}}\right] $$Let $p_1$ be the smallest integer such that
$$\vert y_{p_1}^{(1)}\vert = \Vert\vec{y}^{(1)}\Vert_{L_\infty}$$finally we define $\vec{x}^{(1)}$ as
$$\vec{x}^{(1)} = \frac{1}{y_{p_1}^{(1)}}\vec{y}^{(1)} = \frac{1}{y_{p_1}^{(1)}}\mathbf{A}\vec{x}^{(0)}$$then,
$$x_{p_1}^{(1)} = 1 = \vert\vert\vec{x}^{(1)}\vert\vert_{L_\infty}$$Now define,
$$\vec{y}^{(2)} = \mathbf{A} \vec{x}^{(1)} = \mathbf{A}^2 \vec{x}^{(0)}$$and
$$\mu^{(2)} = y_{p_1}^{(2)} = \frac{y_{p_1}^{(2)}}{x_{p_1}^{(1)}} = \frac{\left.\left[\beta_1 \lambda_1^2 v_{p_1}^{(1)}+ \sum_{j=2}^n \beta_j \lambda_j^2 v_{p_1}^{(j)}\right]\middle/y_{p_1}^{(1)}\right.}{\left.\left[\beta_1 \lambda_1 v_{p_1}^{(1)} + \sum_{j=2}^n \beta_j \lambda_j v_{p_1}^{(j)}\right]\middle/y_{p_1}^{(1)}\right.} = \lambda_1 \left[ \frac{\beta_1 v_{p_1}^{(1)} + \sum_{j=2}^n \beta_j \left(\frac{\lambda_j}{\lambda_1}\right)^2 v_{p_1}^{(j)}}{\beta_1 v_{p_1}^{(1)}+\sum_{j=2}^n \beta_j \left(\frac{\lambda_j}{\lambda_1}\right) v_{p_1}^{(j)}}\right]$$Let $p_2$ be the smallest integer such that
$$\vert y_{p_2}^{(2)}\vert = \Vert\vec{y}^{(2)}\Vert_{L_\infty}$$finally we define $\vec{x}^{(1)}$ as
$$\vec{x}^{(2)} = \frac{1}{y_{p_2}^{(2)}}\vec{y}^{(2)} = \frac{1}{y_{p_1}^{(2)}}\mathbf{A}\vec{x}^{(1)} = \frac{1}{y_{p_1}^{(2)}y_{p_1}^{(1)}}\mathbf{A}^2\vec{x}^{(0)}$$Continuing this sequence we have,
$$\vec{y}^{(m)} = \mathbf{A}\vec{x}^{(m-1)}$$$$\mu^{(m)} = y_{p_{m-1}}^{(m)} = \frac{y_{p_0}^{(2)}}{x_{p_0}^{(1)}} = \lambda_1 \left[\frac{\beta_1 v_{p_{m-1}}^{(1)} + \sum_{j=2}^n \beta_j \left(\frac{\lambda_j}{\lambda_1}\right)^m v_{p_{m-1}}^{(j)}}{\beta_1 v_{p_{m-1}}^{(1)} + \sum_{j=2}^n \beta_j \left(\frac{\lambda_j}{\lambda_1}\right)^{m-1} v_{p_{m-1}}^{(j)}}\right] \label{eqn:power}\tag{1}$$and
$$\vec{x}^{(m)} = \frac{\vec{y}^{(m)}}{y_{p_m}^{(m)}} = \frac{\mathbf{A}^m \vec{x}^{(0)}}{\prod_{k=1}^m y_{p_k}^{(k)}}$$where at each step, $p_m$ is used to represent the smallest integer for which
$$\vert y_{p_m}^{(m)}\vert = \Vert\vec{y}^{(m)}\Vert_{L_\infty}$$If we look at (\ref{eqn:power}) we can see that as $m \to \infty$, then $\mu \to \lambda_1$ as long as $\vec{x}^{(0)}$ is chosen so that $\beta_1 \neq 0$. Also, the sequence of vectors $\vec{x}^{(m)}$ converges to an eigenvector associated with $\lambda_1$ that has infinity norm of unity. This method has the disadvantage that it will not work if the matrix does not have a single dominant eigenvalue.
Pseudocode for Power Method
Steps | |
---|---|
1. | Initialize an eigenvector, $\vec{x} = \left[x_1, x_2, \ldots, x_n\right]^{\intercal}$, that is not in the nullspace of $\mathbf{A}$, i.e. $\mathbf{A}\vec{x} \ne \mathbf{0}$ |
2. | Initialize $ERR > TOL$ |
3. | Find the smallest integer $p$ with $1\leq p\leq n$ and $\vert x_p\vert=\Vert\vec{x}\Vert_{L_\infty}$ |
4. | Set $\vec{x}=\frac{\vec{x}}{x_p}$ |
5. | While $ERR>TOL$, do Steps 6-10. |
6. | $\phantom{--} \vec{y} = \mathbf{A}\vec{x}$ |
7. | $\phantom{--} \mu = y_p$ |
8. | $\phantom{--}$ Find the smallest integer $p$ with $1\leq p\leq n$ and $\vert y_p \vert = \Vert\vec{y}\Vert_{L_\infty}$ |
9. | $\phantom{--}$ Set $ERR = \Vert\vec{x}-\left(\frac{\vec{y}}{y_p}\right)\Vert_{L_\infty}$ |
10. | $\phantom{--}$ Set $\vec{x} = \frac{\vec{y}}{y_p}$ |
The rate of convergence for the Power Method is $O\left(\left\vert\frac{\lambda_2}{\lambda_1}\right\vert^m\right)$. Next we will make some modifications to help improve the rate of convergence.
def __find_p(x):
return np.argwhere(np.isclose(np.abs(x), np.linalg.norm(x, np.inf))).min()
def __iterate(A, x, p):
y = np.dot(A, x)
μ = y[p]
p = __find_p(y)
error = np.linalg.norm(x - y / y[p], np.inf)
x = y / y[p]
return (error, p, μ, x)
def power_method(A, tolerance=1e-10, max_iterations=10000):
n = A.shape[0]
x = np.ones(n)
p = __find_p(x)
error = 1
x = x / x[p]
for _ in range(max_iterations):
if error < tolerance:
break
error, p, μ, x = __iterate(A, x, p)
return (μ, x)
Simulation of Power Method Converging
The following simulation shows as Power Law method converging to an eigenvalue and eigenvector with the following $\mathbf{A}$ matrix and initial guess for an eigenvector.
$$\mathbf{A} = \begin{bmatrix}0.1 & 1\\1 & 0.1\end{bmatrix}, \qquad \vec{v} = \begin{bmatrix}1\\0\end{bmatrix}$$import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import matplotlib
matplotlib.use('Agg')
matplotlib.rc('animation', html='jshtml')
A = np.array([[0.1, 1],[1, 0.1]])
n = A.shape[0]
x = np.array([1.0, 0.0])
p = __find_p(x)
error = 1
x = x / x[p]
number_of_iterations = 20
x_array = [x]
λ_array = [0.0]
for _ in range(1, number_of_iterations):
error, p, μ, x = __iterate(A, x, p)
x_array.append(x)
λ_array.append(μ)
fig, ax = plt.subplots()
ax.set_xlim(-1.5, 1.5)
ax.set_ylim(-1.5, 1.5)
ax.set_aspect('equal')
lines = []
lines.append(ax.plot(np.sqrt(1.1), np.sqrt(1.1), 'ko',label='Exact Eigenvalue')[0])
lines.append(ax.plot([], [], 'ro', label='Eigenvalue')[0])
lines.append(ax.plot([], [], 'k--', label='Eigenvector')[0])
lines.append(ax.plot([], [], 'k--')[0])
ax.legend(loc='upper left');
def animate(i):
lines[0].set_data(np.sqrt(1.1), np.sqrt(1.1))
λ = np.sqrt(λ_array[i])
lines[1].set_data(λ, λ)
lines[2].set_data([0, 10*x_array[i][0]], [0, 10*x_array[i][1]])
lines[3].set_data([0, -10*x_array[i][0]], [0, -10*x_array[i][1]])
return
FuncAnimation(fig, animate, frames=number_of_iterations, interval=300)
Inverse Power Method
The Inverse Power Method is a modification of the power method that gives faster convergence. It is used to determine the eigenvalue of $\mathbf{A}$ that is closest to a specific number $q$.
Suppose the matrix $\mathbf{A}$ has eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_n$ with linearly independent eigenvectors $\vec{v}^{(1)},\ldots,\vec{v}^{(n)}$. The eigenvalues of $(\mathbf{A}-q\mathbf{I})^{-1}$, where $q \neq \lambda_i$, for $i = 1, 2, \ldots, n$, are
$$\frac{1}{\lambda_1 - q}, \frac{1}{\lambda_2 -q}, \ldots, \frac{1}{\lambda_n - q},$$If we apply the power method to $(\mathbf{A}- q\mathbf{I})^{-1}$ we have
$$\vec{y}^{(m)} = (\mathbf{A}-q\mathbf{I})^{-1} \vec{x}^{(m-1)}$$$$\mu^{(m)} = y_{p_{m-1}}^{(m)} = \frac{y_{p_{p-1}}^{(m)}}{x_{p_{m-1}}^{(m-1)}} = \frac{\sum_{j=1}^n \beta_j \frac{1}{(\lambda_j -q)^m}v_{p_{m-1}}^{(j)}}{\sum_{j=1}^n \beta_j \frac{1}{(\lambda_j-q)^{m-1} v_{p_{m-1}}^{(j)}}}$$and
$$\vec{x}^{(m)} = \frac{\vec{y}^{(m)}}{y_{p_{m}}^{(m)}}$$where, at each step, $p_m$ represents the smallest integer for which $\vert y_{p_{m}}^{(m)}\vert = \Vert\vec{y}^{(m)}\Vert_{L_\infty}$. $\mu$ will then converge to $\frac{1}{(\lambda_k - q)}$, where
$$\frac{1}{\vert\lambda_k - q\vert} = \max\left( \frac{1}{\vert\lambda_j - q\vert}\right)$$where $i = 1, 2, \ldots, n$ and $\lambda_k \approx q + \frac{1}{\mu^{(m)}}$ is an eigenvalue of $\mathbf{A}$ closest to $q$.
Once $k$ is known we can write,
The choice of $q$ determines the convergence, provided that $\frac{1}{\lambda_k -q}$ is a dominant eigenvalue of $(\mathbf{A}-q\mathbf{I})^{-1}$. The closer $q$ is to an eigenvalue $\lambda_k$, the faster the convergence. The convergence is on the order
Convergence
where $\lambda$ represents the eigenvalue of $\mathbf{A}$ that is second closest to $q$.
The vector $\vec{y}^{(m)}$ is obtained by solving the equation,
$$(\mathbf{A}-q\mathbf{I})\vec{y}^{(m)} = \vec{x}^{(m-1)}$$This equation can be solved by any of the methods we previously learned for solving linear systems. The selection of $q$ can be based on some means of localizing an eigenvalue (e.g. Gerschogrin Circle, etc.) If we have an initial approximation of the eigenvector, $\vec{x}^{(0)}$, we can choose $q$, as such
$$q = \frac{\vec{x}^{(0)T}\mathbf{A}\vec{x}^{(0)}}{\vec{x}^{(0)T} \vec{x}^{(0)}}$$Pseudocode for Inverse Power Method
Steps | |
---|---|
1. | Initialize an eigenvector, $\vec{x} = \left[x_1, x_2,\ldots, x_3\right]^\intercal$, that is not in the null space of $\mathbf{A}$ |
2. | Initialize $ERR > TOL$ |
3. | Set $q = \frac{\vec{x}^T \mathbf{A} \vec{x}}{\vec{x}^T \vec{x}}$ |
4. | Find the smallest integer $p$ with $1\leq p \leq n$ and $\vert x_p\vert = \Vert\vec{x}\Vert_{L_\infty}$ |
5. | Set $\vec{x} = \frac{\vec{x}}{x_p}$ |
6. | While $ERR > TOL$, do steps 7-12 |
7. | $\phantom{--}$ Solve $(\mathbf{A}-q\mathbf{I})\vec{y} = \vec{x}$ |
8. | $\phantom{--}$ $\mu = y_p$ |
9. | $\phantom{--}$ Find the smallest integer $p$ with $1 \leq p \leq n$ and $\vert y_p\vert = \Vert\vec{y}\Vert_{L_\infty}$ |
10. | $\phantom{--}$ Set $ERR = \left\Vert\vec{x}-\left(\frac{\vec{y}}{y_p}\right)\right\Vert_{L_\infty}$ |
11. | $\phantom{--}$ Set $\vec{x} = \frac{\vec{y}}{y_p}$ |
12. | $\phantom{--}$ Set $\mu = \left(\frac{1}{\mu}\right) + q$ |
def inverse_power_method(A, tolerance=1e-10, max_iterations=10000):
n = A.shape[0]
x = np.ones(n)
I = np.eye(n)
q = np.dot(x, np.dot(A, x)) / np.dot(x, x)
p = __find_p(x)
error = 1
x = x / x[p]
for _ in range(max_iterations):
if error < tolerance:
break
y = np.linalg.solve(A - q * I, x)
μ = y[p]
p = __find_p(y)
error = np.linalg.norm(x - y / y[p], np.inf)
x = y / y[p]
μ = 1. / μ + q
return (μ, x)
There are several ways of determining approximation to the other eigenvalues once a dominant eigenvalue is known. A class of techniques called deflation techniques involve forming a new matrix $\mathbf{B}$ with eigenvalues identical to those in $\mathbf{A}$ except with the dominant eigenvalue of $\mathbf{A}$ replaced with $0$. There are then a series of steps that allow you to find the remaining eigenvalues. One popular deflation technique is called Wieland deflation. These deflation techniques tend to be very susceptible of roundoff error and usually they are only used in approximating the initial guess, $q$, for use with the Inverse Power Method. When all the eigenvalues of a matrix are required there are better methods for achieving their approximations. Therefore, we will fore go a discussion of deflation techniques and move on to these other methods which involve similarity transformations.