Search
LU Factorization

Any non-singular matrix $\mathbf{A}$ can be factored into a lower triangular matrix $\mathbf{L}$, and upper triangular matrix $\mathbf{U}$ using procedures we have already established with Gaussian elimination. This proves very useful for numerical computation and is, in fact, one of the most common ways most packaged linear algebra solvers solve non-sparse linear systems.

Previously, we learned that by using Gaussian elimination we can solve the linear system $\mathbf{A}\vec{x} = \vec{b}$ in $O\left(\frac{1}{3}n^3\right)$ arithmetic operations to determine $\vec{x}$. It turns out if $\mathbf{A}$ has the form $\mathbf{A=LU}$ we can solve for $\vec{x}$ using a two step process. First we let $\vec{y}=\mathbf{U}\vec{x}$ and solve the system for $L\vec{y}=\vec{b}$ for $\vec{y}$. Since $\mathbf{L}$ is lower triangular we use a forward substitution process that only takes $O\left(n^2\right)$ operations. Once $\vec{y}$ is known, the upper triangular system $\mathbf{U}\vec{x}=\vec{y}$ can be solved with back substitution in $O\left(n^2\right)$ operations. Therefore using this procedure we can reduce the solution of $ \mathbf{A}\vec{x} = \vec{b}$ from $O\left(\frac{1}{3}n^3\right)$ to $O\left(2n^2\right)$ operations. For large systems this can reduce the calculation time by more than 95%. Determining the $\mathbf{L}$ and $\mathbf{U}$ matrices still takes $O\left(\frac{1}{3}n^3\right)$, but it only has to be done for a single $\mathbf{A}$ matrix and can be used efficiently to solve for $\vec{x}$ with many right-hand side vectors, $\vec{b}$.

We already know one way to transform $\mathbf{A}$ into $\mathbf{U}$ by using Gaussian elimination. We will leave the proof for linear algebra class, but we find $\mathbf{L}$ by performing negations of the same operations on the identity matrix in reverse order. For example when going $\mathbf{A}\rightarrow \mathbf{U}$ if we perform the row operation

$$ E_i - m_{ji}E_j\rightarrow E_i $$

to undo this and return to $\mathbf{A}$ we would perform

$$ E_i+m_{ji}E_j\rightarrow E_i $$

by performing the second operation on the identity matrix with the same dimensions as $\mathbf{A}$ we will end up with an $\mathbf{L}$. I emphasize an here because this is only one way to decompose $\mathbf{A}$ into $\mathbf{L}$ and $\mathbf{U}$, there are other methods and an infinite number of $\mathbf{L}$ and $\mathbf{U}$ matrices, i.e. they are not unique.

$\mathbf{LU}$ Example

We will look at a simple example and write out the row operations.

$$ \mathbf{A} = \begin{bmatrix}1&1&0\\2&1&-1\\3&-1&-1\end{bmatrix} \xrightarrow{E_2-2E_1} \begin{bmatrix}1&1&0\\0&-1&-1\\3&-1&-1\end{bmatrix} \xrightarrow{E_3-3E_1} \begin{bmatrix}1&1&0\\0&-1&-1\\0&-4&-1\end{bmatrix} \xrightarrow{E_3-4E_2} \begin{bmatrix}1&1&0\\0&-1&-1\\0&0&3\end{bmatrix} = \mathbf{U} $$
$$ \mathbf{I} = \begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix} \xrightarrow{E_3+4E_2} \begin{bmatrix}1&0&0\\0&1&0\\0&4&1\end{bmatrix} \xrightarrow{E_3+3E_1} \begin{bmatrix}1&0&0\\0&1&0\\3&4&1\end{bmatrix} \xrightarrow{E_2+2E_1} \begin{bmatrix}1&0&0\\2&1&0\\3&4&1\end{bmatrix} = \mathbf{L} $$

Let's check our results with Python

L = np.array([[1,0,0],[2,1,0],[3,4,1]])
U = np.array([[1,1,0],[0,-1,-1],[0,0,3]])
L @ U
array([[ 1,  1,  0],
       [ 2,  1, -1],
       [ 3, -1, -1]])

Notice that the entries in $\mathbf{L}$ below the diagonal are simply the $m_{ij}$'s therefore we do not actually have to perform the row operations we can simply insert the $m_{ij}$ components into their proper place and move forward.

Psuedocode for a simple $\mathbf{LU}$ factorization

Consider the matricies:

\begin{equation} \mathbf{A} = \begin{pmatrix}a_{11}&a_{12}&\ldots&a_{1n}\\a_{21}&\ddots& &a_{2n}\\\vdots& &\ddots& \vdots\\a_{n1}&...&...&a_{nn}\end{pmatrix}, \mathbf{L} = \begin{pmatrix}l_{11}&l_{12}&\ldots&l_{1n}\\l_{21}&\ddots& &l_{2n}\\\vdots& &\ddots& \vdots\\l_{n1}&...&...&l_{nn}\end{pmatrix},\\ \mathbf{U} = \begin{pmatrix}u_{11}&u_{12}&\ldots&u_{1n}\\u_{21}&\ddots& &u_{2n}\\\vdots& &\ddots& \vdots\\u_{n1}&...&...&u_{nn}\end{pmatrix} \end{equation}
Steps
1. Initialize $\mathbf{L}$ to an identity matrix, $\mathbf{I}$ of dimension $n\times n$ and $\mathbf{U = A}$.
2. For $i = 1, \ldots, n$ do Step 3
3. $\phantom{--}$ For $j=i+1, \ldots, n$ do Steps 4-5
4. $\phantom{----}$ Set $l_{ji}=u_{ji}/u_{ii}$
5. $\phantom{----}$ Perform $U_j = (U_j-l_{ji}U_i)$ (where $U_i, U_j$ represent the $i$ and $j$ rows of the matrix $\mathbf{U}$, respectively)

Python / NumPy implementation of a simple $\mathbf{LU}$ factorization

def lu(A):
    
    #Get the number of rows
    n = A.shape[0]
    
    U = A.copy()
    L = np.eye(n, dtype=np.double)
    
    #Loop over rows
    for i in range(n):
            
        #Eliminate entries below i with row operations 
        #on U and reverse the row operations to 
        #manipulate L
        factor = U[i+1:, i] / U[i, i]
        L[i+1:, i] = factor
        U[i+1:] -= factor[:, np.newaxis] * U[i]
        
    return L, U

Psuedocode for a simple $\mathbf{PLU}$ factorization

If we observe the pseudocode presented previously we can see that something as simple as $a_{ii}=0$ will cause our algorithm to fail. We can prevent this from happening by doing a row exchange with a row that has a nonzero value in the $i^\mathrm{th}$ column. Like Gauss elimination there are several pivot strategies that can be utilized. For simplicity, we will exchange any row that has a zero on the diagonal with the first row below it that has a nonzero number in that column. Here we have to keep track of the row exchanges by creating a permutation matrix, $\mathbf{P}$ by performing the same row exchanges to an identity matrix. This gives a matrix with precisely one nonzero entry in each row and in each column and each nonzero entry is 1. We will need $\mathbf{P}$ later, when we use the $\mathbf{LU}$ factorization to solve the permuted system of equations, $\mathbf{P A}\vec{x} = \mathbf{P} \vec{b}$.

Consider the matrices:

\begin{equation} \mathbf{A} = \begin{bmatrix}a_{11}&a_{12}&\ldots&a_{1n}\\a_{21}&\ddots& &a_{2n}\\\vdots& &\ddots& \vdots\\a_{n1}&\ldots&\ldots&a_{nn}\end{bmatrix}, \mathbf{L} = \begin{bmatrix}l_{11}&l_{12}&\ldots&l_{1n}\\l_{21}&\ddots& &l_{2n}\\\vdots& &\ddots& \vdots\\l_{n1}&\ldots&\ldots&l_{nn}\end{bmatrix}, \\ \mathbf{U} = \begin{bmatrix}u_{11}&u_{12}&\ldots&u_{1n}\\u_{21}&\ddots& &u_{2n}\\\vdots& &\ddots& \vdots\\u_{n1}&\ldots&\ldots&u_{nn}\end{bmatrix}, \mathbf{P} = \begin{bmatrix}p_{11}&p_{12}&\ldots&p_{1n}\\p_{21}&\ddots& &p_{2n}\\\vdots& &\ddots& \vdots\\p_{n1}&\ldots&\ldots&p_{nn}\end{bmatrix} \end{equation}
Steps
1. Initialize $\mathbf{L = P = I}$ of dimension $n \times n$ and $\mathbf{U = A}$
2. For $i = 1, \ldots, n$ do Steps 3-4, 8
3. $\phantom{--}$ Let $k = i$,
4. $\phantom{--}$ While $u_{ii}=0$, do Steps 5-7
5. $\phantom{----}$ Swap row $U_i$ with row $U_{ k+1 }$
6. $\phantom{----}$ Swap row $P_i$ with row $P_{ k+1 }$
7. $\phantom{----}$ Increment $k$ by $1$.
8. $\phantom{--}$ For $j = i+1, \ldots, n$ do Steps 9-10
9. $\phantom{----}$ Set $l_{ji} = u_{ ji }/ u_{ ii }$
10. $\phantom{----}$ Perform $U_{ j }=U_{j} - l_{ji} U_{ i }$ (where $U_i, U_j$ represent the $i$ and $j$ rows of the matrix $\mathbf{U}$, respectively)

Python/NumPy implementation of $\mathbf{PLU}$ decomposition

def plu(A):
    
    #Get the number of rows
    n = A.shape[0]
    
    #Allocate space for P, L, and U
    U = A.copy()
    L = np.eye(n, dtype=np.double)
    P = np.eye(n, dtype=np.double)
    
    #Loop over rows
    for i in range(n):
        
        #Permute rows if needed
        for k in range(i, n): 
            if ~np.isclose(U[i, i], 0.0):
                break
            U[[k, k+1]] = U[[k+1, k]]
            P[[k, k+1]] = P[[k+1, k]]
            
        #Eliminate entries below i with row 
        #operations on U and #reverse the row 
        #operations to manipulate L
        factor = U[i+1:, i] / U[i, i]
        L[i+1:, i] = factor
        U[i+1:] -= factor[:, np.newaxis] * U[i]
        
    return P, L, U

The SciPy function scipy.linalg.lu performs a $\mathbf{PLU}$ decomposition. However, we can't compare our implementation to SciPy's in general, because the SciPy implementation uses a slightly different strategy which could result in a different (but still correct) decomposition.

Solving equations after $\mathbf{LU}$ factorization

Once we have $\mathbf{L}$ and $\mathbf{U}$ we can solve for as many right-hand side vectors $\vec{b}$ as desired very quickly using the following two step process. First we let $\vec{y}=\mathbf{U}\vec{x}$ and then solve for $\mathbf{L}\vec{y}=\vec{b}$ for $\vec{y}$ by using forward substitution. The pseudocode for this is as follows

Psuedocode for forward substitution

Steps
1. Set $y_1 = b_1/l_{11}$
2. For $i=2, 3, \ldots n$ do Step 3
3. $$\phantom{--} y_i = \left.\left(b_i - \sum_{j=1}^{i-1} l_{ij}y_j \right)\middle/l_{ii} \right. $$

Python/NumPy implementation of forward substitution

def forward_substitution(L, b):
    
    #Get number of rows
    n = L.shape[0]
    
    #Allocating space for the solution vector
    y = np.zeros_like(b, dtype=np.double);
    
    #Here we perform the forward-substitution.  
    #Initializing  with the first row.
    y[0] = b[0] / L[0, 0]
    
    #Looping over rows in reverse (from the bottom  up),
    #starting with the second to last row, because  the 
    #last row solve was completed in the last step.
    for i in range(1, n):
        y[i] = (b[i] - np.dot(L[i,:i], y[:i])) / L[i,i]
        
    return y

Then solve $\mathbf{U}\vec{x} = \vec{y}$ for $\vec{x}$ using backward substitution. The psuedocode for this is

Pseudocode Back substitution

Steps
1. Set $x_n = y_n / u_{nn}$
2. For $i = n-1, n-2, ..., 1$ do Step 3.
3. $$\phantom{--} x_i = \left. \left(y_i - \sum_{j=i+1}^{n} u_{ij} x_j\right)\middle/u_{ii}\right.$$

Python/NumPy implementation of backward substitution

def back_substitution(U, y):
    
    #Number of rows
    n = U.shape[0]
    
    #Allocating space for the solution vector
    x = np.zeros_like(y, dtype=np.double);

    #Here we perform the back-substitution.  
    #Initializing with the last row.
    x[-1] = y[-1] / U[-1, -1]
    
    #Looping over rows in reverse (from the bottom up), 
    #starting with the second to last row, because the 
    #last row solve was completed in the last step.
    for i in range(n-2, -1, -1):
        x[i] = (y[i] - np.dot(U[i,i:], x[i:])) / U[i,i]
        
    return x

Putting everything together for a solution process

def lu_solve(A, b):
    
    L, U = lu(A)
    
    y = forward_substitution(L, b)
    
    return back_substitution(U, y)

Testing the code

A = np.array([[1, 4, 5], [6, 8, 22], [32, 5., 5]])
b = np.array([1, 2, 3.])

lu_solve(A, b)
array([ 0.05614973,  0.25935829, -0.01871658])

and comparing to NumPy's np.lingalg.solve

np.linalg.solve(A, b)
array([ 0.05614973,  0.25935829, -0.01871658])

For a $\mathbf{PLU}$ factorization would have the additional step of permuting the right-hand side vector $\vec{b} = \mathbf{P}\vec{b}$ before doing the forward substitution to solve for $\vec{y}$.

def plu_solve(A, b):
    
    P, L, U = plu(A)
    
    y = forward_substitution(L, np.dot(P, b))
    
    return back_substitution(U, y)

Testing our implementation

A = np.array([[0, 4, 5], [6, 8, 22], [32, 5., 5]])
b = np.array([1, 2, 3.])

plu_solve(A, b)
array([ 0.05363985,  0.2835249 , -0.02681992])

Comparing to the NumPy np.linalg.solve function (which also uses the PLU factorization method)

np.linalg.solve(A, b)
array([ 0.05363985,  0.2835249 , -0.02681992])

$\mathbf{LU}$ factorization, another look

Let's consider a generic $\mathbf{LU}$ factorization, for simplicity we will consider a set of $3 \times 3$ matrices, but these ideas will apply in the $n\times n$ case as well

\begin{equation} \mathbf{LU} = \begin{bmatrix}l_{11}&0&0\\l_{21}&l_{22}&0\\l_{31}&l_{32}&l_{33}\end{bmatrix} \begin{bmatrix}u_{11}&u_{12}&u_{13}\\0&u_{22}&u_{23}\\0&0&u_{33}\end{bmatrix} = \begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{bmatrix} = \mathbf{A} \end{equation}

If we multiply the two matrices on the left together, we have

\begin{equation} \begin{bmatrix}l_{11}u_{11}&l_{11}u_{12}&l_{11}u_{13}\\l_{21}u_{11}&l_{21}u_{12}+l_{22}u_{22}&l_{21}u_{13}+l_{22}u_{23}\\l_{31}u_{11}&l_{31}u_{12}+l_{32}u_{22}&l_{31}u_{13}+l_{32}u_{23}+l_{33}u_{33}\end{bmatrix} = \begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{bmatrix} \end{equation}

Now if we take the same approach as before where the $l_{ii}$'s are the 1's we can solve the first row of equations trivially, namely

$$u_{11} = a_{11}, \quad u_{12} = a_{12}, \quad u_{13} = a_{13},$$

then we have enough information to solve the rest of the first column,

$$l_{21} = a_{21}/a_{11}, \quad l_{31}=a_{31}/a_{11},$$

and the rest of the second row,

$$u_{22} = (a_{22}-a_{21}^{2}/a_{11}), \quad u_{23} = (a_{23}-a_{21}a_{23}/a_{11}),$$

etc.

When this procedure is generalized it is known as the Doolittle alogrithm. There is a similar procedure known as Crout's method that uses the 1's on the diagonal of the $\mathbf{U}$ matrix. The generalization of these methods will be shown in the sequel described in their pseudocode.

Pseudocode for Doolittle algorithm.

Consider the matrices

\begin{equation} \mathbf{A} = \begin{bmatrix}a_{11}&a_{12}&\ldots&a_{1n}\\a_{21}&\ddots& &a_{2n}\\\vdots& &\ddots& \vdots\\a_{n1}&\ldots&\ldots&a_{nn}\end{bmatrix}, \mathbf{L} = \begin{bmatrix}l_{11}&l_{12}&\ldots&l_{1n}\\l_{21}&\ddots& &l_{2n}\\\vdots& &\ddots& \vdots\\l_{n1}&\ldots&\ldots&l_{nn}\end{bmatrix}, \\ \mathbf{U} = \begin{bmatrix}u_{11}&u_{12}&\ldots&u_{1n}\\u_{21}&\ddots& &u_{2n}\\\vdots& &\ddots& \vdots\\u_{n1}&\ldots&\ldots&u_{nn}\end{bmatrix} \end{equation}
Steps
1. For $k = 1, 2, \ldots, n$ do Steps 2-3, 5
2. $\phantom{--}$ Set $l_{kk}=1$
3. $\phantom{--}$ For $j=k, k+1, \ldots, n$ do Step 4
4. $$\phantom{----} u_{kj} = a_{kj} - \sum_{m=1}^{k-1}l_{km}u_{mj}$$
5. $\phantom{--}$ For $i=k+1, k+2, \ldots, n$ do Step 6
6. $$\phantom{----} l_{ik}=\left.\left(a_{ik}-\sum_{m=1}^{k-1}l_{im}u_{mk}\right) \middle/u_{kk}\right.$$
def doolittle(A):
    
    n = A.shape[0]
    
    U = np.zeros((n, n), dtype=np.double)
    L = np.eye(n, dtype=np.double)
    
    for k in range(n):
        
        U[k, k:] = A[k, k:] - L[k,:k] @ U[:k,k:]
        L[(k+1):,k] = (A[(k+1):,k] - L[(k+1):,:] @ U[:,k]) / U[k, k]
    
    return L, U
A = np.array([[1, 4, 5], [6, 8, 22], [32, 5., 5]])
L, U = doolittle(A)
display(Latex('$\mathbf A =\phantom${}'.format(A)))
display(Latex('$\mathbf  =\phantom${}'.format(L @ U)))
$\mathbf A =\phantom{.}$[[ 1. 4. 5.] [ 6. 8. 22.] [32. 5. 5.]]
$\mathbf {LU} =\phantom{.}$[[ 1. 4. 5.] [ 6. 8. 22.] [32. 5. 5.]]

Pseudocode for Crout algorithm

Consider the matrices

\begin{equation} \mathbf{A} = \begin{bmatrix}a_{11}&a_{12}&\ldots&a_{1n}\\a_{21}&\ddots& &a_{2n}\\\vdots& &\ddots& \vdots\\a_{n1}&\ldots&\ldots&a_{nn}\end{bmatrix}, \mathbf{L} = \begin{bmatrix}l_{11}&l_{12}&\ldots&l_{1n}\\l_{21}&\ddots& &l_{2n}\\\vdots& &\ddots& \vdots\\l_{n1}&\ldots&\ldots&l_{nn}\end{bmatrix}, \\ \mathbf{U} = \begin{bmatrix}u_{11}&u_{12}&\ldots&u_{1n}\\u_{21}&\ddots& &u_{2n}\\\vdots& &\ddots& \vdots\\u_{n1}&\ldots&\ldots&u_{nn}\end{bmatrix} \end{equation}
Steps
1. For $k = 1, 2, ..., n$ do Steps 2-3, 5
2. $\phantom{--}$ Set $l_{kk} = a_{kk}-\sum_{m=1}^{k-1}l_{km}u_{mk}$
3. $\phantom{--}$ For $j = k, k+1, \ldots n$ do Step 4
4. $$\phantom{----} u_{kj} = \left.\left(a_{kj}-\sum_{m=1}^{k-1}l_{km}u_{mj}\right)\middle/l_{kk}\right.$$
5. $\phantom{--}$ For $i = k+1, k+2, \ldots, n$ do Step
6. $$\phantom{----} l_{ik} = \left.\left(a_{ik}-\sum_{m=1}^{k-1}l_{im}u_{mk}\right) \middle/ u_{kk}\right.$$
def crout(A):
    
    n = A.shape[0]
    
    U = np.zeros((n, n), dtype=np.double)
    L = np.zeros((n, n), dtype=np.double)
    
    for k in range(n):
        
        L[k, k] = A[k, k] - L[k, :] @ U[:, k]
        
        U[k, k:] = (A[k, k:] - L[k, :k] @ U[:k, k:]) / L[k, k]
        L[(k+1):, k] = (A[(k+1):, k] - L[(k+1):, :] @ U[:, k]) / U[k, k]
    
    return L, U
L, U = crout(A)
display(Latex('$\mathbf A =\phantom${}'.format(A)))
display(Latex('$\mathbf  =\phantom${}'.format(L @ U)))
$\mathbf A =\phantom{.}$[[ 1. 4. 5.] [ 6. 8. 22.] [32. 5. 5.]]
$\mathbf {LU} =\phantom{.}$[[ 1. 4. 5.] [ 6. 8. 22.] [32. 5. 5.]]

Pseudocode for Cholesky decomposition

If matrix $\mathbf{A}$ is symmetric and positive definite, then there exists a lower triangular matrix $\mathbf{L}$ such that $\mathbf{A=LL}^\intercal$. This is just a special case of the $\mathbf{LU}$ decomposition, $\mathbf{U=L}^\intercal$. The algorithm is slightly simpler than the Doolittle or Crout methods.

Consider the matrices

\begin{equation} \mathbf{A} = \begin{bmatrix}a_{11}&a_{12}&\ldots&a_{1n}\\a_{21}&\ddots& &a_{2n}\\\vdots& &\ddots& \vdots\\a_{n1}&\ldots&\ldots&a_{nn}\end{bmatrix}, \mathbf{L} = \begin{bmatrix}l_{11}&l_{12}&\ldots&l_{1n}\\l_{21}&\ddots& &l_{2n}\\\vdots& &\ddots& \vdots\\l_{n1}&\ldots&\ldots&l_{nn}\end{bmatrix}, \end{equation}
Steps
1. For $k=1,2,\ldots n$ do Steps 2-3
2. $\phantom{--}$ Set $l_{kk} = \sqrt{a_{kk}-\sum_{m=1}^{k-1}l_{km}^2}$
3. $\phantom{--}$ For $i=k+1,k+2,\ldots, n$ do Step 4.
4. $$\phantom{----} l_{ik}=\left.\left(a_{ik}-\sum_{m=1}^{k-1}l_{im}l_{mk}\right) \middle / l_{kk}\right.$$
def cholesky(A):
    
    n = A.shape[0]
    
    L = np.zeros((n, n), dtype=np.double)
    
    for k in range(n):
        
        L[k, k] = np.sqrt(A[k, k] - np.sum(L[k, :] ** 2))
        
        L[(k+1):, k] = (A[(k+1):, k] - L[(k+1):, :] @ L[:, k]) / L[k, k]
    
    return L

Testing the implementation

A = np.array([[2, -1, 0],[-1, 2, -1.], [0, -1, 2.]])
L = cholesky(A)
display(Latex('$\mathbf L =\phantom${}'.format(L)))
display(Latex('$\mathbf A = \mathbf  =\phantom${}'.format(L @ L.T)))
$\mathbf L =\phantom{.}$[[ 1.41421356 0. 0. ] [-0.70710678 1.22474487 0. ] [ 0. -0.81649658 1.15470054]]
$\mathbf A = \mathbf {L L^\intercal} =\phantom{.}$[[ 2. -1. 0.] [-1. 2. -1.] [ 0. -1. 2.]]

Testing with the built in numpy.linalg.cholesky function

L = np.linalg.cholesky(A)
display(Latex('$\mathbf L =\phantom${}'.format(L)))
$\mathbf L =\phantom{.}$[[ 1.41421356 0. 0. ] [-0.70710678 1.22474487 0. ] [ 0. -0.81649658 1.15470054]]

Solving for the inverse of $\mathbf A$ with the $\mathbf{LU}$ decomposition

Once the $\mathbf{LU}$ decomposition of $\mathbf{A}$ is complete it is straightforward to find the inverse of $\mathbf{A}$, using the same forward and backward substitution process we used when solving for an arbitrary right hand side vector $\vec{b}$. Except we will do the procedure $n$ times, where $n$ is the number of columns of $\mathbf{A}$ and

$$\vec{b} = \left[\vec{b}_1, \vec{b}_2, \ldots, \vec{b}_n\right] = \mathbf{I}$$

are the columns of the identity matrix. Stated differently, we will use the columns of the identity matrix as individual right-hand side vectors, $\vec{b}$, in order to solve for the inverse column-by-column.

Pseudocode for inverse solve after $\mathbf{LU}$ decomposition

Steps
1. Set $\vec{b} = \mathbf{I}$ with dimension $n \times n$
2. For $k = 1, 2, \ldots, n$ do Steps 3-4, 6-7
3. $\phantom{--}$ Set $y_{1k}=\frac{b_{1k}}{l_{11}}$
4. $\phantom{--}$ For $i = 2, 3, \ldots n$ do Step 5.
5. $$\phantom{----} y_{ik} = \left.\left(b_{ik}-\sum\nolimits_{j=1}^{i-1}l_{ij}y_{jk} \right) \middle/ l_{ii} \right.$$
6. $\phantom{--}$ Set $x_{nk} = \frac{y_{nk}}{u_{nn}}$
7. $\phantom{--}$ For $i = n-1, n-2, \ldots, 1$ do Step 8
8. $$\phantom{----} x_{ik}= \left.\left(y_{ik}-\sum_{j=i+1}^{n}u_{ij}x_{jk}\right) \middle/ u_{ii}\right.$$
def plu_inverse(A):
    
    n = A.shape[0]
    
    b = np.eye(n)
    Ainv = np.zeros((n, n))
    
    P, L, U = plu(A)
    
    for i in range(n):
        
        y = forward_substitution(L, np.dot(P, b[i, :]))
        
        Ainv[i, :] = back_substitution(U, y)
        
    return Ainv

Testing the plu_inverse implementation

plu_inverse(A)
array([[0.75, 0.5 , 0.25],
       [0.5 , 1.  , 0.5 ],
       [0.25, 0.5 , 0.75]])

and comparing with the built in NumPy matrix inverse function np.linalg.inv

np.linalg.inv(A)
array([[0.75, 0.5 , 0.25],
       [0.5 , 1.  , 0.5 ],
       [0.25, 0.5 , 0.75]])

Determinant of a Matrix

You might recall from linear algebra that there are several ways of computing the determinant of a matrix (e.g. Leibniz formula, Laplace formula, Cramer's rule, etc.); however, none of these are as computationally efficient as using the $\mathbf{LU}$ decomposition and a few properties of determinants to solve for the determinate of a matrix $\mathbf{A}$. Recall,

For $\mathbf{A = LU}, \implies \det(\mathbf{A}) = \det(\mathbf{L})\det(\mathbf{U})$

also for an upper (or lower) triangular matrix. The determinate of the matrix is simply the product of the diagonal entries. Therefore, if we solve for $\mathbf{L}$ and $\mathbf{U}$ using the Doolittle method, where there are 1's on the diagonal of the $\mathbf{L}$ matrix, then the determinate of $\mathbf{L}$ is 1. Thus,

$$\det(\mathbf{A}) = 1 \cdot \det(\mathbf{U}) = \prod_{j=1}^{n}u_{jj}$$

Similarly, for a $\mathbf{PLU}$ decomposition,

$$\det(\mathbf{A}) = \det(\mathbf{P})\det(\mathbf{L})\det(\mathbf{U}) = \det(\mathbf{P})\cdot\prod_{j=1}^{n}u_{jj}$$

but, $\mathbf{P}$ is just a permutation of the identity matrix. Let's observe what happens when we do five semi-random row permutations of the identity matrix and calculate the determinate of $\mathbf{P}$ each time.

P = np.eye(5)
P[[0, 1]] = P[[1, 0]]
print(np.linalg.det(P))

P[[1, 2]] = P[[2, 1]]
print(np.linalg.det(P))

P[[0, 4]] = P[[4, 0]]
print(np.linalg.det(P))

P[[1, 3]] = P[[3, 1]]
print(np.linalg.det(P))

P[[4, 1]] = P[[1, 4]]
print(np.linalg.det(P))
-1.0
1.0
-1.0
1.0
-1.0

Therefore all we need to do is keep track of the number of row permutations and the $\det(\mathbf{P})$ will have the following properties

$$ \det(P) = \begin{cases} 1, & \textrm{for even number of permutations} \\ -1,& \textrm{for odd number of permutations} \end{cases} $$

Python/NumPy implementation of $\mathbf{PLU}$ determinate

First we modify our plu function from before to only return $\mathbf{U}$ and the number of permutations of $\mathbf{P}$.

def plu_for_det(A):
    
    #Get the number of rows
    n = A.shape[0]
    
    U = A.copy()
    L = np.eye(n, dtype=np.double)
    P = np.eye(n, dtype=np.double)
    
    number_of_permutations = 0
    
    #Loop over rows
    for i in range(n):
        
        for k in range(i, n): 
            if ~np.isclose(U[i, i], 0.0):
                break
            U[[k, k+1]] = U[[k+1, k]]
            P[[k, k+1]] = P[[k+1, k]]
            number_of_permutations += 1
            
        #Eliminate entries below i with row operations on U and
        #reverse the row operations to manipulate L
        factor = U[i+1:, i] / U[i, i]
        L[i+1:, i] = factor
        U[i+1:] -= factor[:, np.newaxis] * U[i]
        
    return U, number_of_permutations

Now we can easily implement the determinate calculation

def plu_det(A):
    
    U, number_of_permutations = plu_for_det(A)
    
    if number_of_permutations % 2 == 0:
        return np.diagonal(U).prod()
    else:
        return -np.diagonal(U).prod()

Checking our implementation

plu_det(A)
4.0

Comparing against np.linalg.det

np.linalg.det(A)
4.0