Computation of Eigenvalues and Eigenvectors

For a small matrix $A$, we usually compute its eigenvalues through $det(A-\lambda I)$. But in reality, this computation is too expensive for most matrices. And in many cases, we only need the largest and/or smallest eigenvalues (or just a few eigenvalues) so it might be overkill to always calculate all eigenvalues.

*Singular value decomposition is linked to eigenvalues and eigenvectors, so methods for eigenvalues and eigenvectors are also methods for SVD.

Iterative methods

There are iterative methods for linear equation systems, nonlinear equations, eigenvalues / eigenvectors, optimization. And for eigenvalues/eigenvectors the only option in practice are iterative methods:

A process that generates a sequence of approximations: $v^{(1)}, v^{(2)}, v^{(3)}, \cdots$
If the sequence converges the values gets closer and closer to the real value (otherwise it diverges)
Each step is based on the values in previous step
Typically an initial value (guess) is needed to start the process
The iterations are stopped when the values are good enough

The power method

The steps of the power method are shown as below:

Start with $v_{0}=(1, 1, \cdots)^{\top}$, $v_{t+1}=Av_{t}$ and normalize $v_{t+1}$
Continue the iteration of (1) until the elements in $v$ don't change anymore
Normalize $v_u$ we find above with 2-norm to get the eigenvector, and compute the corresponding largest eigenvalue $\lambda = \frac{v_{u}^{\top}(Av_u)}{v_{u}^{\top}v_u}=v_{u}^{\top}Av_u$ (Note, if $v$ is normalized, $v^{\top}v=1$)

The mathematics theory behind this method:

Initial a guess $x^{(0)}$, if the matrix $A$ is diagonizable, then it has $n$ linearly independent eigenvectors: $v_{1}, v_{2}, \cdots, v_{n}$
$x^{(0)}$ can be expressed as $x^{(0)}=c_{1}v_{1}+c_{2}v_{2}+ \cdots +c_{n}v_{n}$
Multiply by $A\Rightarrow x^{(0)} = A x^{(0)} =A\left[c_{1} v_{1}+c_{2} v_{2}+\cdots+c_{n} V_{n}\right]$
$\quad \quad \quad \quad \quad \quad \quad \quad \enspace=c_{1} A v_{1}+c_{2} A v_{2}+\cdots+c_{n} A v_{n}$
$\quad \quad \quad \quad \quad \quad \quad \quad \enspace=c_{1} \lambda_{1} v_{1}+c_{2} \lambda_{2} v_{2}+\cdots+c_{n} \lambda _{n} V_{n}$
Normalize the new $x^{(0)} = \frac{x^{(0)}}{\parallel x^{(0)} \parallel _2}$
Multiply by $A$ again $\Rightarrow x^{(0)} = AA x^{(0)} =c_{1} \lambda_{1} A v_{1}+c_{2} \lambda_{2} A v_{2}+\cdots+c_{n} \lambda _{n} A V_{n}$
$\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad =c_{1} {\lambda_{1}}^{2} v_{1}+c_{2} {\lambda_{2}}^{2} v_{2}+\cdots+c_{n} {\lambda _{n}}^{2} V_{n}$
$\quad \quad \quad \quad \quad \quad \quad \quad \Rightarrow x^{(0)} = A^{k}x^{(0)}=c_{1} {\lambda_{1}}^{k} v_{1}+c_{2} {\lambda_{2}}^{k} v_{2}+\cdots+c_{n} {\lambda _{n}}^{k} V_{n}$
Normalize the new $x^{(0)} again
Assume $\lambda _{1}$ is the largest in magnitude
$A^{k}x^{(0)}={\lambda_{1}}^{k}(c_{1}v_{1}+c_{2}(\frac{\lambda _{2}}{\lambda _{1}})^{k}v_{2}+ \cdots + c_{n}(\frac{\lambda _{n}}{\lambda _{1}})^{k} v_{n})$
$\qquad \quad \approx{\lambda _{1}}^{k}c_{1}v_{1}$, $\quad$ when $k\rightarrow \infty$
Then we get a multiple of the dominant eigenvector.

The convegence rate of the power method depends on the ratio of $\lambda _{n}$ and $\lambda _{1}$ ($n=2,3,\cdots$). The convegence would be faster as $\frac{\lambda _{n}}{\lambda _{1}}$ to be smaller.

To find the smallest eigenvalue and corresponding eigenvector, we can perform the same iteration on $A^{-1}$, called inverse power iteration.

By using shift $\mu$ and use the power method on the eigenvalue problem, we can find other eigenvalues than the largest: $(A-\mu I)v = (\lambda - \mu)v$ In the case of $\lambda _{1} = \mu$, the smallest eigenvalue becomes the largest in the shifted problem, and the Power method will find the smallest eigenvalue (to the initial problem). In principle we can find all eigenvalues by using shifts, but others methods are used in practice.

The QR-method

The QR-method (or QR-iteration) is a method that can be used to find all eigenvalues at the same time. The idea is to in some way find transformations similar to $A = QΛQ^{\top}$ and then read off the eigenvalues on the diagonal of $Λ$.

For the matrices that are not diagonalizable, we use a different similarity transform. If $A = VBV^{\top}$ for some nonsingular matrix $V$, then $A$ and $B$ are similar, so they have the same eigenvalues. If $B$ happens to be triangular, then we can read off the eigenvalues of $A$ from the diagonal of $B$.

Schur decomposition (Schur form): If matrix $A$ real with real eigenvalues then there exist orthogonal matrix $Q$ such that $A = QSQ^{-1} = QSQ^{\top}$, $S$ real and upper triangular. $A$ and $S$ are similar, so $A$’s eigenvalues on the diagonal of $S$. If $A$ real with complex eigenvalues then there exist orthogonal matrix $Q$ such that $𝐴 = Q \tilde{S} Q^{-1} = Q \tilde{S} Q^{T}$, $\tilde{S}$ is quasi upper triangular.

Start with $A$ doing QR-factorization: $A=QR$, then $A^{(1)}=RQ=Q^{\top}AQ$
Next step: $A^{(1)}=Q^{(1)}R$, form $A^{(2)}=RQ^{(1)}={Q^{(1)}}^{\top}A^{(1)}Q^{(1)}$, and so forth

If we look at $A^{(3)}={Q^{(2)}}^{\top}{Q^{(1)}}^{\top}Q^{\top}AQQ^{(1)}Q^{(2)}$, $\tilde{Q}$ will converge to $V$ in the schur form contains eigenvectors, $A^{(k)}$ will converge to $S$ in schur form (triangular and eigenvalues on diagonal).

Mathematics

Linear Algebra

All articles in this blog adopt the CC BY-SA 4.0 agreement except for special statements. Please indicate the source for reprinting!

Linear Regression Previous

Principal Component Analysis (PCA) Next