Principal Component Analysis (PCA)

Principal Component Analysis is a technique for simplifying datasets. It is a linear transformation that transforms the data into a new coordinate system such that the first largest variance of any data projection is in the first coordinate (called the first principal component), and the second largest variance is in the second coordinate (the second principal component). ingredients), and so on. Principal component analysis is often used to reduce the dimensionality of a dataset while maintaining the features of the dataset that contribute the most to the variance. This is done by keeping low-order principal components and ignoring high-order principal components. Such lower-order components tend to retain the most important aspects of the data.

Covariance matrix

Algorithms like PCA depend heavily on the covariance. Correlation coefficient tells us how variables are related and it is the covariance normalized to range \([-1 \quad 1]\). The covariance matrix is an \(m*m\)-matrix (\(m\) is the number of variables) and it's symmetric as covariance between \(x_1\) and \(x_2\) equals covariance between \(x_2\) and \(x_1\). The diagonal entries are the variances (the covariance between \(x_1\) and \(x_1\) is the variance of \(x_1\)). There two methods for calculating the covariance matrix, and they're shown with the example below:

\(x_{1}=\left(\begin{array}{l}2 \\ 3 \\ 4\end{array}\right), \quad \bar{x}_{1}=3, \quad x_{2}=\left(\begin{array}{l}3 \\ 1 \\ 2\end{array}\right), \quad \bar{x}_{2}=2\)

sample \(n=\) number of elements in \(x_{n}=3, \quad\) variable \(m=\) number of \(x_{n}=2\)

Method 1:

\(\sigma^{2}\left(x_{1}\right)=\frac{1}{n-1} \sum_{j=1}^{m}\left(x_{1 j}-\bar{x}_{1}\right)^{2}=\frac{1}{2}\left((2-3)^{2}+(3-3)^{2}+(4-3)^{2}\right)=1\)

\(\sigma^{2}\left(x_{2}\right)=\frac{1}{n-1} \sum_{j=1}^{m}\left(x_{2 j}-\bar{x}_{2}\right)^{2}=\frac{1}{2}\left((3-2)^{2}+(1-2)^{2}+(2-2)^{2}\right)=1\)

\(\sigma^{2}\left(x_{1}, x_{2}\right)=\frac{1}{n-1} \sum_{j=1}^{m}\left(x_{1 j}-\bar{x}_{1}\right)\left(x_{2 j}-\bar{x}_{2}\right)=-\frac{1}{2}\)

\(\sigma^{2}\left(x_{2}, x_{1}\right)=\sigma^{2}\left(x_{1}, x_{2}\right)=-\frac{1}{2}\)

\(C\left(x_{2}, x_{1}\right)=\left(\begin{array}{ll}\sigma^{2}\left(x_{1}\right) & \sigma^{2}\left(x_{1}, x_{2}\right) \\ \sigma^{2}\left(x_{2}, x_{1}\right) & \sigma^{2}\left(x_{2}\right)\end{array}\right)=\left(\begin{array}{cc}1 & -1 / 2 \\ -1 / 2 & 1\end{array}\right)\)

Method 2:

Let \(\bar{A}=\left(\begin{array}{l} x_{1}^{\top} \\ x_{2}^{\top}\end{array}\right)=\left(\begin{array}{ccc}2 & 3 & 4 \\ 3 & 1 & 2 \end{array}\right)\)

Apply scaling on matrix \(\bar{A}\): \(A=\left(\begin{array}{l} x_{1}^{\top}-\bar{x}_{1} \\ x_{2}^{\top}-\bar{x}_{2}\end{array}\right)=\left(\begin{array}{ccc}-1 & 0 & 1 \\ 1 & -1 & 0 \end{array}\right)\)

\(C\left(x_{2}, x_{1}\right)=\frac{1}{n-1} A A^{\top}=\frac{1}{2}\left(\begin{array}{cc}2 & -1 \\ -1 & 2\end{array}\right)=\left(\begin{array}{cc}1 & -1 / 2 \\ -1 / 2 & 1\end{array}\right)\)

SVD and PCA

In SVD, \(A=U \Sigma V^{\top}\), then \(AV\) or \(U \Sigma\) represent data points's principal components. \(U=(u_{1}, \cdots, u_{n})\) are the left singular vectors of \(A\) (eigenvector of \(C\)) that represent the direction of the largest variance of the data, which can also be view principal directions. We can get eigenvalues of \(C\) from SVD of \(A\): \(\lambda _{i}=\frac{1}{n-1} \sigma_{i}^2\), which is also the magnitude of data points. Eigenvalues \(\lambda _{i}\) represent the fraction of the total spread (variance) in the \(u_i\)-direction.

Total variance = trace(\(C\)) = the sum of eigenvalues of \(C\) = the sum of diagonal elements of \(C\), and the number of each eigenvalue to be divided by total variance tells how many percents that each principal component explains the total variance. For example: there are 2 eigenvalues \(\lambda _{1}=28.9\), \(\lambda _{2}=0.1\), and trace(\(C\))=29, then \(\frac{\lambda _{1}}{trace(C)}= \frac{28.9}{29}=0.997\), so the first eigenvalue explains \(99\)% of the total variance.

PCA in Python

1
2
3
4
5
6
7
8
9
import numpy as np
Q = np.array([[5,5,0,4], [1,1,5,0], [3,2,0,4], [3,5,0,5], [0,0,4,0]])
A = Q-Q.mean(axis=0, keepdims=True)
ATA = np.dot(A.T, A)
eig1 = np.linalg.eig(ATA)
AAT = np.dot(A, A.T)
eig2 = np.linalg.eig(AAT)
PCA = np.dot(A, eig1[1])
print(PCA, '\n')

All articles in this blog adopt the CC BY-SA 4.0 agreement except for special statements. Please indicate the source for reprinting!