Principal components analysis
From Wikinfo
In statistics, principal components analysis (PCA) is a technique that can be used to simplify a dataset; more formally it is a transform that chooses a new coordinate system for the data set, such, that that greatest variance by any projection of the data set comes to lie on the first axis (then called the first principal component), the second greatest variance on the second axis and so on. PCA can be used for reducing dimensionality in a dataset while retaining those characteristics of the dataset that contribute most to its variance by eliminating the later principal components (by a more or less heuristic decision). These characteristics may be the 'most important', but this is not necessarily the case, depending on the application.
PCA is also called the Karhunen-Lo�ve transform or the Hotelling transform. PCA has the speciality of being the optimal linear transform for keeping the subspace that has largest variance. However this comes at the price of greater computational requirement, e.g. if compared to the discrete cosine transform. Unlike other linear transforms, the PCA does not have a fixed set of basis vectors. Its basis vectors depend on the data set.
The principal component w1 of a dataset x can be defined as (assuming zero mean, i.e. E(x)=0)
This article needs some serious revamping, to say the least. One cannot assume without loss of generality that the expectation is zero. If the expectation were observable, one could subtract it from x and get something with zero expectation, and so no generality would be lost by this assumption. In practice the expecation is never observable, and one must consider the probability distribution of the difference between x and an estimate, based on data, of the expectation of x.
- <math>\mathbf{w}_1
= \arg\max_{\Vert \mathbf{w} \Vert = 1} E\left\{ \left( \mathbf{w}^T \mathbf{x}\right)^2 \right\}</math>
(See arg max for the notation.) With the first <math>k - 1</math> components, the <math>k</math>-th component can be found by subtracting the first <math>k - 1</math> principal components from x:
- <math>\mathbf{\hat{x}}_{k - 1}
= \mathbf{x} -
\sum_{i = 1}^{k - 1}
\mathbf{w}_i \mathbf{w}_i^T \mathbf{x}</math>
and by substituting this as the new dataset to find a principal component in:
- <math>\mathbf{w}_k
= \arg\max_{\Vert \mathbf{w} \Vert = 1} E\left\{
\left( \mathbf{w}^T \mathbf{\hat{x}}_{k - 1}
\right)^2 \right\}.</math>
A simpler way to calculate the components wi uses the covariance matrix of x, the measurement vector. By finding the eigenvalues and eigenvectors of the covariance matrix, we find that the eigenvectors with the largest eigenvalues correspond to the dimensions that have the strongest correlation in the dataset. The original measurements are finally projected onto the reduced vector space.
Related (or even more similar than related?) is the calculus of empirical orthogonal functions (EOF).
Another method of dimension reduction is a self-organizing map.
PCA is a popular technique in pattern recognition. However, PCA is not optimized for class separability. An alternative is the linear discriminant analysis, which does take this into account. PCA optimally minimizes reconstruction error under the L2 norm, assuming Gaussian distributed data.
Contents |
Pseudocode
Psuedocode for PCA using the covariance method. Suppose you have n data vectors of d dimensions each. You want to project your data into a k dimensional subspace.
Find the basis vectors
- Organize your data into column vectors, so you end up with a <math>d \times n</math> matrix, D.
- Find the mean along each dimension, so you end up with a <math>d \times 1</math> mean vector, M.
- Subtract the mean vector M from each column of the data matrix D. Store mean subtracted data matrix in S.
- Find the covariance matrix C of S. <math>C = S \cdot S^T</math>.
- Compute and sort by decreasing eigenvalue, the eigenvectors V of C.
- Save the mean vector M. Save the first k columns of V as P. P will have dimension <math>d \times k</math>. <math> 1 \leq k \leq d</math>
Projecting new data
Suppose you have a d�1 data vector D. Then the k�1 projected vector is v = PT(D − M).
Derivation of PCA using the covariance method
Let X be a d-dimensional random vector expressed as column vector. Without loss of generality, assume X has zero mean.
Excuse me, but that is absurd. If the mean were observable, then one could simply subtract the mean from X, getting something with zero mean, and then indeed no generality would be lost by assuming that. In practice, one must use a data-based and therefore uncertain estimate of the mean, and one must therefore consider the probability distribution of the difference between X and the estimate of the mean of X.
We want to find a <math>d \times d</math> orthonormal projection matrix P such that
<math>Y = P^\top X</math>
with the constraint that
<math>\operatorname{cov}(Y)</math> is a diagonal matrix and <math>P^{-1} = P^\top</math>.
By substitution, and matrix algebra, we get
<math> \begin{matrix} \operatorname{cov}(Y) &=& \operatorname{E}[YY^\top]\\ \ &=& \operatorname{E}[(P^\top X) (P^\top X)^\top]\\ \ &=& \operatorname{E}[(P^\top X) (X^\top P)]\\ \ &=& P^\top \operatorname{E}[X X^\top] P\\ \ &=& P^\top \operatorname{cov}(X) P \end{matrix} </math>
We now have
<math> \begin{matrix} P\operatorname{cov}(Y) &=& P P^\top \operatorname{cov}(X) P\\ \ &=& \operatorname{cov}(X) P\\ \end{matrix} </math>
Rewrite P as d <math>d \times 1</math> column vectors, so
<math>P = [P_1, P_2, \ldots, P_d]</math>
and <math>\operatorname{cov}(Y)</math> as
<math> \begin{bmatrix} \lambda_1 & 0 & 0 \\ 0 & \ddots & 0 \\ 0 & 0 & \lambda_d \end{bmatrix} </math>
Substituting into equation above, we get
<math>[\lambda_1 P_1, \lambda_2 P_2, \ldots, \lambda_d P_d] = [\operatorname{cov}(X)P_1, \operatorname{cov}(X)P_2, \ldots, \operatorname{cov}(X)P_d]</math>
Notice that in <math>\lambda_i P_i = \operatorname{cov}(X)P_i</math>, Pi is an eigenvector of Xs covariance matrix. Therefore, by finding the eigenvectors of Xs convariance matrix, we find a projection matrix P that satisfies the original constraints.
See also:
References
- Adapted from the Wikipedia article, "Principal_components_analysis" http://en.wikipedia.org/wiki/Principal_components_analysis, used under the GNU Free Documentation License

