Algorithm flow:
- Decentralization (X = X - mean)
- Get covariance matrix (cov = np.dot(X.T, X))
- SVD (U, S, V = np.linalg.svd(cov))
The first several colomns of U is the Dimensionality reduction matrix.
- Auto-determin k: Try to find the smallest k which makes sum(S[0:k])/sum(S) < threshold. Threshold usually equals 0.95 or 0.99.
- output data after reducing dimension. (return np.dot(X, U[:,:k]))
- To reconstruct data from the compressed data, just np.dot(compressed_data, U[:,:k].T).