PCA
Cheat Sheet
Everything you need on one page. Perfect for revision, interviews, and quick reference.
Everything you need on one page. Perfect for revision, interviews, and quick reference.
In practice, sklearn uses SVD instead of explicit eigendecomposition for better numerical stability.
In sklearn: PCA(n_components=0.95) auto-selects $k$ for 95% variance.
SVD avoids computing $X^TX$ explicitly, which is more numerically stable and efficient. Sklearn uses SVD internally.
Kernel PCA handles nonlinear relationships by implicitly mapping to higher-dimensional spaces. Use when linear PCA fails to capture structure (e.g., concentric circles).
A: PCA is an unsupervised dimensionality reduction technique that projects data onto orthogonal directions (principal components) that capture maximum variance, using the eigenvectors of the covariance matrix.
A: PCA maximizes the variance of the projected data. Equivalently, it minimizes the reconstruction error (the information lost when projecting onto fewer dimensions).
A: PCA is based on variance. Without scaling, features with larger numerical ranges dominate the principal components, regardless of their actual importance. Standardization ensures equal contribution from all features.
A: Use the scree plot (look for the elbow), cumulative variance threshold (typically 95%), or Kaiser criterion (eigenvalue > 1 with correlation matrix). In sklearn, use PCA(n_components=0.95).
A: PCA is unsupervised and maximizes total variance. LDA is supervised and maximizes class separability. PCA ignores class labels; LDA uses them. Use PCA for general reduction; LDA when classification performance matters.
A: SVD decomposes $X = U\Sigma V^T$. The columns of $V$ are the principal directions, and $\lambda_i = \sigma_i^2/n$. SVD computes PCA without forming $X^TX$, which is more stable and efficient.
A: When the data has nonlinear structure that standard PCA cannot capture (e.g., concentric circles, Swiss roll). Kernel PCA maps data to a higher-dimensional space via a kernel function before extracting principal components.
A: Yes. Principal components are orthogonal and uncorrelated by construction. Using PCA-transformed features in regression eliminates multicollinearity problems. However, you lose interpretability of the original features.