PCA Cheat Sheet | Techma Zone Study Lab

Key Formulas

Covariance Matrix:

C = \frac{1}{n} X^T X

Eigenvalue Equation:

C\mathbf{v} = \lambda \mathbf{v}

Projection:

Z = X_{\text{centered}} \cdot W_k

Variance Maximization:

\max_{\mathbf{w}} \mathbf{w}^T C \mathbf{w} \quad \text{s.t.} \quad \mathbf{w}^T \mathbf{w} = 1

Explained Variance Ratio:

\frac{\lambda_i}{\sum_{j=1}^{d} \lambda_j}

Reconstruction:

\hat{X} = Z W_k^T + \bar{X}

Algorithm Steps

Center the data: Subtract the mean of each feature. Standardize if features are on different scales.
Compute covariance matrix: $C = \frac{1}{n} X_c^T X_c$ where $X_c$ is centered data.
Eigendecomposition: Solve $C\mathbf{v} = \lambda \mathbf{v}$ for all eigenvalues and eigenvectors.
Sort by eigenvalue: Order eigenvectors by decreasing eigenvalue: $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_d$.
Select top $k$ components: Choose $k$ based on scree plot or cumulative variance threshold.
Project: $Z = X_c \cdot W_k$ where $W_k$ contains the top $k$ eigenvectors.

In practice, sklearn uses SVD instead of explicit eigendecomposition for better numerical stability.

Choosing Components

Cumulative Variance (95%):

k = \min\left\{k : \frac{\sum_{i=1}^{k}\lambda_i}{\sum_{i=1}^{d}\lambda_i} \geq 0.95\right\}

Kaiser Criterion:

Keep components where $\lambda_i > 1$ (using correlation matrix)

Scree Plot:

Look for the "elbow" where eigenvalues transition from steep to flat

In sklearn: PCA(n_components=0.95) auto-selects $k$ for 95% variance.

SVD Connection

SVD Decomposition:

X = U \Sigma V^T

Principal Directions:

Columns of $V$ = eigenvectors of $C = \frac{1}{n}X^TX$

Eigenvalue Relation:

\lambda_i = \frac{\sigma_i^2}{n}

Projected Data:

Z = XV = U\Sigma

SVD avoids computing $X^TX$ explicitly, which is more numerically stable and efficient. Sklearn uses SVD internally.

Kernel PCA

Kernel Matrix:

K_{ij} = K(\mathbf{x}_i, \mathbf{x}_j) = \phi(\mathbf{x}_i) \cdot \phi(\mathbf{x}_j)

RBF Kernel:

K(\mathbf{x}, \mathbf{x}') = \exp(-\gamma\|\mathbf{x} - \mathbf{x}'\|^2)

Polynomial Kernel:

K(\mathbf{x}, \mathbf{x}') = (\mathbf{x} \cdot \mathbf{x}' + r)^d

Kernel PCA handles nonlinear relationships by implicitly mapping to higher-dimensional spaces. Use when linear PCA fails to capture structure (e.g., concentric circles).

Assumptions & Limitations

Assumptions:

Principal components are linear combinations of original features
Directions of maximum variance are the most important
Components are mutually orthogonal (uncorrelated)
Data should be centered (mean-subtracted) before applying PCA

Limitations:

Cannot capture nonlinear relationships (use kernel PCA, t-SNE, UMAP)
Highly sensitive to feature scaling -- always standardize first
Components lose interpretability (mixtures of all original features)
Variance may not equal importance for classification tasks (consider LDA)
Sensitive to outliers that can distort the covariance matrix

Common Pitfalls

Forgetting to scale/standardize features before PCA (most common mistake)
Treating principal components as if they have interpretable meaning
Fitting PCA on the entire dataset before train/test split (data leakage)
Using PCA when features are already independent and uncorrelated
Keeping too few components and losing critical information
Using PCA for classification without checking if it preserves class separation
Applying standard PCA to nonlinear data structures (use kernel PCA)
Forgetting to apply the same PCA transformation to test data

Interview Quick-Fire

Q: What is PCA?

A: PCA is an unsupervised dimensionality reduction technique that projects data onto orthogonal directions (principal components) that capture maximum variance, using the eigenvectors of the covariance matrix.

Q: What does PCA maximize?

A: PCA maximizes the variance of the projected data. Equivalently, it minimizes the reconstruction error (the information lost when projecting onto fewer dimensions).

Q: Why is scaling important for PCA?

A: PCA is based on variance. Without scaling, features with larger numerical ranges dominate the principal components, regardless of their actual importance. Standardization ensures equal contribution from all features.

Q: How do you choose the number of components?

A: Use the scree plot (look for the elbow), cumulative variance threshold (typically 95%), or Kaiser criterion (eigenvalue > 1 with correlation matrix). In sklearn, use PCA(n_components=0.95).

Q: PCA vs LDA -- what's the difference?

A: PCA is unsupervised and maximizes total variance. LDA is supervised and maximizes class separability. PCA ignores class labels; LDA uses them. Use PCA for general reduction; LDA when classification performance matters.

Q: What is the relationship between PCA and SVD?

A: SVD decomposes $X = U\Sigma V^T$. The columns of $V$ are the principal directions, and $\lambda_i = \sigma_i^2/n$. SVD computes PCA without forming $X^TX$, which is more stable and efficient.

Q: When should you use kernel PCA?

A: When the data has nonlinear structure that standard PCA cannot capture (e.g., concentric circles, Swiss roll). Kernel PCA maps data to a higher-dimensional space via a kernel function before extracting principal components.

Q: Can PCA help with multicollinearity?

A: Yes. Principal components are orthogonal and uncorrelated by construction. Using PCA-transformed features in regression eliminates multicollinearity problems. However, you lose interpretability of the original features.

PCA
Cheat Sheet

Key Formulas

Algorithm Steps

Choosing Components

SVD Connection

Kernel PCA

Assumptions & Limitations

Assumptions:

Limitations:

Common Pitfalls

Interview Quick-Fire

Q: What is PCA?

Q: What does PCA maximize?

Q: Why is scaling important for PCA?

Q: How do you choose the number of components?

Q: PCA vs LDA -- what's the difference?

Q: What is the relationship between PCA and SVD?

Q: When should you use kernel PCA?

Q: Can PCA help with multicollinearity?

Continue Your Journey

PCA Guide

PCA Quiz

Interview Prep

PCACheat Sheet

Key Formulas

Algorithm Steps

Choosing Components

SVD Connection

Kernel PCA

Assumptions & Limitations

Assumptions:

Limitations:

Common Pitfalls

Interview Quick-Fire

Q: What is PCA?

Q: What does PCA maximize?

Q: Why is scaling important for PCA?

Q: How do you choose the number of components?

Q: PCA vs LDA -- what's the difference?

Q: What is the relationship between PCA and SVD?

Q: When should you use kernel PCA?

Q: Can PCA help with multicollinearity?

Continue Your Journey

PCA Guide

PCA Quiz

Interview Prep

PCA
Cheat Sheet