KNN Cheat Sheet | Techma Zone Study Lab

Distance Formulas

Euclidean (L2):

d(\mathbf{x}, \mathbf{y}) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2}

Manhattan (L1):

d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^{n} |x_i - y_i|

Minkowski (Lp):

d(\mathbf{x}, \mathbf{y}) = \left(\sum_{i=1}^{n} |x_i - y_i|^p\right)^{1/p}

Cosine Distance:

d(\mathbf{x}, \mathbf{y}) = 1 - \frac{\mathbf{x} \cdot \mathbf{y}}{\|\mathbf{x}\| \cdot \|\mathbf{y}\|}

Euclidean is the default. Manhattan is robust to outliers. Minkowski generalizes both ($p=1$ Manhattan, $p=2$ Euclidean). Cosine ignores magnitude.

Choosing K

Rule of Thumb:

K \approx \sqrt{n}

Use odd K for binary classification to avoid ties
Small $K$: low bias, high variance (overfitting)
Large $K$: high bias, low variance (underfitting)
$K = 1$: Voronoi tessellation decision boundary

Start with $K = \sqrt{n}$ where $n$ is the training set size
Use cross-validation to test a range of $K$ values
Plot accuracy vs. $K$ and look for the elbow point
Prefer the simplest model (larger $K$) at similar performance

Classification

Majority Voting:

\hat{y} = \arg\max_c \sum_{i=1}^{K} \mathbb{1}(y_i = c)

Weighted Voting:

\hat{y} = \arg\max_c \sum_{i=1}^{K} w_i \cdot \mathbb{1}(y_i = c)

Weight (inverse dist.):

w_i = \frac{1}{d(\mathbf{x}_q, \mathbf{x}_i) + \epsilon}

Majority voting treats all $K$ neighbors equally. Weighted voting gives closer neighbors more influence. Add $\epsilon$ to avoid division by zero.

Regression

Mean of Neighbors:

\hat{y} = \frac{1}{K} \sum_{i=1}^{K} y_i

Weighted Mean:

\hat{y} = \frac{\sum_{i=1}^{K} w_i \cdot y_i}{\sum_{i=1}^{K} w_i}

Inverse Distance Weight:

w_i = \frac{1}{d(\mathbf{x}_q, \mathbf{x}_i)^2}

For regression, KNN averages the target values of the $K$ closest neighbors. Inverse-distance weighting often improves accuracy by reducing the effect of distant neighbors.

Feature Scaling

Min-Max Normalization:

x' = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

Scales to:

x' \in [0, 1]

Z-Score Standardization:

z = \frac{x - \mu}{\sigma}

Result:

\mu = 0, \quad \sigma = 1

Feature scaling is critical for KNN. Without it, features with larger ranges dominate the distance calculation. Always fit on training data, then transform test data using the same parameters.

Curse of Dimensionality

Volume of Hypersphere:

V_d(r) = \frac{\pi^{d/2}}{\Gamma(d/2 + 1)} r^d

Key Ratio:

\frac{d_{\max} - d_{\min}}{d_{\min}} \to 0 \quad \text{as } d \to \infty

In high dimensions, all points become nearly equidistant
Distance metrics lose discriminative power
Need exponentially more data: $n \propto e^d$
Most volume of a hypersphere concentrates near the surface

Solutions:

PCA or t-SNE for dimensionality reduction
Feature selection to remove irrelevant features
Use Manhattan distance (more robust in high-$d$)

Efficient Search

Brute Force:

$O(n \cdot d)$ per query

KD-Tree:

$O(d \cdot \log n)$ average per query

KD-Tree Build:

O(n \cdot d \cdot \log n)

Ball Tree:

$O(d \cdot \log n)$ average per query

Brute Force: best for small $n$ or high $d$ (above ~20)
KD-Tree: partitions space by axis-aligned splits; degrades in high dimensions
Ball Tree: uses hypersphere partitions; better than KD-Tree for $d > 20$
LSH (Locality-Sensitive Hashing): approximate NN, $O(1)$ amortized

Advantages & Limitations

Advantages:

No training phase (lazy learner) - $O(1)$ training
Naturally handles multi-class classification
Simple, intuitive, and easy to interpret
Non-parametric: makes no assumptions on data distribution
Adapts to any decision boundary shape

Limitations:

Slow prediction: $O(n \cdot d)$ per query (brute force)
High memory usage - stores entire training set
Sensitive to irrelevant and redundant features
Struggles with imbalanced datasets
Performance degrades in high dimensions

Common Mistakes

Not scaling features before computing distances
Choosing $K$ without cross-validation
Using even $K$ for binary classification (causes ties)
Ignoring the curse of dimensionality with many features
Not removing irrelevant or noisy features
Using Euclidean distance on categorical variables
Applying KNN to very large datasets without approximate NN
Forgetting to use the same scaler on train and test data

Interview Quick-Fire

Q: Why is KNN called a "lazy learner"?

A: Because it performs no training. It simply stores the dataset and defers all computation to prediction time, unlike eager learners (e.g., SVM, logistic regression) that build a model during training.

Q: How does KNN handle ties in classification?

A: Common strategies include using an odd $K$, applying distance-weighted voting so closer neighbors break the tie, or randomly selecting among the tied classes.

Q: What is the time complexity of KNN prediction?

A: Brute force is $O(n \cdot d)$ per query. KD-Trees and Ball Trees reduce this to $O(d \cdot \log n)$ on average. LSH provides approximate $O(1)$ amortized lookup.

Q: Why is feature scaling essential for KNN?

A: KNN relies on distance. If features have different scales, those with larger ranges dominate the distance. Scaling ensures all features contribute equally.

Q: KNN vs. Naive Bayes - when to choose which?

A: Choose Naive Bayes for high-dimensional sparse data (text), fast training, and when features are roughly independent. Choose KNN for low-dimensional data with complex, non-linear boundaries.

Q: How does the curse of dimensionality affect KNN?

A: In high dimensions, distances between all points converge, making nearest neighbors no more meaningful than random points. This degrades KNN's performance significantly.

Q: Is KNN parametric or non-parametric?

A: Non-parametric. It makes no assumptions about the underlying data distribution and the model complexity grows with the training set size.

Q: How do you handle imbalanced classes in KNN?

A: Use distance-weighted voting so closer neighbors have more influence, oversample the minority class, undersample the majority class, or use algorithms like SMOTE before applying KNN.

KNN
Cheat Sheet

Distance Formulas

Choosing K

Classification

Regression

Feature Scaling

Curse of Dimensionality

Solutions:

Efficient Search

Advantages & Limitations

Advantages:

Limitations:

Common Mistakes

Interview Quick-Fire

Q: Why is KNN called a "lazy learner"?

Q: How does KNN handle ties in classification?

Q: What is the time complexity of KNN prediction?

Q: Why is feature scaling essential for KNN?

Q: KNN vs. Naive Bayes - when to choose which?

Q: How does the curse of dimensionality affect KNN?

Q: Is KNN parametric or non-parametric?

Q: How do you handle imbalanced classes in KNN?

Continue Your Journey

KNN Guide

Quiz

Interview Prep

KNNCheat Sheet

Distance Formulas

Choosing K

Classification

Regression

Feature Scaling

Curse of Dimensionality

Solutions:

Efficient Search

Advantages & Limitations

Advantages:

Limitations:

Common Mistakes

Interview Quick-Fire

Q: Why is KNN called a "lazy learner"?

Q: How does KNN handle ties in classification?

Q: What is the time complexity of KNN prediction?

Q: Why is feature scaling essential for KNN?

Q: KNN vs. Naive Bayes - when to choose which?

Q: How does the curse of dimensionality affect KNN?

Q: Is KNN parametric or non-parametric?

Q: How do you handle imbalanced classes in KNN?

Continue Your Journey

KNN Guide

Quiz

Interview Prep

KNN
Cheat Sheet