Random Forest Cheat Sheet | Techma Zone Study Lab

Key Formulas

Bootstrap Probability:

P(\text{not selected}) = \left(1 - \frac{1}{n}\right)^n \approx e^{-1} \approx 0.368

Ensemble (Classification):

\hat{y} = \text{mode}\left(\hat{y}_1, \hat{y}_2, \ldots, \hat{y}_B\right)

Ensemble (Regression):

\hat{y} = \frac{1}{B}\sum_{b=1}^{B} f_b(\mathbf{x})

Ensemble Variance:

\text{Var}(\bar{f}) = \rho\sigma^2 + \frac{1 - \rho}{B}\sigma^2

Feature Subset Size:

m = \lfloor\sqrt{p}\rfloor \text{ (classification)}, \quad m = \left\lfloor \frac{p}{3} \right\rfloor \text{ (regression)}

Bootstrap Aggregating (Bagging)

Sampling:

Draw $n$ samples with replacement from original dataset of size $n$

Unique Samples:

\approx 63.2\%\text{ unique samples per bootstrap}

OOB Samples:

\approx 36.8\%\text{ left out per tree (Out-of-Bag)}

Aggregation:

Majority vote (classification) or average (regression) across all $B$ trees

Each tree sees a different random subset of the training data. This diversity among trees is what reduces variance and prevents overfitting compared to a single decision tree.

Feature Randomness

Random Subset:

At each split, only $m$ random features out of $p$ total are considered

Decorrelation Effect:

\text{Var}(\bar{f}) = \rho\sigma^2 + \frac{1-\rho}{B}\sigma^2 \xrightarrow{\rho \to 0} \frac{\sigma^2}{B}

Standard Choices:

m = \lfloor\sqrt{p}\rfloor, \quad m = \lfloor\log_2(p)\rfloor, \quad m = \left\lfloor\frac{p}{3}\right\rfloor

Feature randomness decorrelates trees -- lowering $\rho$ in the variance formula. Even if one feature is highly predictive, not every tree will use it at the root, creating diverse tree structures.

Out-of-Bag (OOB) Error

OOB Prediction:

\hat{y}_i^{\text{OOB}} = \text{aggregate}\left(\{f_b(\mathbf{x}_i) : i \notin \mathcal{B}_b\}\right)

OOB Error:

\text{OOB Error} = \frac{1}{n}\sum_{i=1}^{n} L\left(y_i, \hat{y}_i^{\text{OOB}}\right)

Each sample $\mathbf{x}_i$ is predicted only by trees that did not include it in their bootstrap sample
OOB error approximates leave-one-out cross-validation
No need for a separate validation set -- built-in honest estimate
Enable in scikit-learn with oob_score=True

Feature Importance

Gini Importance:

\text{Imp}(X_j) = \sum_{\text{nodes using } X_j} \Delta \, G = G_{\text{parent}} - \sum_k \frac{n_k}{n_{\text{parent}}} G_k

Permutation Importance:

\text{Imp}(X_j) = \text{Score}_{\text{original}} - \text{Score}_{\text{permuted } X_j}

Gini Impurity:

G = 1 - \sum_{k=1}^{K} p_k^2

Gini importance (default in sklearn) is biased toward high-cardinality features. Permutation importance is more reliable and model-agnostic. SHAP values provide theoretically grounded, per-prediction importance.

Hyperparameters

n_estimators:

Number of trees. Typical range: 100--500. More trees = better but slower. Performance plateaus eventually.

max_depth:

Maximum tree depth. Default: None (fully grown). Limit to prevent overfitting on noisy data.

max_features:

Features per split: $\sqrt{p}$, $\log_2(p)$, or tune via CV. Controls tree correlation $\rho$.

min_samples_split:

Minimum samples to split a node. Default: 2. Increase for regularization.

min_samples_leaf:

Minimum samples in a leaf node. Default: 1. Increase to smooth predictions.

bootstrap:

Default: True . Set to False to use the entire dataset per tree (loses OOB capability).

n_jobs:

Set to -1 for full parallelism. Trees are independent and train in parallel.

Random Forest is remarkably robust to hyperparameters. Start with defaults, then tune n_estimators, max_features, and max_depth using cross-validation.

Pros vs Cons

Pros:

No feature scaling required -- tree-based splits are scale-invariant
Handles missing values and mixed feature types naturally
Built-in feature importance via Gini or permutation methods
Fully parallelizable -- each tree trains independently
Robust to outliers -- median/mode aggregation dampens extreme predictions
Rarely overfits with more trees -- adding trees reduces variance without increasing bias

Cons:

Less interpretable than a single decision tree -- hundreds of trees are hard to visualize
Memory intensive -- stores all trees in memory at prediction time
Slower prediction than a single tree -- must query all $B$ trees
Default Gini importance is biased for high-cardinality and continuous features
Cannot extrapolate beyond training range (regression) -- predictions bounded by seen values

Interview Quick-Fire

Q: What is a Random Forest?

A: An ensemble of decision trees, each trained on a bootstrap sample with random feature subsets at each split. Final prediction is the majority vote (classification) or average (regression) of all trees. Combines bagging with feature randomness to reduce variance.

Q: Bagging vs. Boosting -- what is the difference?

A: Bagging (used in RF) trains trees independently on bootstrap samples and aggregates via voting/averaging -- it reduces variance. Boosting (e.g., XGBoost) trains trees sequentially, where each new tree corrects errors of previous ones -- it reduces bias. Bagging is parallel; boosting is sequential.

Q: What is OOB error and why is it useful?

A: Out-of-Bag error uses the ~36.8% of samples left out of each bootstrap to evaluate that tree. Each sample is predicted only by trees that did not train on it. OOB error approximates leave-one-out CV without needing a separate validation set, saving data and computation.

Q: How does Random Forest measure feature importance?

A: Two main methods: (1) Gini importance -- total decrease in impurity across all splits using that feature, averaged over trees. (2) Permutation importance -- measures accuracy drop when a feature's values are randomly shuffled. Permutation importance is preferred as Gini importance is biased toward high-cardinality features.

Q: Random Forest vs. single Decision Tree -- when to pick RF?

A: Always prefer RF when accuracy matters -- it reduces the high variance of individual trees by averaging many decorrelated trees. A single tree is preferred only when full interpretability is essential (e.g., clinical decision rules). RF sacrifices interpretability for significantly better generalization.

Q: When should you use Random Forest?

A: RF excels with tabular data, mixed feature types, when you need a strong baseline with minimal tuning, and when feature importance is desired. It works well for both classification and regression. Avoid RF when you need real-time low-latency predictions, extrapolation, or when data is very high-dimensional and sparse (e.g., text).

Q: How does Random Forest handle overfitting?

A: RF resists overfitting through two mechanisms: (1) Bootstrap sampling gives each tree a different training set, and (2) random feature subsets at each split decorrelate the trees. More trees never increase overfitting -- they only reduce variance. The ensemble variance formula shows: as $B \to \infty$, variance approaches $\rho\sigma^2$, where $\rho$ is controlled by max_features.

Q: Can Random Forest be used for regression?

A: Yes. RandomForestRegressor averages predictions from all trees: $\hat{y} = \frac{1}{B}\sum f_b(\mathbf{x})$. Key difference from classification: uses $m = \lfloor p/3 \rfloor$ features per split (vs. $\sqrt{p}$) and mean squared error for splitting. Limitation: RF regression cannot extrapolate beyond the range of training target values.

Random Forest
Cheat Sheet

Key Formulas

Bootstrap Aggregating (Bagging)

Feature Randomness

Out-of-Bag (OOB) Error

Feature Importance

Hyperparameters

Pros vs Cons

Pros:

Cons:

Interview Quick-Fire

Q: What is a Random Forest?

Q: Bagging vs. Boosting -- what is the difference?

Q: What is OOB error and why is it useful?

Q: How does Random Forest measure feature importance?

Q: Random Forest vs. single Decision Tree -- when to pick RF?

Q: When should you use Random Forest?

Q: How does Random Forest handle overfitting?

Q: Can Random Forest be used for regression?

Continue Your Journey

Random Forest Guide

Random Forest Quiz

Interview Prep

Random ForestCheat Sheet

Key Formulas

Bootstrap Aggregating (Bagging)

Feature Randomness

Out-of-Bag (OOB) Error

Feature Importance

Hyperparameters

Pros vs Cons

Pros:

Cons:

Interview Quick-Fire

Q: What is a Random Forest?

Q: Bagging vs. Boosting -- what is the difference?

Q: What is OOB error and why is it useful?

Q: How does Random Forest measure feature importance?

Q: Random Forest vs. single Decision Tree -- when to pick RF?

Q: When should you use Random Forest?

Q: How does Random Forest handle overfitting?

Q: Can Random Forest be used for regression?

Continue Your Journey

Random Forest Guide

Random Forest Quiz

Interview Prep

Random Forest
Cheat Sheet