Gradient Boosting
Cheat Sheet
Everything you need on one page. Perfect for revision, interviews, and quick reference.
Everything you need on one page. Perfect for revision, interviews, and quick reference.
For classification, apply sigmoid ($\sigma$) or softmax to $F_M(\mathbf{x})$ to get probabilities.
Start with learning_rate=0.1, max_depth=4, subsample=0.8 and tune from there.
Early stopping + low learning rate is the single most effective regularization strategy for gradient boosting.
All three produce similar accuracy. Choose LightGBM for speed, CatBoost for categorical data, XGBoost for general reliability.
SHAP values are the gold standard -- they provide consistent, local+global, directional explanations.
scale_pos_weight or resamplingA: An ensemble method that builds models sequentially, where each new model corrects errors of the previous ensemble by fitting to the negative gradient of the loss function (pseudo-residuals).
A: Boosting builds models sequentially (each depends on the previous), focusing on hard examples. Bagging (e.g., Random Forest) builds models independently in parallel and averages them.
A: It shrinks the contribution of each tree, requiring more trees but improving generalization. Lower learning rate + more trees = better performance but slower training.
A: The Hessian (second derivative) provides curvature information, enabling more accurate split decisions and optimal leaf weights -- similar to Newton's method vs. gradient descent.
A: For structured/tabular data with mixed feature types. It dominates when data is not images, text, or audio. Especially strong with moderate-sized datasets (1K-10M rows).