Logistic Regression
Cheat Sheet
Everything you need on one page. Perfect for revision, interviews, and quick reference.
Everything you need on one page. Perfect for revision, interviews, and quick reference.
The BCE loss is convex - guaranteed global minimum!
Gradient = (prediction error) × (input features)
L1 produces sparse weights (feature selection). L2 shrinks all weights uniformly.
A: MSE with sigmoid creates a non-convex loss surface with local minima. BCE is convex, ensuring global optimum.
A: No. The sigmoid makes the equation nonlinear in weights. We use iterative methods like gradient descent.
A: Each coefficient represents the change in log-odds for a one-unit increase in the corresponding feature.
A: Logistic regression is a single-layer neural network with sigmoid activation. It is the building block of deep learning.
A: When interpretability matters, data is limited, features are linearly separable, or you need calibrated probabilities.
A: The hyperplane where $w^Tx + b = 0$, i.e., where the predicted probability is exactly 0.5.