Logistic Regression Cheat Sheet | Techma Zone Study Lab

Key Formulas

Linear Model:

$$z = w^Tx + b$$

Sigmoid:

\sigma(z) = \frac{1}{1 + e^{-z}}

Prediction:

P(y=1|x) = \sigma(w^Tx + b)

Odds:

\frac{P}{1-P} = e^{w^Tx+b}

Log-Odds:

\log\left(\frac{P}{1-P}\right) = w^Tx + b

Sigmoid Derivative:

\sigma'(z) = \sigma(z)(1-\sigma(z))

Loss Function

BCE Loss:

\mathcal{L} = -\frac{1}{n}\sum_{i=1}^{n}[y_i\log(p_i) + (1-y_i)\log(1-p_i)]

When y=1:

\mathcal{L} = -\log(p) \quad \text{(penalize low confidence)}

When y=0:

\mathcal{L} = -\log(1-p) \quad \text{(penalize high confidence)}

The BCE loss is convex - guaranteed global minimum!

Optimization Steps

Initialize weights $$w$$ and bias $$b$$ (zeros or random)
Compute predictions: $p_i = \sigma(w^Tx_i + b)$
Compute loss: $\mathcal{L}(w,b)$
Compute gradients
Update weights: $w := w - \eta \nabla_w \mathcal{L}$
Repeat steps 2-5 until convergence

No closed form exists.

Gradient

Weight Gradient:

\frac{\partial \mathcal{L}}{\partial w} = \frac{1}{n}\sum_{i=1}^{n}(p_i - y_i)x_i

Bias Gradient:

\frac{\partial \mathcal{L}}{\partial b} = \frac{1}{n}\sum_{i=1}^{n}(p_i - y_i)

Update Rule:

w := w - \eta(p_i - y_i)x_i

Gradient = (prediction error) × (input features)

Assumptions

Binary outcome variable (0 or 1)
Linear relationship between features and log-odds
Observations are independent
Little or no multicollinearity among features
Large enough sample size
No extreme outliers in continuous predictors

Common Mistakes

Using MSE loss instead of BCE (non-convex!)
Not scaling features before training
Ignoring multicollinearity
Expecting non-linear decision boundaries
Confusing logistic regression with linear regression
Forgetting to check class imbalance
Interpreting coefficients as probabilities (they are log-odds!)
Using accuracy as the only metric for imbalanced data

Regularization

L2 (Ridge):

\mathcal{L} + \frac{\lambda}{2}\|w\|^2

L1 (Lasso):

\mathcal{L} + \lambda\|w\|_1

L1 produces sparse weights (feature selection). L2 shrinks all weights uniformly.

Interview Questions

Q: Why not use MSE for logistic regression?

A: MSE with sigmoid creates a non-convex loss surface with local minima. BCE is convex, ensuring global optimum.

Q: Is there a closed-form solution?

A: No. The sigmoid makes the equation nonlinear in weights. We use iterative methods like gradient descent.

Q: What does the coefficient represent?

A: Each coefficient represents the change in log-odds for a one-unit increase in the corresponding feature.

Q: How is it related to neural networks?

A: Logistic regression is a single-layer neural network with sigmoid activation. It is the building block of deep learning.

Q: When would you choose logistic regression over a neural network?

A: When interpretability matters, data is limited, features are linearly separable, or you need calibrated probabilities.

Q: What is the decision boundary?

A: The hyperplane where $$w^Tx + b = 0$$ , i.e., where the predicted probability is exactly 0.5.

Logistic RegressionCheat Sheet