Naive Bayes
Cheat Sheet
Everything you need on one page. Perfect for revision, interviews, and quick reference.
Everything you need on one page. Perfect for revision, interviews, and quick reference.
Used for continuous features. Assumes features follow a normal distribution within each class.
Best for text classification with word counts (bag of words model).
For binary features. Models both presence AND absence of features (unlike Multinomial which ignores absence).
Prevents zero probabilities from destroying predictions. Essential for real-world applications.
No iterative optimization needed - just counting and multiplying.
A: Because it assumes all features are conditionally independent given the class label - a simplification that rarely holds in practice but works surprisingly well.
A: When you have very little training data, need extremely fast training/prediction, or have a very high-dimensional feature space (like text with thousands of words).
A: Simply omit the missing feature from the likelihood product. Since features are assumed independent, this is mathematically valid.
A: If any feature has zero probability for a class, the entire posterior for that class becomes zero, regardless of all other evidence.
A: Generative - it models the joint distribution P(x,y) by estimating P(x|y) and P(y), then uses Bayes' rule to get P(y|x).
A: The classification decision only needs the correct ranking of class probabilities, not accurate probability values. Even biased probability estimates often produce the correct argmax.