Mastering Conditional Probability and Markov Models

Conditional Probability

Instead of saying: “The chance of rain given I saw a wet umbrella,” we write it as: P(rain | wet umbrella).

The Difference in One Sentence

QuestionFilter First?Divide By
P(ice cream | sunny)YES, only look at sunny daysNumber of sunny days (5)
P(sunny AND ice cream)NO, look at everythingTotal days (10)

Think of It This Way 🧠

  • AND = out of the whole world.
  • GIVEN = out of a smaller filtered world.

So now you know two things:

  • P(A | B) = out of only the B days, how many have A (divide by B days).
  • P(A, B) = out of ALL days, how many have both (divide by total days).

The fundamental formulas:

  • P(A, B) = P(A | B) × P(B)
  • P(A | B) = P(A, B) / P(B)

What You Know So Far

IdeaIn Plain EnglishFormula
P(A)Chance of A out of everythingcount of A / total
P(A, B)Chance of BOTH A and Bcount of both / total
P(A | B)Chance of A, knowing B happenedfilter to B first, then count A
The formulaConnecting all threeP(A|B) = P(A,B) / P(B)

The Recipe: Do This Every Time

Step 1: Imagine 100 people. Always start with 100 people total.

Step 2: Split them into 2 groups. Use the first percentage given (e.g., “10% have a cavity” → 10 people have a cavity, 90 don’t).

Step 3: Find the AND people in EACH group. Use the percentages given for each group (e.g., “75% of cavity people have a toothache” → 10 × 0.75 = 7.5).

Step 4: Add the AND people together. (e.g., 7.5 + 9 = 16.5 total toothache people).

Step 5: Divide. Your specific AND group / total AND people (e.g., 7.5 / 16.5 = 45.5%).

🧠 One-Line Summary

Split 100 people → find the AND in each group → add them → divide!

The Pattern That Fits Every Problem

NameAlways MeansHow to Find It
PriorThe FIRST percentage givenRead from problem
LikelihoodPercentage for the ‘yes’ groupRead from problem
ANDPrior × LikelihoodMultiply
EvidenceTotal of both AND groupsAdd both AND groups
PosteriorFinal answerAND / Evidence

Independence Rules

SituationMeaning
P(A) = P(A|B)✅ INDEPENDENT — B gives no new info about A
P(A) ≠ P(A|B)❌ DEPENDENT — B gives new info about A

For conditional independence, check: P(A, B | C) = P(A|C) × P(B|C).

Naive Bayes Example

Step 1: Priors (Positive: 3/5, Negative: 2/5).
Step 2: Total Words (Positive: 9, Negative: 11).
Step 3: Word Probabilities (e.g., “good” in Positive: 2/9).
Step 4: Final Scores (Multiply priors by word probabilities).
Step 5: Prediction (The class with the higher score wins).

Note: Use smoothing to avoid zero probabilities: P(word|class) = (count + 1) / (total words + vocabulary size).

Language Modeling

A collection of text is called a Corpus.

  • Unigram: Multiply individual word probabilities.
  • Bigram: P(word2 | word1).
  • Trigram: P(word3 | word1, word2).

Markov Assumption: “I only need to know where I am NOW to predict where I’ll go NEXT.”

Hidden Markov Models (HMM)

HMMs have two types of probabilities:

  • Transition: Moving from one hidden state to another.
  • Emission: Hidden state producing an observation.

Forward Algorithm: Build day by day, adding everything up at the end to find the probability of an observation sequence.

Summary Cheat Sheet

  • Given states, want probability? Just multiply!
  • Given observations, want their probability? Use Forward Algorithm!
  • Given observations, want hidden states? Use Viterbi!