Mastering Conditional Probability and Markov Models

Conditional Probability

Instead of saying: “The chance of rain given I saw a wet umbrella,” we write it as: P(rain | wet umbrella).

The Difference in One Sentence

QuestionFilter First?Divide By
P(ice cream | sunny)YES, only look at sunny daysNumber of sunny days (5)
P(sunny AND ice cream)NO, look at everythingTotal days (10)

Think of It This Way 🧠

  • AND = out of the whole world.
  • GIVEN = out of a smaller filtered world.

So now you know two things:

  • P(A | B) = out of only the B days, how many have A (divide by B days).
  • P(A, B) = out of ALL days, how many have both (divide by total days).

The fundamental formulas:

  • P(A, B) = P(A | B) Γ— P(B)
  • P(A | B) = P(A, B) / P(B)

What You Know So Far

IdeaIn Plain EnglishFormula
P(A)Chance of A out of everythingcount of A / total
P(A, B)Chance of BOTH A and Bcount of both / total
P(A | B)Chance of A, knowing B happenedfilter to B first, then count A
The formulaConnecting all threeP(A|B) = P(A,B) / P(B)

The Recipe: Do This Every Time

Step 1: Imagine 100 people. Always start with 100 people total.

Step 2: Split them into 2 groups. Use the first percentage given (e.g., “10% have a cavity” β†’ 10 people have a cavity, 90 don’t).

Step 3: Find the AND people in EACH group. Use the percentages given for each group (e.g., “75% of cavity people have a toothache” β†’ 10 Γ— 0.75 = 7.5).

Step 4: Add the AND people together. (e.g., 7.5 + 9 = 16.5 total toothache people).

Step 5: Divide. Your specific AND group / total AND people (e.g., 7.5 / 16.5 = 45.5%).

🧠 One-Line Summary

Split 100 people β†’ find the AND in each group β†’ add them β†’ divide!

The Pattern That Fits Every Problem

NameAlways MeansHow to Find It
PriorThe FIRST percentage givenRead from problem
LikelihoodPercentage for the ‘yes’ groupRead from problem
ANDPrior Γ— LikelihoodMultiply
EvidenceTotal of both AND groupsAdd both AND groups
PosteriorFinal answerAND / Evidence

Independence Rules

SituationMeaning
P(A) = P(A|B)βœ… INDEPENDENT β€” B gives no new info about A
P(A) β‰  P(A|B)❌ DEPENDENT β€” B gives new info about A

For conditional independence, check: P(A, B | C) = P(A|C) Γ— P(B|C).

Naive Bayes Example

Step 1: Priors (Positive: 3/5, Negative: 2/5).
Step 2: Total Words (Positive: 9, Negative: 11).
Step 3: Word Probabilities (e.g., “good” in Positive: 2/9).
Step 4: Final Scores (Multiply priors by word probabilities).
Step 5: Prediction (The class with the higher score wins).

Note: Use smoothing to avoid zero probabilities: P(word|class) = (count + 1) / (total words + vocabulary size).

Language Modeling

A collection of text is called a Corpus.

  • Unigram: Multiply individual word probabilities.
  • Bigram: P(word2 | word1).
  • Trigram: P(word3 | word1, word2).

Markov Assumption: “I only need to know where I am NOW to predict where I’ll go NEXT.”

Hidden Markov Models (HMM)

HMMs have two types of probabilities:

  • Transition: Moving from one hidden state to another.
  • Emission: Hidden state producing an observation.

Forward Algorithm: Build day by day, adding everything up at the end to find the probability of an observation sequence.

Summary Cheat Sheet

  • Given states, want probability? Just multiply!
  • Given observations, want their probability? Use Forward Algorithm!
  • Given observations, want hidden states? Use Viterbi!