Mastering Conditional Probability and Markov Models
Conditional Probability
Instead of saying: “The chance of rain given I saw a wet umbrella,” we write it as: P(rain | wet umbrella).
The Difference in One Sentence
| Question | Filter First? | Divide By |
|---|---|---|
| P(ice cream | sunny) | YES, only look at sunny days | Number of sunny days (5) |
| P(sunny AND ice cream) | NO, look at everything | Total days (10) |
Think of It This Way π§
- AND = out of the whole world.
- GIVEN = out of a smaller filtered world.
So now you know two things:
- P(A | B) = out of only the B days, how many have A (divide by B days).
- P(A, B) = out of ALL days, how many have both (divide by total days).
The fundamental formulas:
- P(A, B) = P(A | B) Γ P(B)
- P(A | B) = P(A, B) / P(B)
What You Know So Far
| Idea | In Plain English | Formula |
|---|---|---|
| P(A) | Chance of A out of everything | count of A / total |
| P(A, B) | Chance of BOTH A and B | count of both / total |
| P(A | B) | Chance of A, knowing B happened | filter to B first, then count A |
| The formula | Connecting all three | P(A|B) = P(A,B) / P(B) |
The Recipe: Do This Every Time
Step 1: Imagine 100 people. Always start with 100 people total.
Step 2: Split them into 2 groups. Use the first percentage given (e.g., “10% have a cavity” β 10 people have a cavity, 90 don’t).
Step 3: Find the AND people in EACH group. Use the percentages given for each group (e.g., “75% of cavity people have a toothache” β 10 Γ 0.75 = 7.5).
Step 4: Add the AND people together. (e.g., 7.5 + 9 = 16.5 total toothache people).
Step 5: Divide. Your specific AND group / total AND people (e.g., 7.5 / 16.5 = 45.5%).
π§ One-Line Summary
Split 100 people β find the AND in each group β add them β divide!
The Pattern That Fits Every Problem
| Name | Always Means | How to Find It |
|---|---|---|
| Prior | The FIRST percentage given | Read from problem |
| Likelihood | Percentage for the ‘yes’ group | Read from problem |
| AND | Prior Γ Likelihood | Multiply |
| Evidence | Total of both AND groups | Add both AND groups |
| Posterior | Final answer | AND / Evidence |
Independence Rules
| Situation | Meaning |
|---|---|
| P(A) = P(A|B) | β INDEPENDENT β B gives no new info about A |
| P(A) β P(A|B) | β DEPENDENT β B gives new info about A |
For conditional independence, check: P(A, B | C) = P(A|C) Γ P(B|C).
Naive Bayes Example
Step 1: Priors (Positive: 3/5, Negative: 2/5).
Step 2: Total Words (Positive: 9, Negative: 11).
Step 3: Word Probabilities (e.g., “good” in Positive: 2/9).
Step 4: Final Scores (Multiply priors by word probabilities).
Step 5: Prediction (The class with the higher score wins).
Note: Use smoothing to avoid zero probabilities: P(word|class) = (count + 1) / (total words + vocabulary size).
Language Modeling
A collection of text is called a Corpus.
- Unigram: Multiply individual word probabilities.
- Bigram: P(word2 | word1).
- Trigram: P(word3 | word1, word2).
Markov Assumption: “I only need to know where I am NOW to predict where I’ll go NEXT.”
Hidden Markov Models (HMM)
HMMs have two types of probabilities:
- Transition: Moving from one hidden state to another.
- Emission: Hidden state producing an observation.
Forward Algorithm: Build day by day, adding everything up at the end to find the probability of an observation sequence.
Summary Cheat Sheet
- Given states, want probability? Just multiply!
- Given observations, want their probability? Use Forward Algorithm!
- Given observations, want hidden states? Use Viterbi!
