Mastering Conditional Probability and Markov Models

Posted on Mar 18, 2026 in Archaeology

Conditional Probability

Instead of saying: “The chance of rain given I saw a wet umbrella,” we write it as: P(rain | wet umbrella).

The Difference in One Sentence

Question	Filter First?	Divide By
P(ice cream \| sunny)	YES, only look at sunny days	Number of sunny days (5)
P(sunny AND ice cream)	NO, look at everything	Total days (10)

Think of It This Way 🧠

AND = out of the whole world.
GIVEN = out of a smaller filtered world.

So now you know two things:

P(A | B) = out of only the B days, how many have A (divide by B days).
P(A, B) = out of ALL days, how many have both (divide by total days).

The fundamental formulas:

P(A, B) = P(A | B) × P(B)
P(A | B) = P(A, B) / P(B)

What You Know So Far

Idea	In Plain English	Formula
P(A)	Chance of A out of everything	count of A / total
P(A, B)	Chance of BOTH A and B	count of both / total
P(A \| B)	Chance of A, knowing B happened	filter to B first, then count A
The formula	Connecting all three	P(A\|B) = P(A,B) / P(B)

The Recipe: Do This Every Time

Step 1: Imagine 100 people. Always start with 100 people total.

Step 2: Split them into 2 groups. Use the first percentage given (e.g., “10% have a cavity” → 10 people have a cavity, 90 don’t).

Step 3: Find the AND people in EACH group. Use the percentages given for each group (e.g., “75% of cavity people have a toothache” → 10 × 0.75 = 7.5).

Step 4: Add the AND people together. (e.g., 7.5 + 9 = 16.5 total toothache people).

Step 5: Divide. Your specific AND group / total AND people (e.g., 7.5 / 16.5 = 45.5%).

🧠 One-Line Summary

Split 100 people → find the AND in each group → add them → divide!

The Pattern That Fits Every Problem

Name	Always Means	How to Find It
Prior	The FIRST percentage given	Read from problem
Likelihood	Percentage for the ‘yes’ group	Read from problem
AND	Prior × Likelihood	Multiply
Evidence	Total of both AND groups	Add both AND groups
Posterior	Final answer	AND / Evidence

Independence Rules

Situation	Meaning
P(A) = P(A\|B)	✅ INDEPENDENT — B gives no new info about A
P(A) ≠ P(A\|B)	❌ DEPENDENT — B gives new info about A

For conditional independence, check: P(A, B | C) = P(A|C) × P(B|C).

Naive Bayes Example

Step 1: Priors (Positive: 3/5, Negative: 2/5).
Step 2: Total Words (Positive: 9, Negative: 11).
Step 3: Word Probabilities (e.g., “good” in Positive: 2/9).
Step 4: Final Scores (Multiply priors by word probabilities).
Step 5: Prediction (The class with the higher score wins).

Note: Use smoothing to avoid zero probabilities: P(word|class) = (count + 1) / (total words + vocabulary size).

Language Modeling

A collection of text is called a Corpus.

Unigram: Multiply individual word probabilities.
Bigram: P(word2 | word1).
Trigram: P(word3 | word1, word2).

Markov Assumption: “I only need to know where I am NOW to predict where I’ll go NEXT.”

Hidden Markov Models (HMM)

HMMs have two types of probabilities:

Transition: Moving from one hidden state to another.
Emission: Hidden state producing an observation.

Forward Algorithm: Build day by day, adding everything up at the end to find the probability of an observation sequence.

Summary Cheat Sheet

Given states, want probability? Just multiply!
Given observations, want their probability? Use Forward Algorithm!
Given observations, want hidden states? Use Viterbi!