# Introduction to Probability and Statistical Inference

## Chapter 12-15: Probability

### Definitions

– A phenomenon is *random* if individual outcomes are uncertain but there is a predictable behavior in a large number of repetitions (random = haphazard).

– The *probability* of any outcome is the proportion of times the outcome would occur in a very long (infinitely long) series of repetitions.

– Repetitions, or *trials*, are said to be *independent* if the outcome of one trial does not affect the outcome of another.

### Operations on Events

– The *intersection* of two events *A* and *B* is the collection of outcomes that are in both *A* and *B*:

– Denoted *A*∩*B* and read as “*A* and *B*”.

– The *union* of two events *A* and *B* is the collection of outcomes that are in *A* or *B* (including events in both):

– Denoted *A*∪*B* and read as “*A* or *B*”.

– The *complement* of an event *A* is the collection of outcomes not in *A*:

– Denoted *A* and read as “not *A*”.

### The General Addition Rule

General addition rule: *P*(*A* ∪ *B*) = *P*(*A*) + *P*(*B*) − *P*(*A* ∩ *B*)

### Independent Events

– Events *A* and *B* are *independent* if knowing that one has occurred provides no information about whether the other will.

– E.G. Consider a random experiment involving 2 tosses of a fair coin

– Sample space is S = {*HH*, *HT*, *TH*, *TT*}

– All 4 outcomes in S have chance 1/4.

– Knowing *H* occurs on the 1st toss provides no info about whether *H* will occur in the 2nd toss.

– Let *A* = {*HH*, *HT*} be the event of a head on the 1st toss and *B* = {*HH*, *TH*} be the event of a head on the 2nd toss.

– The events *A* and *B* are independent.

### General Multiplication Rule

– For 2 (not necessarily independent) events, the rule is: *P*(*A*∩*B*) = *P*(*A*|*B*)×*P*(*B*), or

*P*(*A*∩*B*) = *P*(*B*|*A*)×*P*(*A*)

– Read *A*|*B* as “*A* given *B*” and *P*(*A*|*B*) as “the conditional probability of *A* given *B*.”

– When *A* and *B* are independent,

– *P*(*A*|*B*) = *P*(*A*) and *P*(*B*|*A*) = *P*(*B*), so that

*P*(*A* ∩ *B*) = *P*(*A*) × *P*(*B*).

– The multiplication rule for independent events is thus a special case of the general multiplication rule.

### Bayes’ Theorem and Partitioning the Sample Space

– Connects conditional probabilities:

*P*(*A* | *B*) = *P*(*B* | *A*)*P*(*A*) / *P*(*B*)

– Writing *B* as {*A* ∩ *B*} ∪ {*A* ∩ *B*} and noticing that these two sets are disjoint (see previous diagram) means that we can write *P*(*B*) = *P*(*A* ∩ *B*) + *P*(*A* ∩ *B*).

– Applying the general multiplication rule to both terms in the sum for *P*(*B*), we obtain

*P*(*B*) = *P*(*B* | *A*)*P*(*A*) + *P*(*B* | *A*)*P*(*A*).

– Which means an alternate form of Bayes’ Theorem is

*P*(*A* | *B*) = *P*(*B* | *A*)*P*(*A*) / (*P*(*B* | *A*)*P*(*A*) + *P*(*B* | *A*)*P*(*A*))

## Chapter 20-21: Inference for a Population Mean

### Review of Key Terminology

– **Parameter:** a number that describes a distribution, or the relationship between two variables.

– **Statistic:** a quantity that can be computed from data (does not depend on unknown parameters).

– **Sampling Distribution:** the distribution of a statistic that we’d observe if we sampled repeatedly.

– **Null/Alternative Hypothesis (***H*_{0}, *H*_{a}**):** Hypothesized values of a parameter that we are interested in testing.

– **Probability of an Outcome:** the proportion of times the outcome of a random phenomenon would occur if we sampled repeatedly.

### Inference for a Population Mean

– When σ is unknown we estimate it with the sample SD *s*.

– The sampling distribution of *X̄* is *N*(μ,σ/√n). Since we don’t know σ/√n we estimate it with *s*/√n.

– When we estimate the standard deviation of the sampling distribution of a statistic we call it the **standard error**. Thus *s*/√n is the standard error of *X̄*.

– When we knew σ, confidence intervals and tests for μ were based on the “pivotal quantity”

*Z* = (*X̄* − μ) / (σ/√n) ~ *N*(0,1).

– The natural thing to do when we don’t know σ is use

*T* = (*X̄* − μ) / (*s*/√n).

– **Question:** What is the sampling distribution of *T*?

### Sampling Distribution of *T*

– When we replace σ by *s*, we tend to get slightly more extreme observations in *T* than *Z* because of the extra variation in *s*.

– How much more extreme is determined by *n*.

– When *n* is large, *s* is very close to σ and the sampling distribution of *T* is very close to the standard normal.

– The distribution of *T* based on *n* observations is called the *t* distribution with *n* − 1 **degrees of freedom** (d.f.). The d.f. is from the degrees of freedom in the statistic *s*.

*t* Confidence Intervals

– Now that we can characterize the sampling distribution of *T* we can construct a confidence interval as before:

– For a level *C* confidence interval (CI) for the mean based on *n* observations we find the *p* = (1 − *C*)/2 critical value of the *t* distribution with *n* degrees of freedom, call it *t*^{*} (*t*^{*}_{n−1,p}).

– *C*% of *t* statistics will fall between −*t*^{*} and *t*^{*}; that is,

−*t*^{*} ≤ (*x̄* − μ) / (*s*/√n) ≤ *t*^{*} *C*% of the time … a little algebra …

μ − *t*^{*}(*s*/√n) ≤ *x̄* ≤ μ + *t*^{*}(*s*/√n) *C*% of the time

– So

*x̄* ± *t*^{*}(*s*/√n)

will cover μ *C*% of the time.

### Conditions for Inference about Two Population Means

– The data are *independent* SRSs of size *n*_{1} and *n*_{2} from two populations.

– The two populations can be populations of individuals under two treatments. Then we assume the data are from a randomized experiment (subjects randomly assigned to treatments) with one factor having two treatments.

– Both populations distributions are normal with unknown means μ_{1} and μ_{2} and unknown SDs σ_{1} and σ_{2}, respectively.

– Use the sample size conditions for one-sample inference with the sample size *n* = *n*_{1} + *n*_{2}; i.e.,

– *n* < 15: use *t* only if the data distribution is roughly symmetric and unimodal without outliers

– 15 ≤ *n* < 40: Use *t* except in the presence of outliers or strong skewness

– *n* ≥ 40: Use *t* even for skewed distributions.

### Overview of Inference

– Let *x*_{1} and *x*_{2} be the variables measured from the two populations.

– We want to compare the two populations by either a confidence interval for μ_{1} − μ_{2} or a test of *H*_{0}: μ_{1} − μ_{2} = 0.

– Inference is based on the difference in sample means, *x̄*_{1} − *x̄*_{2}, and the sampling distribution.

#### Sampling Distribution

– We need the sampling distribution of *X̄*_{1} − *X̄*_{2} under our assumptions for inference.

– The *x*_{1}_{i}’s are from a population with distribution *N*(μ_{1}, σ_{1}) and the *x*_{2}_{i}’s are from a population with distribution *N*(μ_{2}, σ_{2}).

– Then *X̄*_{1} − *X̄*_{2} has variance σ^{2}_{1}/*n*_{1} + σ^{2}_{2}/*n*_{2} and hence SD √σ^{2}_{1}/*n*_{1} + σ^{2}_{2}/*n*_{2}. (Variances add not s.d.’s.)

– We can standardize to a standard normal:

*Z* = ((*X̄*_{1} − *X̄*_{2}) − (μ_{1} − μ_{2})) / √σ^{2}_{1}/*n*_{1} + σ^{2}_{2}/*n*_{2}

#### Estimated SDs

– If σ_{1} and σ_{2} are both unknown, substitute *s*_{1} and *s*_{2} to obtain the standard error (SE) √*s*^{2}_{1}/*n*_{1} + *s*^{2}_{2}/*n*_{2} and the pivotal quantity

– Form is (estimated difference – true difference)/SE

– What is the sampling distribution of *T*?

– We can work around the problem by fudging the d.f.

– **Option 1:** (preferred) Get a computer to *estimate* the d.f. – see output from t.test().

– **Option 2:** Take the d.f. equal to the smaller of *n*_{1} − 1 and *n*_{2} − 1.

#### Hypothesis Tests

– For a value μ_{0} of μ_{1} − μ_{2} we have the *t* statistic

*T* = ((*x̄*_{1} − *x̄*_{2}) − μ_{0}) / √*s*^{2}_{1}/*n*_{1} + *s*^{2}_{2}/*n*_{2}

– (estimated difference – hypothesized difference)/SE

– Compare to the *t* distribution with estimated d.f. to obtain a p-value to summarize the evidence against the null hypothesis in favor of the alternative hypothesis.

– *p* < 0.001 **very strong**

– 0.001 < *p* < 0.01 **strong**

– 0.01 < *p* < 0.05 **good**

– 0.05 < *p* < 0.1 **some**

– *p* > 0.1 **little**

#### Confidence Intervals

– From the pivotal quantity we can derive a confidence interval of the usual form estimated difference ± margin of error, where

– estimated difference is *x̄*_{1} − *x̄*_{2}

– margin of error is critical value × SE