Statistical Inference: Hypothesis Testing and Confidence Intervals

Posted on Apr 27, 2024 in Mathematics

Chapter 17: Inferences When Standard Deviation is Unknown

Big Idea: T-Test for Unknown Standard Deviation

The t-test is used when the population standard deviation (SD) is unknown. It has fewer conditions than z-tests for inferences about a mean:

Data is a simple random sample (SRS) from a larger population.
Observations follow a normal distribution.
We estimate standard error using s / √n, where s is the sample SD and √n accounts for the variability of sample means.

T-Test Formula and Properties

The t-test statistic is calculated as: (X̄ – μ) / (s / √n). It is more variable than a z-test because we estimate σ with s, resulting in a wider, non-normal distribution. The degrees of freedom (df) is n-1, and higher df leads to a distribution closer to normal.

Steps for Conducting a T-Test

Check if conditions are met.
Calculate the t-test statistic using sample mean (X̄), sample SD (s), sample size (n), and hypothesized population mean (μ₀).
Compute the probability (p-value) of observing the test statistic t or more extreme under the null hypothesis.
Interpret the p-value to draw conclusions about the hypothesis.

T-Star and Confidence Intervals

For a 95% confidence interval (CI) with 25 observations, we use the qt() function in R to find t*: t_star <- qt(p = 0.975, df = 24). The formula for the CI is: X̄ ± t_star * s / √n.

Robustness of T-Tests

T-tests are robust against non-normality, except in cases of outliers or strong skew. With outliers, larger sample sizes improve robustness.

Assumptions and Considerations

Plot data to check for outliers and skew.
SRS is more important than normality.
For n < 15, use t-procedures if data appears close to normal.
For n > 15, use t-tests unless there are outliers or strong skew.
For n > 40, generally use t-tests.

Chapter 17 Part 2: Paired T-Test

Matching by Design

Paired t-tests are used when data is matched by design, such as before-and-after measurements on the same individuals. It tests the mean difference within subjects: T = (μd – 0) / (σd / √n), where μd is the mean difference and σd is the standard deviation of the differences.

R Functions for Paired T-Tests

Use pull() to extract raw data from a data frame.
Use qt() to find the quantile for the desired confidence level.
Use t.test() with paired = T to conduct a paired t-test.

Advantages and Considerations

Paired t-tests remove confounding by comparing within-subject differences. Ensure a wash-out period between treatments to avoid carryover effects.

Chapter 18: Comparing Two Population Means

Two-Sample Tests

We now move from one-sample to two-sample tests, comparing means from two populations. The null hypothesis (H₀) is: μ₁ – μ₂ = 0, and the alternative hypothesis (H₁) is: μ₁ – μ₂ ≠ 0.

Graphical Comparison

Compare the distributions of the two samples using histograms or boxplots to assess their shapes, centers, and spreads.

Conditions for Two-Sample T-Tests

Two SRSs from two populations.
Independent samples.
Same quantitative variable for both samples.
Both samples are normally distributed with no outliers.

Standard Error and T-Statistic

The standard error (SE) is estimated as: SE = √(s₁²/n₁ + s₂²/n₂). The two-sample t-test statistic is: t = (X̄₁ – X̄₂) / SE.

Degrees of Freedom and Confidence Interval

The degrees of freedom formula is complex. The confidence interval is calculated as: (X̄₁ – X̄₂) ± t_star * √(s₁²/n₁ + s₂²/n₂).

R Function for Two-Sample T-Tests

Use t.test() to conduct a two-sample t-test, specifying the two samples and the alternative hypothesis.

Robustness

Two-sample t-tests are more robust than one-sample tests, especially for skewed data. Sample sizes as small as 5 can work when samples are of equal size. Larger samples are needed when populations have different shapes.

Chapter 19: Inference About a Population Proportion

Binary Data and Confidence Intervals

This chapter deals with binary data (e.g., success/failure). The large sample CI for a proportion is: p̂ ± z_star * √[p̂(1-p̂)/n]. However, the Plus 4 method is often more effective.

Plus 4 Method

The Plus 4 method improves the accuracy of CIs for binary data, especially for small sample sizes. It adds 2 successes and 2 failures to the data, calculates the adjusted proportion (p̃), and uses it to compute the CI.

Other Methods for Proportion CIs

Wilson Score Interval (prop.test in R): Similar to Plus 4 with a correction factor.
Clopper-Pearson or Exact Interval (binom.test in R): Conservative, providing better coverage than advertised.

Example and Comparison of Methods

The example compares different methods for calculating a 95% CI for the proportion of elderly individuals who died within a year of a hip fracture. The Plus 4 and Wilson Score methods provide similar results, while the large sample method is less accurate.

Finding Sample Size for Proportion Studies

To determine the required sample size (n) for a desired margin of error (m), we use the formula: n = (z_star/m)² * p_star * (1-p_star), where p_star is an estimate of the true proportion (use 0.5 if unknown).

Calculating P-Values

Use the pnorm() function in R to find the p-value for a given z-value, specifying lower.tail = F for the upper tail probability.

Chapter 20: Inference for Comparing Two Proportions

Large Sample CI for Difference of Two Proportions

Use this method when the number of successes and failures is greater than 10 for both samples. The formula is: (p̂₁ – p̂₂) ± z_star * √[p̂₁(1-p̂₁)/n₁ + p̂₂(1-p̂₂)/n₂]. However, it may have low coverage for small samples.

Plus 4 Method for Two Proportions

Similar to the Plus 4 method for one proportion, this method adds 1 success and 1 failure to each sample and calculates adjusted proportions (p̃₁ and p̃₂) to compute the CI. It is more accurate for small sample sizes.

Z-Test for Two Proportions

The z-test statistic is calculated as: (p̂₁ – p̂₂) / √[p̂(1-p̂)(1/n₁ + 1/n₂)], where p̂ is the pooled proportion. Use pnorm() to find the p-value.

Conditions and Considerations

Use the large sample CI or z-test when counts of success and failure are greater than 5 for both samples.
Use the Plus 4 method when sample sizes are at least 5.
Consider using a z-test for one-sided hypotheses, as chi-squared tests are only two-sided.

Chapter 21: The Chi-Squared Goodness of Fit Test

Categorical Data with More Than Two Categories

The chi-squared goodness of fit test assesses how well observed data fits a hypothesized distribution for a single categorical variable with more than two categories.

Example: Jury Selection and Ethnicity

The example examines whether the ethnic distribution of jurors matches the expected distribution based on population proportions.

Chi-Squared Statistic and Distribution

The chi-squared statistic is calculated as the sum of (observed – expected)² / expected for each category. The chi-squared distribution has degrees of freedom (df) equal to the number of categories minus 1.

P-Value and Chi-Squared Test in R

Use pchisq() to find the p-value for a given chi-squared value and df. Use chisq.test() to conduct a chi-squared goodness of fit test.

Conditions for Chi-Squared Test

Fixed number of observations.
Independent observations.
Mutually exclusive categories.
At least 80% of cells have expected counts of 5 or more.
All cells have expected counts greater than 1.

Chapter 22: Inference for Two-Way Tables

Analyzing Two Categorical Variables

This chapter extends the chi-squared test to analyze the relationship between two categorical variables.

Example: Vaping and JUUL Advertisements

The example investigates the association between exposure to JUUL advertisements and vaping among teens.

Expected Counts and Chi-Squared Statistic

The expected count for each cell is calculated as (row total * column total) / overall total. The chi-squared statistic is calculated similarly to the goodness of fit test.

Degrees of Freedom and P-Value

The degrees of freedom for a two-way table is (number of rows – 1) * (number of columns – 1). Use pchisq() to find the p-value.

Chi-Squared Test of Independence

Use chisq.test() to conduct a chi-squared test of independence. The correct = T option applies a continuity correction for small sample sizes.

Conditions and Assumptions

Expected counts of at least 5 for at least 80% of cells.
All expected counts greater than 1.
Data from independent SRSs or a single SRS with individuals classified according to two categorical variables.

Z-Test for Two-Way Tables

For 2×2 tables, a z-test can be used to find one-sided p-values, which are not available with the chi-squared test.

Graphical Comparison

Use dodged histograms to compare the conditional distributions of one variable across levels of another variable.

Chapter 23: Inference for Regression

Recap of Regression Analysis

This chapter reviews key concepts of regression analysis, including linearity, correlation, line of best fit, and interpretation of slope, intercept, and R².

Assumptions for Regression Inference

Linear relationship between x and y.
Normality of residuals (vertical distances between observed and fitted values).
Independent observations.
Equal standard deviation of responses for all values of x.

Graphs for Checking Assumptions

Scatter plot: Shows data, fitted line, and residuals.
QQ plot: Checks normality of residuals.
Fitted vs. Residuals plot: Checks for random scatter.
Amount Explained plot: Compares boxplots of y and residuals.

Robustness and Outliers

Regression is relatively robust to non-normality but sensitive to outliers.

Chapter 23 Part 2: Inference for Regression

R Functions for Regression Output

tidy(): Presents regression coefficients and statistics.
glance(): Provides overall model fit statistics.
augment(): Creates an augmented data frame with fitted values and residuals.

Sum of Squared Errors (SSE) and Regression Standard Error

SSE measures the overall discrepancy between observed and fitted values. The regression standard error (s) is a measure of model fit, with lower values indicating better fit.

Hypothesis Testing for Regression Slope

We test the null hypothesis H₀: b = 0 (no association between x and y) against the alternative hypothesis H₁: b ≠ 0 (association exists).

R Functions for Hypothesis Testing

Use tidy() to obtain the estimated slope (b̂), standard error (SEb), and t-statistic. Use pt() to find the p-value.

Confidence Interval for Regression Slope

The CI for the slope is calculated as: b̂ ± t_star * SEb, where t_star is obtained from the qt() function.

Test for Lack of Correlation

A non-significant hypothesis test for the slope implies no correlation between x and y.

Chapter 24: ANOVA

Analysis of Variance

ANOVA compares means of multiple groups to determine if there is a statistically significant difference between them.

Example: Cancer Treatment in Mice

The example investigates the effect of different cancer treatments on tumor volume in mice.

ANOVA Test Statistic (F)

The F statistic is the ratio of mean squares for groups (MSG) to mean squares for error (MSE). MSG measures the variation between group means, while MSE measures the variation within groups.

ANOVA in R

Use aov() to conduct an ANOVA test. Use tidy() to display the results, including degrees of freedom, sum of squares, mean squares, F statistic, and p-value.

Tukey’s Honestly Significant Difference (HSD) Test

Tukey’s HSD test is used to identify which specific groups differ from each other after a significant ANOVA result.

Conditions for ANOVA

Independent SRSs from each population.
Normal distribution of populations (robustness to non-normality exists).
Equal standard deviations of populations (rule of thumb: largest SD < 2 * smallest SD).

Bootstrap Confidence Intervals

Non-Parametric Confidence Intervals

Bootstrap methods are used to construct CIs when data is not normally distributed or when standard formulas are not applicable.

Steps for Bootstrap CI

Find the median (or other statistic) of the original sample.
Repeatedly resample with replacement and calculate the statistic for each resample.
Create a histogram of the resampled statistics to approximate the sampling distribution.
Find the percentiles that capture the middle 95% of the distribution as the CI bounds.

When to Use Bootstrap

No formula or unknown formula for CI.
Assumptions for standard formulas not met.
CI for any statistic.

Permutation Tests

Hypothesis Testing with Small Samples or Non-SRS Data

Permutation tests are used for hypothesis testing when sample sizes are small or data is not from an SRS.

Example: Malaria and Alcohol Consumption

The example examines the effect of beer consumption on mosquito attraction.

Permutation Test in R

Use the infer package to conduct a permutation test. The specify(), hypothesize(), generate(), and calculate() functions are used to define the test and generate the null distribution.

Interpretation

If the null hypothesis is true, the distribution of the response variable should be similar across groups, even after shuffling the data.

Bonus Chapter: Regression Model with Categorical Exposure

Key R Functions

qt(), pt(), qnorm(), pnorm(), pchisq(): Distribution functions.
t.test(), binom.test(), prop.test(), chisq.test(): Hypothesis testing functions.
Broom package: tidy(), glance(), augment(): Functions for summarizing and manipulating model output.
lm(), predict(), confint(), aov(), TukeyHSD(): Regression and ANOVA functions.
ggplot2, dplyr: Data visualization and manipulation packages.

Example: Calcium Intake and Bone Density

The example uses ANOVA to compare mean daily calcium intake in adults with different bone densities.

Statistical Inference: Hypothesis Testing and Confidence Intervals

Chapter 17: Inferences When Standard Deviation is Unknown

Big Idea: T-Test for Unknown Standard Deviation

T-Test Formula and Properties

Steps for Conducting a T-Test

T-Star and Confidence Intervals

Robustness of T-Tests

Assumptions and Considerations

Chapter 17 Part 2: Paired T-Test

Matching by Design

R Functions for Paired T-Tests

Advantages and Considerations

Chapter 18: Comparing Two Population Means

Two-Sample Tests

Graphical Comparison

Conditions for Two-Sample T-Tests

Standard Error and T-Statistic

Degrees of Freedom and Confidence Interval

R Function for Two-Sample T-Tests

Robustness

Chapter 19: Inference About a Population Proportion

Binary Data and Confidence Intervals

Plus 4 Method

Other Methods for Proportion CIs

Example and Comparison of Methods

Finding Sample Size for Proportion Studies

Calculating P-Values

Chapter 20: Inference for Comparing Two Proportions

Large Sample CI for Difference of Two Proportions

Plus 4 Method for Two Proportions

Z-Test for Two Proportions

Conditions and Considerations

Chapter 21: The Chi-Squared Goodness of Fit Test

Categorical Data with More Than Two Categories

Example: Jury Selection and Ethnicity

Chi-Squared Statistic and Distribution

P-Value and Chi-Squared Test in R

Conditions for Chi-Squared Test

Chapter 22: Inference for Two-Way Tables

Analyzing Two Categorical Variables

Example: Vaping and JUUL Advertisements

Expected Counts and Chi-Squared Statistic

Degrees of Freedom and P-Value

Chi-Squared Test of Independence

Conditions and Assumptions

Z-Test for Two-Way Tables

Graphical Comparison

Chapter 23: Inference for Regression

Recap of Regression Analysis

Assumptions for Regression Inference

Graphs for Checking Assumptions

Robustness and Outliers

Chapter 23 Part 2: Inference for Regression

R Functions for Regression Output

Sum of Squared Errors (SSE) and Regression Standard Error

Hypothesis Testing for Regression Slope

R Functions for Hypothesis Testing

Confidence Interval for Regression Slope

Test for Lack of Correlation

Chapter 24: ANOVA

Analysis of Variance

Example: Cancer Treatment in Mice

ANOVA Test Statistic (F)

ANOVA in R

Tukey’s Honestly Significant Difference (HSD) Test

Conditions for ANOVA

Bootstrap Confidence Intervals

Non-Parametric Confidence Intervals

Steps for Bootstrap CI

When to Use Bootstrap

Permutation Tests

Hypothesis Testing with Small Samples or Non-SRS Data

Example: Malaria and Alcohol Consumption

Permutation Test in R

Interpretation

Bonus Chapter: Regression Model with Categorical Exposure

Key R Functions

Example: Calcium Intake and Bone Density

Recent Notes

Subjects