Statistical Sampling Distributions and Inference Exercises

Review Exercises for Sampling Distributions

8.56 Consider the data displayed in Exercise 1.20 on page 31. Construct a box-and-whisker plot and comment on the nature of the sample. Compute the sample mean and sample standard deviation.

8.57 If X1, X2, …, Xn are independent random variables having identical exponential distributions with parameter θ, show that the density function of the random variable Y = X1 + X2 + … + Xn is that of a gamma distribution with parameters α = n and β = θ.

8.58 In testing for carbon monoxide in a certain brand of cigarette, the data, in milligrams per cigarette, were coded by subtracting 12 from each observation. Use the results of Exercise 8.14 on page 231 to find the standard deviation for the carbon monoxide content of a random sample of 15 cigarettes of this brand if the coded measurements are: 3.8, −0.9, 5.4, 4.5, 5.2, 5.6, 2.7, −0.1, −0.3, −1.7, 5.7, 3.3, 4.4, −0.5, and 1.9.

8.59 If S12 and S22 represent the variances of independent random samples of size n1 = 8 and n2 = 12, taken from normal populations with equal variances, find P(S12 / S22 < 4.89).

8.60 A random sample of 5 bank presidents indicated annual salaries of $395,000, $521,000, $483,000, $479,000, and $510,000. Find the variance of this set.

8.61 If the number of hurricanes that hit a certain area of the eastern United States per year is a random variable having a Poisson distribution with μ = 6, find the probability that this area will be hit by:

  • (a) exactly 15 hurricanes in 2 years;
  • (b) at most 9 hurricanes in 2 years.

8.62 A taxi company tests a random sample of 10 steel-belted radial tires of a certain brand and records the following tread wear: 48,000, 53,000, 45,000, 61,000, 59,000, 56,000, 63,000, 49,000, 53,000, and 54,000 kilometers. Use the results of Exercise 8.14 on page 231 to find the standard deviation of this set of data by first dividing each observation by 1000 and then subtracting 55.

8.63 Consider the data of Exercise 1.19 on page 31. Construct a box-and-whisker plot. Comment on the results. Compute the sample mean and sample standard deviation.

8.64 If S12 and S22 represent the variances of independent random samples of size n1 = 25 and n2 = 31, taken from normal populations with variances σ12 = 10 and σ22 = 15, respectively, find P(S12 / S22 > 1.26).

8.65 Consider Example 1.5 on page 25. Comment on any outliers.

8.66 Consider Review Exercise 8.56. Comment on any outliers in the data.

8.67 The breaking strength X of a certain rivet used in a machine engine has a mean of 5000 psi and a standard deviation of 400 psi. A random sample of 36 rivets is taken. Consider the distribution of , the sample mean breaking strength.

  • (a) What is the probability that the sample mean falls between 4800 psi and 5200 psi?
  • (b) What sample size n would be necessary in order to have P(4900 < x̄ < 5100) = 0.99?

8.68 Consider the situation of Review Exercise 8.62. If the population from which the sample was taken has a population mean μ = 53,000 kilometers, does the sample information here seem to support that claim? In your answer, compute t = (x̄ − 53,000) / (s / √10) and determine from Table A.4 (with 9 d.f.) whether the computed t-value is reasonable or appears to be a rare event.

8.69 Two distinct solid fuel propellants, type A and type B, are being considered for a space program activity. Burning rates of the propellant are crucial. Random samples of 20 specimens of the two propellants are taken with sample means of 20.5 cm/sec for propellant A and 24.50 cm/sec for propellant B. It is generally assumed that the variability in burning rate is roughly the same for the two propellants and is given by a population standard deviation of 5 cm/sec. Assume that the burning rates for each propellant are approximately normal and hence make use of the Central Limit Theorem. Nothing is known about the two population mean burning rates, and it is hoped that this experiment might shed some light on them.

  • (a) If, indeed, μA = μB, what is P(x̄B − x̄A ≥ 4.0)?
  • (b) Use your answer in (a) to shed some light on the proposition that μA = μB.

8.70 The concentration of an active ingredient in the output of a chemical reaction is strongly influenced by the catalyst that is used in the reaction. It is felt that when catalyst A is used, the population mean concentration exceeds 65%. The standard deviation is known to be σ = 5%. A sample of outputs from 30 independent experiments gives the average concentration of A = 64.5%.

  • (a) Does this sample information with an average concentration of A = 64.5% provide disturbing information that perhaps μA is not 65%, but less than 65%? Support your answer with a probability statement.
  • (b) Suppose a similar experiment is done with the use of another catalyst, catalyst B. The standard deviation σ is still assumed to be 5% and B turns out to be 70%. Comment on whether or not the sample information on catalyst B strongly suggests that μB is truly greater than μA. Support your answer by computing P(x̄B − x̄A ≥ 5.5 | μB = μA).
  • (c) Under the condition that μA = μB = 65%, give the approximate distribution of the following quantities (with mean and variance of each), making use of the Central Limit Theorem:
    • i) B;
    • ii) A − x̄B;
    • iii) (x̄A − x̄B) / (σ√(2/30)).

8.71 From the information in Review Exercise 8.70, compute (assuming μB = 65%) P(x̄B ≥ 70).

8.72 Given a normal random variable X with mean 20 and variance 9, and a random sample of size n taken from the distribution, what sample size n is necessary in order that P(19.9 ≤ x̄ ≤ 20.1) = 0.95?

8.73 In Chapter 9, the concept of parameter estimation will be discussed at length. Suppose X is a random variable with mean μ and variance σ2 = 1.0. Suppose also that a random sample of size n is to be taken and is to be used as an estimate of μ. When the data are taken and the sample mean is measured, we wish it to be within 0.05 unit of the true mean with probability 0.99. That is, we want there to be a good chance that the computed from the sample is “very close” to the population mean, so we wish P(|x̄ − μ| ≤ 0.05) = 0.99. What sample size is required?

8.74 Suppose a filling machine is used to fill cartons with a liquid product. The specification that is strictly enforced for the filling machine is 9 ± 1.5 oz. If any carton is produced with weight outside these bounds, it is considered by the supplier to be defective. It is hoped that at least 99% of cartons will meet these specifications. With the conditions μ = 9 and σ = 1, what proportion of cartons from the process are defective? If changes are made to reduce variability, what must σ be reduced to in order to meet specifications with probability 0.99? Assume a normal distribution for the weight.

8.75 Consider the situation in Review Exercise 8.74. Suppose a considerable effort is conducted to “tighten” the variability in the system. Following the effort, a random sample of size 40 is taken from the new assembly line and the sample variance is s2 = 0.188 ounces2. Do we have strong numerical evidence that σ2 has been reduced below 1.0? Consider the probability P(S2 ≤ 0.188 | σ2 = 1.0), and give your conclusion.

8.76 Group Project: The class should be divided into groups of four people. The four students in each group should go to the college gym or a local fitness center. The students should ask each person who comes through the door his or her height in inches. Each group will then divide the height data by gender and work together to answer the following questions.

  • (a) Construct a normal quantile-quantile plot of the data. Based on the plot, do the data appear to follow a normal distribution?
  • (b) Use the estimated sample variance as the true variance for each gender. Assume that the population mean height for male students is actually three inches larger than that of female students. What is the probability that the average height of the male students will be 4 inches larger than that of the female students in your sample?
  • (c) What factors could render these results misleading?

Potential Misconceptions and Hazards

The Central Limit Theorem is one of the most powerful tools in all of statistics. The notion of a sampling distribution is one of the most important fundamental concepts in all of statistics, and the student should gain a clear understanding of it before proceeding. All chapters that follow will make considerable use of sampling distributions.

Suppose one wants to use the statistic to draw inferences about the population mean μ. This will be done by using the observed value from a single sample of size n. Any inference made must take into account the theoretical structure, or distribution, of all values that could be observed from samples of size n. This distribution is the basis for the Central Limit Theorem. The t, χ2, and F-distributions are also used in the context of sampling distributions.

For example, the t-distribution represents the structure that occurs if all of the values of (x̄ − μ) / (s / √n) are formed, where and s are taken from samples of size n from a N(x; μ, σ) distribution. Similar remarks apply to χ2 and F. The sample information forming the statistics for all of these distributions is the normal; where there is a t, F, or χ2, the source was a sample from a normal distribution.

Key Considerations for Sampling Distributions

There are three things that one must bear in mind regarding these fundamental sampling distributions:

  1. One cannot use the Central Limit Theorem unless σ is known. When σ is not known, it should be replaced by s, the sample standard deviation, in order to use the Central Limit Theorem.
  2. The T statistic is not a result of the Central Limit Theorem; x1, x2, …, xn must come from a N(x; μ, σ) distribution in order for (x̄ − μ) / (s / √n) to be a t-distribution.
  3. The concept of degrees of freedom is intuitive, as the nature of the distribution of S and t depends on the amount of information in the sample.

One- and Two-Sample Estimation Problems

9.1 Introduction

In previous chapters, we emphasized sampling properties of the sample mean and variance. The purpose of these presentations is to build a foundation that allows us to draw conclusions about the population parameters from experimental data. For example, the Central Limit Theorem provides information about the distribution of the sample mean . The distribution involves the population mean μ. Thus, any conclusions concerning μ drawn from an observed sample average must depend on knowledge of this sampling distribution. Similar comments apply to S2 and σ2.

9.2 Statistical Inference

Statistical inference consists of those methods by which one makes inferences or generalizations about a population. The trend today is to distinguish between the classical method of estimating a population parameter, whereby inferences are based strictly on information obtained from a random sample, and the Bayesian method, which utilizes prior subjective knowledge in conjunction with sample data.

Statistical inference may be divided into two major areas: estimation and tests of hypotheses. We treat these two areas separately, dealing with theory and applications of estimation in this chapter and hypothesis testing in Chapter 10.