6.1 Sampling Distribution of the Sample Mean
LEARNING OBJECTIVES
- Describe the distribution of the sample mean.
- Solve probability problems involving the distribution of the sample mean.
Suppose all samples of size [latex]n[/latex] are selected from a population with mean [latex]\mu[/latex] and standard deviation [latex]\sigma[/latex]. For each sample, the sample mean [latex]\overline{x}[/latex] is recorded. The probability distribution of these sample means is called the sampling distribution of the sample means. The central limit theorem describes the properties of the sampling distribution of the sample means.
THE CENTRAL LIMIT THEOREM
Suppose all samples of size [latex]n[/latex] are taken from a population with mean [latex]\mu[/latex] and standard deviation [latex]\sigma[/latex]. The collection of sample means forms a probability distribution called the sampling distribution of the sample mean.
- The mean of the distribution of the sample means, denoted [latex]\mu_{\overline{x}}[/latex], equals the mean of the population.
[latex]\begin{eqnarray*}\\\mu_{\overline{x}}&=&\mu\\\\\end{eqnarray*}[/latex]
- The standard deviation of the sample means (called the standard error of the mean), denoted [latex]\sigma_{\overline{x}}[/latex], equals the standard deviation of the population divided by the square root of the sample size.
[latex]\begin{eqnarray*}\\\sigma_{\overline{x}}&=&\frac{\sigma}{\sqrt{n}}\\\\\end{eqnarray*}[/latex]
- The distribution of the sample means follows a normal distribution if one of the following conditions is met:
- The population the samples are drawn from is normal, regardless of the sample size [latex]n[/latex].
- The sample size [latex]n\geq 30[/latex].
Video: “Central limit theorem | Inferential statistics | Probability and Statistics | Khan Academy” by Khan Academy [9:49] is licensed under the Standard YouTube License.Transcript and closed captions available on YouTube.
Video: “Sampling distribution of the sample mean | Probability and Statistics | Khan Academy” by Khan Academy [10:52] is licensed under the Standard YouTube License.Transcript and closed captions available on YouTube.
Video: “Standard error of the mean | Inferential statistics | Probability and Statistics | Khan Academy” by Khan Academy [15:15] is licensed under the Standard YouTube License.Transcript and closed captions available on YouTube.
Calculating Probabilities for Sample Means
Because the central limit theorem states that the sampling distribution of the sample means follows a normal distribution (under the right conditions), the normal distribution can be used to answer probability questions about sample means. The [latex]z[/latex]-score for the sampling distribution of the sample means is
[latex]\displaystyle{z=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}}[/latex]
where [latex]\mu[/latex] is the mean of the population the sample is taken from, [latex]\sigma[/latex] is the standard deviation of the population the sample is taken from, and [latex]n[/latex] is the sample size.
CALCULATING PROBABILITIES ABOUT SAMPLE MEANS IN EXCEL
Because the sample means follow a normal distribution (under the right conditions), the norm.dist(x,[latex]\mu[/latex],[latex]\sigma[/latex],logic operator) function can be used to calculate probabilities associated with a sample mean.
- For x, enter the value for [latex]\overline{x}[/latex].
- For [latex]\mu[/latex], enter the mean of the sample means [latex]\mu[/latex]. Because the mean of the sample means equals the mean of the population the sample is taken from, we enter [latex]\mu[/latex], the mean of the population.
- For [latex]\sigma[/latex], enter the standard error of the sample means [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex].
- For the logic operator, enter true. Note: Because we are calculating the area under the curve, we always enter true for the logic operator.
NOTES
- In this case, we want to calculate probabilities associated with a sample mean. The sample means follow a normal distribution (under the right conditions), which allows us to use the norm.dist function to calculate probabilities. Because we are working with sample means, we must enter the mean and the standard distribution of the distribution of the sample means into the norm.dist function, and not the mean and standard distribution of the population the samples are taken from. The mean of the sample means equals the mean of the population, so we enter the value of [latex]\mu[/latex] into the second field of the norm.dist function. But the standard distribution of the sample means equals [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex], so we must enter this value into the third field of the norm.dist function.
- We use the norm.dist function in the same way as we learned previously to calculate the probability a sample mean is less than a given value, a sample mean is greater than a given value, or a sample mean is in between two given values.
- An alternative approach in Excel is to use the norm.s.dist(z,true) function. In the norm.s.dist function, we enter the z-score for the corresponding value of [latex]\overline{x}[/latex] (using the [latex]z[/latex]-score for sample means given above).
EXAMPLE
The length of time, in hours, it takes an “over 40” group of people to play one soccer match is normally distributed with a mean of [latex]2[/latex] hours and a standard deviation of [latex]0.5[/latex] hours. Suppose a sample of size [latex]25[/latex] is drawn randomly from the population.
- Is the distribution of the sample means normal? Explain.
- What is the mean and the standard distribution of the distribution of the sample means?
- What is the probability that the mean of the sample is less than [latex]1.7[/latex] hours?
- What is the probability that the mean of the sample is more than [latex]2.2[/latex] hours?
- What is the probability that the sample mean is between [latex]1.8[/latex] hours and [latex]2.3[/latex] hours?
Solution
- Because the population the sample is taken from follows a normal distribution, the distribution of the sample means also follows a normal distribution.
- The mean of the distribution of the sample means is [latex]\mu_{\overline{x}}=2[/latex]. The standard deviation of the sample means is [latex]\displaystyle{\sigma_{\overline{x}}=\frac{\sigma}{\sqrt{n}}=\frac{0.5}{\sqrt{25}}=0.1}[/latex].
-
Function norm.dist Field 1 1.7 Field 2 2 Field 3 0.5/sqrt(25) Field 4 true Answer 0.0013 The probability the sample mean is less than [latex]1.7[/latex] hours is [latex]0.0013[/latex] (or [latex]0.13\%[/latex]).
Note: Because we are calculating a probability for a sample mean, we enter the standard deviation of the sample means 0.5/sqrt(25) into field 3 (and not the standard deviation of the population).
-
Function 1-norm.dist Field 1 2.2 Field 2 2 Field 3 0.5/sqrt(25) Field 4 true Answer 0.0228 The probability the sample mean is more than [latex]2.2[/latex] hours is [latex]0.0228[/latex] (or [latex]2.28\%[/latex]).
-
Function norm.dist -norm.dist Field 1 2.3 1.8 Field 2 2 2 Field 3 0.5/sqrt(25) 0.5/sqrt(25) Field 4 true true Answer 0.9759 The probability the sample mean is between [latex]1.8[/latex] hours and [latex]2.3[/latex] hours is [latex]0.9759[/latex] (or [latex]97.59\%[/latex]).
TRY IT
The length of time taken on the SAT for a group of students has a mean of [latex]2.5[/latex] hours and a standard deviation of [latex]0.25[/latex] hours. A sample size of [latex]60[/latex] is drawn randomly from the population.
- Is the distribution of the sample means normal? Explain.
- What is the probability that the sample mean is between [latex]2.4[/latex] hours and [latex]2.8[/latex] hours?
- What is the probability that the sample mean is at least [latex]2.6[/latex] hours?
- What is the probability that the sample mean is at most [latex]2.45[/latex] hours?
Click to see Solution
- The distribution of the sample means is normal because the sample size of [latex]60[/latex] is greater than [latex]30[/latex].
-
Function norm.dist -norm.dist Field 1 2.8 2.4 Field 2 2.5 2.5 Field 3 0.25/sqrt(60) 0.25/sqrt(60) Field 4 true true Answer 0.9990 -
Function 1-norm.dist Field 1 2.6 Field 2 2.5 Field 3 0.25/sqrt(60) Field 4 true Answer 0.0010 -
Function norm.dist Field 1 2.45 Field 2 2.5 Field 3 0.25/sqrt(60) Field 4 true Answer 0.0607
EXAMPLE
In a recent study reported on Oct. 29, 2012, on the Flurry Blog, the mean age of tablet users is [latex]34[/latex] years, and the standard deviation is [latex]15[/latex] years. Suppose a sample of [latex]100[/latex] tablet users is taken.
- What are the mean and standard deviation for the sample mean ages of tablet users?
- What is the distribution of the sample means? Explain.
- Find the probability that the sample mean age is more than [latex]30[/latex] years.
Solution
- The mean of the distribution of the sample means is [latex]\mu_{\overline{x}}=34[/latex]. The standard deviation of the sample means is [latex]\displaystyle{\sigma_{\overline{x}}=\frac{\sigma}{\sqrt{n}}=\frac{15}{\sqrt{100}}=1.5}[/latex].
- The distribution of the sample means is normal because the sample size of [latex]100[/latex] is greater than [latex]30[/latex].
-
Function 1-norm.dist Field 1 30 Field 2 34 Field 3 15/sqrt(100) Field 4 true Answer 0.9962 The probability the sample mean is more than [latex]30[/latex] years of age is [latex]0.9962[/latex] (or [latex]99.62\%[/latex]).
TRY IT
In an article on Flurry Blog, a gaming marketing gap for men between the ages of 30 and 40 is identified. You are researching a start-up game targeted at the 35-year-old demographic. Your idea is to develop a strategy game that can be played by men from their late 20s through their late 30s. Based on the article’s data, industry research shows that the average strategy player is [latex]28[/latex] years old with a standard deviation of [latex]4.8[/latex] years. You take a sample of [latex]100[/latex] randomly selected gamers. If your target market is 29- to 35-year-olds, should you continue with your development strategy?
Click to see Solution
You need to determine the probability for men whose mean age is between 29 and 35 years of age wanting to play a strategy game.
Function | norm.dist | -norm.dist |
---|---|---|
Field 1 | 35 | 29 |
Field 2 | 28 | 28 |
Field 3 | 4.8/sqrt(100) | 4.8/sqrt(100) |
Field 4 | true | true |
Answer | 0.0186 |
There is [latex]1.86\%[/latex] chance that the mean age of men who will play your game is between 29 years and 35 years. Because this is a very low probability, you should not continue your development strategy.
EXAMPLE
The mean number of minutes for app engagement by a tablet user is [latex]8.2[/latex] minutes with a standard deviation of [latex]1[/latex] minute. Suppose a sample of [latex]60[/latex] table users is taken.
- Is the distribution of the sample mean normal? Explain.
- What are the mean and standard deviation for the sample mean number of minutes for app engagement?
- Find the probability that the sample mean is between [latex]8[/latex] minutes and [latex]8.5[/latex] minutes.
- Find the probability that the sample mean is less than [latex]8.3[/latex] minutes.
Solution
- Because the sample size of [latex]60[/latex] is greater than [latex]30[/latex], the distribution of the sample means also follows a normal distribution.
- The mean of the distribution of the sample means is [latex]\mu_{\overline{x}}=8.2[/latex]. The standard deviation of the sample means is [latex]\displaystyle{\sigma_{\overline{x}}=\frac{\sigma}{\sqrt{n}}=\frac{1}{\sqrt{60}}=0.13}[/latex].
-
Function norm.dist -norm.dist Field 1 8.5 8 Field 2 8.2 8.2 Field 3 1/sqrt(60) 1/sqrt(60) Field 4 true true Answer 0.9293 The probability that the sample mean is between [latex]8[/latex] and [latex]8.5[/latex] minutes is [latex]0.9293[/latex] (or [latex]92.93\%[/latex]).
-
Function norm.dist Field 1 8.3 Field 2 8.2 Field 3 1/sqrt(60) Field 4 true Answer 0.7807 The probability that the sample mean is less than [latex]8.3[/latex] minutes is [latex]0.7807[/latex] (or [latex]78.07\%[/latex]).
TRY IT
Cans of a cola beverage claim to contain [latex]16[/latex] ounces with a standard deviation of [latex]0.143[/latex] ounces. The amounts in a sample of [latex]34[/latex] cans are measured, and the mean is [latex]16.01[/latex] ounces. Find the probability that a sample of [latex]34[/latex] cans will have an average amount greater than [latex]16.01[/latex] ounces. Do the results suggest that cans are filled with an amount greater than [latex]16[/latex] ounces?
Click to see Solution
Function | 1-norm.dist |
---|---|
Field 1 | 16.01 |
Field 2 | 16 |
Field 3 | 0.143/sqrt(34) |
Field 4 | true |
Answer | 0.3417 |
Because there is a [latex]34.17\%[/latex] probability that the average sample volume is greater than [latex]16.01[/latex] ounces, we should be skeptical of the company’s claimed volume. That is, based on this sample, it is likely that the average volume of the cans is higher than the claimed [latex]16[/latex] ounces.
As consumers, we would be glad if the average was higher than [latex]16[/latex] ounces because we are likely receiving more cola in the can than what we paid for. As the manufacturer, we would need to inspect our bottling process to determine if the process is working within acceptable limits.
Video: “Excel Statistics 76: Sampling Distribution Of Sample Mean & Central Limit Theorem” by excelisfun [24:06] is licensed under the Standard YouTube License.Transcript and closed captions available on YouTube.
Exercises
- Yoonie is a personnel manager in a large corporation. Each month she must review [latex]16[/latex] of the employees. From past experience, she has found that the reviews take her approximately [latex]4[/latex] hours each, with a standard deviation of [latex]1.2[/latex] hours. Assume the time it takes her to complete one review is normally distributed. Suppose [latex]16[/latex] reviews are selected at random.
- What is the mean and standard deviation of the population?
- What is the distribution of the sample means? Explain.
- What is the mean and standard deviation of the sample means?
- Find the probability that one review will take Yoonie from [latex]3.5[/latex] to [latex]4.25[/latex] hours.
- Find the probability that the mean of a month’s reviews will take Yoonie from [latex]3.5[/latex] to [latex]4.25[/latex] hours.
- Why are the probabilities in (d) and (e) different?
- Find the probability that the mean of a month’s reviews will take Yoonie more than [latex]5[/latex] hours.
Click to see Answer
- [latex]4[/latex], [latex]1.2[/latex]
- Normal because the population the sample is taken from is normal.
- [latex]4[/latex], [latex]0.3[/latex]
- [latex]0.2441[/latex]
- [latex]0.7499[/latex]
- Part (d) is the probability of a single element from the population, and part (e) is the probability of the sample mean.
- [latex]0.0004[/latex]
- Suppose that the distance of fly balls hit to the outfield (in baseball) is normally distributed with a mean of [latex]250[/latex] feet and a standard deviation of [latex]50[/latex] feet. We randomly sample [latex]49[/latex] fly balls.
- What is the probability that the [latex]49[/latex] balls travelled an average of less than [latex]230[/latex] feet?
- What is the probability that the [latex]49[/latex] balls travelled an average of [latex]245[/latex] feet to [latex]255[/latex] feet?
- What is the probability that the 49 balls travelled an average of more than [latex]260[/latex] feet?
Click to see Answer
- [latex]0.0026[/latex]
- [latex]0.5161[/latex]
- [latex]0.0808[/latex]
- According to the CRA, the average length of time for an individual to complete (keep records for, learn, prepare, copy, assemble, and send) their tax return is [latex]10.53[/latex] hours with a standard deviation of [latex]2[/latex] hours. Suppose we randomly sample [latex]36[/latex] taxpayers.
- What is the distribution of the sample means? Explain.
- Find the probability that the [latex]36[/latex] taxpayers in the sample finished their tax returns in an average of less than [latex]10[/latex] hours.
- Would you be surprised if the [latex]36[/latex] taxpayers finished their tax returns in an average of more than [latex]11.5[/latex] hours? Explain.
- Would you be surprised if one taxpayer finished their tax return in more than [latex]11.5[/latex] hours? Explain.
Click to see Answer
- Normal because the sample size ([latex]36[/latex]) is greater than [latex]30[/latex].
- [latex]0.0559[/latex]
- Yes, because the probability the average time is greater than [latex]11.5[/latex] hours is only [latex]0.0018[/latex].
- No, because the probability that an individual taxpayer took more than [latex]11.5[/latex] hours is [latex]0.3138[/latex].
- Suppose that a category of world-class runners are known to run a marathon ([latex]26[/latex] miles) in an average of [latex]145[/latex] minutes with a standard deviation of [latex]14[/latex] minutes. Consider [latex]49[/latex] of the races.
- Find the probability that the runner will average between [latex]142[/latex] and [latex]146[/latex] minutes in these [latex]49[/latex] marathons.
- Find the probability that the runner will average less than [latex]140[/latex] minutes in these [latex]49[/latex] marathons.
- Find the probability that the runner will average more than [latex]148[/latex] minutes in these [latex]49[/latex] marathons.
Click to see Answer
- [latex]0.6247[/latex]
- [latex]0.0062[/latex]
- [latex]0.0668[/latex]
- In 1940, the average size of a U.S. farm was [latex]174[/latex] acres, and the standard deviation was [latex]55[/latex] acres. Suppose we randomly survey [latex]38[/latex] farmers from 1940.
- What is the distribution of the sample means? Explain.
- What is the mean and standard deviation of the sample means?
- What is the probability that the sample mean is less than [latex]170[/latex] acres?
- What is the probability that the sample mean is more than [latex]180[/latex] acres?
- What is the probability that the sample mean is between [latex]165[/latex] and [latex]175[/latex] acres?
Click to see Answer
- Normal because the sample size ([latex]38[/latex]) is greater than [latex]30[/latex].
- [latex]174[/latex], [latex]8.92[/latex]
- [latex]0.3270[/latex]
- [latex]0.2506[/latex]
- [latex]0.3881[/latex]
- The percent of fat calories that a person in America consumes each day is normally distributed with a mean of [latex]36[/latex] and a standard deviation of [latex]10[/latex]. Suppose that [latex]16[/latex] individuals are randomly chosen.
- What is the distribution of the sample means?
- What is the mean and standard deviation of the sample means?
- Find the probability that the average percent of fat calories consumed in the group of [latex]16[/latex] is more than [latex]35[/latex].
- Find the probability that the average percent of fat calories consumed in the group of [latex]16[/latex] is less than [latex]30[/latex].
- Find the probability that the average percent of fat calories consumed in the group of [latex]16[/latex] is between [latex]40[/latex] and [latex]45[/latex].
Click to see Answer
- Normal because the population the sample is taken from is normal.
- [latex]36[/latex], [latex]2.5[/latex]
- [latex]0.6554[/latex]
- [latex]0.0082[/latex]
- [latex]0.0546[/latex]
- The distribution of income in some Third World countries is considered wedge-shaped (many very poor people, very few middle-income people, and even fewer wealthy people). Suppose we pick a country with a wedge-shaped distribution. Suppose the average salary is [latex]\$2,000[/latex] per year with a standard deviation of [latex]\$8,000[/latex]. We randomly survey [latex]1,000[/latex] residents of that country.
- How is it possible for the standard deviation to be greater than the average?
- What is the distribution of the sample means? Explain
- Is it likely that the average salary of the [latex]1,000[/latex] residents is more than [latex]\$2,800[/latex]? Explain.
- Is it likely that the average salary of the [latex]1,000[/latex] residents is less than [latex]\$1,800[/latex]? Explain.
- Why is it more likely that the average salary of the [latex]1,000[/latex] residents will be from [latex]\$2,000[/latex] to [latex]\$2,100[/latex] than from [latex]\$2,100[/latex] to [latex]\$2,200[/latex]?
Click to see Answer
- Because there are many more poor people compared to middle-income or wealthy people, the mean will be closer to the income of the poorer people. The middle income and wealthy people will have a very large dispersion away from this mean, which causes a large standard deviation.
- Normal because the population the sample size ([latex]1,000[/latex]) is greater than [latex]30[/latex].
- No, because the probability that the average salary is more than [latex]\$2,800[/latex] is only [latex]0.0008[/latex].
- Yes, because the probability that the average salary is less than [latex]\$1,800[/latex] is [latex]0.0008[/latex].
- The probability that the average is between [latex]\$2,000[/latex] and [latex]\$2,100[/latex] is higher at [latex]0.2146[/latex], compared to the probability that the average is between [latex]\$2,100[/latex] and [latex]\$2,200[/latex] at [latex]0.1317[/latex].
- NeverReady Batteries has engineered a newer, longer-lasting AAA battery. The company claims this battery has an average life span of [latex]17[/latex] hours with a standard deviation of [latex]0.8[/latex] hours. Your statistics class questions this claim. As a class, you randomly select [latex]30[/latex] batteries and find that the sample mean life span is [latex]16.7[/latex] hours. If the process is working properly, what is the probability of getting a random sample of [latex]30[/latex] batteries in which the sample mean lifetime is [latex]16.7[/latex] hours or less? Is the company’s claim reasonable?
Click to see Answer
[latex]0.02[/latex]. The claim is not reasonable because the probability of getting a sample mean of [latex]16.7[/latex] or less is only [latex]0.02[/latex]. If the population mean was [latex]17[/latex], we would expect the probability of a sample mean of [latex]16.7[/latex] or less to be higher than [latex]0.02[/latex].
- Your company has a contract to perform preventive maintenance on thousands of air-conditioners in a large city. Based on service records from previous years, the time that a technician spends servicing a unit averages one hour with a standard deviation of one hour. In the coming week, your company will service a simple random sample of [latex]70[/latex] units in the city. You plan to budget an average of [latex]1.1[/latex] hours per technician to complete the work. Will this be enough time?
Click to see Answer
Yes, because the probability that the average time is less than [latex]1.1[/latex] hours is [latex]0.7986[/latex].
“6.2 Sampling Distribution of the Sample Mean” and “6.4 Exercises” from Introduction to Statistics by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.