6.2 Sampling Distribution of the Sample Mean

LEARNING OBJECTIVES

  • Describe the distribution of the sample mean.
  • Solve probability problems involving the distribution of the sample mean.

Suppose all samples of size [latex]n[/latex] are selected from a population with mean [latex]\mu[/latex] and standard deviation [latex]\sigma[/latex].  For each sample, the sample mean [latex]\overline{x}[/latex] is recorded.  The probability distribution of these sample means is called the sampling distribution of the sample means.  The central limit theorem describes the properties of the sampling distribution of the sample means.

THE CENTRAL LIMIT THEOREM

Suppose all samples of size [latex]n[/latex] are taken from a population with mean [latex]\mu[/latex] and standard deviation [latex]\sigma[/latex].  The collection of sample means forms a probability distribution called the sampling distribution of the sample mean.

  1. The mean of the distribution of the sample means, denoted [latex]\mu_{\overline{x}}[/latex], equals the mean of the population.

    [latex]\begin{eqnarray*} \\ \mu_{\overline{x}}& = & \mu \\ \\ \end{eqnarray*}[/latex]

  2. The standard deviation of the of the sample means (called the standard error of the mean), denoted [latex]\sigma_{\overline{x}}[/latex], equals the standard deviation of the population divided by the square root of the sample size.

    [latex]\begin{eqnarray*} \\ \sigma_{\overline{x}} & = & \frac{\sigma}{\sqrt{n}} \\ \\ \end{eqnarray*}[/latex]

  3. The distribution of the sample means follows a normal distribution if one of the following conditions is met:
    • The population the samples are drawn from is normal, regardless of the sample size [latex]n[/latex].
    • The sample size [latex]n \geq 30[/latex].

Watch this video: Sampling distribution of the sample mean | Probability and Statistics | Khan Academy by Khan Academy [10:51] 


Watch this video: Standard error of the mean | Inferential statistics | Probability and Statistics | Khan Academy by Khan Academy [15:14] 


Because the central limit theorem states that the sampling distribution of the sample means follows a normal distribution (under the right conditions), the normal distribution can be used to answer probability questions about sample means. The [latex]z[/latex]-score for the sampling distribution of the sample means is

[latex]\displaystyle{z=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}}[/latex]

where [latex]\mu[/latex] is the mean of the population the sample is taken from, [latex]\sigma[/latex] is the standard deviation of the population the sample is taken from, and [latex]n[/latex] is the sample size.

CALCULATING PROBABILITIES ABOUT SAMPLE MEANS IN EXCEL

Because the distribution the sample means follows a normal distribution (under the right conditions), the norm.dist(x,[latex]\mu[/latex],[latex]\sigma[/latex],logic operator) function can be used to calculated probabilities associated with a sample mean.

  • For x, enter the value for [latex]\overline{x}[/latex].
  • For [latex]\mu[/latex], enter the mean of the sample means [latex]\mu[/latex].  Because the mean of the sample means equals the mean of the population the sample is taken from, we enter [latex]\mu[/latex], the mean of the population.
  • For [latex]\sigma[/latex], enter the standard error of the mean [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex].
  • For the logic operator, enter true.  Note:  Because we are calculating the area under the curve, we always enter true for the logic operator.

NOTE

In this case, we want to calculate probabilities associated with a sample mean.  The sample means follow a normal distribution (under the right conditions), which allows us to use the norm.dist function to calculate probabilities.  Because we are working with sample means, we must enter the mean and the standard distribution of the distribution of the sample means into the norm.dist function, and not the mean and standard distribution of the population the samples are taken from.  The mean of the sample means equals the mean of the population, so we are entering the value of [latex]\mu[/latex] into the second field of the norm.dist function.  But the standard distribution of the sample means equals [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex], so we must enter this value into third field of the norm.dist function.

We use the norm.dist function in the same way as we learned previously to calculate the probability a sample mean is less than a given value, a sample mean is greater than a given value, or a sample mean is in between two given values.

An alternative approach in Excel is to use the norm.s.dist(z,true) function.  In the norm.s.dist function, we enter the z-score for the corresponding value of [latex]\overline{x}[/latex] (using the z-score for sample means given above).

EXAMPLE

The length of time, in hours, it takes an “over 40” group of people to play one soccer match is normally distributed with a mean of 2 hours and a standard deviation of 0.5 hours.  Suppose a sample of size 25 is drawn randomly from the population.

  1. Is the distribution of the sample means normal?  Explain.
  2. What is the mean and the standard distribution of the distribution of the sample means?
  3. What is the probability that the mean of the sample is less than 1.7 hours?
  4. What is the probability that the mean of the sample is more than 2.2 hours?
  5. What is the probability that the sample mean is between 1.8 hours and 2.3 hours?

Solution:

  1. Because the population the sample is taken from follows a normal distribution, the distribution of the sample means also follows a normal distribution.
  2. The mean of the distribution of the sample means is [latex]\mu_{\overline{x}}=2[/latex].  The standard deviation of the sample means is [latex]\displaystyle{\sigma_{\overline{x}}=\frac{\sigma}{\sqrt{n}}=\frac{0.5}{\sqrt{25}}=0.1}[/latex].
  3. Function norm.dist Answer
    Field 1 1.7 0.0013
    Field 2 2
    Field 3 [latex]0.5\sqrt(25)[/latex]
    Field 4 true

    The probability the sample mean is less than 1.7 hours is 0.0013 (or 0.13%).

    Note: Because we are calculating a probability for a sample mean, we enter the standard deviation of the sample means [latex]0.5\sqrt(25)[/latex] into field 3 (and not the standard deviation of the population).

  4. Function 1-norm.dist Answer
    Field 1 2.2 0.0228
    Field 2 2
    Field 3 [latex]0.5\sqrt(25)[/latex]
    Field 4 true

    The probability the sample mean is more than 2.2 hours is 0.0228 (or 2.28%).

  5. Function norm.dist -norm.dist Answer
    Field 1 2.3 1.8 0.9759
    Field 2 2 2
    Field 3 [latex]0.5\sqrt(25)[/latex] [latex]0.5\sqrt(25)[/latex]
    Field 4 true true

    The probability the sample mean is between 1.8 hours and 2.3 hours is 0.9759 (or 97.59%).

TRY IT

The length of time taken on the SAT for a group of students has a mean of 2.5 hours and a standard deviation of 0.25 hours.  A sample size of 60 is drawn randomly from the population.

  1. Is the distribution of the sample means normal?  Explain.
  2. What is the probability that sample mean is between 2.4 hours and 2.8 hours?
  3. What is the probability that the sample mean is at least 2.6 hours?
  4. What is the probability that the sample mean is at most 2.45 hours?
Click to see Solution
  1. The distribution of the sample means is normal because the sample size of 60 is greater than 30.
  2. Function norm.dist -norm.dist Answer
    Field 1 2.8 2.4 0.9990
    Field 2 2.5 2.5
    Field 3 [latex]0.25\sqrt(60)[/latex] [latex]0.25\sqrt(60)[/latex]
    Field 4 true true
  3. Function 1-norm.dist Answer
    Field 1 2.6 0.0010
    Field 2 2.5
    Field 3 [latex]0.25\sqrt(60)[/latex]
    Field 4 true
  4. Function norm.dist Answer
    Field 1 2.45 0.0607
    Field 2 2.5
    Field 3 [latex]0.25\sqrt(60)[/latex]
    Field 4 true

EXAMPLE

In a recent study reported Oct. 29, 2012 on the Flurry Blog, the mean age of tablet users is 34 years and the standard deviation is 15 years.  Suppose a sample of 100 tablet users is taken.

  1. What are the mean and standard deviation for the sample mean ages of tablet users?
  2. What is the distribution of the sample means? Explain.
  3. Find the probability that the sample mean age is more than 30 years.

Solution:

  1. The mean of the distribution of the sample means is [latex]\mu_{\overline{x}}=34[/latex].  The standard deviation of the sample means is [latex]\displaystyle{\sigma_{\overline{x}}=\frac{\sigma}{\sqrt{n}}=\frac{15}{\sqrt{100}}=1.5}[/latex].
  2. The distribution of the sample means is normal because the sample size of 100 is greater than 30
  3. Function 1-norm.dist Answer
    Field 1 30 0.9962
    Field 2 34
    Field 3 [latex]15\sqrt(100)[/latex]
    Field 4 true

    The probability the sample mean is more than 30 years of age is 0.9962 (or 99.62%).

TRY IT

In an article on Flurry Blog, a gaming marketing gap for men between the ages of 30 and 40 is identified.  You are researching a start-up game targeted at the 35-year-old demographic.  Your idea is to develop a strategy game that can be played by men from their late 20s through their late 30s.  Based on the article’s data, industry research shows that the average strategy player is 28 years old with a standard deviation of 4.8 years.  You take a sample of 100 randomly selected gamers.  If your target market is 29- to 35-year-olds, should you continue with your development strategy?

 

Click to see Solution

 

You need to determine the probability for men whose mean age is between 29 and 35 years of age wanting to play a strategy game.

Function norm.dist -norm.dist Answer
Field 1 35 29 0.0186
Field 2 28 28
Field 3 [latex]4.8\sqrt(100)[/latex] [latex]4.8\sqrt(100)[/latex]
Field 4 true true

There is 1.86% chance that the mean age of men who will play your game is between 29 years and 35 years.  Because this is a very low probability, you should not continue your development strategy.

EXAMPLE

The mean number of minutes for app engagement by a tablet user is 8.2 minutes with a standard deviation of 1 minute.  Suppose a sample of 60 table users is taken.

  1. Is the distribution of the sample mean normal?  Explain.
  2. What are the mean and standard deviation for the sample mean number of minutes for app engagement?
  3. Find the probability that the sample mean is between 8 minutes and 8.5 minutes.
  4. Find the probability that the sample mean is less than 8.3 minutes.

Solution:

  1. Because the sample size of 60 is greater than 30, the distribution of the sample means also follows a normal distribution.
  2. The mean of the distribution of the sample means is [latex]\mu_{\overline{x}}=8.2[/latex].  The standard deviation of the sample means is [latex]\displaystyle{\sigma_{\overline{x}}=\frac{\sigma}{\sqrt{n}}=\frac{1}{\sqrt{60}}=0.13}[/latex].
  3. Function norm.dist -norm.dist Answer
    Field 1 8.5 8 0.9293
    Field 2 8.2 8.2
    Field 3 [latex]1\sqrt(60)[/latex] [latex]1\sqrt(60)[/latex]
    Field 4 true true

    The probability that the sample mean is between 8 and 8.5 minutes is 0.9293 (or 92.93%).

  4. Function norm.dist Answer
    Field 1 8.3 0.7807
    Field 2 8.2
    Field 3 [latex]1\sqrt(60)[/latex]
    Field 4 true

    The probability that the sample mean is less than 8.3 minutes is 0.7807 (or 78.07%).

TRY IT

Cans of a cola beverage claim to contain 16 ounces with a standard deviation of 0.143 ounces.  The amounts in a sample of 34 cans are measured and the mean is 16.01 ounces.   Find the probability that a sample of 34 cans will have an average amount greater than 16.01 ounces.  Do the results suggest that cans are filled with an amount greater than 16 ounces?

 

Click to see Solution

 

Function 1-norm.dist Answer
Field 1 16.01 0.3417
Field 2 16
Field 3 [latex]0.143\sqrt(34)[/latex]
Field 4 true

Because there is a 34.17% probability that the average sample volume is greater than 16.01 ounces, we should be skeptical of the company’s claimed volume.  That is, based on this sample, it is likely that the average volume of the cans is higher than the claimed 16 ounces.

As consumers, we would be glad if the average was higher than 16 ounces because we are likely receiving more cola in the can that what we paid for.  As the manufacturer, we would need to inspect our bottling process to determine if the processes is working within acceptable limits.


Watch this video: Excel Statistics 76: Sampling Distribution Of Sample Mean & Central Limit Theorem by ExcelIsFun [24:05]


Concept Review

The distribution of the sample means follows a normal distribution if one of the following conditions is met:

  • The population the samples are taken from is normal.
  • The sample size is greater than or equal to 30.

The mean of the sample means [latex]\mu_{\overline{x}}[/latex] equals the population mean [latex]\mu[/latex]. The standard deviation of the sample means [latex]\sigma_{\overline{x}}[/latex] is equal to [latex]\displaystyle{\frac{\sigma}{\sqrt{n}}}[/latex] where [latex]\sigma[/latex] is the population standard deviation and [latex]n[/latex] is the sample size.


Attribution

“7.1 The Central Limit Theorem for Sample Means (Averages) in Introductory Statistics by OpenStax is licensed under a Creative Commons Attribution 4.0 International License.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.