6.3 Sampling Distribution of the Sample Proportion

LEARNING OBJECTIVES

  • Describe the distribution of the sample proportion.
  • Solve probability problems involving the distribution of the sample proportion.

The Central Limit Theorem tells us that the distribution of the sample means follow a normal distribution under the right conditions. This allows us to answer probability questions about the sample mean [latex]\overline{x}[/latex].  Now we want to investigate the sampling distribution for another important parameter—the sampling distribution of the sample proportion.  Once we know what distribution the sample proportions follow, we can answer probability questions about sample proportions.

A proportion is the percent, fraction, or ratio of a sample or population that have a characteristic of interest.  The population proportion is denoted by [latex]p[/latex] and the sample proportion is denoted by [latex]\hat{p}[/latex].

[latex]\begin{eqnarray*} \\ \mbox{Proportion} & = & \frac{\mbox{Number of Items with Characteristic of Interest}}{\mbox{Total Number of Items}}\\ & = & \frac{x}{n} \\ \\ \end{eqnarray*}[/latex]

If the random variable is discrete, such as for categorical data, then the parameter we wish to estimate is the population proportion. This is, of course, the probability of drawing a success in any one random draw.  Because we are interested in the number of successes, we are dealing with the binomial distribution.  The random variable [latex]X[/latex] is the number of successes and the parameter we wish to know is [latex]p[/latex], the probability of drawing a success, which is of course the proportion of successes in the population.  What is the distribution of the sample proportion [latex]\hat{p}[/latex]?

THE CENTRAL LIMIT THEORM FOR SAMPLE PROPORTIONS

Suppose all samples of size [latex]n[/latex] are taken from a population with proportion [latex]p[/latex].  The collection of sample proportions forms a probability distribution called the sampling distribution of the sample proportion.

  1. The mean of the distribution of the sample proportions, denoted [latex]\mu_{\hat{p}}[/latex], equals the population proportion.

    [latex]\begin{eqnarray*}\\ \mu_{\hat{p}} & = & p  \\ \\ \end{eqnarray*}[/latex]

  2. The standard deviation of the of the sample proportions (called the standard error of the proportion), denoted [latex]\sigma_{\hat{p}}[/latex], is

    [latex]\begin{eqnarray*} \\ \sigma_{\hat{p}}&= & \sqrt{\frac{p \times (1-p)}{n}} \\ \\ \end{eqnarray*}[/latex]

  3. The distribution of the sample proportion is:
    • Normal if [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex].
    • Binomial if one of [latex]n \times p \lt 5[/latex] and [latex]n \times (1-p) \lt 5[/latex].

Watch this video: Sampling Distribution of the Sample Proportion by Khan Academy [9:57]


Watch this video: Sampling Distribution of the Sample Proportion by Khan Academy [4:34]


When [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex], the central limit theorem states that the sampling distribution of the sample proportions follows a normal distribution.  In this case the normal distribution can be used to answer probability questions about sample proportions and the [latex]z[/latex]-score for the sampling distribution of the sample proportions is

[latex]\displaystyle{z=\frac{\hat{p}-p}{\sqrt{\frac{p \times (1-p)}{n}}}}[/latex]

where [latex]p[/latex] is the population proportion and [latex]n[/latex] is the sample size.

CALCULATING PROBABILITIES ABOUT SAMPLE PROPORTIONS IN EXCEL (NORMAL)

When the distribution of the sample proportions follows a normal distribution (when [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex]), the norm.dist(x,[latex]\mu[/latex],[latex]\sigma[/latex],logic operator) function can be used to calculated probabilities associated with a sample proportion.

  • For x, enter the value for [latex]\hat{p}[/latex].
  • For [latex]\mu[/latex], enter the mean of the sample proportions [latex]p[/latex].  Because the mean of the sample proportions equals the proportion of the population the sample is taken from, we enter [latex]p[/latex], the population proportion.
  • For [latex]\sigma[/latex], enter the standard error of the proportion [latex]\displaystyle{\sqrt{\frac{p \times (1-p)}{n}}}[/latex].
  • For the logic operator, enter true.  Note:  Because we are calculating the area under the curve, we always enter true for the logic operator.

NOTE

In this case, we want to calculate probabilities associated with a sample proportion.  The sample proportions follow a normal distribution (under the right conditions), which allows us to use the norm.dist function to calculate probabilities.  Because we are working with sample proportions, we must enter the mean and the standard distribution of the distribution of the sample proportions into the norm.dist function.  The mean of the sample proportions equals the population proportion, so we are entering the value of [latex]p[/latex] into the second field of the norm.dist function.  But the standard distribution of the sample proportion equals [latex]\displaystyle{\sqrt{\frac{p \times (1-p)}{n}}}[/latex], so we must enter this value into third field of the norm.dist function.

We use the norm.dist function in the same way as we learned previously to calculate the probability a sample proportion is less than a given value, a sample proportion is greater than a given value, or a sample proportion is in between two given values.

An alternative approach in Excel is to use the norm.s.dist(z,true) function.  In the norm.s.dist function, we enter the [latex]z[/latex]-score for the corresponding value of [latex]\hat{p}[/latex] (using the [latex]z[/latex]-score for sample proportions given above).

EXAMPLE

A recent study asked working adults if they worked most of their time remotely.  The study found that 30% of employees spend the majority of their time working remotely.  Suppose a sample of 150 working adults is taken.

  1. What is the distribution of the sample proportion?  Explain.
  2. What is the mean and standard deviation of the sample proportion?
  3. What is the probability that at most 27% of the workers in the sample work remotely most of the time?
  4. What is the probability that at least 51 of the workers in the sample work remotely most of the time?
  5. What is the probability that between 32% and 35% of the workers in the sample work remotely most of the time?

Solution:

  1. [latex]n=150[/latex] and [latex]p=0.3[/latex].  Checking [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex]:

    [latex]\begin{eqnarray*} \\ n \times p & = & 150 \times 0.3=45 \geq 5 \\ \\n \times (1-p) & = & 150 \times (1-0.3)=105 \geq 5 \\ \\ \end{eqnarray*}[/latex]

    Because both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex] the distribution of the sample proportion is normal.

  2. The mean of the distribution of the sample proportions is [latex]\mu_{\hat{p}}=0.3[/latex].  The standard deviation of the sample proportions is [latex]\displaystyle{\sigma_{\hat{p}}=\sqrt{\frac{p \times (1-p)}{n}}=\sqrt{\frac{0.3 \times (1-0.3)}{150}}=0.0374}[/latex].
  3. Function norm.dist Answer
    Field 1 0.27 0.2113
    Field 2 0.3
    Field 3 sqrt(0.3*(1-0.3)/150)
    Field 4 true

    The probability the sample proportion is at most 27% is 0.2113 (or 21.13%).

    Note: Because we are calculating a probability for a sample proportion, we enter the mean of the sample proportions 0.3 (which is the population proportion) into field 2 and the standard deviation of the sample proportions sqrt(0.3*(1-0.3)/150) into field 3.

  4. In this case, 51 is not a proportion. It is the number of items in the sample that have the characteristic of interest. We need to convert this 51 out of 150 into a percent: [latex]\displaystyle{\frac{51}{150}=0.34}[/latex].  This question is asking us to find the probability that at least 34% of the workers in the sample work remotely most of the time.
    Function 1-norm.dist Answer
    Field 1 0.34 0.1425
    Field 2 0.3
    Field 3 sqrt(0.3*(1-0.3)/150)
    Field 4 true

    The probability the sample proportion is at least 34% is 0.1425 (or 14.25%).

  5. Function norm.dist -norm.dist Answer
    Field 1 0.35 0.32 0.2058
    Field 2 0.3 0.3
    Field 3 sqrt(0.3*(1-0.3)/150) sqrt(0.3*(1-0.3)/150)
    Field 4 true true

    The probability the sample proportion is between 32% and 35% is 0.2058 (or 20.58%).

TRY IT

According to a recent study, 17.5% of the adult population of Canada are smokers.  Suppose a random sample of 200 adult Canadians is taken.

  1. What is the distribution of the sample proportion?  Explain.
  2. What is the mean and standard deviation of the sample proportion?
  3. What is the probability that less than 32 of the adults in the sample are smokers?
  4. What is the probability that more than 20% of the adults in the sample are smokers?
  5. What is the probability that between 34 and 44 of the adults in the sample are smokers?
Click to see Solution
  1. Because [latex]n \times p=200 \times 0.175=35 \geq 5[/latex] and [latex]n \times (1-p)=200 \times (1-0.175)=165 \geq 5[/latex] the distribution of the sample proportions is normal.
  2. The mean of the distribution of the sample proportions is [latex]\mu_{\hat{p}}=0.175[/latex].  The standard deviation of the sample proportions is [latex]\displaystyle{\sigma_{\hat{p}}=\sqrt{\frac{p \times (1-p)}{n}}=\sqrt{\frac{0.175 \times (1-0.175)}{200}}=0.02687}[/latex].
  3. Function norm.dist Answer
    Field 1 0.16 0.2883
    Field 2 0.175
    Field 3 sqrt(0.175*(1-0.175)/200)
    Field 4 true
  4. Function 1-norm.dist Answer
    Field 1 0.2 0.1761
    Field 2 0.175
    Field 3 sqrt(0.175*(1-0.175)/200)
    Field 4 true
  5. Function norm.dist -norm.dist Answer
    Field 1 0.22 0.17 0.9530
    Field 2 0.175 0.175
    Field 3 sqrt(0.175*(1-0.175)/200) sqrt(0.175*(1-0.175)/200)
    Field 4 true true

When one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex], the sampling distribution of the sample proportions follows a binomial distribution, and so we must use the binomial distribution to answer probability questions about sample proportions.  In these cases, we are actually answering probability questions about the number of items with the characteristic of interest, [latex]x[/latex].  In other words, we are answering questions about the number of successes [latex]x[/latex] we get in [latex]n[/latex] trials (the sample size) where the probability of success is the population proportion [latex]p[/latex].  These are exactly the same type of questions we answered previously with the binomial distribution.

CALCULATING PROBABILITIES ABOUT SAMPLE PROPORTIONS IN EXCEL (BINOMIAL)

When the distribution the sample proportions follows a binomial distribution (when one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex]), the binom.dist(x,n,p,logic operator) function can be used to calculated probabilities associated with a sample proportion.

  • For x, enter the number of items with the characteristic of interest [latex]x[/latex].
  • For n, enter the sample size [latex]n[/latex].  The sample size is the number of trials in the binomial experiment.
  • For p, enter the population proportion [latex]p[/latex].  The population proportion is the probability of success.
  • For the logic operator, enter trueNote:  Because probabilities for sample proportions are generally inequalities ([latex]\lt, \leq, \gt, \geq[/latex]), we enter true for the logic operator.  We would only enter false in the case that the probability of the sample proportion exactly equals a given value.

NOTE

We use the binom.dist function in the same way as we learned previously to calculate the probability a sample proportion is less than a given value, a sample proportion is at most a given value, a sample proportion is greater than a given value, or a sample proportion is at least a given value.

EXAMPLE

At the local humane society, 3% of the dogs have heartworm disease.  Suppose a sample of 60 dogs at the humane society is taken.

  1. What is the distribution of the sample proportion?  Explain.
  2. What is the probability that at most 5% of the dogs in the sample have heartworm disease?
  3. What is the probability that less than 7 of the dogs in the sample have heartworm disease?
  4. What is the probability that more than 8% of the dogs in the sample have heartworm disease?
  5. What is the probability that at least 6 of the dogs in the sample have heartworm disease?

Solution:

  1. Because [latex]n \times p=60 \times 0.03=1.8 \lt 5[/latex] the distribution of the sample proportions is binomial.
  2. We want to find [latex]P(\hat{p} \leq 0.05)[/latex].  Because we are using the binomial distribution, we have to convert 5% into the number of items [latex]x[/latex] in the sample with the required characteristic:  [latex]x=0.05 \times 60=3[/latex].  In terms of the binomial distribution, we need to find [latex]P(x \leq 3)[/latex].
    Function binom.dist Answer
    Field 1 3 0.8943
    Field 2 60
    Field 3 0.03
    Field 4 true

    The probability that at most 5% of the dogs in the sample have heartworm disease is 0.8943 (or 89.43%).

  3. We want to find [latex]P(x \lt 7)[/latex].  Because we are using the binomial distribution, this probability is the same as[latex]P(x \leq 6)[/latex].
    Function binom.dist Answer
    Field 1 6 0.9979
    Field 2 60
    Field 3 0.03
    Field 4 true

    The probability that less than 7 of the dogs in the sample have heartworm disease is 0.9979 (or 99.79%).

  4. We want to find [latex]P(\hat{p} \gt 0.08)[/latex].  Because we are using the binomial distribution, we have to convert 8% into the number of items [latex]x[/latex] in the sample with the required characteristic:  [latex]x=0.08 \times 60=4.8[/latex].  In terms of the binomial distribution, we need to find [latex]P(x \gt 4.8)[/latex].  This is the same as [latex]1-P(x \leq 4)[/latex].
    Function 1-binom.dist Answer
    Field 1 4 0.0340
    Field 2 60
    Field 3 0.03
    Field 4 true

    The probability that more than 8% of the dogs in the sample have heartworm disease is 0.0340 (or 3.4%).

  5. We want to find [latex]P(x \geq 6)[/latex].   Because we are using the binomial distribution, this probability is the same as [latex]1-P(x \leq 5)[/latex].
    Function 1-binom.dist Answer
    Field 1 5 0.0091
    Field 2 60
    Field 3 0.03
    Field 4 true

    The probability that at least 6 of the dogs in the sample have heartworm disease is 0.0091 (or 0.91%).

TRY IT

During the past tax season, 92% of tax returns were filed using an electronic filing system.  Suppose a sample of 40 tax returns are selected.

  1. What is the distribution of the sample proportions?
  2. What is the probability at most 35 of the tax returns in the sample were filed electronically?
  3. What is the probability less than 93% of the tax returns in the sample were filed electronically?
  4. What is the probability more than 36 of the tax returns in the sample were filed electronically?
  5. What is the probability at least 88% of the tax returns in the sample were filed electronically?
Click to see Solution
  1. Because [latex]n \times (1-p)=40 \times (1-0.92)=3.2 \lt 5[/latex] the distribution of the sample proportions is binomial.
  2. Function binom.dist Answer
    Field 1 35 0.2132
    Field 2 40
    Field 3 0.92
    Field 4 true
  3. Function binom.dist Answer
    Field 1 37 0.6306
    Field 2 40
    Field 3 0.92
    Field 4 true
  4. Function 1-binom.dist Answer
    Field 1 36 0.6007
    Field 2 40
    Field 3 0.92
    Field 4 true
  5. Function 1-binom.dist Answer
    Field 1 33 0.9624
    Field 2 40
    Field 3 0.92
    Field 4 true

Watch this video: Excel Statistics 79: Proportions Sampling Distribution by ExcelIsFun [8:54]


Concept Review

The distribution of the sample proportions follows a

  • normal distribution if both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex].
  • binomial distribution if one of [latex]n \times p \lt 5[/latex] and [latex]n \times (1-p) \lt 5[/latex].

The mean of the sample proportion [latex]\mu_{\hat{p}}[/latex] equals the population proportion [latex]p[/latex].  The standard deviation of the sample proportions [latex]\sigma_{\hat{p}}[/latex] is equal to [latex]\displaystyle{\sqrt{\frac{p \times (1-p)}{n}}}[/latex] where [latex]p[/latex] is the population proportion and [latex]n[/latex] is the sample size.


Attribution

7.3 The Central Limit Theorem for Proportions in Introductory Business Statistics by OpenStax is licensed under a Creative Commons Attribution 4.0 International License.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.