7.4 Confidence Intervals for a Population Proportion
LEARNING OBJECTIVES
- Calculate and interpret confidence intervals for estimating a population proportion.
During an election year, we see articles in the newspaper that state confidence intervals in terms of proportions or percentages. For example, a poll for a particular candidate running for president might show that the candidate has [latex]40\%[/latex] of the vote within three percentage points (if the sample is large enough). Often, election polls are calculated with [latex]95\%[/latex] confidence, so, the pollsters would be [latex]95\%[/latex] confident that the true proportion of voters who favoured the candidate would be between [latex]37\%[/latex] and [latex]43\%[/latex].
Investors in the stock market are interested in the true proportion of stocks that go up and down each week. Businesses that sell personal computers are interested in the proportion of households that own personal computers. Confidence intervals can be calculated for the true proportion of stocks that go up or down each week and for the true proportion of households that own personal computers.
A confidence interval for a population proportion is based on the fact that the sample proportions follow an approximately normal distribution when both [latex]n\times p\geq 5[/latex] and [latex]n\times(1-p)\geq 5[/latex]. Similar to confidence intervals for population means, a confidence interval for a population proportion is constructed by taking a sample of size [latex]n[/latex] from the population, calculating the sample proportion [latex]\hat{p}[/latex], and then adding and subtracting the margin of error from [latex]\hat{p}[/latex] to get the limits of the confidence interval.
In order to construct a confidence interval for a population proportion, we must be able to assume the sample proportions follow a normal distribution. As we have seen previously, we can assume the sample proportions follow a normal distribution when both [latex]n\times p\geq 5[/latex] and [latex]n\times(1-p)\geq 5[/latex]. But in this situation, the population proportion [latex]p[/latex] is unknown, so we cannot check the values of [latex]n\times p[/latex] and [latex]n\times(1-p)[/latex]. Because we must take a sample and calculate the sample proportion [latex]\hat{p}[/latex], we can check the quantities [latex]n\times\hat{p}[/latex] and [latex]n\times(1-\hat{p})[/latex]. For a confidence interval for a population proportion, if both [latex]n\times\hat{p}\geq 5[/latex] and [latex]n\times(1-\hat{p})\geq 5[/latex], we can assume the sample proportions follow a normal distribution.
Calculating the Margin of Error
The margin of error for a confidence interval with confidence level [latex]C[/latex] for an unknown population proportion [latex]p[/latex] is
[latex]\displaystyle{\text{Margin of Error}=z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}}[/latex]
where [latex]z[/latex] is the [latex]z[/latex]-score from the standard normal distribution so that the area to the left of [latex]z[/latex] is [latex]\displaystyle{C+\frac{1-C}{2}}[/latex].
NOTE
In the margin of error formula, the sample proportion [latex]\hat{p}[/latex] is used to estimate the unknown population proportion [latex]p[/latex]. The estimated sample proportion [latex]\hat{p}[/latex] is used because [latex]p[/latex] is the unknown quantity we are trying to estimate with the confidence interval. The sample proportion [latex]\hat{p}[/latex] is calculated from the sample taken to construct the confidence interval where
[latex]\displaystyle{\hat{p}=\frac{\text{number of items in the sample with characteristic of interest}}{n}}[/latex]
Constructing the Confidence Interval
The limits for the confidence interval with confidence level [latex]C[/latex] for an unknown population proportion [latex]p[/latex] are
[latex]\begin{eqnarray*}\\\text{Lower Limit}&=&\hat{p}-z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\\\\\text{Upper Limit}&=&\hat{p}+z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\\\\\end{eqnarray*}[/latex]
where [latex]z[/latex] is the [latex]z[/latex]-score from the standard normal distribution so that the area to the left of [latex]z[/latex] is [latex]\displaystyle{C+\frac{1-C}{2}}[/latex].
NOTE
The confidence interval can only be used if we can assume the sample proportions follow a normal distribution. This means we must check that [latex]n\times\hat{p}\geq 5[/latex] and [latex]n\times(1-\hat{p})\geq 5[/latex] before constructing the confidence interval. If one of [latex]n\times\hat{p}[/latex] or [latex]n\times(1-\hat{p})[/latex] is less than [latex]5[/latex], we cannot construct the confidence interval.
CALCULATING THE [latex]\color{white}{z}[/latex]-SCORE FOR A CONFIDENCE INTERVAL IN EXCEL
To find the [latex]z[/latex]-score to construct a confidence interval with confidence level [latex]C[/latex], use the norm.s.inv(area to the left of z) function.
- For area to the left of z, enter the entire area to the left of the [latex]z[/latex]-score required. For a confidence interval, the area to the left of [latex]z[/latex] is [latex]\displaystyle{C+\frac{1-C}{2}}[/latex].
The output from the norm.s.inv function is the value of [latex]z[/latex]-score needed to construct the confidence interval.
NOTE
The norm.s.inv function requires that we enter the entire area to the left of the unknown [latex]z[/latex]-score. This area includes the confidence level [latex]C[/latex] (the area in the middle of the distribution) plus the remaining area in the left tail [latex]\frac{1-C}{2}[/latex].
EXAMPLE
Suppose that a market research firm is hired to estimate the percentage of adults living in a large city who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of the [latex]500[/latex] people surveyed, [latex]421[/latex] responded yes – they own cell phones.
- Construct a [latex]95\%[/latex] confidence interval for the proportion of adult residents of this city who have cell phones.
- Interpret the confidence interval found in part 1.
- Is it reasonable to conclude that [latex]85\%[/latex] of the adult residents of this city have cell phones? Explain.
Solution
- The sample proportion is [latex]\displaystyle{\hat{p}=\frac{421}{500}=0.842}[/latex]. We need to check [latex]n\times\hat{p}[/latex] and [latex]n\times(1-\hat{p})[/latex]:
[latex]\begin{eqnarray*}\\n\times\hat{p}&=&500\times 0.842=421\geq 5\\\\n\times(1-\hat{p})&=&500\times(1-0.842)=79\geq 5\\\\\end{eqnarray*}[/latex]
Because both [latex]n \times \hat{p} \geq 5[/latex] and [latex]n\times(1-\hat{p})\geq 5[/latex], the sample proportions follow a normal distribution and we can construct the confidence interval.
To find the confidence interval, we need to find the [latex]z[/latex]-score for the [latex]95\%[/latex] confidence interval. This means that we need to find the [latex]z[/latex]-score from the standard normal distribution so that the entire area to the left of [latex]z[/latex] is [latex]\displaystyle{0.95+\frac{1-0.95}{2}=0.975}[/latex].
Function norm.s.inv Field 1 0.975 Answer 1.9599… So [latex]z=1.9599....[/latex]. The [latex]95\%[/latex] confidence interval is
[latex]\begin{eqnarray*}\\\text{Lower Limit}&=&\hat{p}-z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\\&=&0.842-1.9599...\times\sqrt{\frac{0.842\times(1-0.842)}{500}}\\&=&0.8100\\\\\text{Upper Limit}&=&\hat{p}+z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\\&=&0.842+1.9599...\times\sqrt{\frac{0.842\times(1-0.842)}{500}}\\&=&0.8740\\\\\end{eqnarray*}[/latex]
- We are [latex]95\%[/latex] confident that the proportion of adult residents of this city who have cell phones is between [latex]81\%[/latex] and [latex]87.4\%[/latex].
- It is reasonable to conclude that [latex]85\%[/latex] of the adult residents of this city have cell phones because [latex]85\%[/latex] is inside the confidence interval.
NOTES
- When calculating the limits for the confidence interval, keep all of the decimals in the [latex]z[/latex]-score and other values throughout the calculation. This will ensure that there is no round-off error in the answers. Use Excel to do the calculation of the limits, clicking on the cells containing the [latex]z[/latex]-score and any other values, to ensure that all of the decimal places are used in the calculation.
- The limits for the confidence interval are percents. In the above example, the upper limit of [latex]0.8740[/latex] is the decimal form of a percent: [latex]87.4\%[/latex].
- When writing down the interpretation of the confidence interval, make sure to include the confidence level, and the actual population proportion captured by the confidence interval (i.e. be specific to the context of the question), and express the limits as percents.
- With a confidence level of [latex]C\%[/latex], [latex]C\%[/latex] of all confidence intervals constructed contain the true population parameter. In the above example, this means that [latex]95\%[/latex] of all confidence intervals constructed this way contain the proportion of adult residents in this city that have a cell phone. In other words, if we constructed [latex]100[/latex] of these confidence (using [latex]100[/latex] different samples of size [latex]500[/latex]), we would expect [latex]95[/latex] of them to contain the true proportion of adult residents in this city that have a cell phone.
TRY IT
Suppose [latex]250[/latex] randomly selected people are surveyed to determine if they own a tablet. Of the [latex]250[/latex] surveyed, [latex]98[/latex] reported owning a tablet.
- Construct a [latex]94\%[/latex] confidence interval for the proportion of people who own tablets.
- Interpret the confidence interval found in part 1.
- Is it reasonable to assume that [latex]30\%[/latex] of people own tablets? Explain.
Click to see Solution
-
Function norm.s.inv Field 1 0.97 Answer 1.8807… [latex]\begin{eqnarray*}\\\text{Lower Limit}&=&\hat{p}-z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\\&=&0.392-1.8807...\times\sqrt{\frac{0.392\times(1-0.392)}{250}}\\&=&0.3339\\\\\text{Upper Limit}&=&\hat{p}+z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\\&=&0.392+1.8807...\times\sqrt{\frac{0.392\times(1-0.392)}{250}}\\&=&0.4501\\\\\end{eqnarray*}[/latex]
- We are [latex]94\%[/latex] confident that the proportion of people who own tablets is between [latex]33.39\%[/latex] and [latex]45.01\%[/latex].
- It is not reasonable to claim the proportion of people who own tablets is [latex]30\%[/latex] because [latex]30\%[/latex] is outside the confidence interval.
EXAMPLE
For a class project, a political science student at a large university wants to estimate the percentage of students who are registered voters. He surveys [latex]500[/latex] students and finds that [latex]300[/latex] are registered voters.
- Construct a [latex]90\%[/latex] confidence interval for the percent of students who are registered voters.
- Interpret the confidence interval found in part 1.
Solution
- The sample proportion is [latex]\displaystyle{\hat{p}=\frac{300}{500}=0.6}[/latex]. We need to check [latex]n\times\hat{p}[/latex] and [latex]n\times(1-\hat{p})[/latex]:
[latex]\begin{eqnarray*}\\n\times\hat{p}&=&500\times 0.6=300\geq 5\\\\n\times(1-\hat{p})&=&500\times(1-0.6)=200\geq 5\\\\\end{eqnarray*}[/latex]
Because both [latex]n \times \hat{p} \geq 5[/latex] and [latex]n\times(1-\hat{p})\geq 5[/latex], the sample proportions follow a normal distribution and we can construct the confidence interval.
To find the confidence interval, we need to find the [latex]z[/latex]-score for the [latex]90\%[/latex] confidence interval. This means that we need to find the [latex]z[/latex]-score from the standard normal distribution so that the entire area to the left of [latex]z[/latex] is [latex]\displaystyle{0.90+\frac{1-0.90}{2}=0.95}[/latex].
Function norm.s.inv Field 1 0.95 Answer 1.6448… So [latex]z=1.6448....[/latex]. The [latex]90\%[/latex] confidence interval is
[latex]\begin{eqnarray*}\\\text{Lower Limit}&=&\hat{p}-z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\\&=&0.6-1.6448...\times\sqrt{\frac{0.6\times(1-0.6)}{500}}\\&=&0.5640\\\\\text{Upper Limit}&=&\hat{p}+z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\\&=&0.6+1.6448...\times\sqrt{\frac{0.6\times(1-0.6)}{500}}\\&=&0.6360\\\\\end{eqnarray*}[/latex]
- We are [latex]90\%[/latex] confident that the percent of students who are registered voters is between [latex]56.4\%[/latex] and [latex]63.6\%[/latex].
TRY IT
A student polls her school to see if students in the school district are for or against the new legislation regarding school uniforms. She surveys [latex]600[/latex] students and finds that [latex]480[/latex] are against the new legislation.
- Construct a [latex]98\%[/latex] confidence interval for the proportion of students who are against the new legislation.
- Interpret the confidence interval found in part 1.
- A parent’s group claims that only [latex]75\%[/latex] of students are against the legislation. Is it reasonable for the group to make this claim? Explain.
Click to see Solution
-
Function norm.s.inv Field 1 0.99 Answer 2.3263… [latex]\begin{eqnarray*}\\\text{Lower Limit}&=&\hat{p}-z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\\&=&0.8-2.3264...\times\sqrt{\frac{0.8\times(1-0.8)}{600}}\\&=&0.7620\\\\\text{Upper Limit}&=&\hat{p}+z\times\sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\\&=&0.8+2.3263...\times\sqrt{\frac{0.8\times(1-0.8)}{600}}\\&=&0.8380\\\\\end{eqnarray*}[/latex]
- We are [latex]98\%[/latex] confident that the proportion of students who are against the new legislation is between [latex]76.20\%[/latex] and [latex]83.80\%[/latex].
- It is not reasonable for the group to claim the proportion is [latex]75\%[/latex] because [latex]75\%[/latex] is outside of the confidence interval.
Video: “Excel Statistics 85: Confidence Intervals for Proportions #1” by excelisfun [8:34] is licensed under the Standard YouTube License.Transcript and closed captions available on YouTube.
Video: “Excel Statistics 86: Confidence Intervals for Proportions #2” by excelisfun [4:52] is licensed under the Standard YouTube License.Transcript and closed captions available on YouTube.
Exercises
- A company manufactures individual fruit snack bags with each bag containing a mixture of grape, orange, berry, and banana-flavoured snack pieces. A sample of bags contained a total of [latex]68[/latex] snack pieces, [latex]5[/latex] of which were berry flavoured.
- Construct a [latex]96\%[/latex] confidence interval for the proportion of berry-flavoured snack pieces per bag.
- Interpret the confidence interval found in part (a).
- Is it reasonable for the company to claim that each bag contains [latex]8\%[/latex] berry-flavoured snack pieces? Explain.
Click to see Answer
- [latex]\text{Lower Limit}=0.0085[/latex], [latex]\text{Upper Limit}=0.1385[/latex]
- There is a [latex]96\%[/latex] probability that the proportion of berry-flavoured snack pieces per bag is between [latex]0.85\%[/latex] and [latex]13.85\%[/latex].
- Yes, because [latex]8\%[/latex] is inside the confidence interval.
- Suppose the marketing company is interested in the proportion of women who make the majority of the household purchasing decisions. They randomly surveyed [latex]200[/latex] households and found that in [latex]120[/latex] of them, the woman made the majority of the household purchasing decisions.
- Construct a [latex]95\%[/latex] confidence interval for the proportion of households where the women make the majority of the purchasing decisions.
- Interpret the confidence interval found in part (a).
- Is it reasonable for the marketing company to claim that women make the majority of purchasing decisions in [latex]70\%[/latex] of households? Explain.
Click to see Answer
- [latex]\text{Lower Limit}=0.5321[/latex], [latex]\text{Upper Limit}=0.6679[/latex]
- There is a [latex]95\%[/latex] probability that the proportion of women who make the majority of the household purchasing decisions is between [latex]53.21\%[/latex] and [latex]66.79\%[/latex].
- No, because [latex]70\%[/latex] is outside the confidence interval.
- A pollster is interested in what voters think is the most important issue in an upcoming election. In a poll of [latex]1,200[/latex] voters, [latex]65\%[/latex] said the economy was the most important issue.
- Construct a [latex]90\%[/latex] confidence interval for the proportion of voters who believe the economy is the most important issue in the upcoming election.
- Interpret the confidence interval found in part (a).
- Is it reasonable to claim that [latex]60\%[/latex] of voters believe the economy is the most important issue in the upcoming election? Explain.
- What would happen to the confidence interval if the level of confidence were [latex]95\%[/latex]?
Click to see Answer
- [latex]\text{Lower Limit}=0.6274[/latex], [latex]\text{Upper Limit}=0.6726[/latex]
- There is a [latex]90\%[/latex] probability that the proportion of voters who believe the economy is the most important issue in the upcoming election is between [latex]62.74\%[/latex] and [latex]67.26\%[/latex].
- No, because [latex]60\%[/latex] is outside the confidence interval.
- The confidence interval would get wider.
- The Ice Chalet skating school offers dozens of different ice-skating classes at various levels of ability. The Ice Chalet wants to know the proportion of girls, aged 8 to 12, in all of their ice-skating classes. In a sample of [latex]80[/latex] skating students, [latex]64[/latex] were girls, aged 8 to 12.
- Construct a [latex]92\%[/latex] confidence interval for the proportion of girls, aged 8 to 12, in the ice-skating classes.
- Interpret the confidence interval found in part (a).
Click to see Answer
- [latex]\text{Lower Limit}=0.7217[/latex], [latex]\text{Upper Limit}=0.8783[/latex]
- There is a [latex]92\%[/latex] probability that the proportion of girls, aged 8 to 12, in the ice-skating classes is between [latex]72.17\%[/latex] and [latex]87.83\%[/latex].
- A university conducted a study of whether running is healthy for adults over age 50. During the eight-year study, [latex]1.5\%[/latex] of the [latex]451[/latex] members of the 50-Plus Fitness Association died.
- Construct a [latex]97\%[/latex] confidence interval for the proportion of adults over 50 who ran and died in the same eight-year period.
- Explain what a “[latex]97\%[/latex] confidence interval” means for this study.
Click to see Answer
- [latex]\text{Lower Limit}=0.0026[/latex], [latex]\text{Upper Limit}=0.0274[/latex]
- There is a [latex]97\%[/latex] probability that the proportion of adults over 50 who ran and died in the eight-year period is between [latex]0.26\%[/latex] and [latex]2.74\%[/latex].
- A national news magazine conducts a poll of [latex]1,000[/latex] adults across the country. One of the questions asked in the poll was, “What is the main problem facing the country?”[latex]20\%[/latex] of those polled answered “crime.”
- Construct a [latex]93\%[/latex] confidence interval for the proportion of adults in the country who feel that crime is the main problem.
- Interpret the confidence interval found in part (a).
- Is it reasonable to claim that [latex]30\%[/latex] of adults feel crime is the main problem? Explain.
Click to see Answer
- [latex]\text{Lower Limit}=0.1771[/latex], [latex]\text{Upper Limit}=0.2229[/latex]
- There is a [latex]93\%[/latex] probability that the proportion of adults in the country who feel that crime is the main problem is between [latex]17.71\%[/latex] and [latex]22.29\%[/latex].
- No, because [latex]30\%[/latex] is outside the confidence interval.
- A city’s mayor wants to estimate the proportion of residents who believe that education is one of the top issues facing the city. In a survey of [latex]506[/latex] adult city residents, [latex]400[/latex] said that education is a top issue for the city.
- Construct a [latex]98\%[/latex] confidence interval for the proportion of adult city residents who believe education is one of the top issues in the city.
- Interpret the confidence interval found in part (a).
- The mayor claims that [latex]90\%[/latex] of adult city residents feel education is one of the top issues in the city. Is the major right? Explain.
Click to see Answer
- [latex]\text{Lower Limit}=0.7484[/latex], [latex]\text{Upper Limit}=0.8326[/latex]
- There is a [latex]98\%[/latex] probability that the proportion of adult city residents who believe education is one of the top issues in the city is between [latex]74.84\%[/latex] and [latex]83.26\%[/latex].
- No, because [latex]90\%[/latex] is outside the confidence interval.
- A bank wants to study the retirement savings plans of young adults between the ages of 25 and 35. In a sample of [latex]300[/latex] young adults between the ages of 25 and 35, only [latex]20[/latex] made regular contributions to some type of retirement savings plan.
- Construct a [latex]99\%[/latex] confidence interval for the proportion of young adults between the ages of 25 and 35 who made regular contributions to some type of retirement savings plan.
- Interpret the confidence interval found in part (a).
- Can the bank claim that [latex]10\%[/latex] of young adults between the ages of 25 and 35 make regular contributions to some type of retirement savings plan? Explain.
- Without performing any calculations, describe how the confidence interval would change if the confidence level changed from [latex]99\%[/latex] to [latex]90\%[/latex].
Click to see Answer
- [latex]\text{Lower Limit}=0.0296[/latex], [latex]\text{Upper Limit}=0.1038[/latex]
- There is a [latex]99\%[/latex] probability that the proportion of young adults between ages 25 and 35 who make regular contributions to some type of retirement savings plan is between [latex]2.96\%[/latex] and [latex]10.38\%[/latex].
- Yes, because [latex]10\%[/latex] is inside the confidence interval.
- The confidence interval would get narrower.
“7.4 Confidence Intervals for a Population Proportion” and “7.6 Exercises” from Introduction to Statistics by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.