7.1 Introduction to Confidence Intervals

This is a photo of M&Ms piled together. The M&Ms are red, blue, green, yellow, orange and brown.
Have you ever wondered what the average number of M&Ms in a bag at the grocery store is? You can use confidence intervals to answer this question. Photo by comedy_nose, CC BY 4.0.

Suppose you want to determine the mean rent of a two-bedroom apartment in your town.  You might look in the classified section of the newspaper, write down several rents listed, and average them together.  You would obtain a point estimate of the true mean rent of two-bedroom apartments in your town.  If you are trying to determine the percentage of times you make a basket when shooting a basketball, you might count the number of shots you make and divide that by the number of shots you attempted.  In this case, you would obtain a point estimate for the true proportion of the baskets you make when shooting a basketball.

We use sample data to make generalizations about an unknown population.  This part of statistics is called inferential statistics.  The sample data help us to make an estimate of a population parameter.  We realize that the point estimate is most likely not the exact value of the population parameter, but close to it.  After calculating point estimates, we construct interval estimates, called confidence intervals.

In this chapter, you will learn to construct and interpret confidence intervals.  You will also learn a new distribution, the [latex]t[/latex]-distribution, and how it is used with these intervals.  Throughout the chapter, it is important to keep in mind that the confidence interval is a random variable.  It is the population parameter that is fixed.


Watch this video: Understanding Confidence Intervals: Statistics Help by Dr Nic’s Math and Stats [4:02]


If you worked in the marketing department of an entertainment company, you might be interested in the mean number of songs a consumer downloads a month from iTunes.  If so, you could conduct a survey and calculate the sample mean [latex]\overline{x}[/latex] and the sample standard deviation [latex]s[/latex].  You would use the sample mean [latex]\overline{x}[/latex] to estimate the population mean and the sample standard deviation [latex]s[/latex] to estimate the population standard deviation.  The sample mean [latex]\overline{x}[/latex] is the point estimate for the population mean [latex]\mu[/latex].  The sample standard deviation [latex]s[/latex] is the point estimate for the population standard deviation [latex]\sigma[/latex].  Each of [latex]\overline{x}[/latex] and [latex]s[/latex] is called a statistic.

A confidence interval is another type of estimate but, instead of being just one number, it is an interval of numbers.  The interval of numbers is a range of values calculated from a given set of sample data. The confidence interval is likely to include the unknown population parameter.

Suppose, for the iTunes example, we do not know the population mean [latex]\mu[/latex], but we do know that the population standard deviation is [latex]\sigma = 1[/latex] and the sample size is [latex]n=100[/latex].  Then, by the central limit theorem, the standard deviation for the sample mean is

[latex]\displaystyle{\frac{\sigma}{\sqrt{n}} = \frac{1}{\sqrt{100}}=0.1}[/latex]

The empirical rule, which applies to bell-shaped distributions, says that in approximately 95% of the samples, the sample mean [latex]\overline{x}[/latex] will be within two standard deviations of the population mean [latex]\mu[/latex].  For our iTunes example, two standard deviations is [latex]2 \times 0.1 = 0.2[/latex].  The sample mean [latex]\overline{x}[/latex] is likely to be within [latex]0.2[/latex] units of [latex]\mu[/latex].

Because [latex]\overline{x}[/latex] is within [latex]0.2[/latex] units of [latex]\mu[/latex], which is unknown, [latex]\mu[/latex] is likely to be within [latex]0.2[/latex] units of [latex]\overline{x}[/latex] in 95% of the samples. The population mean [latex]\mu[/latex] is contained in an interval whose lower number is calculated by taking the sample mean and subtracting two standard deviations ([latex]2 \times 0.1=0.2[/latex]) and whose upper number is calculated by taking the sample mean and adding two standard deviations.  In other words, [latex]\mu[/latex] is between [latex]\overline{x}-0.2[/latex] and [latex]\overline{x}+0.2[/latex] in 95% of all the samples.   Suppose that a sample produced a sample mean [latex]\overline{x}=2[/latex].  Then the unknown population mean [latex]\mu[/latex] is between [latex]\overline{x}-0.2=2-0.2=1.8[/latex] and [latex]\overline{x}+0.2=2+0.2=2.2[/latex] 

We say that we are 95% confident that the (unknown) population mean number of songs downloaded from iTunes per month is between [latex]1.8[/latex] and [latex]2.2[/latex]. The 95% confidence interval is the interval with lower limit [latex]1.8[/latex] and upper limit [latex]2.2[/latex].

The 95% confidence interval implies two possibilities.  Either the interval [latex]1.8[/latex] to [latex]2.2[/latex] contains the true mean [latex]\mu[/latex] or our sample produced an [latex]\overline{x}[/latex] that is not within [latex]0.2[/latex] units of the true mean [latex]\mu[/latex].   Because we are 95% confident that the true population mean is inside the interval, the second possibility, that the population mean is not inside the interval, happens for only 5% of all the samples.

Remember that a confidence interval is created for an unknown population parameter like the population mean [latex]\mu[/latex].  Confidence intervals for some parameters have the form:

[latex]\begin{eqnarray*}  \mbox{Lower Limit} & = & \mbox{point estimate}-\mbox{margin of error} \\ \\ \mbox{Upper Limit} & = & \mbox{point estimate}+\mbox{margin of error} \\ \end{eqnarray*}[/latex]

The margin of error depends on the confidence level and the standard error of the mean.

When you read newspapers and journals, some reports will use the phrase “margin of error.”  Other reports will not use that phrase, but include a confidence interval as the point estimate plus or minus the margin of error.  These are two ways of expressing the same concept.


Attribution

“Chapter 8 Introduction” in Introductory Statistics by OpenStax is licensed under a Creative Commons Attribution 4.0 International License.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.