7.1 Introduction to Confidence Intervals
LEARNING OBJECTIVES
- Differentiate between point estimates and interval estimates.
- Explain key terms related to confidence intervals.
The marketing department of an entertainment company is interested in the mean number of songs a consumer downloads a month from an online music service. How could the marketing department find this mean? They could take a sample of consumers and calculate the sample mean [latex]\overline{x}[/latex] and the sample standard deviation [latex]s[/latex] of the sample. The sample mean [latex]\overline{x}[/latex] is a point estimate for the population mean [latex]\mu[/latex]. The sample standard deviation [latex]s[/latex] is a point estimate for the population standard deviation [latex]\sigma[/latex].
In general, when a sample is taken from a population, the sample statistic calculated from the sample is a point estimate for the corresponding population parameter. The point estimate is a single value used to estimate the population parameter. For example, the sample mean [latex]\overline{x}[/latex] is a point estimate for the population mean [latex]\mu[/latex] and the sample proportion [latex]\hat{p}[/latex] is a point estimate for the population proportion [latex]p[/latex].
There are a few issues with relying on just a point estimate to estimate a population parameter. Different samples from the same population will produce different point estimates, and how do we know which point estimate is the best one? Also, a point estimate is a single value, which may or may not be close to the actual population parameter, and how do we know how far away the population parameter is from the point estimate. The difference between the point estimate and the population parameter is called the sampling error. Because the population parameter is unknown, we cannot answer these questions. We have no way to know if a point estimate over- or under-estimates a population parameter or by how much.
Instead of a point estimate, an interval estimate called a confidence interval, provides us with additional insight into estimating a population parameter. Instead of being just one number, a confidence interval is an interval of numbers. The interval of numbers is a range of values calculated from a given set of sample data. The confidence interval is likely to include the unknown population parameter.
Suppose, for the above example, we do not know the population mean [latex]\mu[/latex], but we do know that the population standard deviation is [latex]\sigma=1[/latex] and the sample size is [latex]n=100[/latex]. Then, by the central limit theorem, the standard deviation for the sample mean is
[latex]\displaystyle{\frac{\sigma}{\sqrt{n}}=\frac{1}{\sqrt{100}}=0.1}[/latex]
The empirical rule, which applies to bell-shaped distributions, says that in approximately [latex]95\%[/latex] of the samples, the sample mean [latex]\overline{x}[/latex] will be within two standard deviations of the population mean [latex]\mu[/latex]. For our example, two standard deviations is [latex]2\times 0.1=0.2[/latex]. The sample mean [latex]\overline{x}[/latex] is likely to be within [latex]0.2[/latex] units of [latex]\mu[/latex].
Because [latex]\overline{x}[/latex] is within [latex]0.2[/latex] units of [latex]\mu[/latex], which is unknown, [latex]\mu[/latex] is likely to be within [latex]0.2[/latex] units of [latex]\overline{x}[/latex] in [latex]95\%[/latex] of the samples. The population mean [latex]\mu[/latex] is contained in an interval whose lower number is calculated by taking the sample mean and subtracting two standard deviations ([latex]2\times 0.1=0.2[/latex]) and whose upper number is calculated by taking the sample mean and adding two standard deviations. In other words, [latex]\mu[/latex] is between [latex]\overline{x}-0.2[/latex] and [latex]\overline{x}+0.2[/latex] in [latex]95\%[/latex] of all the samples. Suppose that a sample produced a sample mean [latex]\overline{x}=2[/latex]. Then the unknown population mean [latex]\mu[/latex] is between [latex]\overline{x}-0.2=2-0.2=1.8[/latex] and [latex]\overline{x}+0.2=2+0.2=2.2[/latex]
We say that we are [latex]95\%[/latex] confident that the (unknown) population mean number of songs downloaded per month is between [latex]1.8[/latex] and [latex]2.2[/latex]. The [latex]95\%[/latex] confidence interval is the interval with a lower limit [latex]1.8[/latex] and upper limit [latex]2.2[/latex].
The [latex]95\%[/latex] confidence interval implies two possibilities. Either the interval [latex]1.8[/latex] to [latex]2.2[/latex] contains the true mean [latex]\mu[/latex], or our sample produced an [latex]\overline{x}[/latex] that is not within [latex]0.2[/latex] units of the true mean [latex]\mu[/latex]. Because we are [latex]95\%[/latex] confident that the true population mean is inside the interval, the second possibility is that the population mean is not inside the interval, which happens for only [latex]5\%[/latex] of all the samples.
Remember that a confidence interval is created for an unknown population parameter like the population mean [latex]\mu[/latex]. Confidence intervals for some parameters have the form:
[latex]\begin{eqnarray*}\text{Lower Limit}&=&\text{point estimate}-\text{margin of error}\\\\\text{Upper Limit}&=&\text{point estimate}+\text{margin of error}\\\end{eqnarray*}[/latex]
Note that the margin of error depends on the confidence level and the standard error of the mean.
Video: “Understanding Confidence Intervals: Statistics Help” by Dr Nic’s Maths and Stats [4:02] is licensed under the Standard YouTube License.Transcript and closed captions available on YouTube.
“7.1 Introduction to Confidence Intervals” from Introduction to Statistics by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.