11.3 One-Way ANOVA and Hypothesis Tests for Three or More Population Means

Valerie Watts

11.3 One-Way ANOVA and Hypothesis Tests for Three or More Population Means

LEARNING OBJECTIVES

Conduct and interpret hypothesis tests for three or more population means using one-way ANOVA.

The purpose of a one-way ANOVA (analysis of variance) test is to determine the existence of a statistically significant difference among the means of three or more populations. The test actually uses variances to help determine if the population means are equal or not.

Throughout this section, we will use subscripts to identify the values for the means, sample sizes, and standard deviations for the populations.

Symbol for:

Population [latex]k[/latex]

Population Mean

[latex]\mu_k[/latex]

Population Standard Deviation

[latex]\sigma_k[/latex]

Sample Size

[latex]n_k[/latex]

Sample Mean

[latex]\overline{x}_k[/latex]

Sample Standard Deviation

[latex]s_k[/latex]

[latex]k[/latex] is the number of populations under study, [latex]n[/latex] is the total number of observations in all of the samples combined, and [latex]\overline{\overline{x}}[/latex] is the mean of the sample means.

[latex]\begin{eqnarray*}n&=&n_1+n_2+\cdots+n_k\\\\\overline{\overline{x}}&=&\frac{n_1\times\overline{x}_1+n_2\times\overline{x}_2+\cdots+n_k\times\overline{x}_k}{n}\end{eqnarray*}[/latex]

One-Way ANOVA

A predictor variable is called a factor or independent variable. For example, age, temperature, and gender are factors. The groups or samples are often referred to as treatments. This terminology comes from the use of ANOVA procedures in medical and psychological research to determine if there is a difference in the effects of different treatments.

EXAMPLE

A local college wants to compare the mean GPA for players on four of its sports teams: basketball, baseball, hockey, and lacrosse. A random sample of players was taken from each team, and their GPAwas recorded in the table below.

Basketball	Baseball	Hockey	Lacrosse
3.6	2.1	4.0	2.0
2.9	2.6	2.0	3.6
2.5	3.9	2.6	3.9
3.3	3.1	3.2	2.7
3.8	3.4	3.2	2.5

In this example, the factor is the sports team.

Basketball

Baseball

Hockey

Lacrosse

Population 1

Population 2

Population 3

Population 4

Sample Size ([latex]n_i[/latex])

[latex]5[/latex]

Sample Mean ([latex]\overline{x}_i[/latex])

[latex]3.22[/latex]

[latex]3.02[/latex]

[latex]3[/latex]

[latex]2.94[/latex]

[latex]\begin{eqnarray*}k&=&4\\\\n&=&n_1+n_2+n_3+n_4\\&=&5+5+5+5\\&=&20\\\\\overline{\overline{x}}&=&\frac{n_1\times\overline{x}_1+n_2\times\overline{x}_2+n_3\times\overline{x}_3+n_4\times\overline{x}_4}{n}\\&=&\frac{5\times 3.22+5\times 3.02+5\times 3+5\times 2.94}{20}\\&=&3.045\end{eqnarray*}[/latex]

The following assumptions are required to use a one-way ANOVA test:

Each population from which a sample is taken is normally distributed.
All samples are randomly selected and independently taken from the populations.
The populations are assumed to have equal variances.
The population data is numerical (interval or ratio level).

The logic behind one-way ANOVA is to compare population means based on two independent estimates of the (assumed) equal variance [latex]\sigma^2[/latex] between the populations:

One estimate of the equal variance [latex]\sigma^2[/latex] is based on the variability among the sample means themselves (called the between-groups estimate of population variance).
One estimate of the equal variance [latex]\sigma^2[/latex] is based on the variability of the data within each sample (called the within-groups estimate of population variance).

The one-way ANOVA procedure compares these two estimates of the population variance [latex]\sigma^2[/latex] to determine if the population means are equal or if there is a difference in the population means. Because ANOVA involves the comparison of two estimates of variance, an [latex]F[/latex]-distribution is used to conduct the ANOVA test. The test statistic is an [latex]F[/latex]-score that is the ratio of the two estimates of population variance:

[latex]\displaystyle{F=\frac{\text{variance between groups}}{\text{variance within groups}}}[/latex]

The degrees of freedom for the [latex]F[/latex]-distribution are [latex]df_1=k-1[/latex] and [latex]df_2=n-k[/latex] where [latex]k[/latex] is the number of populations and [latex]n[/latex] is the total number of observations in all of the samples combined.

The variance between groups estimate of the population variance is called the mean square due to treatment, [latex]MST[/latex]. The [latex]MST[/latex] is the estimate of the population variance determined by the variance of the sample means from the overall sample mean [latex]\overline{\overline{x}}[/latex]. When the population means are equal, [latex]MST[/latex] provides an unbiased estimate of the population variance. When the population means are not equal, [latex]MST[/latex] provides an overestimate of the population variance.

[latex]\begin{eqnarray*}SST&=&n_1\times(\overline{x}_1-\overline{\overline{x}})^2+n_2\times(\overline{x}_2-\overline{\overline{x}})^2+\cdots+n_k\times(\overline{x}_k-\overline{\overline{x}})^2\\\\MST&=&\frac{SST}{k-1}\end{eqnarray*}[/latex]

The variance within groups estimate of the population variance is called the mean square due to error, [latex]MSE[/latex]. The [latex]MSE[/latex] is the pooled estimate of the population variance using the sample variances as estimates for the population variance. The [latex]MSE[/latex] always provides an unbiased estimate of the population variance because it is not affected by whether or not the population means are equal.

[latex]\begin{eqnarray*}SSE&=&(n_1-1)\times s_1^2+(n_2-1)\times s_2^2+\cdots+(n_k-1)\times s_k^2\\\\MSE&=&\frac{SSE}{n-k}\end{eqnarray*}[/latex]

The one-way ANOVA test depends on the fact that the variance between groups [latex]MST[/latex] is influenced by differences between the population means, which results in [latex]MST[/latex] being either an unbiased or overestimate of the population variance. Because the variance within groups [latex]MSE[/latex] compares values of each group to its own group mean, [latex]MSE[/latex] is not affected by differences between the population means and is always an unbiased estimate of the population variance.

The null hypothesis in a one-way ANOVA test is that the population means are all equal, and the alternative hypothesis is that there is a difference in the population means. The [latex]F[/latex]-score for the one-way ANOVA test is [latex]\displaystyle{F=\frac{MST}{MSE}}[/latex] with [latex]df_1=k-1[/latex] and [latex]df_2=n-k[/latex]. The [latex]p-\text{value}[/latex] for the test is the area in the right tail of the [latex]F[/latex]-distribution, to the right of the [latex]F[/latex]-score.

When the variance between groups [latex]MST[/latex] and variance within groups [latex]MSE[/latex] are close in value, the [latex]F[/latex]-score is close to [latex]1[/latex] and results in a large [latex]p-\text{value}[/latex]. In this case, the conclusion is that the population means are equal.

When the variance between groups [latex]MST[/latex] is significantly larger than the variability within groups [latex]MSE[/latex], the [latex]F[/latex]-score is large and results in a small [latex]p-\text{value}[/latex]. In this case, the conclusion is that there is a difference in the population means.

Conducting a Hypothesis Test for Three or More Population Means

Follow these steps to perform a hypothesis test on three or more population means:

Verify that the one-way ANOVA assumptions are met.
Write down the null hypothesis that there is no difference in the population means:
[latex]\begin{eqnarray*}\\H_0:&&\mu_1=\mu_2=\cdots=\mu_k\end{eqnarray*}[/latex]

The null hypothesis is always the claim that the population means are equal.
Write down the alternative hypotheses that there is some difference in the population means:
[latex]\begin{eqnarray*}\\H_a:&&\text{at least one population mean is different from the others}\\\\\end{eqnarray*}[/latex]
Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].
The [latex]p-\text{value}[/latex] is the area in the right tail of the [latex]F[/latex]-distribution. The [latex]F[/latex]-score and degrees of freedom are
[latex]\begin{eqnarray*}F&=&\frac{MST}{MSE}\\\\df_1&=&k-1\\\\df_2&=&n-k\\\\\end{eqnarray*}[/latex]
Compare the [latex]p-\text{value}[/latex] to the significance level and state the outcome of the test.
- If [latex]p-\text{value}\leq\alpha[/latex], reject [latex]H_0[/latex] in favour of [latex]H_a[/latex].
  - The results of the sample data are significant. There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
- If [latex]p-\text{value}\gt\alpha[/latex], do not reject [latex]H_0[/latex].
  - The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
Write down a concluding sentence specific to the context of the question.

EXAMPLE

A local college wants to compare the mean GPA for players on four of its sports teams: basketball, baseball, hockey, and lacrosse. A random sample of players was taken from each team, and their GPA was recorded in the table below.

Basketball	Baseball	Hockey	Lacrosse
3.6	2.1	4.0	2.0
2.9	2.6	2.0	3.6
2.5	3.9	2.6	3.9
3.3	3.1	3.2	2.7
3.8	3.4	3.2	2.5

Assume the populations are normally distributed and have equal variances. At the [latex]5\%[/latex] significance level, is there a difference in the average GPA between the sports team?

Solution

Let basketball be population 1, let baseball be population 2, let hockey be population 3, and let lacrosse be population 4. From the question, we have the following information:

Basketball

Baseball

Hockey

Lacrosse

[latex]n_1=5[/latex]

[latex]n_2=5[/latex]

[latex]n_3=5[/latex]

[latex]n_4=5[/latex]

[latex]\overline{x}_1=3.22[/latex]

[latex]\overline{x}_2=3.02[/latex]

[latex]\overline{x}_3=3[/latex]

[latex]\overline{x}_4=2.94[/latex]

[latex]s_1^2=0.277[/latex]

[latex]s_2^2=0.487[/latex]

[latex]s_3^2=0.56[/latex]

[latex]s_4^2=0.613[/latex]

Previously, we found [latex]k=4[/latex], [latex]n=20[/latex], and [latex]\overline{\overline{x}}=3.045[/latex].

Hypotheses:

[latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3=\mu_4\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]

[latex]p-\text{value}[/latex]:

To calculate out the [latex]F[/latex]-score, we need to find [latex]MST[/latex] and [latex]MSE[/latex].

[latex]\begin{eqnarray*}SST&=&n_1\times(\overline{x}_1-\overline{\overline{x}})^2+n_2\times(\overline{x}_2-\overline{\overline{x}})^2+n_3\times(\overline{x}_3-\overline{\overline{x}})^2+n_4\times(\overline{x}_4-\overline{\overline{x}})^2\\&=&5\times(3.22-3.045)^2+5\times(3.02-3.045)^2+5\times(3-3.045)^2\\&&+5\times(2.94-3.045)^2\\&=&0.2215\\\\MST&=&\frac{SST}{k-1}\\&=&\frac{0.2215}{4-1}\\&=&0.0738\ldots\\\\SSE&=&(n_1-1)\times s_1^2+(n_2-1)\times s_2^2+(n_3-1)\times s_3^2+(n_4-1)\times s_4^2\\&=&(5-1)\times 0.277+(5-1)\times 0.487+(5-1)\times 0.56+(5-1)\times 0.623\\&=&7.788\\\\MSE&=&\frac{SSE}{n-k}\\&=&\frac{7.788}{20-4}\\&=&0.48675\end{eqnarray*}[/latex]

The [latex]p-\text{value}[/latex] is the area in the right tail of the [latex]F[/latex]-distribution. To use the f.dist.rt function, we need to calculate out the [latex]F[/latex]-score and the degrees of freedom:

[latex]\begin{eqnarray*}F&=&\frac{MST}{MSE}\\&=&\frac{0.0738\ldots}{0.48675}\\&=&0.15168\ldots\\\\df_1&=&k-1\\&=&4-1\\&=&3\\\\df_2&=&n-k\\&=&20-4\\&=&16\end{eqnarray*}[/latex]

Function

f.dist.rt

Field 1

0.15168…

Field 2

3

Field 3

16

Answer

0.9271

So the [latex]p-\text{value}=0.9271[/latex].

Conclusion:

Because [latex]p-\text{value}=0.9271\gt 0.05=\alpha[/latex], we do not reject the null hypothesis. At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that the mean GPA for the sports teams are the same.

NOTES

The null hypothesis [latex]\mu_1=\mu_2=\mu_3=\mu_4[/latex] is the claim that the mean GPA for the sports teams are all equal.
The alternative hypothesis is the claim that at least one of the population means is not equal to the others. The alternative hypothesis does not say that all of the population means are not equal, only that at least one of them is not equal to the others.
The [latex]p-\text{value}[/latex] is the area in the right tail of the [latex]F[/latex]-distribution, to the right of [latex]F=0.15168\ldots[/latex]. In the calculation of the [latex]p-\text{value}[/latex]:
- The function is f.dist.rt because we are finding the area in the right tail of an [latex]F[/latex]-distribution.
- Field 1 is the value of [latex]F[/latex].
- Field 2 is the value of [latex]df_1[/latex].
- Field 3 is the value of [latex]df_2[/latex].
The [latex]p-\text{value}[/latex] of [latex]0.9271[/latex] is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis. In other words, the population means are all equal.

ANOVA Summary Tables

The calculation of the [latex]MST[/latex], [latex]MSE[/latex], and the [latex]F[/latex]-score for a one-way ANOVA test can be time-consuming, even with the help of software like Excel. However, Excel has a built-in one-way ANOVA summary table that not only generates the averages, variances, [latex]MST[/latex], and [latex]MSE[/latex], but also calculates the required [latex]F[/latex]-score and [latex]p-\text{value}[/latex] for the test.

USING EXCEL TO CREATE A ONE-WAY ANOVA SUMMARY TABLE

In order to create a one-way ANOVA summary table, we need to use the Analysis ToolPak. Follow these instructions to add the Analysis ToolPak.

Enter the data into an Excel worksheet.
Go to the Data tab and click on Data Analysis. If you do not see Data Analysis in the Data tab, you will need to install the Analysis ToolPak.
In the Data Analysis window, select Anova: Single Factor. Click OK.
In the Input range, enter the cell range for the data.
In the Grouped By box, select rows if your data is entered as rows (the default is columns).
Click on Labels in first row if you included the column headings in the input range.
In the Alpha box, enter the significance level for the test.
From the Output Options, select the location where you want the output to appear.
Click OK.

NOTE

Because we are using the [latex]p-\text{value}[/latex] approach to hypothesis testing, it is not crucial that we enter the actual significance level we are using for the test. The [latex]p-\text{value}[/latex] (the area in the right tail of the [latex]F[/latex]-distribution) is not affected by significance level. For the critical-value approach to hypothesis testing, we must enter the correct significance level for the test because the critical value does depend on the significance level.

EXAMPLE

A local college wants to compare the mean GPA for players on four of its sports teams: basketball, baseball, hockey, and lacrosse. A random sample of players was taken from each team, and their GPA was recorded in the table below.

Basketball	Baseball	Hockey	Lacrosse
3.6	2.1	4.0	2.0
2.9	2.6	2.0	3.6
2.5	3.9	2.6	3.9
3.3	3.1	3.2	2.7
3.8	3.4	3.2	2.5

Assume the populations are normally distributed and have equal variances. At the [latex]5\%[/latex] significance level, is there a difference in the average GPA between the sports team?

Solution

Let basketball be population 1, let baseball be population 2, let hockey be population 3, and let lacrosse be population 4.

Hypotheses:

[latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3=\mu_4\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]

[latex]p-\text{value}[/latex]:

The ANOVA summary table generated by Excel is shown below:

*Source of Variation*	SS	df	MS	F	*P-value*	*F crit*
Anova: Single Factor

SUMMARY
*Groups*	*Count*	*Sum*	*Average*	*Variance*
Basketball	5	16.1	3.22	0.277
Baseball	5	15.1	3.02	0.487
Hockey	5	15	3	0.56
Lacrosse	5	14.7	2.94	0.623

ANOVA
Between Groups	0.2215	3	0.073833	0.151686	0.927083	3.238872
Within Groups	7.788	16	0.48675

Total	8.0095	19

The [latex]p-\text{value}[/latex] for the test is in the P-value column of the between groups row. So the [latex]p-\text{value}=0.9271[/latex].

Conclusion:

Because [latex]p-\text{value}=0.9271\gt 0.05=\alpha[/latex], we do not reject the null hypothesis. At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that the mean GPA for the sports teams are the same.

NOTES

In the top part of the ANOVA summary table (under the Summary heading), we have the averages and variances for each of the groups (basketball, baseball, hockey, and lacrosse).
In the bottom part of the ANOVA summary table (under the ANOVA heading), we have
- The value of [latex]SST[/latex] (in the SS column of the between groups row).
- The value of [latex]MST[/latex] (in the MS column of the between groups row).
- The value of [latex]SSE[/latex] (in the SS column of the within groups row).
- The value of [latex]MSE[/latex] (in the MS column of the within groups row).
- The value of the [latex]F[/latex]-score (in the F column of the between groups row).
- The [latex]p-\text{value}[/latex] (in the [latex]p-\text{value}[/latex] column of the between groups row).

EXAMPLE

A fourth-grade class is studying the environment. One of the assignments is to grow bean plants in different soils. Tommy chose to grow his bean plants in soil found outside his classroom mixed with dryer lint. Tara chose to grow her bean plants in potting soil bought at the local nursery. Nick chose to grow his bean plants in soil from his mother’s garden. No chemicals were used on the plants, only water. They were grown inside the classroom next to a large window. Each child grew five plants. At the end of the growing period, each plant was measured, producing the data (in inches) in the table below.

Tommy’s Plants	Tara’s Plants	Nick’s Plants
24	25	23
21	31	27
23	23	22
30	20	30
23	28	20

Assume the heights of the plants are normally distribution and have equal variance. At the [latex]5\%[/latex] significance level, does it appear that the three media in which the bean plants were grown produced the same mean height?

Solution

Let Tommy’s plants be population 1, let Tara’s plants be population 2, and let Nick’s plants be population 3.

Hypotheses:

[latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]

[latex]p-\text{value}[/latex]:

The ANOVA summary table generated by Excel is shown below:

Anova: Single Factor

SUMMARY
Groups	Count	Sum	Average	Variance
Tommy’s Plants	5	121	24.2	11.7
Tara’s Plants	5	127	25.4	18.3
Nick’s Plants	5	122	24.4	16.3

ANOVA
Source of Variation	SS	df	MS	F	P-value	F crit
Between Groups	4.133333	2	2.066667	0.133909	0.875958	3.885294
Within Groups	185.2	12	15.43333

Total	189.3333	14

So the [latex]p-\text{value}=0.8760[/latex].

Conclusion:

Because [latex]p-\text{value}=0.8760\gt 0.05=\alpha[/latex], we do not reject the null hypothesis. At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that the mean heights of the plants grown in three media are the same.

NOTES

The null hypothesis [latex]\mu_1=\mu_2=\mu_3[/latex] is the claim that the mean heights of the plants grown in the three different media are all equal.
The alternative hypothesis is the claim that at least one of the population means is not equal to the others. The alternative hypothesis does not say that all of the population means are not equal, only that at least one of them is not equal to the others.
The [latex]p-\text{value}[/latex] of [latex]0.8760[/latex] is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis. In other words, the population means are all equal.

TRY IT

A statistics professor wants to study the average GPA of students in four different programs: marketing, management, accounting, and human resources. The professor took a random sample of GPAs of students in those programs at the end of the past semester. The data is recorded in the table below.

Marketing	Management	Accounting	Human Resources
2.17	2.63	3.21	3.27
1.85	1.77	3.78	3.45
2.83	3.25	4.00	2.85
1.69	1.86	2.95	2.26
3.33	2.21	2.65	3.18

Assume the GPAs of the students are normally distributed and have equal variance. At the [latex]5\%[/latex] significance level, is there a difference in the average GPA of the students in the different programs?

Click to see Solution

Let marketing be population 1, let management be population 2, let accounting be population 3, and let human resources be population 4.

Hypotheses:

[latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3=\mu_4\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]

[latex]p-\text{value}[/latex]:

The ANOVA summary table generated by Excel is shown below:

Anova: Single Factor

SUMMARY
Groups	Count	Sum	Average	Variance
Marketing	5	11.87	2.374	0.47648
Management	5	11.72	2.344	0.37108
Accounting	5	16.59	3.318	0.31797
Human Resources	5	15.01	3.002	0.21947

ANOVA
Source of Variation	SS	df	MS	F	P-value	F crit
Between Groups	3.459895	3	1.153298	3.330826	0.046214	3.238872
Within Groups	5.54	16	0.34625

Total	8.999895	19

So the [latex]p-\text{value}=0.0462[/latex].

Conclusion:

Because [latex]p-\text{value}=0.0462\lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that there is a difference in the average GPA of the students in the different programs.

TRY IT

A manufacturing company runs three different production lines to produce one of its products. The company wants to know if the mean production rate is the same for the three lines. For each production line, a sample of eight-hour shifts was taken, and the number of items produced during each shift was recorded in the table below.

Line 1	Line 2	Line 3
35	21	31
35	36	34
36	22	24
39	38	21
37	28	27
36	34	29
31	35	33
38	39	20
33	40	24

Assume the numbers of items produced on each line during an eight-hour shift are normally distributed and have equal variance. At the [latex]1\%[/latex] significance level, is there a difference in the average production rate for the three lines?

Click to see Solution

Let Line 1 be population 1, let Line 2 be population 2, and let Line 3 be population 3.

Hypotheses:

[latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]

[latex]p-\text{value}[/latex]:

The ANOVA summary table generated by Excel is shown below:

Anova: Single Factor

SUMMARY
Groups	Count	Sum	Average	Variance
Line 1	9	320	35.55556	6.027778
Line 2	9	293	32.55556	51.52778
Line 3	9	243	27	26

ANOVA
Source of Variation	SS	df	MS	F	P-value	F crit
Between Groups	339.1852	2	169.5926	6.089096	0.007264	5.613591
Within Groups	668.4444	24	27.85185

Total	1007.63	26

So the [latex]p-\text{value}=0.0073[/latex].

Conclusion:

Because [latex]p-\text{value}=0.0073\lt 0.01=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the [latex]1\%[/latex] significance level, there is enough evidence to suggest that there is a difference in the mean production rate of the three lines.

Exercises

Three different traffic routes are tested for mean driving time. The entries in the table are the driving times, in minutes, on the three different routes. Assume the driving times are normally distribution and have equal variance.

Route 1 Route 2 Route 3

30 27 16

32 29 41

27 28 22

35 36 31

At the [latex]5\%[/latex] significance level, test if the mean driving time for the three routes is the same.
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.7728[/latex]
- Conclusion: At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that the mean driving time is the same for the three routes.
Suppose a group is interested in determining whether teenagers obtain their driver’s licenses at approximately the same mean age across the country. Suppose that the following data are randomly collected from five teenagers in each region of the country. The numbers represent the age at which teenagers obtained their driver’s licenses. Assume the ages are normally distribution and have equal variance.

Northeast South West Central East

16.3 16.9 16.4 16.2 17.1

16.1 16.5 16.5 16.6 17.2

16.4 16.4 16.6 16.5 16.6

16.5 16.2 16.1 16.4 16.8

At the [latex]5\%[/latex] significance level, determine if the mean age when teenagers get their driver’s license is the same in the different regions of the country.
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3=\mu_4=\mu_5\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.0174[/latex]
- Conclusion: At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that there is a difference in the mean age when teenagers get their driver’s licenses in different regions of the country.
Groups of men from three different areas of the country are to be tested for mean weight. The entries in the table are the weights for the different groups. Assume the weights are normally distribution and have equal variance.

Group 1 Group 2 Group 3

216 202 170

198 213 165

240 284 182

187 228 197

176 210 201

At the [latex]1\%[/latex] significance level, test if the mean weight for men is the same for the three groups.
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.0546[/latex]
- Conclusion: At the [latex]1\%[/latex] significance level, there is enough evidence to suggest that the mean weight for men is the same for the three groups.
Girls from four different soccer teams are tested for mean goals scored per game. The entries in the table are the goals per game for the different teams for a sample of games. Assume the goals scored per game are normally distribution and have equal variance.

Team 1 Team 2 Team 3 Team 4

1 2 0 3

2 3 1 4

0 2 1 4

3 4 0 3

2 4 0 2

At the [latex]1\%[/latex] significance level, test if the mean goals scored per game is the same for the four teams.
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3=\mu_4\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.0005[/latex]
- Conclusion: At the [latex]1\%[/latex] significance level, there is enough evidence to suggest that there is a difference in the mean goals scored per game for the four teams.
Five basketball teams took a random sample of players regarding how high each player can jump (in centimetres). Assume the heights are normally distribution and have equal variance.

Team 1 Team 2 Team 3 Team 4 Team 5

90 80 120 95 102.5

105 87.5 125 110 97.5

127.5 95 97.5 115 100

At the [latex]5\%[/latex] significance level, is there a difference in the mean jump heights among the teams?
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3=\mu_4=\mu_5\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.1614[/latex]
- Conclusion: At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that there is no difference in the mean jump height among the teams.
A video game developer is testing a new game on three different groups. Each group represents a different target market for the game. The developer collects scores from a random sample from each group. The scores are recorded in the table below. Assume the scores are normally distribution and have equal variance.

Group A Group B Group C

101 151 101

108 149 109

98 160 198

107 112 186

111 126 160

At the [latex]5\%[/latex] significance level, test if the mean scores are the same for the different groups.
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.0592[/latex]
- Conclusion: At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that the mean scores are the same for the different groups.
Three students, Linda, Tuan, and Javier, are given five laboratory rats each for a nutritional experiment. Each rat’s weight is recorded in grams. Linda feeds her rats Formula [latex]A[/latex], Tuan feeds his rats Formula [latex]B[/latex], and Javier feeds his rats Formula [latex]C[/latex]. At the end of a specified time period, each rat is weighed again, and the net gain, in grams, is recorded. The results are shown in the table below. Assume the net weight gains are normally distribution and have equal variance.

Linda’s Rats Tuan’s Rats Javier’s Rats

43.5 47.0 51.2

39.4 40.5 40.9

41.3 38.9 37.9

46.0 46.3 45.0

38.2 44.2 48.6

At the [latex]5\%[/latex], determine if the three formulas produce the same mean net weight gain.
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.5305[/latex]
- Conclusion: At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that the mean net weight gain is the same for the three formulas.
A grassroots group opposed to a proposed increase in the gas tax claimed that the increase would hurt working-class people the most because they commute the farthest to work. Suppose that the group randomly surveyed [latex]8[/latex] individuals in each of three different income groups (working-class, middle-income, and wealthy), and asked them their daily one-way commuting distance, in kilometres. Assume the distances are normally distribution and have equal variance.

Working-Class Middle Income Wealthy

17.8 16.5 8.5

26.7 17.4 6.3

49.4 22.0 4.6

9.4 7.4 12.6

65.4 9.4 11.0

47.1 2.1 28.6

19.5 6.4 15.4

51.2 13.9 9.3

At the [latex]1\%[/latex] significance level, determine if there is a difference in the mean commuting distance for the three income groups.
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.0014[/latex]
- Conclusion: At the [latex]1\%[/latex] significance level, there is enough evidence to suggest that there is a difference in the mean commuting distance for the three income groups.
The following table lists the number of pages in four different types of magazines. Assume the number of pages are normally distribution and have equal variance.

Home Decorating News Health Computer

172 87 82 104

286 94 153 136

163 123 87 98

205 106 103 207

197 101 96 146

At the [latex]5\%[/latex] significance level, test if the four magazine types have the same mean number of pages.
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3=\mu_4\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.0012[/latex]
- Conclusion: At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that four magazine types do not have the same mean number of pages.
Are the means for the final exams the same for all statistics class delivery types? The table below shows the mean scores on final exams from several randomly selected classes that used the different delivery types. Assume the mean scores are normally distribution and have equal variance.

Online Hybrid Face-to-Face

72 83 80

84 73 78

77 84 84

80 81 81

81 86

79

82

At the [latex]5\%[/latex] significance level, determine if the mean score on the final exam is the same for the different course delivery methods.
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.5437[/latex]
- Conclusion: At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that the mean score on the final exam is the same for the different course delivery methods.
A local ski resort wants to know if the mean number of daily visitors is the three types of snow conditions. A sample of days is taken, and the number of visitors each day and the snow conditions are recorded. The results are shown in the table below. Assume the number of daily visitors are normally distribution and has equal variance.

Powder Machine Made Hard Packed

1,210 2,107 2,846

1,080 1,149 1,638

1,537 862 2,019

941 1,870 1,178

1,528 2,233

1,382

At the [latex]5\%[/latex] significance level, determine if there is a difference in the mean number of daily visitors for the different snow conditions.
Click to see Answer
- Hypotheses: [latex]\begin{eqnarray*}H_0:&&\mu_1=\mu_2=\mu_3\\H_a:&&\text{at least one population mean is different from the others}\end{eqnarray*}[/latex]
- [latex]p-\text{value}=0.0807[/latex]
- Conclusion: At the [latex]5\%[/latex] significance level, there is enough evidence to suggest that the mean number of daily visitors is the same for the different snow conditions.

“11.4 One-Way ANOVA and Hypothesis Tests for Three or More Population Means” and “11.5 Exercises” from Introduction to Statistics by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics - Second Edition Copyright © 2025 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Basketball	Baseball	Hockey	Lacrosse
3.6	2.1	4.0	2.0
2.9	2.6	2.0	3.6
2.5	3.9	2.6	3.9
3.3	3.1	3.2	2.7
3.8	3.4	3.2	2.5

Northeast	South	West	Central	East
16.3	16.9	16.4	16.2	17.1
16.1	16.5	16.5	16.6	17.2
16.4	16.4	16.6	16.5	16.6
16.5	16.2	16.1	16.4	16.8

Linda’s Rats	Tuan’s Rats	Javier’s Rats
43.5	47.0	51.2
39.4	40.5	40.9
41.3	38.9	37.9
46.0	46.3	45.0
38.2	44.2	48.6

Working-Class	Middle Income	Wealthy
17.8	16.5	8.5
26.7	17.4	6.3
49.4	22.0	4.6
9.4	7.4	12.6
65.4	9.4	11.0
47.1	2.1	28.6
19.5	6.4	15.4
51.2	13.9	9.3

Powder	Machine Made	Hard Packed
1,210	2,107	2,846
1,080	1,149	1,638
1,537	862	2,019
941	1,870	1,178
	1,528	2,233
	1,382

Route 1	Route 2	Route 3
30	27	16
32	29	41
27	28	22
35	36	31

Group 1	Group 2	Group 3
216	202	170
198	213	165
240	284	182
187	228	197
176	210	201

Team 1	Team 2	Team 3	Team 4
1	2	0	3
2	3	1	4
0	2	1	4
3	4	0	3
2	4	0	2

Team 1	Team 2	Team 3	Team 4	Team 5
90	80	120	95	102.5
105	87.5	125	110	97.5
127.5	95	97.5	115	100

Basketball	Baseball	Hockey	Lacrosse
3.6	2.1	4.0	2.0
2.9	2.6	2.0	3.6
2.5	3.9	2.6	3.9
3.3	3.1	3.2	2.7
3.8	3.4	3.2	2.5

One-Way ANOVA

Conducting a Hypothesis Test for Three or More Population Means

NOTES

ANOVA Summary Tables

NOTE

NOTES

NOTES

Exercises

License

Share This Book

Basketball	Baseball	Hockey	Lacrosse
3.6	2.1	4.0	2.0
2.9	2.6	2.0	3.6
2.5	3.9	2.6	3.9
3.3	3.1	3.2	2.7
3.8	3.4	3.2	2.5