10.4 The Goodness-of-Fit Test

Valerie Watts

10.4 The Goodness-of-Fit Test

LEARNING OBJECTIVES

Conduct and interpret [latex]\chi^2[/latex]-goodness-of-fit hypothesis tests.

Recall that a categorical (or qualitative) variable is a variable where the data can be grouped by specific categories. Examples of categorical variables include eye colour, blood type, or brand of car. A categorical variable is a random variable that takes on categories. Suppose we want to determine whether the data from a categorical variable “fit” a particular distribution or not. That is, for a categorical variable with a historical or assumed probability distribution, does a new sample from the population support the assumed probability distribution or does the sample indicate that there has been a change in the probability distribution?

The [latex]\chi^2[/latex]-goodness-of-fit test allows us the test if the sample data from a categorical variable fits the pattern of expected probabilities for the variable. In a [latex]\chi^2[/latex]-goodness-of-fit test, we are analyzing the distribution of the frequencies for one categorical variable. This is a hypothesis test where the hypotheses state that the categorial variable does or does not follow an assumed probability distribution and a [latex]\chi^2[/latex]-distribution is used to determine the p-value for the test.

Steps to Conduct a [latex]\chi^2[/latex]-Goodness-of-Fit Test

Suppose a categorical variable has [latex]k[/latex] possible outcomes (categories) with probabilities [latex]p_1, p_2,...,p_k[/latex]. Suppose [latex]n[/latex] independent observations are taken from this categorical variable.

Write down the null and alternative hypotheses:
[latex]\begin{eqnarray*} \\ H_0: & & p_1=p_{1_0}, p_2=p_{2_0}.,...p_k=p_{k_0} \\ H_a: & &\mbox{at least one } p_i\neq p_{i_0} \\ \\ \end{eqnarray*}[/latex]
Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].
Use the [latex]\chi^2[/latex]-distribution to find the p-value, which is the area in the right tail of the distribution. The [latex]\chi^2[/latex]-score and degrees of freedom are
[latex]\begin{eqnarray*}\chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ \\ df &= & k-1 \\ \\ \mbox{observed} & = & \mbox{observed frequency from the sample data} \\ \mbox{expected} & = & \mbox{expected frequency from assumed distribution} \\ k & = & \mbox{number of categories} \\ \\ \end{eqnarray*}[/latex]
Compare the p-value to the significance level and state the outcome of the test:
- If p-value[latex]\leq \alpha[/latex], reject [latex]H_0[/latex] in favour of [latex]H_a[/latex].
  - The results of the sample data are significant. There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
- If p-value[latex]\gt \alpha[/latex], do not reject [latex]H_0[/latex].
  - The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
Write down a concluding sentence specific to the context of the question.

NOTES

The null hypothesis is the claim that the categorial variable follows the assumed distribution. That is, the probability [latex]p_i[/latex] of each possible outcome of the categorical variable equals a hypothesized probability [latex]p_{i_0}[/latex].
The alternative hypothesis is the claim that the categorical variable does not follow the assumed distribution. That is, for at least one possible outcome of the categorical variable the probability [latex]p_i[/latex] does not equal the claimed probability [latex]p_{i_0}[/latex].
In order to use the [latex]\chi^2[/latex]-goodness-of-fit test, the expected frequency for each category must be at least 5.
The p-value for a [latex]\chi^2[/latex]-goodness-of-fit test is always the area in the right tail of the [latex]\chi^2[/latex]-distribution. So, we use chisq.dist.rt to find the p-value for a [latex]\chi^2[/latex]-goodness-of-fit test.
To calculate the [latex]\chi^2[/latex]-score:
1. For each of the possible outcomes of the categorical variable, calculate [latex]\displaystyle{\frac{(\mbox{observed-expected})^2}{\mbox{expected}}}[/latex]:
  1. Find the difference between the observed frequency (from the sample) and the expected frequency (from the null hypothesis). The expected frequency equals [latex]n \times p_{i_0}[/latex] where [latex]n[/latex] is the sample size and [latex]p_{i_0}[/latex] is the assumed probability for the [latex]i[/latex]th outcome claimed in the null hypothesis.
  2. Square the difference in step (i).
  3. Divide the value found in step (iii) by the expected frequency.
2. Add up the values of [latex]\displaystyle{\frac{(\mbox{observed-expected})^2}{\mbox{expected}}}[/latex] for each of the outcomes.
We expect that there will be a discrepancy between the observed frequency and the expected frequency. If this discrepancy is very large, the value of [latex]\chi^2[/latex] will be very large and result in a small p-value.

EXAMPLE

Absenteeism of college students from math classes is a major concern to math instructors because missing class appears to increase the drop rate. Suppose that a study was done to determine if the actual student absenteeism rate follows faculty perception. The faculty believe that the distribution of the number of absences per term is as follows:

Number of Absences per Term	Expected Percent of Students
0–2	50%
3–5	30%
6–8	12%
9–11	6%
12+	2%

At the end of the semester, a random survey of 300 students across all mathematics courses was taken and the actual (observed) number of absences for the 300 students is recorded.

Number of Absences per Term	Observed Number of Students
0–2	120
3–5	100
6–8	55
9–11	15
12+	10

At the 5% significance level, determine if the number of absences per term follow the distribution assumed by the faculty.

Solution:

Let [latex]p_1[/latex] be the probability a student has 0-2 absences, [latex]p_2[/latex] be the probability a student has 3-5 absences, [latex]p_3[/latex] be the probability a student has 6-8 absences, [latex]p_4[/latex] be the probability a student has 9-11 absences, and [latex]p_5[/latex] be the probability a student has 12 or more absences.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p_1=50\%, p_2=30\%, p_3=12\%, p_4=6\%, p_5=2\% \\ H_a: & & \mbox{at least one of the } p_i's \mbox{ does not equal its stated probability} \end{eqnarray*}[/latex]

p-value:

From the question, we have [latex]n=300[/latex] and [latex]k=5[/latex]. Now we need to calculate out the [latex]\chi^2[/latex]-score for the test.

The observed frequency for each category is the number of observations in the sample that fall into that category. This is the information provided in the sample above.

Next, we must calculate out the expected frequencies. The expected frequency is the number of observations we would expect to see in the sample, assuming the null hypothesis is true. To calculate the expected frequency for each category, we multiply the sample size [latex]n[/latex] by the probability associated with that category claimed in the null hypothesis.

Number of Absences per Term	Observed Frequency	Expected Frequency
0-2	120	0.5[latex]\times[/latex]300=150
3-5	100	0.3[latex]\times[/latex]300=90
6-8	55	0.12[latex]\times[/latex]300=36
9-11	15	0.06[latex]\times[/latex]300=18
12+	10	0.02[latex]\times[/latex]300=6

To calculate the [latex]\chi^2[/latex]-score, we work out the quantity [latex]\displaystyle{\frac{(\mbox{observed-expected})^2}{\mbox{expected}}}[/latex] for each category and then add up these quantities.

[latex]\begin{eqnarray*} \chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ & = & \frac{(120-150)^2}{150}+\frac{(100-90)^2}{90}+\frac{(55-36)^2}{36}+\frac{(15-18)^2}{18}+\frac{(10-6)^2}{6} \\ & = & 20.305... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]df=k-1=5-1=4[/latex]. The [latex]\chi^2[/latex]-goodness-of-fit test is a right tailed test, so we use the chisq.dist.rt function to find the p-value:

Function

chisq.dist.rt

Answer

Field 1

20.305….

0.0004

Field 2

4

So the p-value[latex]=0.0004[/latex].

Conclusion:

Because p-value[latex]=0.0004 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 5% significance level there is enough evidence to suggest that the number of absences per term does not follow the distribution assumed by faculty.

NOTES

The null hypothesis is the claim that the percent of students that fall into each category is as stated. That is, 50% students miss between 0 and 2 classes, 30% of the students miss between 3 and 5 students, etc.
The alternative hypothesis is the claim that at least one of the percent of students that fall into each category is not as stated. The alternative hypothesis does not say that every [latex]p_i[/latex] does not equal its stated probabilities, only that one of them does not equal its stated probability.
Keep all of the decimals throughout the calculation (i.e. in the calculation of the [latex]\chi^2[/latex]-score) to avoid any round-off error in the calculation of the p-value. This ensures that we get the most accurate value for the p-value. You can use Excel to calculate the expected frequencies and the [latex]\chi^2[/latex]-score.
The p-value is the area in the right tail of the [latex]\chi^2[/latex]-distribution, to the right of [latex]\chi^2=20.305...[/latex]. In the calculation of the p-value:
- The function is chisq.dist.rt because we are finding the area in the right tail of a [latex]\chi^2[/latex]-distribution.
- Field 1 is the value of [latex]\chi^2[/latex].
- Field 2 is the value of the degrees of freedom [latex]df[/latex].
The p-value of 0.0004 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis. In other words, student absenteeism does not fit faculty perception.

EXAMPLE

Employers want to know which days of the week employees have the highest number of absences in a five-day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of 60 managers are asked on which day of the week they had the highest number of employee absences. The results are recorded in the table below. At the 5% significance level, test if the day of the week with the highest number of absences occur with equal frequency during a five-day work week.

Day of the Week	Observed Frequency
Monday	15
Tuesday	11
Wednesday	10
Thursday	9
Friday	15

Solution:

Let [latex]p_1[/latex] be the probability the highest number of absences occurs on Monday, [latex]p_2[/latex] be probability the highest number of absences occurs on Tuesday, [latex]p_3[/latex] be the probability the highest number of absences occurs on Wednesday, [latex]p_4[/latex] be the probability the highest number of absences occurs on Thursday, and [latex]p_5[/latex] be the probability the highest number of absences occurs on Friday.

If the day of the week with the highest number of absences occurs with equal frequency, then the probability that any day has the highest number of absences is the same as any other day. Because there are 5 days (categories), if the frequencies are equal then each day would have a probability of 20% [latex]\left(\mbox{or }\frac{1}{5}\right)[/latex].

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p_1=p_2=p_3=p_4=p_5=20\% \\ H_a: & & \mbox{at least one of the } p_i \neq 20\% \end{eqnarray*}[/latex]

p-value:

From the question, we have [latex]n=60[/latex] and [latex]k=5[/latex]. Now we need to calculate out the [latex]\chi^2[/latex]-score for the test.

The observed frequency for each category is the number of observations in the sample that fall into that category. This is the information provided in the sample above.

Next, we must calculate out the expected frequencies. The expected frequency is the number of observations we would expect to see in the sample, assuming the null hypothesis is true. To calculate the expected frequency for each category, we multiply the sample size [latex]n[/latex] by the probability associated with that category claimed in the null hypothesis.

Day of the Week

Observed Frequency

Expected Frequency

Monday

15

0.2[latex]\times[/latex]60=12

Tuesday

11

0.2[latex]\times[/latex]60=12

Wednesday

10

0.2[latex]\times[/latex]60=12

Thursday

9

0.2[latex]\times[/latex]60=12

Friday

15

0.2[latex]\times[/latex]60=12

To calculate the [latex]\chi^2[/latex]-score, we work out the quantity [latex]\displaystyle{\frac{(\mbox{observed-expected})^2}{\mbox{expected}}}[/latex] for each category and then add up these quantities.

[latex]\begin{eqnarray*} \chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ & = & \frac{(15-12)^2}{12}+\frac{(11-12)^2}{12}+\frac{(10-12)^2}{12}+\frac{(9-12)^2}{12}+\frac{(15-12)^2}{12} \\ & = & 2.666... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]df=k-1=5-1=4[/latex]. The [latex]\chi^2[/latex]-goodness-of-fit test is a right tailed test, so we use the chisq.dist.rt function to find the p-value:

This is a chi square distribution. Along the horizontal axis the point chi square is labeled. The area in the right tail to the right of chi square is shaded and labeled with p-value.

Function

chisq.dist.rt

Answer

Field 1

2.666….

0.6151

Field 2

4

So the p-value[latex]=0.6151[/latex].

Conclusion:

Because p-value[latex]=0.6151 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis. At the 5% significance level there is enough evidence to suggest that the day of the week with the highest number of absences occur with equal frequency during a five-day work week.

NOTES

The null hypothesis is the claim that the probability each day of the week has the highest number of absences is 20%.
The alternative hypothesis is the claim that at least one of the probabilities is not 20%. The alternative hypothesis does not say that every [latex]p_i[/latex] does not equal 20%, only that one of them does not equal 20%.
Keep all of the decimals throughout the calculation (i.e. in the calculation of the [latex]\chi^2[/latex]-score) to avoid any round-off error in the calculation of the p-value. This ensures that we get the most accurate value for the p-value.
The p-value of 0.6151 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.

TRY IT

Teachers want to know which night each week their students are doing most of their homework. Most teachers think that students do homework equally throughout the week. Suppose a random sample of 49 students are asked on which night of the week they did the most homework. The results are shown in the table below. At the 5% significance level, are the nights that students do most of their homework equally distributed?

Day of Week	Number of Students
Sunday	11
Monday	8
Tuesday	10
Wednesday	7
Thursday	10
Friday	5
Saturday	5

Click to see Solution

Let [latex]p_1[/latex] be the probability students do their homework on Sunday, [latex]p_2[/latex] be the probability students do their homework on Monday, [latex]p_3[/latex] be the probability students do their homework on Tuesday, [latex]p_4[/latex] be the probability students do their homework on Wednesday, [latex]p_5[/latex] be the probability students do their homework on Thursday, [latex]p_6[/latex] be the probability students do their homework on Friday, and [latex]p_7[/latex] be the probability students do their homework on Saturday.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p_1=p_2=p_3=p_4=p_5=p_6=p_7=\frac{1}{7} \\ H_a: & & \mbox{at least one of the } p_i \neq \frac{1}{7} \end{eqnarray*}[/latex]

p-value:

From the question, we have [latex]n=49[/latex] and [latex]k=7[/latex].

Day of the Week

Observed Frequency

Expected Frequency

Sunday

11

1/7[latex]\times[/latex]49=7

Monday

8

1/7[latex]\times[/latex]49=7

Tuesday

10

1/7[latex]\times[/latex]49=7

Wednesday

7

1/7[latex]\times[/latex]49=7

Thursday

10

1/7[latex]\times[/latex]49=7

Friday

5

1/7[latex]\times[/latex]49=7

Saturday

5

1/7[latex]\times[/latex]49=7

[latex]\begin{eqnarray*} \chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ & = & \frac{(11-7)^2}{7}+\frac{(8-7)^2}{7}+\frac{(10-7)^2}{7}+\frac{(7-7)^2}{7} \\ & & +\frac{(10-7)^2}{7}+\frac{(5-7)^2}{7}+\frac{(5-7)^2}{7} \\ & = & 6.142... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]df=k-1=7-1=6[/latex].

Function

chisq.dist.rt

Answer

Field 1

6.142….

0.4074

Field 2

6

So the p-value[latex]=0.4074[/latex].

Conclusion:

Because p-value[latex]=0.4074 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis. At the 5% significance level there is enough evidence to suggest that the nights students do most of their homework are equally distributed.

TRY IT

One study indicates that the number of televisions that American families have is distributed as shown in this table:

Number of Televisions	Percent
0	10%
1	16%
2	55%
3	11%
4 or more	8%

A researcher wants to determine if the number of televisions that families in the far western part of the U.S. have the same distribution as the above study. A random sample of 600 families in the far western U.S. is taken and the results are recorded in the following table:

Number of Televisions	Observed Frequency
0	66
1	119
2	340
3	60
4 or more	15

At the 1% significance level, does it appear that the distribution of the number of televisions for families in the far western U.S is different from the distribution for the American population as a whole?

Click to see Solution

Let [latex]p_1[/latex] be the probability a family owns 0 televisions, [latex]p_2[/latex] be the probability a family owns 1 television, [latex]p_3[/latex] be the probability a family owns 2 televisions, [latex]p_4[/latex] be the probability a family owns 3 televisions, and [latex]p_5[/latex] be the probability a family owns 4 or more televisions.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p_1=10\%, p_2= 16\%, p_3= 55\%, p_4= 11\%, p_5=8\% \\ H_a: & & \mbox{at least one of the } p_i's \mbox{ does not equal its stated probability} \end{eqnarray*}[/latex]

p-value:

From the question, we have [latex]n=600[/latex] and [latex]k=5[/latex].

Number of Televisions	Observed Frequency	Expected Frequency
0	66	0.1[latex]\times[/latex]600=60
1	119	0.16[latex]\times[/latex]600=96
2	340	0.55[latex]\times[/latex]600=330
3	60	0.11[latex]\times[/latex]600=66
4 or more	15	0.08[latex]\times[/latex]600=48

[latex]\begin{eqnarray*} \chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ & = & \frac{(66-60)^2}{60}+\frac{(119-96)^2}{96}+\frac{(340-330)^2}{330}+\frac{(60-66)^2}{66} +\frac{(15-48)^2}{48} \\ & = & 29.646... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]df=k-1=5-1=4[/latex].

Function

chisq.dist.rt

Answer

Field 1

29.646….

0.000006

Field 2

4

So the p-value[latex]=0.000006[/latex].

Conclusion:

Because p-value[latex]=0.000006 \lt 0.01=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 1% significance level there is enough evidence to suggest that the distribution of the number of televisions for families in the far western U.S is different from the distribution for the American population as a whole.

TRY IT

The expected percentage of the number of pets students in the United States have in their homes is distributed as follows:

Number of Pets	Percent
0	18%
1	25%
2	30%
3	18%
4 or more	9%

A researcher wants to find out if the distribution of the number of pets students in Canada have is the same as the distribution shown in the U.S. A random sample of 1,000 students from Canada is taken and the results are shown in the table below:

Number of Pets	Observed Frequency
0	210
1	240
2	320
3	140
4+	90

At the 1% significance level, is the distribution of the number of pets students in Canada have different from the distribution for the United States?

Click to see Solution

Let [latex]p_1[/latex] be the probability a student owns 0 pets, [latex]p_2[/latex] be the probability a student owns 1 pet, [latex]p_3[/latex] be the probability a student owns 2 pets, [latex]p_4[/latex] be the probability a student owns 3 pets, and [latex]p_5[/latex] be the probability a student owns 4 or more pets.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p_1=18\%, p_2= 25\%, p_3= 30\%, p_4= 18\%, p_5=9\% \\ H_a: & & \mbox{at least one of the } p_i's \mbox{ does not equal its stated probability} \end{eqnarray*}[/latex]

p-value:

From the question, we have [latex]n=1000[/latex] and [latex]k=5[/latex].

Number of Pets	Observed Frequency	Expected Frequency
0	210	0.18[latex]\times[/latex]1000=180
1	240	0.25[latex]\times[/latex]1000=250
2	320	0.30[latex]\times[/latex]1000=300
3	140	0.18[latex]\times[/latex]1000=180
4 or more	90	0.09[latex]\times[/latex]1000=90

[latex]\begin{eqnarray*} \chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ & = & \frac{(210-180)^2}{180}+\frac{(240-250)^2}{250}+\frac{(320-300)^2}{300}+\frac{(140-180)^2}{180} +\frac{(90-90)^2}{90} \\ & = & 15.622... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]df-k-1=5-1=4[/latex].

Function

chisq.dist.rt

Answer

Field 1

15.622….

0.0036

Field 2

4

So the p-value[latex]=0.0036[/latex].

Conclusion:

Because p-value[latex]=0.0036 \lt 0.01=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 1% significance level there is enough evidence to suggest that the distribution of the number of pets students in Canada have is different from the distribution for the United States.

_{Watch this video: Pearson’s chi square test (goodness of fit) | Probability and Statistics | Khan Academy by Khan Academy [11:47]}

Concept Review

The [latex]\chi^2[/latex]-goodness-of-fit test is used to determine if a categorical variable follows a hypothesized distribution. The goodness-of-fit test is a well established process:

Write down the null and alternative hypotheses. The null hypothesis is the claim that the categorical variable follows the hypothesized distribution and the alternative hypothesis is the claim that the categorical variable does not follow the hypothesized distribution.
Collect the sample information for the test and identify the significance level.
The p-value is the area in the right tail of the [latex]\chi^2[/latex]-distribution where [latex]\displaystyle{\chi^2 = \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}}}[/latex] and [latex]df=k-1[/latex].
Compare the p-value to the significance level and state the outcome of the test.
Write down a concluding sentence specific to the context of the question.

Attribution

“11.2 Goodness-of-Fit Test“ in Introductory Statistics by OpenStax is licensed under a Creative Commons Attribution 4.0 International License.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License