10.4 The Goodness-of-Fit Test

LEARNING OBJECTIVES

  • Conduct and interpret [latex]\chi^2[/latex]-goodness-of-fit hypothesis tests.

Recall that a categorical (or qualitative) variable is a variable where the data can be grouped by specific categories.  Examples of categorical variables include eye colour, blood type, or brand of car.  A categorical variable is a random variable that takes on categories.  Suppose we want to determine whether the data from a categorical variable “fit” a particular distribution or not.  That is, for a categorical variable with a historical or assumed probability distribution, does a new sample from the population support the assumed probability distribution or does the sample indicate that there has been a change in the probability distribution?

The [latex]\chi^2[/latex]-goodness-of-fit test allows us the test if the sample data from a categorical variable fits the pattern of expected probabilities for the variable.  In a [latex]\chi^2[/latex]-goodness-of-fit test, we are analyzing the distribution of the frequencies for one categorical variable.  This is a hypothesis test where the hypotheses state that the categorial variable does or does not follow an assumed probability distribution and a [latex]\chi^2[/latex]-distribution is used to determine the p-value for the test.

Steps to Conduct a [latex]\chi^2[/latex]-Goodness-of-Fit Test

Suppose a categorical variable has [latex]k[/latex] possible outcomes (categories) with probabilities [latex]p_1, p_2,...,p_k[/latex].  Suppose [latex]n[/latex] independent observations are taken from this categorical variable.

  1. Write down the null and alternative hypotheses:

    [latex]\begin{eqnarray*} \\ H_0: & & p_1=p_{1_0}, p_2=p_{2_0}.,...p_k=p_{k_0} \\ H_a: & &\mbox{at least one } p_i\neq p_{i_0} \\ \\ \end{eqnarray*}[/latex]

  2. Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].
  3. Use the [latex]\chi^2[/latex]-distribution to find the p-value, which is the area in the right tail of the distribution.  The [latex]\chi^2[/latex]-score and degrees of freedom are

    [latex]\begin{eqnarray*}\chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ \\ df &= & k-1  \\ \\ \mbox{observed} & = & \mbox{observed frequency from the sample data} \\ \mbox{expected} & = & \mbox{expected frequency from assumed distribution} \\ k & = & \mbox{number of categories} \\ \\ \end{eqnarray*}[/latex]

  4. Compare the p-value to the significance level and state the outcome of the test:
    • If p-value[latex]\leq \alpha[/latex], reject [latex]H_0[/latex] in favour of [latex]H_a[/latex].
      • The results of the sample data are significant.  There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
    • If p-value[latex]\gt \alpha[/latex], do not reject [latex]H_0[/latex].
      • The results of the sample data are not significant.  There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
  5. Write down a concluding sentence specific to the context of the question.

NOTES

  1. The null hypothesis is the claim that the categorial variable follows the assumed distribution.  That is, the probability [latex]p_i[/latex] of each possible outcome of the categorical variable equals a hypothesized probability [latex]p_{i_0}[/latex].
  2. The alternative hypothesis is the claim that the categorical variable does not follow the assumed distribution.  That is, for at least one possible outcome of the categorical variable the probability [latex]p_i[/latex] does not equal the claimed probability [latex]p_{i_0}[/latex].
  3. In order to use the [latex]\chi^2[/latex]-goodness-of-fit test, the expected frequency for each category must be at least 5.
  4. The p-value for a [latex]\chi^2[/latex]-goodness-of-fit test is always the area in the right tail of the [latex]\chi^2[/latex]-distribution.  So, we use chisq.dist.rt to find the p-value for a [latex]\chi^2[/latex]-goodness-of-fit test.
  5. To calculate the [latex]\chi^2[/latex]-score:
    1. For each of the possible outcomes of the categorical variable, calculate [latex]\displaystyle{\frac{(\mbox{observed-expected})^2}{\mbox{expected}}}[/latex]:
      1. Find the difference between the observed frequency (from the sample) and the expected frequency (from the null hypothesis).  The expected frequency equals [latex]n \times p_{i_0}[/latex] where [latex]n[/latex] is the sample size and [latex]p_{i_0}[/latex] is the assumed probability for the [latex]i[/latex]th outcome claimed in the null hypothesis.
      2. Square the difference in step (i).
      3. Divide the value found in step (iii) by the expected frequency.
    2. Add up the values of [latex]\displaystyle{\frac{(\mbox{observed-expected})^2}{\mbox{expected}}}[/latex] for each of the outcomes.
  6. We expect that there will be a discrepancy between the observed frequency and the expected frequency.  If this discrepancy is very large, the value of [latex]\chi^2[/latex] will be very large and result in a small p-value.

EXAMPLE

Absenteeism of college students from math classes is a major concern to math instructors because missing class appears to increase the drop rate.  Suppose that a study was done to determine if the actual student absenteeism rate follows faculty perception.  The faculty believe that the distribution of the number of absences per term is as follows:

Number of Absences per Term Expected Percent of Students
0–2 50%
3–5 30%
6–8 12%
9–11 6%
12+ 2%

At the end of the semester, a random survey of 300 students across all mathematics courses was taken and the actual (observed) number of absences for the 300 students is recorded.

Number of Absences per Term Observed Number of Students
0–2 120
3–5 100
6–8 55
9–11 15
12+ 10

At the 5% significance level, determine if the number of absences per term follow the distribution assumed by the faculty.

Solution:

Let [latex]p_1[/latex] be the probability a student has 0-2 absences,  [latex]p_2[/latex] be the probability a student has 3-5 absences, [latex]p_3[/latex] be the probability a student has 6-8 absences, [latex]p_4[/latex] be the probability a student has 9-11 absences, and [latex]p_5[/latex] be the probability a student has 12 or more absences.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p_1=50\%, p_2=30\%, p_3=12\%, p_4=6\%, p_5=2\%  \\ H_a: & & \mbox{at least one of the } p_i's \mbox{ does not equal its stated probability} \end{eqnarray*}[/latex]

p-value: 

From the question, we have [latex]n=300[/latex] and [latex]k=5[/latex].  Now we need to calculate out the [latex]\chi^2[/latex]-score for the test.

The observed frequency for each category is the number of observations in the sample that fall into that category.  This is the information provided in the sample above.

Next, we must calculate out the expected frequencies.  The expected frequency is the number of observations we would expect to see in the sample, assuming the null hypothesis is true.  To calculate the expected frequency for each category, we multiply the sample size [latex]n[/latex] by the probability associated with that category claimed in the null hypothesis.

Number of Absences per Term Observed Frequency Expected Frequency
0-2 120 0.5[latex]\times[/latex]300=150
3-5 100 0.3[latex]\times[/latex]300=90
6-8 55 0.12[latex]\times[/latex]300=36
9-11 15 0.06[latex]\times[/latex]300=18
12+ 10 0.02[latex]\times[/latex]300=6

To calculate the [latex]\chi^2[/latex]-score, we work out the quantity [latex]\displaystyle{\frac{(\mbox{observed-expected})^2}{\mbox{expected}}}[/latex] for each category and then add up these quantities.

[latex]\begin{eqnarray*} \chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ & = & \frac{(120-150)^2}{150}+\frac{(100-90)^2}{90}+\frac{(55-36)^2}{36}+\frac{(15-18)^2}{18}+\frac{(10-6)^2}{6} \\ & = & 20.305... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]df=k-1=5-1=4[/latex].  The [latex]\chi^2[/latex]-goodness-of-fit test is a right tailed test, so we use the chisq.dist.rt function to find the p-value:

This is a chi square distribution. Along the horizontal axis the point chi square is labeled. The area in the right tail to the right of chi square is shaded and labeled with p-value.

Function  chisq.dist.rt Answer
Field 1 20.305…. 0.0004
Field 2 4

So the p-value[latex]=0.0004[/latex].

Conclusion:

Because p-value[latex]=0.0004 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that the number of absences per term does not follow the distribution assumed by faculty.

NOTES

  1. The null hypothesis is the claim that the percent of students that fall into each category is as stated.  That is, 50% students miss between 0 and 2 classes, 30% of the students miss between 3 and 5 students, etc.
  2. The alternative hypothesis is the claim that at least one of the percent of students that fall into each category is not as stated.  The alternative hypothesis does not say that every [latex]p_i[/latex] does not equal its stated probabilities, only that one of them does not equal its stated probability.
  3. Keep all of the decimals throughout the calculation (i.e. in the calculation of the [latex]\chi^2[/latex]-score) to avoid any round-off error in the calculation of the p-value.  This ensures that we get the most accurate value for the p-value.  You can use Excel to calculate the expected frequencies and the [latex]\chi^2[/latex]-score.
  4. The p-value is the area in the right tail of the [latex]\chi^2[/latex]-distribution, to the right of [latex]\chi^2=20.305...[/latex].  In the calculation of the p-value:
    • The function is chisq.dist.rt because we are finding the area in the right tail of a [latex]\chi^2[/latex]-distribution.
    • Field 1 is the value of [latex]\chi^2[/latex].
    • Field 2 is the value of the degrees of freedom [latex]df[/latex].
  5. The p-value of 0.0004 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, student absenteeism does not fit faculty perception.

EXAMPLE

Employers want to know which days of the week employees have the highest number of absences in a five-day work week.  Most employers would like to believe that employees are absent equally during the week.  Suppose a random sample of 60 managers are asked on which day of the week they had the highest number of employee absences.  The results are recorded in the table below.  At the 5% significance level, test if the day of the week with the highest number of absences occur with equal frequency during a five-day work week.

Day of the Week Observed Frequency
Monday 15
Tuesday 11
Wednesday 10
Thursday 9
Friday 15

 

Solution:

Let [latex]p_1[/latex] be the probability the highest number of absences occurs on Monday,  [latex]p_2[/latex] be probability the highest number of absences occurs on Tuesday, [latex]p_3[/latex] be the probability the highest number of absences occurs on Wednesday, [latex]p_4[/latex] be the probability the highest number of absences occurs on Thursday, and [latex]p_5[/latex] be the probability the highest number of absences occurs on Friday.

If the day of the week with the highest number of absences occurs with equal frequency, then the probability that any day has the highest number of absences is the same as any other day.  Because there are 5 days (categories), if the frequencies are equal then each day would have a probability of 20% [latex]\left(\mbox{or }\frac{1}{5}\right)[/latex].

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p_1=p_2=p_3=p_4=p_5=20\%  \\ H_a: & & \mbox{at least one of the } p_i \neq 20\%  \end{eqnarray*}[/latex]

p-value: 

From the question, we have [latex]n=60[/latex] and [latex]k=5[/latex].  Now we need to calculate out the [latex]\chi^2[/latex]-score for the test.

The observed frequency for each category is the number of observations in the sample that fall into that category.  This is the information provided in the sample above.

Next, we must calculate out the expected frequencies.  The expected frequency is the number of observations we would expect to see in the sample, assuming the null hypothesis is true.  To calculate the expected frequency for each category, we multiply the sample size [latex]n[/latex] by the probability associated with that category claimed in the null hypothesis.

Day of the Week Observed Frequency Expected Frequency
Monday 15 0.2[latex]\times[/latex]60=12
Tuesday 11 0.2[latex]\times[/latex]60=12
Wednesday 10 0.2[latex]\times[/latex]60=12
Thursday 9 0.2[latex]\times[/latex]60=12
Friday 15 0.2[latex]\times[/latex]60=12

To calculate the [latex]\chi^2[/latex]-score, we work out the quantity [latex]\displaystyle{\frac{(\mbox{observed-expected})^2}{\mbox{expected}}}[/latex] for each category and then add up these quantities.

[latex]\begin{eqnarray*} \chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ & = & \frac{(15-12)^2}{12}+\frac{(11-12)^2}{12}+\frac{(10-12)^2}{12}+\frac{(9-12)^2}{12}+\frac{(15-12)^2}{12} \\ & = & 2.666... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]df=k-1=5-1=4[/latex].  The [latex]\chi^2[/latex]-goodness-of-fit test is a right tailed test, so we use the chisq.dist.rt function to find the p-value:

This is a chi square distribution. Along the horizontal axis the point chi square is labeled. The area in the right tail to the right of chi square is shaded and labeled with p-value.

Function  chisq.dist.rt Answer
Field 1 2.666…. 0.6151
Field 2 4

So the p-value[latex]=0.6151[/latex].

Conclusion:

Because p-value[latex]=0.6151 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is enough evidence to suggest that the day of the week with the highest number of absences occur with equal frequency during a five-day work week.

NOTES

  1. The null hypothesis is the claim that the probability each day of the week has the highest number of absences is 20%.
  2. The alternative hypothesis is the claim that at least one of the probabilities is not 20%.  The alternative hypothesis does not say that every [latex]p_i[/latex] does not equal 20%, only that one of them does not equal 20%.
  3. Keep all of the decimals throughout the calculation (i.e. in the calculation of the [latex]\chi^2[/latex]-score) to avoid any round-off error in the calculation of the p-value.  This ensures that we get the most accurate value for the p-value.
  4. The p-value of 0.6151 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.

TRY IT

Teachers want to know which night each week their students are doing most of their homework.  Most teachers think that students do homework equally throughout the week.  Suppose a random sample of 49 students are asked on which night of the week they did the most homework.  The results are shown in the table below.  At the 5% significance level, are the nights that students do most of their homework equally distributed?

Day of Week Number of Students
Sunday 11
Monday 8
Tuesday 10
Wednesday 7
Thursday 10
Friday 5
Saturday 5

 

Click to see Solution

 

Let [latex]p_1[/latex] be the probability students do their homework on Sunday,  [latex]p_2[/latex] be the probability students do their homework on Monday, [latex]p_3[/latex] be the probability students do their homework on Tuesday, [latex]p_4[/latex] be the probability students do their homework on Wednesday, [latex]p_5[/latex] be the probability students do their homework on Thursday, [latex]p_6[/latex] be the probability students do their homework on Friday, and [latex]p_7[/latex] be the probability students do their homework on Saturday.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p_1=p_2=p_3=p_4=p_5=p_6=p_7=\frac{1}{7}  \\ H_a: & & \mbox{at least one of the } p_i \neq \frac{1}{7}  \end{eqnarray*}[/latex]

p-value: 

From the question, we have [latex]n=49[/latex] and [latex]k=7[/latex].

Day of the Week Observed Frequency Expected Frequency
Sunday 11 1/7[latex]\times[/latex]49=7
Monday 8 1/7[latex]\times[/latex]49=7
Tuesday 10 1/7[latex]\times[/latex]49=7
Wednesday 7 1/7[latex]\times[/latex]49=7
Thursday 10 1/7[latex]\times[/latex]49=7
Friday 5 1/7[latex]\times[/latex]49=7
Saturday 5 1/7[latex]\times[/latex]49=7

[latex]\begin{eqnarray*} \chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ & = & \frac{(11-7)^2}{7}+\frac{(8-7)^2}{7}+\frac{(10-7)^2}{7}+\frac{(7-7)^2}{7} \\ &  & +\frac{(10-7)^2}{7}+\frac{(5-7)^2}{7}+\frac{(5-7)^2}{7} \\ & = & 6.142... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]df=k-1=7-1=6[/latex].

Function  chisq.dist.rt Answer
Field 1 6.142…. 0.4074
Field 2 6

So the p-value[latex]=0.4074[/latex].

Conclusion:

Because p-value[latex]=0.4074 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is enough evidence to suggest that the nights students do most of their homework are equally distributed.

TRY IT

One study indicates that the number of televisions that American families have is distributed as shown in this table:

Number of Televisions Percent
0 10%
1 16%
2 55%
3 11%
4 or more 8%

A researcher wants to determine if the number of televisions that families in the far western part of the U.S. have the same distribution as the above study.  A random sample of 600 families in the far western U.S. is taken and the results are recorded in the following table:

Number of Televisions Observed Frequency
0 66
1 119
2 340
3 60
4 or more 15

At the 1% significance level, does it appear that the distribution of the number of televisions for families in the far western U.S is different from the distribution for the American population as a whole?

 

Click to see Solution

 

Let [latex]p_1[/latex] be the probability a family owns 0 televisions,  [latex]p_2[/latex] be the probability a family owns 1 television, [latex]p_3[/latex] be the probability a family owns 2 televisions, [latex]p_4[/latex] be the probability a family owns 3 televisions, and [latex]p_5[/latex] be the probability a family owns 4 or more televisions.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p_1=10\%, p_2= 16\%, p_3= 55\%, p_4= 11\%, p_5=8\%  \\ H_a: & & \mbox{at least one of the } p_i's \mbox{ does not equal its stated probability} \end{eqnarray*}[/latex]

p-value: 

From the question, we have [latex]n=600[/latex] and [latex]k=5[/latex].

Number of Televisions Observed Frequency Expected Frequency
0 66 0.1[latex]\times[/latex]600=60
1 119 0.16[latex]\times[/latex]600=96
2 340 0.55[latex]\times[/latex]600=330
3 60 0.11[latex]\times[/latex]600=66
4 or more 15 0.08[latex]\times[/latex]600=48

[latex]\begin{eqnarray*} \chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ & = & \frac{(66-60)^2}{60}+\frac{(119-96)^2}{96}+\frac{(340-330)^2}{330}+\frac{(60-66)^2}{66}  +\frac{(15-48)^2}{48} \\ & = & 29.646... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]df=k-1=5-1=4[/latex].

Function  chisq.dist.rt Answer
Field 1 29.646…. 0.000006
Field 2 4

So the p-value[latex]=0.000006[/latex].

Conclusion:

Because p-value[latex]=0.000006 \lt 0.01=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 1% significance level there is enough evidence to suggest that the distribution of the number of televisions for families in the far western U.S is different from the distribution for the American population as a whole.

TRY IT

The expected percentage of the number of pets students in the United States have in their homes is distributed as follows:

Number of Pets Percent
0 18%
1 25%
2 30%
3 18%
4 or more 9%

A researcher wants to find out if the distribution of the number of pets students in Canada have is the same as the distribution shown in the U.S.  A random sample of 1,000 students from Canada is taken and the results are shown in the table below:

Number of Pets Observed Frequency
0 210
1 240
2 320
3 140
4+ 90

At the 1% significance level, is the distribution of the number of pets students in Canada have different from the distribution for the United States?

 

Click to see Solution

 

Let [latex]p_1[/latex] be the probability a student owns 0 pets,  [latex]p_2[/latex] be the probability a student owns 1 pet, [latex]p_3[/latex] be the probability a student owns 2 pets, [latex]p_4[/latex] be the probability a student owns 3 pets, and [latex]p_5[/latex] be the probability a student owns 4 or more pets.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p_1=18\%, p_2= 25\%, p_3= 30\%, p_4= 18\%, p_5=9\%  \\ H_a: & & \mbox{at least one of the } p_i's \mbox{ does not equal its stated probability} \end{eqnarray*}[/latex]

p-value: 

From the question, we have [latex]n=1000[/latex] and [latex]k=5[/latex].

Number of Pets Observed Frequency Expected Frequency
0 210 0.18[latex]\times[/latex]1000=180
1 240 0.25[latex]\times[/latex]1000=250
2 320 0.30[latex]\times[/latex]1000=300
3 140 0.18[latex]\times[/latex]1000=180
4 or more 90 0.09[latex]\times[/latex]1000=90

[latex]\begin{eqnarray*} \chi^2 & = & \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}} \\ & = & \frac{(210-180)^2}{180}+\frac{(240-250)^2}{250}+\frac{(320-300)^2}{300}+\frac{(140-180)^2}{180}  +\frac{(90-90)^2}{90} \\ & = & 15.622... \end{eqnarray*}[/latex]

The degrees of freedom for the [latex]\chi^2[/latex]-distribution is [latex]df-k-1=5-1=4[/latex].

Function  chisq.dist.rt Answer
Field 1 15.622…. 0.0036
Field 2 4

So the p-value[latex]=0.0036[/latex].

Conclusion:

Because p-value[latex]=0.0036 \lt 0.01=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 1% significance level there is enough evidence to suggest that the distribution of the number of pets students in Canada have is different from the distribution for the United States.


Watch this video: Pearson’s chi square test (goodness of fit) | Probability and Statistics | Khan Academy by Khan Academy [11:47]


Concept Review

The [latex]\chi^2[/latex]-goodness-of-fit test is used to determine if a categorical variable follows a hypothesized distribution.  The goodness-of-fit test is a well established process:

  1. Write down the null and alternative hypotheses.  The null hypothesis is the claim that the categorical variable follows the hypothesized distribution and the alternative hypothesis is the claim that the categorical variable does not follow the hypothesized distribution.
  2. Collect the sample information for the test and identify the significance level.
  3. The p-value is the area in the right tail of the [latex]\chi^2[/latex]-distribution where [latex]\displaystyle{\chi^2  =  \sum \frac{(\mbox{observed-expected})^2}{\mbox{expected}}}[/latex] and [latex]df=k-1[/latex].
  4. Compare the p-value to the significance level and state the outcome of the test.
  5. Write down a concluding sentence specific to the context of the question.

Attribution

11.2 Goodness-of-Fit Test in Introductory Statistics by OpenStax is licensed under a Creative Commons Attribution 4.0 International License.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.