"

12.5 Coefficient of Determination

LEARNING OBJECTIVES

  • Calculate and interpret the coefficient of determination.

Previously, we saw how to use the correlation coefficient to measure the strength and direction of the linear relationship between the independent and dependent variables. The correlation coefficient gives us a way to measure how good a linear regression model fits the data. The coefficient of determination is another way to evaluate how well a linear regression model fits the data. Denoted [latex]r^2[/latex], the coefficient of determination is the proportion of variation in the dependent variable that can be explained by the regression equation based on the independent variable. The coefficient of determination is the square of the correlation coefficient.

The coefficient of determination is a number between [latex]0[/latex] and [latex]1[/latex] and is the decimal form of a percent. The closer the coefficient of determination is to [latex]1[/latex], the better the independent variable is at predicting the dependent variable. When we interpret the coefficient of determination, we use the percent form. When expressed as a percent, [latex]r^2[/latex] represents the percent of variation in the dependent variable [latex]y[/latex] that can be explained by the variation in the independent variable [latex]x[/latex] using the regression line. When interpreting the coefficient of determination, remember to be specific to the context of the question.

EXAMPLE

A statistics professor wants to study the relationship between a student’s score on the third exam in the course and their final exam score. The professor took a random sample of [latex]11[/latex] students and recorded their third exam score (out of [latex]80[/latex]) and their final exam score (out of [latex]200[/latex]). The results are recorded in the table below. The professor wants to develop a linear regression model to predict a student’s final exam score from the third exam score.

Student Third Exam Score Final Exam Score
1 65 175
2 67 133
3 71 185
4 71 163
5 66 126
6 75 198
7 67 153
8 70 163
9 71 159
10 69 151
11 69 159

Previously we found the correlation coefficient [latex]r=0.6631[/latex] and the line-of-best-fit [latex]\hat{y}=-173.51+4.83x[/latex] where [latex]x[/latex] is the third exam score and [latex]\hat{y}[/latex] is the (predicted) final exam score.

  1. Find the coefficient of determination.
  2. Interpret the coefficient of determination found in part 1.

Solution

  1. [latex]\displaystyle{r^2=(0.6631)^2=0.4397}[/latex].
  2. [latex]43.97\%[/latex] of the variation in the final exam score can be explained by the regression line based on the third exam score.

TRY IT

SCUBA divers have maximum dive times they cannot exceed when going to different depths. The data in the table below shows different depths with the maximum dive times in minutes. Previously, we found the correlation coefficient and the regression line to predict the maximum dive time from depth.

Depth (in feet) Maximum Dive Time (in minutes)
50 80
60 55
70 45
80 35
90 25
100 22
  1. Find the coefficient of determination.
  2. Interpret the coefficient of determination found in part 1.
Click to see Solution
  1. [latex]\displaystyle{r^2=(-0.9629)=0.9272}[/latex].
  2. [latex]92.72\%[/latex] of the variation in the maximum dive time can be explained by the regression line based on depth.

Exercises

  1. In a random sample of ten professional athletes, the number of endorsements the player has and the amount of money (in millions of dollars) the player earns are recorded in the table below. (Note: for identification of the independent and dependent variables, refer back to Question 4 in Section 12.4.)
    Player Number of Endorsements Money Earned (in millions)
    1 0 2
    2 3 8
    3 2 7
    4 1 3
    5 5 13
    6 5 12
    7 4 9
    8 3 9
    9 0 3
    10 4 10
    1. Calculate the coefficient of determination.
    2. Interpret the coefficient of determination.
    Click to see Answer
    1. [latex]0.9577[/latex]
    2. [latex]95.77\%[/latex] of the money earned by an athlete can be explained by the regression line based on the number of endorsements.

     

  2. The table below gives the percentage of workers who are paid hourly rates for the years 1979 to 1992. (Note: for identification of the independent and dependent variables, refer back to Question 7 in Section 12.2.)
    Year Percent of Workers Paid Hourly Rates
    1979 61.2
    1980 60.7
    1981 61.3
    1982 61.3
    1983 61.8
    1984 61.7
    1985 61.8
    1986 62.0
    1987 62.7
    1990 62.8
    1992 62.9
    1. Find the coefficient of determination.
    2. Interpret the coefficient of determination.
    Click to see Answer
    1. [latex]0.8926[/latex]
    2. [latex]89.26\%[/latex] of the variation in the percent of workers paid an hourly rate can be explained by the regression line based on year.

     

  3. The table below contains real data for the first two decades of AIDS cases. (Note: for identification of the independent and dependent variables, refer back to Question 1 in Section 12.2.)
    Year Number of AIDS Cases
    1981 319
    1982 1,170
    1983 3,076
    1984 6,240
    1985 11,776
    1986 19,032
    1987 28,564
    1988 35,447
    1989 42,674
    1990 48,634
    1991 59,660
    1992 78,530
    1993 78,834
    1994 71,874
    1995 68,505
    1996 59,347
    1997 47,149
    1998 38,393
    1999 25,174
    2000 25,522
    2001 25,643
    2002 26,464
    1. Calculate the coefficient of determination.
    2. Interpret the coefficient of determination.
    Click to see Answer
    1. [latex]0.2049[/latex]
    2. [latex]20.49\%[/latex] of the variation in the number of AIDS cases is explained by the regression line based on the year.

     

  4. Recently, the annual number of driver deaths per [latex]100,000[/latex] for the selected age groups was as shown in the table below. (Note: for identification of the independent and dependent variables, refer back to Question 8 in Section 12.2.)
    Age Number of Driver Deaths per [latex]100,000[/latex]
    17.5 38
    22 36
    29.5 24
    44.5 20
    64.5 18
    80 28
    1. Find the coefficient of determination.
    2. Interpret the coefficient of determination.
    Click to see Answer
    1. [latex]0.3349[/latex]
    2. [latex]33.49\%[/latex] of the number of driver deaths per [latex]100,000[/latex] is explained by the regression line based on age.

     

  5. The table below shows the life expectancy for an individual born in the United States in certain years. (Note: for identification of the independent and dependent variables, refer back to Question 9 in Section 12.2.)
    Year of Birth Life Expectancy
    1930 59.7
    1940 62.9
    1950 70.2
    1965 69.7
    1973 71.4
    1982 74.5
    1987 75
    1992 75.7
    2010 78.7
    1. Calculate the coefficient of determination.
    2. Interpret the coefficient of determination.
    Click to see Answer
    1. [latex]0.9240[/latex]
    2. [latex]92.40\%[/latex] of the variation in life expectancy is explained by the regression line based on the year.

     

  6. The height (sidewalk to roof) of notable tall buildings in America is compared to the number of stories of the building (beginning at street level). (Note: for identification of the independent and dependent variables, refer back to Question 10 in Section 12.2.)
    Height (in feet) Number of Stories
    1,050 57
    428 28
    362 26
    529 40
    790 60
    401 22
    380 38
    1,454 110
    1,127 100
    700 46
    1. Calculate the coefficient of determination.
    2. Interpret the coefficient of determination.
    Click to see Answer
    1. [latex]0.8903[/latex]
    2. [latex]89.03\%[/latex] of the variation in the height is explained by the regression line based on the number of stories.

     

  7. The following table shows data on average per capita wine consumption and heart disease rate in a random sample of 10 countries. (Note: for identification of the independent and dependent variables, refer back to Question 11 in Section 12.2.)
    Per Capita Yearly Wine Consumption in Liters Per Capita Death from Heart Disease
    2.5 221
    3.9 167
    2.9 131
    2.4 191
    2.9 220
    0.8 297
    9.1 71
    2.7 172
    0.8 211
    0.7 300
    1. Calculate the coefficient of determination.
    2. Interpret the coefficient of determination.
    Click to see Answer
    1. [latex]0.6987[/latex]
    2. [latex]69.87\%[/latex] of the variation in the heart disease rate is explained by the regression line based on yearly wine consumption.

     

  8. The following table consists of one student athlete’s time (in minutes) to swim 2000 meters and the student’s heart rate (beats per minute) after swimming on a random sample of 10 days. (Note: for identification of the independent and dependent variables, refer back to Question 12 in Section 12.2.)
    Swim Time Heart Rate
    34.12 144
    35.72 152
    34.72 124
    34.05 140
    34.13 152
    35.73 146
    36.17 128
    35.57 136
    35.37 144
    35.57 148
    1. Calculate the coefficient of determination.
    2. Interpret the coefficient of determination.
    Click to see Answer
    1. [latex]0.0153[/latex]
    2. [latex]1.53\%[/latex] of the variation in heart rate is explained by the regression line based on swim time.

     


12.6 Coefficient of Determination” and “12.8 Exercises” from Introduction to Statistics by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics - Second Edition Copyright © 2025 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book