"

13.3 Coefficient of Multiple Determination

LEARNING OBJECTIVES

  • Calculate and interpret the coefficient of multiple determination.
  • Calculate and interpret the adjusted coefficient of multiple determination.

Previously, we learned about the coefficient of determination, [latex]r^2[/latex], for simple linear regression, which is the proportion of variation in the dependent variable that can be explained by the simple linear regression model based on the independent variable.  The coefficient of determination is a good way to measure how well the simple linear regression model fits the data.

Coefficient of Multiple Determination

The coefficient of multiple determination, denoted [latex]R^2[/latex], in multiple regression is similar to the coefficient of determination in simple linear regression, except in multiple regression, there is more than one independent variable.  The coefficient of multiple determination is the proportion of variation in the dependent variable that can be explained by the multiple regression model based on the independent variables.

The value of the coefficient of multiple determination is found on the regression summary table, which we learned how to generate in Excel in a previous section.  We interpret the coefficient of multiple determination in the same way that we interpret the coefficient of determination for simple linear regression.

EXAMPLE

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income.  A sample of [latex]25[/latex] employees at the company is taken, and the data is recorded in the table below.  The employee’s income is recorded in [latex]\$1000[/latex]s, and the job satisfaction score is out of [latex]10[/latex], with higher values indicating greater job satisfaction.

Job Satisfaction Hours of Unpaid Work per Week Age Income [latex](\$1000[/latex]s)
4 3 23 60
5 8 32 114
2 9 28 45
6 4 60 187
7 3 62 175
8 1 43 125
7 6 60 93
3 3 37 57
5 2 24 47
5 5 64 128
7 2 28 66
8 1 66 146
5 7 35 89
2 5 37 56
4 0 59 65
6 2 32 95
5 6 76 82
7 5 25 90
9 0 55 137
8 3 34 91
7 5 54 184
9 1 57 60
7 0 68 39
10 2 66 187
5 0 50 49

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*}\hat{y}&=&4.7993-0.3818x_1+0.0046x_2+0.0233x_3\\\\\hat{y}&=&\text{predicted job satisfaction score}\\x_1&=&\text{hours of unpaid work per week}\\x_2&=&\text{age}\\x_3&=&\text{income (\$1000s)}\end{eqnarray*}[/latex]

  1. Find the coefficient of multiple determination.
  2. Interpret the coefficient of multiple determination.

Solution

  1. The regression summary table generated by Excel is shown below:
    SUMMARY OUTPUT
    Regression Statistics
    Multiple R 0.711779225
    R Square 0.506629665
    Adjusted R Square 0.436148189
    Standard Error 1.585212784
    Observations 25
    ANOVA
    df SS MS F Significance F
    Regression 3 54.189109 18.06303633 7.18812504 0.001683189
    Residual 21 52.770891 2.512899571
    Total 24 106.96
    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
    Intercept 4.799258185 1.197185164 4.008785216 0.00063622 2.309575344 7.288941027
    Hours of Unpaid Work per Week -0.38184722 0.130750479 -2.9204269 0.008177146 -0.65375772 -0.10993671
    Age 0.004555815 0.022855709 0.199329423 0.843922453 -0.04297523 0.052086864
    Income ([latex]\$1000[/latex]s) 0.023250418 0.007610353 3.055103771 0.006012895 0.007423823 0.039077013

    The coefficient of multiple determination for the regression model is in the top part of the table, under the Regression Statistics heading in the R Square row.  The value of the coefficient of multiple determination is [latex]R^2=0.5066[/latex].

  2. [latex]50.66\%[/latex] of the variation in the job satisfaction score can be explained by the regression model based on the independent variables “hours of unpaid work per week,” “age,” and “income.”

Adjusted Coefficient of Multiple Determination

The value of the coefficient of multiple determination always increases as more independent variables are added to the model, even if the new independent variable has no relationship with the dependent variable.  The coefficient of multiple determination is an inflated value when additional independent variables do not add any significant information to the dependent variable.  Consequently, the coefficient of multiple determination is an overestimate of the contribution of the independent variables when new independent variables are added to the model.

Instead, we use the adjusted coefficient of multiple determination, denoted [latex]\text{adjusted}\;R^2[/latex], which corrects the overestimation of the coefficient of multiple determination when new independent variables are added to the model.  The adjusted coefficient of multiple determination is interpreted in the same way as the coefficient of multiple determination.  The adjusted coefficient of multiple determination adjusts the value of [latex]R^2[/latex] to account for the number of independent variables in the model in order to avoid overestimating the impact of adding independent variables to the model.

The adjusted coefficient of multiple determination is calculated from the value of [latex]R^2[/latex]:

[latex]\displaystyle{\text{adjusted}\;R^2=1-\left(\frac{(n-1)\times(1-R^2)}{n-k-1}\right)}[/latex]

where [latex]n[/latex] is the number of observations and [latex]k[/latex] is the number of independent variables.  Although we can find the value of the adjusted coefficient of multiple determination using the above formula, the value of the coefficient of multiple determination is found on the regression summary table.

EXAMPLE

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income. A sample of [latex]25[/latex] employees at the company is taken, and the data is recorded in the table below. The employee’s income is recorded in [latex]\$1000[/latex]s, and the job satisfaction score is out of [latex]10[/latex], with higher values indicating greater job satisfaction.

Job Satisfaction Hours of Unpaid Work per Week Age Income ([latex]\$1000[/latex]s)
4 3 23 60
5 8 32 114
2 9 28 45
6 4 60 187
7 3 62 175
8 1 43 125
7 6 60 93
3 3 37 57
5 2 24 47
5 5 64 128
7 2 28 66
8 1 66 146
5 7 35 89
2 5 37 56
4 0 59 65
6 2 32 95
5 6 76 82
7 5 25 90
9 0 55 137
8 3 34 91
7 5 54 184
9 1 57 60
7 0 68 39
10 2 66 187
5 0 50 49

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*}\hat{y}&=&4.7993-0.3818x_1+0.0046x_2+0.0233x_3\\\\\hat{y}&=&\text{predicted job satisfaction score}\\x_1&=&\text{hours of unpaid work per week}\\x_2&=&\text{age}\\x_3&=&\text{income (\$1000s)}\end{eqnarray*}[/latex]

  1. Find the adjusted coefficient of multiple determination.
  2. Interpret the adjusted coefficient of multiple determination.

Solution

  1. The regression summary table generated by Excel is shown below:
    SUMMARY OUTPUT
    Regression Statistics
    Multiple R 0.711779225
    R Square 0.506629665
    Adjusted R Square 0.436148189
    Standard Error 1.585212784
    Observations 25
    ANOVA
    df SS MS F Significance F
    Regression 3 54.189109 18.06303633 7.18812504 0.001683189
    Residual 21 52.770891 2.512899571
    Total 24 106.96
    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
    Intercept 4.799258185 1.197185164 4.008785216 0.00063622 2.309575344 7.288941027
    Hours of Unpaid Work per Week -0.38184722 0.130750479 -2.9204269 0.008177146 -0.65375772 -0.10993671
    Age 0.004555815 0.022855709 0.199329423 0.843922453 -0.04297523 0.052086864
    Income ([latex]\$1000[/latex]s) 0.023250418 0.007610353 3.055103771 0.006012895 0.007423823 0.039077013

    The adjusted coefficient of multiple determination for the regression model is in the top part of the table, under the Regression Statistics heading in the Adjusted R Square row.  The value of the adjusted coefficient of multiple determination is [latex]\text{adjusted}\;R^2=0.4361[/latex].

  2. [latex]43.61\%[/latex] of the variation in the job satisfaction score can be explained by the regression model based on the independent variables “hours of unpaid work per week,” “age,” and “income.”

If the addition of a new independent variable increases the value of the adjusted coefficient of multiple determination, then this is an indication that the regression model has improved as a result of adding the new independent variable.  But, if the addition of a new independent variable decreases the value of the adjusted coefficient of multiple determination, then the added independent variable has not improved the overall regression model.  In such cases, the new independent variable should not be added to the model.


Exercises

  1. A local restaurant advocacy group wants to study the relationship between a restaurant’s average weekly profit, the restaurant’s seating capacity, and the average daily traffic that passes the restaurant’s location.  The group took a sample of restaurants and recorded their average weekly profit (in [latex]\$1000[/latex]s), the seating restaurant’s seating capacity, and the average number of cars (in [latex]1000[/latex]s) that passes the restaurant’s location.  The data is recorded in the following table:
    Seating Capacity Traffic Count ([latex]1000[/latex]s) Weekly Net Profit ([latex]\$1000[/latex]s)
    120 19 23.8
    180 8 29.2
    150 12 22
    180 15 26.2
    220 16 33.5
    235 10 32
    115 18 22.4
    110 12 20.4
    165 21 23.7
    220 20 34.7
    140 24 27.1
    145 24 23.3
    140 13 20.9
    200 14 29.6
    210 14 31.4
    175 12 23.2
    175 15 31.1
    190 17 28.2
    100 23 25.2
    145 20 20.7
    135 13 37.2
    25 13 26.3
    140 25 20
    130 14 28.2
    135 10 24.6
    160 23 23.7

    In Question 1 of Section 13.1, we found the regression model to predict the average weekly profit from other variables.

    1. Find the adjusted coefficient of determination for the regression model.
    2. Interpret the adjusted coefficient of determination.
    Click to see Answer
    1. [latex]0.2250[/latex]
    2. [latex]22.50\%[/latex] of the variation in the average weekly profit can be explained by the regression model based on seating capacity and traffic count.

     

  2. A local university wants to study the relationship between a student’s GPA, the average number of hours they spend studying each night, and the average number of nights they go out each week.  The university took a sample of students and recorded the following data:
    GPA Average Number of Hours Spent Studying Each Night Average Number of Nights Go Out Each Week
    3.72 5 1
    3.88 3 1
    3.67 2 1
    3.87 3 4
    2.49 1 4
    1.29 1 2
    1.01 2 4
    2.12 1 1
    1.9 1 5
    3.42 3 2
    1.33 1 4
    1.07 0 2
    2.75 3 1
    3.82 4 1
    3.91 5 0
    2.25 2 3
    2.06 1 5
    2.92 3 2
    3.06 3 1
    3.65 2 2
    3.69 4 1

    In Question 2 of Section 13.1, we found the regression model to predict GPA from other variables.

    1. Find the adjusted coefficient of determination for the regression model.
    2. Interpret the adjusted coefficient of determination for the regression model.
    Click to see Answer
    1. [latex]0.5833[/latex]
    2. [latex]58.33\%[/latex] of the variation in GPA can be explained by the regression model based on the average number of hours spent studying a night and the average number of nights a student goes out each week.

     

  3. A very large company wants to study the relationship between the salaries of employees in management positions, their age, the number of years the employee spent in college, and the number of years the employee has been with the company.  A sample of management employees is taken, and the data is recorded below:
    Age Years of College Years with Company Salary ([latex]\$1000[/latex]s)
    60 8 29 317.3
    33 3 5 97.3
    57 6 27 263.1
    32 4 5 101.3
    31 6 3 114.2
    61 8 19 350.4
    41 7 8 146.9
    35 4 2 91.7
    51 6 21 198.2
    50 8 10 196.5
    57 5 15 105.7
    49 6 18 118.3
    62 7 27 305.2
    52 8 26 239.9
    39 4 8 145.9
    42 7 5 175.4
    62 4 24 219.4
    60 4 22 202.1
    65 3 21 196.3
    40 4 10 143.9
    62 6 29 408.7
    53 7 5 145.2
    48 8 5 175.1
    61 5 6 152.7
    38 7 3 99.7
    40 7 12 174.9
    45 7 7 149.2
    58 7 14 282.8
    38 4 3 95.7
    41 5 18 232.8

    In Question 3 of Section 13.1, we found the regression model to predict salary from other variables.

    1. Find the adjusted coefficient of determination for the regression model.
    2. Interpret the adjusted coefficient of determination.
    Click to see Answer
    1. [latex]0.6959[/latex]
    2. [latex]69.59\%[/latex] of the variation in salary can be explained by the regression model based on age, years of college, and years with the company.

     


13.4 Coefficient of Multiple Determination” and “13.8 Exercises” from Introduction to Statistics by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics - Second Edition Copyright © 2025 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book