13.4 Coefficient of Multiple Determination

LEARNING OBJECTIVES

  • Calculate and interpret the coefficient of multiple determination.

Previously, we learned about the coefficient of determination, [latex]r^2[/latex], for simple linear regression, which is the proportion of variation in the dependent variable that can be explained by the simple linear regression model based on the independent variable.  The coefficient of determination is a good way to measure how well the simple linear regression model fits the data.

Coefficient of Multiple Determination

The coefficient of multiple determination, denoted [latex]R^2[/latex], in multiple regression is similar to the coefficient of determination in simple linear regression, except in multiple regression there is more than one independent variable.  The coefficient of multiple determination is the proportion of variation in the dependent variable that can be explained by the multiple regression model based on the independent variables.

The value of the coefficient of multiple determination is found on the regression summary table, which we learned how to generate in Excel in a previous section.  We interpret the coefficient of multiple determination in the same way that we interpret the coefficient of determination for simple linear regression.

EXAMPLE

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income.  A sample of 25 employees at the company is taken and the data is recorded in the table below.  The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.

Job Satisfaction Hours of Unpaid Work per Week Age Income ($1000s)
4 3 23 60
5 8 32 114
2 9 28 45
6 4 60 187
7 3 62 175
8 1 43 125
7 6 60 93
3 3 37 57
5 2 24 47
5 5 64 128
7 2 28 66
8 1 66 146
5 7 35 89
2 5 37 56
4 0 59 65
6 2 32 95
5 6 76 82
7 5 25 90
9 0 55 137
8 3 34 91
7 5 54 184
9 1 57 60
7 0 68 39
10 2 66 187
5 0 50 49

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]

  1. Find the coefficient of multiple determination.
  2. Interpret the coefficient of multiple determination.

Solution: 

  1. The regression summary table generated by Excel is shown below:
    SUMMARY OUTPUT
    Regression Statistics
    Multiple R 0.711779225
    R Square 0.506629665
    Adjusted R Square 0.436148189
    Standard Error 1.585212784
    Observations 25
    ANOVA
    df SS MS F Significance F
    Regression 3 54.189109 18.06303633 7.18812504 0.001683189
    Residual 21 52.770891 2.512899571
    Total 24 106.96
    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
    Intercept 4.799258185 1.197185164 4.008785216 0.00063622 2.309575344 7.288941027
    Hours of Unpaid Work per Week -0.38184722 0.130750479 -2.9204269 0.008177146 -0.65375772 -0.10993671
    Age 0.004555815 0.022855709 0.199329423 0.843922453 -0.04297523 0.052086864
    Income ($1000s) 0.023250418 0.007610353 3.055103771 0.006012895 0.007423823 0.039077013

    The coefficient of multiple determination for the regression model is in the top part of the table, under the Regression Statistics heading in the R Square row.  The value of the coefficient of multiple determination is [latex]R^2=0.5066[/latex].

  2. 50.66% of the variation in the job satisfaction score can be explained by the regression model based on the independent variables “hours of unpaid work per week,” “age,” and “income.”

Adjusted Coefficient of Multiple Determination

The value of the coefficient of multiple determination always increases as more independent variables are added to the model, even if the new independent variable has no relationship with the dependent variable.  The coefficient of multiple determination is an inflated value when additional independent variables do not add any significant information to the dependent variable.  Consequently, the coefficient of multiple determination is an overestimate of the contribution of the independent variables when new independent variables are added to the model.

Instead, we use the adjusted coefficient of multiple determination, denoted [latex]adjusted \; R^2[/latex], which corrects the overestimation of the coefficient of multiple determination when new independent variables are added to the model.  The adjusted coefficient of multiple determination is interpreted in the same way as the coefficient of multiple determination.  The adjusted coefficient of multiple determination adjusts the value of [latex]R^2[/latex] to account for the number of independent variables in the model in order to avoid overestimating the impact of adding independent variables to the model.

The adjusted coefficient of multiple determination is calculated from the value of [latex]R^2[/latex]:

[latex]\displaystyle{adjusted \; R^2  =  1-\left( \frac{(n-1) \times (1-R^2)}{n-k-1}\right)}[/latex]

where [latex]n[/latex] is the number of observations and [latex]k[/latex] is the number of independent variables.  Although we can find the value of the adjusted coefficient of multiple determination using the above formula, the value of the coefficient of multiple determination is found on the regression summary table.

EXAMPLE

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income.  A sample of 25 employees at the company is taken and the data is recorded in the table below.  The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.

Job Satisfaction Hours of Unpaid Work per Week Age Income ($1000s)
4 3 23 60
5 8 32 114
2 9 28 45
6 4 60 187
7 3 62 175
8 1 43 125
7 6 60 93
3 3 37 57
5 2 24 47
5 5 64 128
7 2 28 66
8 1 66 146
5 7 35 89
2 5 37 56
4 0 59 65
6 2 32 95
5 6 76 82
7 5 25 90
9 0 55 137
8 3 34 91
7 5 54 184
9 1 57 60
7 0 68 39
10 2 66 187
5 0 50 49

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]

  1. Find the adjusted coefficient of multiple determination.
  2. Interpret the adjusted coefficient of multiple determination.

Solution: 

  1. The regression summary table generated by Excel is shown below:
    SUMMARY OUTPUT
    Regression Statistics
    Multiple R 0.711779225
    R Square 0.506629665
    Adjusted R Square 0.436148189
    Standard Error 1.585212784
    Observations 25
    ANOVA
    df SS MS F Significance F
    Regression 3 54.189109 18.06303633 7.18812504 0.001683189
    Residual 21 52.770891 2.512899571
    Total 24 106.96
    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
    Intercept 4.799258185 1.197185164 4.008785216 0.00063622 2.309575344 7.288941027
    Hours of Unpaid Work per Week -0.38184722 0.130750479 -2.9204269 0.008177146 -0.65375772 -0.10993671
    Age 0.004555815 0.022855709 0.199329423 0.843922453 -0.04297523 0.052086864
    Income ($1000s) 0.023250418 0.007610353 3.055103771 0.006012895 0.007423823 0.039077013

    The adjusted coefficient of multiple determination for the regression model is in the top part of the table, under the Regression Statistics heading in the Adjusted R Square row.  The value of the adjusted coefficient of multiple determination is [latex]adjusted \; R^2=0.4361[/latex].

  2. 43.61% of the variation in the job satisfaction score can be explained by the regression model based on the independent variables “hours of unpaid work per week,” “age,” and “income.”

If the addition of a new independent variable increases the value of the adjusted coefficient of multiple determination, then it is an indication that the regression model has improved as a result of adding the new independent variable.  But, if the addition of a new independent variable decreases the value of the adjusted coefficient of multiple determination, then the added independent variable has not improved the overall regression model.  In such cases, the new independent variable should not be added to the model.


Concept Review

The coefficient of multiple determination, [latex]R^2[/latex], is the proportion of variation in the dependent variable that can be explained by the multiple regression model based on the independent variables.  However, the addition of more independent variables into the model always causes the value of [latex]R^2[/latex] to increase, whether or not the added independent variables are actually related to the dependent variable.  Instead, the adjusted coefficient of multiple determination, [latex]adjusted \; R^2[/latex], corrects for the overestimation of [latex]R^2[/latex] when new independent variables are added to the model.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.