13.4 Coefficient of Multiple Determination
LEARNING OBJECTIVES
- Calculate and interpret the coefficient of multiple determination.
Previously, we learned about the coefficient of determination, [latex]r^2[/latex], for simple linear regression, which is the proportion of variation in the dependent variable that can be explained by the simple linear regression model based on the independent variable. The coefficient of determination is a good way to measure how well the simple linear regression model fits the data.
Coefficient of Multiple Determination
The coefficient of multiple determination, denoted [latex]R^2[/latex], in multiple regression is similar to the coefficient of determination in simple linear regression, except in multiple regression there is more than one independent variable. The coefficient of multiple determination is the proportion of variation in the dependent variable that can be explained by the multiple regression model based on the independent variables.
The value of the coefficient of multiple determination is found on the regression summary table, which we learned how to generate in Excel in a previous section. We interpret the coefficient of multiple determination in the same way that we interpret the coefficient of determination for simple linear regression.
EXAMPLE
The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income. A sample of 25 employees at the company is taken and the data is recorded in the table below. The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.
Job Satisfaction | Hours of Unpaid Work per Week | Age | Income ($1000s) |
4 | 3 | 23 | 60 |
5 | 8 | 32 | 114 |
2 | 9 | 28 | 45 |
6 | 4 | 60 | 187 |
7 | 3 | 62 | 175 |
8 | 1 | 43 | 125 |
7 | 6 | 60 | 93 |
3 | 3 | 37 | 57 |
5 | 2 | 24 | 47 |
5 | 5 | 64 | 128 |
7 | 2 | 28 | 66 |
8 | 1 | 66 | 146 |
5 | 7 | 35 | 89 |
2 | 5 | 37 | 56 |
4 | 0 | 59 | 65 |
6 | 2 | 32 | 95 |
5 | 6 | 76 | 82 |
7 | 5 | 25 | 90 |
9 | 0 | 55 | 137 |
8 | 3 | 34 | 91 |
7 | 5 | 54 | 184 |
9 | 1 | 57 | 60 |
7 | 0 | 68 | 39 |
10 | 2 | 66 | 187 |
5 | 0 | 50 | 49 |
Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:
[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]
- Find the coefficient of multiple determination.
- Interpret the coefficient of multiple determination.
Solution:
- The regression summary table generated by Excel is shown below:
SUMMARY OUTPUT Regression Statistics Multiple R 0.711779225 R Square 0.506629665 Adjusted R Square 0.436148189 Standard Error 1.585212784 Observations 25 ANOVA df SS MS F Significance F Regression 3 54.189109 18.06303633 7.18812504 0.001683189 Residual 21 52.770891 2.512899571 Total 24 106.96 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 4.799258185 1.197185164 4.008785216 0.00063622 2.309575344 7.288941027 Hours of Unpaid Work per Week -0.38184722 0.130750479 -2.9204269 0.008177146 -0.65375772 -0.10993671 Age 0.004555815 0.022855709 0.199329423 0.843922453 -0.04297523 0.052086864 Income ($1000s) 0.023250418 0.007610353 3.055103771 0.006012895 0.007423823 0.039077013 The coefficient of multiple determination for the regression model is in the top part of the table, under the Regression Statistics heading in the R Square row. The value of the coefficient of multiple determination is [latex]R^2=0.5066[/latex].
- 50.66% of the variation in the job satisfaction score can be explained by the regression model based on the independent variables “hours of unpaid work per week,” “age,” and “income.”
Adjusted Coefficient of Multiple Determination
The value of the coefficient of multiple determination always increases as more independent variables are added to the model, even if the new independent variable has no relationship with the dependent variable. The coefficient of multiple determination is an inflated value when additional independent variables do not add any significant information to the dependent variable. Consequently, the coefficient of multiple determination is an overestimate of the contribution of the independent variables when new independent variables are added to the model.
Instead, we use the adjusted coefficient of multiple determination, denoted [latex]adjusted \; R^2[/latex], which corrects the overestimation of the coefficient of multiple determination when new independent variables are added to the model. The adjusted coefficient of multiple determination is interpreted in the same way as the coefficient of multiple determination. The adjusted coefficient of multiple determination adjusts the value of [latex]R^2[/latex] to account for the number of independent variables in the model in order to avoid overestimating the impact of adding independent variables to the model.
The adjusted coefficient of multiple determination is calculated from the value of [latex]R^2[/latex]:
[latex]\displaystyle{adjusted \; R^2 = 1-\left( \frac{(n-1) \times (1-R^2)}{n-k-1}\right)}[/latex]
where [latex]n[/latex] is the number of observations and [latex]k[/latex] is the number of independent variables. Although we can find the value of the adjusted coefficient of multiple determination using the above formula, the value of the coefficient of multiple determination is found on the regression summary table.
EXAMPLE
The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income. A sample of 25 employees at the company is taken and the data is recorded in the table below. The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.
Job Satisfaction | Hours of Unpaid Work per Week | Age | Income ($1000s) |
4 | 3 | 23 | 60 |
5 | 8 | 32 | 114 |
2 | 9 | 28 | 45 |
6 | 4 | 60 | 187 |
7 | 3 | 62 | 175 |
8 | 1 | 43 | 125 |
7 | 6 | 60 | 93 |
3 | 3 | 37 | 57 |
5 | 2 | 24 | 47 |
5 | 5 | 64 | 128 |
7 | 2 | 28 | 66 |
8 | 1 | 66 | 146 |
5 | 7 | 35 | 89 |
2 | 5 | 37 | 56 |
4 | 0 | 59 | 65 |
6 | 2 | 32 | 95 |
5 | 6 | 76 | 82 |
7 | 5 | 25 | 90 |
9 | 0 | 55 | 137 |
8 | 3 | 34 | 91 |
7 | 5 | 54 | 184 |
9 | 1 | 57 | 60 |
7 | 0 | 68 | 39 |
10 | 2 | 66 | 187 |
5 | 0 | 50 | 49 |
Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:
[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]
- Find the adjusted coefficient of multiple determination.
- Interpret the adjusted coefficient of multiple determination.
Solution:
- The regression summary table generated by Excel is shown below:
SUMMARY OUTPUT Regression Statistics Multiple R 0.711779225 R Square 0.506629665 Adjusted R Square 0.436148189 Standard Error 1.585212784 Observations 25 ANOVA df SS MS F Significance F Regression 3 54.189109 18.06303633 7.18812504 0.001683189 Residual 21 52.770891 2.512899571 Total 24 106.96 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 4.799258185 1.197185164 4.008785216 0.00063622 2.309575344 7.288941027 Hours of Unpaid Work per Week -0.38184722 0.130750479 -2.9204269 0.008177146 -0.65375772 -0.10993671 Age 0.004555815 0.022855709 0.199329423 0.843922453 -0.04297523 0.052086864 Income ($1000s) 0.023250418 0.007610353 3.055103771 0.006012895 0.007423823 0.039077013 The adjusted coefficient of multiple determination for the regression model is in the top part of the table, under the Regression Statistics heading in the Adjusted R Square row. The value of the adjusted coefficient of multiple determination is [latex]adjusted \; R^2=0.4361[/latex].
- 43.61% of the variation in the job satisfaction score can be explained by the regression model based on the independent variables “hours of unpaid work per week,” “age,” and “income.”
If the addition of a new independent variable increases the value of the adjusted coefficient of multiple determination, then it is an indication that the regression model has improved as a result of adding the new independent variable. But, if the addition of a new independent variable decreases the value of the adjusted coefficient of multiple determination, then the added independent variable has not improved the overall regression model. In such cases, the new independent variable should not be added to the model.
Concept Review
The coefficient of multiple determination, [latex]R^2[/latex], is the proportion of variation in the dependent variable that can be explained by the multiple regression model based on the independent variables. However, the addition of more independent variables into the model always causes the value of [latex]R^2[/latex] to increase, whether or not the added independent variables are actually related to the dependent variable. Instead, the adjusted coefficient of multiple determination, [latex]adjusted \; R^2[/latex], corrects for the overestimation of [latex]R^2[/latex] when new independent variables are added to the model.