13.6 Testing the Regression Coefficients
LEARNING OBJECTIVES
- Conduct and interpret a hypothesis test on individual regression coefficients.
Previously, we learned that the population model for the multiple regression equation is
[latex]\begin{eqnarray*} y & = & \beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_kx_k +\epsilon \end{eqnarray*}[/latex]
where [latex]x_1,x_2,\ldots,x_k[/latex] are the independent variables, [latex]\beta_0,\beta_1,\ldots,\beta_k[/latex] are the population parameters of the regression coefficients, and [latex]\epsilon[/latex] is the error variable. In multiple regression, we estimate each population regression coefficient [latex]\beta_i[/latex] with the sample regression coefficient [latex]b_i[/latex].
In the previous section, we learned how to conduct an overall model test to determine if the regression model is valid. If the outcome of the overall model test is that the model is valid, then at least one of the independent variables is related to the dependent variable—in other words, at least one of the regression coefficients [latex]\beta_i[/latex] is not zero. However, the overall model test does not tell us which independent variables are related to the dependent variable. To determine which independent variables are related to the dependent variable, we must test each of the regression coefficients.
Testing the Regression Coefficients
For an individual regression coefficient, we want to test if there is a relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex].
- No Relationship. There is no relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex]. In this case, the regression coefficient [latex]\beta_i[/latex] is zero. This is the claim for the null hypothesis in an individual regression coefficient test: [latex]H_0: \beta_i=0[/latex].
- Relationship. There is a relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex]. In this case, the regression coefficients [latex]\beta_i[/latex] is not zero. This is the claim for the alternative hypothesis in an individual regression coefficient test: [latex]H_a: \beta_i \neq 0[/latex]. We are not interested if the regression coefficient [latex]\beta_i[/latex] is positive or negative, only that it is not zero. We only need to find out if the regression coefficient is not zero to demonstrate that there is a relationship between the dependent variable and the independent variable. This makes the test on a regression coefficient a two-tailed test.
In order to conduct a hypothesis test on an individual regression coefficient [latex]\beta_i[/latex], we need to use the distribution of the sample regression coefficient [latex]b_i[/latex]:
- The mean of the distribution of the sample regression coefficient is the population regression coefficient [latex]\beta_i[/latex].
- The standard deviation of the distribution of the sample regression coefficient is [latex]\sigma_{b_i}[/latex]. Because we do not know the population standard deviation we must estimate [latex]\sigma_{b_i}[/latex] with the sample standard deviation [latex]s_{b_i}[/latex].
- The distribution of the sample regression coefficient follows a normal distribution.
Steps to Conduct a Hypothesis Test on a Regression Coefficient
- Write down the null hypothesis that there is no relationship between the dependent variable [latex]y[/latex] and the independent variable [latex]x_i[/latex]:
[latex]\begin{eqnarray*} H_0: & & \beta_i=0 \\ \\ \end{eqnarray*}[/latex]
- Write down the alternative hypotheses that is a relationship between the dependent variable [latex]y[/latex]and the independent variable [latex]x_i[/latex]:
[latex]\begin{eqnarray*} H_a: & & \beta_i \neq 0 \\ \\ \end{eqnarray*}[/latex]
- Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].
- The p-value is the sum of the area in the tails of the [latex]t[/latex]-distribution. The [latex]t[/latex]-score and degrees of freedom are
[latex]\begin{eqnarray*}t & = & \frac{b_i-\beta_i}{s_{b_i}} \\ \\ df & = & n-k-1 \\ \\ \end{eqnarray*}[/latex]
- Compare the p-value to the significance level and state the outcome of the test:
- If p-value[latex]\leq \alpha[/latex], reject [latex]H_0[/latex] in favour of [latex]H_a[/latex].
- The results of the sample data are significant. There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
- If p-value[latex]\gt \alpha[/latex], do not reject [latex]H_0[/latex].
- The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
- If p-value[latex]\leq \alpha[/latex], reject [latex]H_0[/latex] in favour of [latex]H_a[/latex].
- Write down a concluding sentence specific to the context of the question.
The required [latex]t[/latex]-score and p-value for the test can be found on the regression summary table, which we learned how to generate in Excel in a previous section.
EXAMPLE
The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income. A sample of 25 employees at the company is taken and the data is recorded in the table below. The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.
Job Satisfaction | Hours of Unpaid Work per Week | Age | Income ($1000s) |
4 | 3 | 23 | 60 |
5 | 8 | 32 | 114 |
2 | 9 | 28 | 45 |
6 | 4 | 60 | 187 |
7 | 3 | 62 | 175 |
8 | 1 | 43 | 125 |
7 | 6 | 60 | 93 |
3 | 3 | 37 | 57 |
5 | 2 | 24 | 47 |
5 | 5 | 64 | 128 |
7 | 2 | 28 | 66 |
8 | 1 | 66 | 146 |
5 | 7 | 35 | 89 |
2 | 5 | 37 | 56 |
4 | 0 | 59 | 65 |
6 | 2 | 32 | 95 |
5 | 6 | 76 | 82 |
7 | 5 | 25 | 90 |
9 | 0 | 55 | 137 |
8 | 3 | 34 | 91 |
7 | 5 | 54 | 184 |
9 | 1 | 57 | 60 |
7 | 0 | 68 | 39 |
10 | 2 | 66 | 187 |
5 | 0 | 50 | 49 |
Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:
[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]
At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week”.
Solution:
Hypotheses:
[latex]\begin{eqnarray*} H_0: & & \beta_1=0 \\ H_a: & & \beta_1 \neq 0 \end{eqnarray*}[/latex]
p-value:
The regression summary table generated by Excel is shown below:
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.711779225 | |||||
R Square | 0.506629665 | |||||
Adjusted R Square | 0.436148189 | |||||
Standard Error | 1.585212784 | |||||
Observations | 25 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 3 | 54.189109 | 18.06303633 | 7.18812504 | 0.001683189 | |
Residual | 21 | 52.770891 | 2.512899571 | |||
Total | 24 | 106.96 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 4.799258185 | 1.197185164 | 4.008785216 | 0.00063622 | 2.309575344 | 7.288941027 |
Hours of Unpaid Work per Week | -0.38184722 | 0.130750479 | -2.9204269 | 0.008177146 | -0.65375772 | -0.10993671 |
Age | 0.004555815 | 0.022855709 | 0.199329423 | 0.843922453 | -0.04297523 | 0.052086864 |
Income ($1000s) | 0.023250418 | 0.007610353 | 3.055103771 | 0.006012895 | 0.007423823 | 0.039077013 |
The p-value for the test on the hours of unpaid work per week regression coefficient is in the bottom part of the table under the P-value column of the Hours of Unpaid Work per Week row. So the p-value=[latex]0.0082[/latex].
Conclusion:
Because p-value[latex]=0.0082 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 5% significance level there is enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week.”
NOTES
- The null hypothesis [latex]\beta_1=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_1[/latex] is zero. That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “hours of unpaid work per week.”
- The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_1[/latex] is not zero. The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “hours of unpaid work per week.”
- When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested. Here the subscript on [latex]\beta[/latex] is 1 because the “hours of unpaid work per week” is defined as [latex]x_1[/latex] in the regression model.
- The p-value for the tests on the regression coefficients are located in the bottom part of the table under the P-value column heading in the corresponding independent variable row.
- Because the alternative hypothesis is a [latex]\neq[/latex], the p-value is the sum of the area in the tails of the [latex]t[/latex]-distribution. This is the value calculated out by Excel in the regression summary table.
- The p-value of 0.0082 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis. In other words, the regression coefficient [latex]\beta_1[/latex] is not zero, and so there is a relationship between the dependent variable “job satisfaction” and the independent variable “hours of unpaid work per week.” This means that the independent variable “hours of unpaid work per week” is useful in predicting the dependent variable.
EXAMPLE
The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income. A sample of 25 employees at the company is taken and the data is recorded in the table below. The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.
Job Satisfaction | Hours of Unpaid Work per Week | Age | Income ($1000s) |
4 | 3 | 23 | 60 |
5 | 8 | 32 | 114 |
2 | 9 | 28 | 45 |
6 | 4 | 60 | 187 |
7 | 3 | 62 | 175 |
8 | 1 | 43 | 125 |
7 | 6 | 60 | 93 |
3 | 3 | 37 | 57 |
5 | 2 | 24 | 47 |
5 | 5 | 64 | 128 |
7 | 2 | 28 | 66 |
8 | 1 | 66 | 146 |
5 | 7 | 35 | 89 |
2 | 5 | 37 | 56 |
4 | 0 | 59 | 65 |
6 | 2 | 32 | 95 |
5 | 6 | 76 | 82 |
7 | 5 | 25 | 90 |
9 | 0 | 55 | 137 |
8 | 3 | 34 | 91 |
7 | 5 | 54 | 184 |
9 | 1 | 57 | 60 |
7 | 0 | 68 | 39 |
10 | 2 | 66 | 187 |
5 | 0 | 50 | 49 |
Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:
[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]
At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “age”.
Solution:
Hypotheses:
[latex]\begin{eqnarray*} H_0: & & \beta_2=0 \\ H_a: & & \beta_2 \neq 0 \end{eqnarray*}[/latex]
p-value:
The regression summary table generated by Excel is shown below:
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.711779225 | |||||
R Square | 0.506629665 | |||||
Adjusted R Square | 0.436148189 | |||||
Standard Error | 1.585212784 | |||||
Observations | 25 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 3 | 54.189109 | 18.06303633 | 7.18812504 | 0.001683189 | |
Residual | 21 | 52.770891 | 2.512899571 | |||
Total | 24 | 106.96 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 4.799258185 | 1.197185164 | 4.008785216 | 0.00063622 | 2.309575344 | 7.288941027 |
Hours of Unpaid Work per Week | -0.38184722 | 0.130750479 | -2.9204269 | 0.008177146 | -0.65375772 | -0.10993671 |
Age | 0.004555815 | 0.022855709 | 0.199329423 | 0.843922453 | -0.04297523 | 0.052086864 |
Income ($1000s) | 0.023250418 | 0.007610353 | 3.055103771 | 0.006012895 | 0.007423823 | 0.039077013 |
The p-value for the test on the age regression coefficient is in the bottom part of the table under the P-value column of the Age row. So the p-value=[latex]0.8439[/latex].
Conclusion:
Because p-value[latex]=0.8439 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis. At the 5% significance level there is not enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “age.”
NOTES
- The null hypothesis [latex]\beta_2=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_2[/latex] is zero. That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “age.”
- The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_2[/latex] is not zero. The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “age.”
- When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested. Here the subscript on [latex]\beta[/latex] is 2 because “age” is defined as [latex]x_2[/latex] in the regression model.
- The p-value of 0.8439 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis. In other words, the regression coefficient [latex]\beta_2[/latex] is zero, and so there is no relationship between the dependent variable “job satisfaction” and the independent variable “age.” This means that the independent variable “age” is not particularly useful in predicting the dependent variable.
EXAMPLE
The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income. A sample of 25 employees at the company is taken and the data is recorded in the table below. The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.
Job Satisfaction | Hours of Unpaid Work per Week | Age | Income ($1000s) |
4 | 3 | 23 | 60 |
5 | 8 | 32 | 114 |
2 | 9 | 28 | 45 |
6 | 4 | 60 | 187 |
7 | 3 | 62 | 175 |
8 | 1 | 43 | 125 |
7 | 6 | 60 | 93 |
3 | 3 | 37 | 57 |
5 | 2 | 24 | 47 |
5 | 5 | 64 | 128 |
7 | 2 | 28 | 66 |
8 | 1 | 66 | 146 |
5 | 7 | 35 | 89 |
2 | 5 | 37 | 56 |
4 | 0 | 59 | 65 |
6 | 2 | 32 | 95 |
5 | 6 | 76 | 82 |
7 | 5 | 25 | 90 |
9 | 0 | 55 | 137 |
8 | 3 | 34 | 91 |
7 | 5 | 54 | 184 |
9 | 1 | 57 | 60 |
7 | 0 | 68 | 39 |
10 | 2 | 66 | 187 |
5 | 0 | 50 | 49 |
Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:
[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]
At the 5% significance level, test the relationship between the dependent variable “job satisfaction” and the independent variable “income”.
Solution:
Hypotheses:
[latex]\begin{eqnarray*} H_0: & & \beta_3=0 \\ H_a: & & \beta_3 \neq 0 \end{eqnarray*}[/latex]
p-value:
The regression summary table generated by Excel is shown below:
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.711779225 | |||||
R Square | 0.506629665 | |||||
Adjusted R Square | 0.436148189 | |||||
Standard Error | 1.585212784 | |||||
Observations | 25 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 3 | 54.189109 | 18.06303633 | 7.18812504 | 0.001683189 | |
Residual | 21 | 52.770891 | 2.512899571 | |||
Total | 24 | 106.96 | ||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | 4.799258185 | 1.197185164 | 4.008785216 | 0.00063622 | 2.309575344 | 7.288941027 |
Hours of Unpaid Work per Week | -0.38184722 | 0.130750479 | -2.9204269 | 0.008177146 | -0.65375772 | -0.10993671 |
Age | 0.004555815 | 0.022855709 | 0.199329423 | 0.843922453 | -0.04297523 | 0.052086864 |
Income ($1000s) | 0.023250418 | 0.007610353 | 3.055103771 | 0.006012895 | 0.007423823 | 0.039077013 |
The p-value for the test on the income regression coefficient is in the bottom part of the table under the P-value column of the Income row. So the p-value=[latex]0.0060[/latex].
Conclusion:
Because p-value[latex]=0.0060 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 5% significance level there is enough evidence to suggest that there is a relationship between the dependent variable “job satisfaction” and the independent variable “income.”
NOTES
- The null hypothesis [latex]\beta_3=0[/latex] is the claim that the regression coefficient for the independent variable [latex]x_3[/latex] is zero. That is, the null hypothesis is the claim that there is no relationship between the dependent variable and the independent variable “income.”
- The alternative hypothesis is the claim that the regression coefficient for the independent variable [latex]x_3[/latex] is not zero. The alternative hypothesis is the claim that there is a relationship between the dependent variable and the independent variable “income.”
- When conducting a test on a regression coefficient, make sure to use the correct subscript on [latex]\beta[/latex] to correspond to how the independent variables were defined in the regression model and which independent variable is being tested. Here the subscript on [latex]\beta[/latex] is 3 because “income” is defined as [latex]x_3[/latex] in the regression model.
- The p-value of 0.0060 is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis. In other words, the regression coefficient [latex]\beta_3[/latex] is not zero, and so there is a relationship between the dependent variable “job satisfaction” and the independent variable “income.” This means that the independent variable “income” is useful in predicting the dependent variable.
Concept Review
The test on a regression coefficient determines if there is a relationship between the dependent variable and the corresponding independent variable. The p-value for the test is the sum of the area in tails of the [latex]t[/latex]-distribution. The p-value can be found on the regression summary table generated by Excel.
The hypothesis test for a regression coefficient is a well established process:
- Write down the null and alternative hypotheses in terms of the regression coefficient being tested. The null hypothesis is the claim that there is no relationship between the dependent variable and independent variable. The alternative hypothesis is the claim that there is a relationship between the dependent variable and independent variable.
- Collect the sample information for the test and identify the significance level.
- The p-value is the sum of the area in the tails of the [latex]t[/latex]-distribution. Use the regression summary table generated by Excel to find the p-value.
- Compare the p-value to the significance level and state the outcome of the test.
- Write down a concluding sentence specific to the context of the question.