13.3 Standard Error of the Estimate

LEARNING OBJECTIVES

  • Calculate and interpret the standard error of the estimate for multiple regression.

The difference between the actual value of the dependent variable [latex]y[/latex] (in the sample date) and the predicted value of the dependent variable [latex]\hat{y}[/latex] obtained from the multiple regression model is called the error or residual.

[latex]\begin{eqnarray*} \mbox{Error} & = & \mbox{Actual Value}-\mbox{Predicted Value}\end{eqnarray*}[/latex]

For the simple linear regression model, the standard error of the estimate measures the average vertical distance (the error) between the points on the scatter diagram and the regression line.

The image shows a scatter diagram and the line of best fit. Vertical lines are drawn from points on the scatter diagram to the line of best fit. The length of the vertical line is the absolute value of the error.

The standard error of the estimate, denoted [latex]s_e[/latex], is a measure of the standard deviation of the errors in a regression model.  The standard error of the estimate is a measure of the average deviation of the errors, the difference between the [latex]\hat{y}[/latex]-values predicted by the multiple regression model and the [latex]y[/latex]-values in the sample.  The standard error of the estimate for the regression model is the standard deviation of the errors/residuals.

The value of [latex]s_e[/latex] tells us, on average, how much the dependent variable differs from the regression model based on the independent variables.  When interpreting the standard error of the estimate, remember to be specific to the question, using the actual names of the dependent and independent variables, and include appropriate units.  The units of the standard error of the estimate are the same as the units of the dependent variable.

The value of the standard error of the estimate for the regression model can be found on the regression summary table, which we learned how to generate in Excel in the previous section.

EXAMPLE

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income.  A sample of 25 employees at the company is taken and the data is recorded in the table below.  The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.

Job Satisfaction Hours of Unpaid Work per Week Age Income ($1000s)
4 3 23 60
5 8 32 114
2 9 28 45
6 4 60 187
7 3 62 175
8 1 43 125
7 6 60 93
3 3 37 57
5 2 24 47
5 5 64 128
7 2 28 66
8 1 66 146
5 7 35 89
2 5 37 56
4 0 59 65
6 2 32 95
5 6 76 82
7 5 25 90
9 0 55 137
8 3 34 91
7 5 54 184
9 1 57 60
7 0 68 39
10 2 66 187
5 0 50 49

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]

  1. Find the standard error of the estimate.
  2. Interpret the standard error of the estimate.

Solution: 

  1. The regression summary table generated by Excel is shown below:
    SUMMARY OUTPUT
    Regression Statistics
    Multiple R 0.711779225
    R Square 0.506629665
    Adjusted R Square 0.436148189
    Standard Error 1.585212784
    Observations 25
    ANOVA
    df SS MS F Significance F
    Regression 3 54.189109 18.06303633 7.18812504 0.001683189
    Residual 21 52.770891 2.512899571
    Total 24 106.96
    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
    Intercept 4.799258185 1.197185164 4.008785216 0.00063622 2.309575344 7.288941027
    Hours of Unpaid Work per Week -0.38184722 0.130750479 -2.9204269 0.008177146 -0.65375772 -0.10993671
    Age 0.004555815 0.022855709 0.199329423 0.843922453 -0.04297523 0.052086864
    Income ($1000s) 0.023250418 0.007610353 3.055103771 0.006012895 0.007423823 0.039077013

    The standard error of the estimate for the regression models is in the top part of the table, under the Regression Statistics heading in the Standard Error row.  The value of the standard error of the estimate is [latex]s_e=1.5852[/latex].

  2. On average, the job satisfaction score is 1.5852 points away from the regression model based on the independent variables “hours of unpaid work per week,” “age,” and “income.”

NOTE

The standard error of the estimate for the regression model is located in the top part of the table under the Regression Statistics heading.  You will notice another standard error column at the bottom in the rows corresponding to the independent variables.  These standard errors in the bottom part of the table are not related to the standard error of the estimate.  In fact, the standard errors in the independent variable rows are measures of the uncertainty around the estimate of the regression coefficient for each independent variable.


Concept Review

The standard error of the estimate, [latex]s_e[/latex], measures the average deviation of the errors of the regression model.  The smaller the value of the standard error of the estimate, the better the fit of the regression model to the data.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.