13.3 Standard Error of the Estimate

Valerie Watts

13.3 Standard Error of the Estimate

LEARNING OBJECTIVES

Calculate and interpret the standard error of the estimate for multiple regression.

The difference between the actual value of the dependent variable [latex]y[/latex] (in the sample date) and the predicted value of the dependent variable [latex]\hat{y}[/latex] obtained from the multiple regression model is called the error or residual.

[latex]\begin{eqnarray*} \mbox{Error} & = & \mbox{Actual Value}-\mbox{Predicted Value}\end{eqnarray*}[/latex]

For the simple linear regression model, the standard error of the estimate measures the average vertical distance (the error) between the points on the scatter diagram and the regression line.

The image shows a scatter diagram and the line of best fit. Vertical lines are drawn from points on the scatter diagram to the line of best fit. The length of the vertical line is the absolute value of the error.

The standard error of the estimate, denoted [latex]s_e[/latex], is a measure of the standard deviation of the errors in a regression model. The standard error of the estimate is a measure of the average deviation of the errors, the difference between the [latex]\hat{y}[/latex]-values predicted by the multiple regression model and the [latex]y[/latex]-values in the sample. The standard error of the estimate for the regression model is the standard deviation of the errors/residuals.

The value of [latex]s_e[/latex] tells us, on average, how much the dependent variable differs from the regression model based on the independent variables. When interpreting the standard error of the estimate, remember to be specific to the question, using the actual names of the dependent and independent variables, and include appropriate units. The units of the standard error of the estimate are the same as the units of the dependent variable.

The value of the standard error of the estimate for the regression model can be found on the regression summary table, which we learned how to generate in Excel in the previous section.

EXAMPLE

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income. A sample of 25 employees at the company is taken and the data is recorded in the table below. The employee’s income is recorded in $1000s and the job satisfaction score is out of 10, with higher values indicating greater job satisfaction.

Job Satisfaction	Hours of Unpaid Work per Week	Age	Income ($1000s)
4	3	23	60
5	8	32	114
2	9	28	45
6	4	60	187
7	3	62	175
8	1	43	125
7	6	60	93
3	3	37	57
5	2	24	47
5	5	64	128
7	2	28	66
8	1	66	146
5	7	35	89
2	5	37	56
4	0	59	65
6	2	32	95
5	6	76	82
7	5	25	90
9	0	55	137
8	3	34	91
7	5	54	184
9	1	57	60
7	0	68	39
10	2	66	187
5	0	50	49

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*} \hat{y} & = & 4.7993-0.3818x_1+0.0046x_2+0.0233x_3 \\ \\ \hat{y} & = & \mbox{predicted job satisfaction score} \\ x_1 & = & \mbox{hours of unpaid work per week} \\ x_2 & = & \mbox{age} \\ x_3 & = & \mbox{income (\$1000s)}\end{eqnarray*}[/latex]

Find the standard error of the estimate.
Interpret the standard error of the estimate.

Solution:

The regression summary table generated by Excel is shown below:

SUMMARY OUTPUT

Regression Statistics
Multiple R	0.711779225
R Square	0.506629665
Adjusted R Square	0.436148189
Standard Error	1.585212784
Observations	25

ANOVA
	df	SS	MS	F	Significance F
Regression	3	54.189109	18.06303633	7.18812504	0.001683189
Residual	21	52.770891	2.512899571
Total	24	106.96

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	4.799258185	1.197185164	4.008785216	0.00063622	2.309575344	7.288941027
Hours of Unpaid Work per Week	-0.38184722	0.130750479	-2.9204269	0.008177146	-0.65375772	-0.10993671
Age	0.004555815	0.022855709	0.199329423	0.843922453	-0.04297523	0.052086864
Income ($1000s)	0.023250418	0.007610353	3.055103771	0.006012895	0.007423823	0.039077013

The standard error of the estimate for the regression models is in the top part of the table, under the Regression Statistics heading in the Standard Error row. The value of the standard error of the estimate is [latex]s_e=1.5852[/latex].

On average, the job satisfaction score is 1.5852 points away from the regression model based on the independent variables “hours of unpaid work per week,” “age,” and “income.”

NOTE

The standard error of the estimate for the regression model is located in the top part of the table under the Regression Statistics heading. You will notice another standard error column at the bottom in the rows corresponding to the independent variables. These standard errors in the bottom part of the table are not related to the standard error of the estimate. In fact, the standard errors in the independent variable rows are measures of the uncertainty around the estimate of the regression coefficient for each independent variable.

Concept Review

The standard error of the estimate, [latex]s_e[/latex], measures the average deviation of the errors of the regression model. The smaller the value of the standard error of the estimate, the better the fit of the regression model to the data.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License