"

13.1 Multiple Regression

LEARNING OBJECTIVES

  • Develop a multiple regression model.
  • Use a multiple regression model to predict values of the dependent variable.
  • Interpret the partial regression coefficients.

Previously, we learned about simple linear regression, which models the linear relationship between one independent variable [latex]x[/latex] and one dependent variable [latex]y[/latex]. The equation for the regression line is:

[latex]\begin{eqnarray*}\hat{y}&=&b_0+b_1x\\\\\hat{y}&=&\text{predicted value of }y\\x&=&\text{value of the independent variable}\\b_0&=&y\text{-interecept of the line}\\b_1&=&\text{slope of the line}\end{eqnarray*}[/latex]

Multiple regression is an extension of simple linear regression where there is still only one dependent variable [latex]y[/latex] but two or more independent variables [latex]x_1,x_2,\ldots,x_k[/latex]. Multiple regression is motivated by scenarios where many independent variables may be simultaneously connected to a dependent variable. For example, the price of a product is related to the demand for the product, the time of year, and the competition’s price.

The equation for the multiple regression model is:

[latex]\begin{eqnarray*}\hat{y}&=&b_0+b_1x_1+b_2x_2+\cdots+b_kx_k\end{eqnarray*}[/latex]

where [latex]\hat{y}[/latex] is the predicted value of [latex]y[/latex], [latex]x_1,x_2,\ldots,x_k[/latex] are the independent variables, [latex]b_0,b_1,\ldots,b_k[/latex] are the regression coefficients, and [latex]k[/latex] is the number of independent variables.

We will use Excel to generate the values of the regression coefficients. However, unlike simple linear regression, where we used individual, built-in functions to find the slope and [latex]y[/latex]-intercept, we use a regression summary table to generate the values of the regression coefficients. As we will see, the regression summary table contains lots of information relating to the multiple regression model. For now, we will use the regression summary table to find the regression coefficients to create the multiple regression model. In later sections, we will learn about some of the other information contained in the regression summary table.

USING EXCEL TO CREATE A REGRESSION SUMMARY TABLE

In order to create a regression summary table, we need to use the Analysis ToolPak. Follow these instructions to add the Analysis ToolPak.

  1. Enter the data into an Excel worksheet.
  2. Go to the Data tab and click on Data Analysis. If you do not see Data Analysis in the Data tab, you will need to install the Analysis ToolPak.
  3. In the Data Analysis window, select Regression and then click OK.
  4. In the Input Y Range, enter the cell range for the [latex]y[/latex] (dependent variable) data.
  5. In the Input X Range, enter the cell range for the [latex]x[/latex] (independent variables) data.
  6. Click on Labels in the first row if you included the column headings in the input range.
  7. From the Output Options, select the location where you want the output to appear. The default is a new worksheet.
  8. Click OK. Excel will then generate a regression summary table.

NOTES

  1. For the Input X Range, the data for the independent variables must all be together. That is, the columns (or rows) containing the data for the independent variables must all be consecutive. If the column (or row) containing data for the dependent variable is in between two columns (or rows) containing independent variables, copy the dependent variable column and paste the dependent variable column at the beginning or end of the columns (or rows) of data. Make sure to delete the original dependent variable column/row after placing a copy at the beginning or end of the data.
  2. There are several other options available in the Regression input window, such as for confidence intervals or information about residuals. We will not need any of this other information, so leave everything else unchecked.

EXAMPLE

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income. A sample of [latex]25[/latex] employees at the company is taken, and the data is recorded in the table below. The employee’s income is recorded in [latex]\$1000[/latex]s, and the job satisfaction score is out of [latex]10[/latex], with higher values indicating greater job satisfaction. Develop a multiple regression model to predict the job satisfaction score from the other variables.

Job Satisfaction Hours of Unpaid Work per Week Age Income ([latex]\$1000[/latex]s)
4 3 23 60
5 8 32 114
2 9 28 45
6 4 60 187
7 3 62 175
8 1 43 125
7 6 60 93
3 3 37 57
5 2 24 47
5 5 64 128
7 2 28 66
8 1 66 146
5 7 35 89
2 5 37 56
4 0 59 65
6 2 32 95
5 6 76 82
7 5 25 90
9 0 55 137
8 3 34 91
7 5 54 184
9 1 57 60
7 0 68 39
10 2 66 187
5 0 50 49

Solution

There are three independent variables: hours of unpaid work per week, age, and income ([latex]\$1000[/latex]s). Let [latex]x_1[/latex] be the hours of unpaid work per week, let [latex]x_2[/latex] be age, and let [latex]x_3[/latex] be income ($1000s). The regression summary table generated by Excel is shown below:

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.711779225
R Square 0.506629665
Adjusted R Square 0.436148189
Standard Error 1.585212784
Observations 25
ANOVA
df SS MS F Significance F
Regression 3 54.189109 18.06303633 7.18812504 0.001683189
Residual 21 52.770891 2.512899571
Total 24 106.96
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 4.799258185 1.197185164 4.008785216 0.00063622 2.309575344 7.288941027
Hours of Unpaid Work per Week -0.38184722 0.130750479 -2.9204269 0.008177146 -0.65375772 -0.10993671
Age 0.004555815 0.022855709 0.199329423 0.843922453 -0.04297523 0.052086864
Income ([latex]\$1000[/latex]s) 0.023250418 0.007610353 3.055103771 0.006012895 0.007423823 0.039077013

The coefficients for the multiple regression model are in the Coefficients column in the bottom part of the table. The value of [latex]b_0[/latex] is in the Intercept row, so [latex]b_0=4.7993[/latex]. The value of [latex]b_1[/latex], the coefficient for [latex]x_1[/latex], is in the Hours of Unpaid Work per Week row, so [latex]b_1=-0.3818[/latex]. The value of [latex]b_2[/latex], the coefficient of [latex]x_2[/latex], is in the Age row, so [latex]b_2=0.0046[/latex]. The value of [latex]b_3[/latex], the coefficient of [latex]x_3[/latex], is in the Income row, so [latex]b_3=0.0233[/latex].

The multiple regression equation is

[latex]\begin{eqnarray*}\hat{y}&=&4.7993-0.3818x_1+0.0046x_2+0.0233x_3\\\\\hat{y}&=&\text{predicted job satisfaction score}\\x_1&=&\text{hours of unpaid work per week}\\x_2&=&\text{age}\\x_3&=&\text{income (\$1000s)}\end{eqnarray*}[/latex]

NOTES

  1. When writing down the multiple regression equation, remember to define what the variables represent in the context of the question. That is, state what [latex]\hat{y}[/latex] and the independent variables represent in relation to the question.
  2. A couple of the columns on the right side of the regression summary table generated by Excel where deleted in order to fit the table onto the page. These columns are not necessary for the work we will be doing.

Video: “Basic Excel Business Analytics #50: Introduction to Multiple Regression, Data Analysis Regression” by excelisfun [13:34] is licensed under the Standard YouTube License.Transcript and closed captions available on YouTube.


Regression Coefficients

Recall that the slope [latex]b_1[/latex] in the simple linear regression model [latex]\hat{y}=b_0+b_1x[/latex] tells us how the dependent variable [latex]y[/latex] changes for a single unit increase in the independent variable [latex]x[/latex]. In a similar way, each regression coefficient [latex]b_i[/latex] represents the change (increase or decrease) in the dependent variable for a one-unit increase in the corresponding independent variable [latex]x_i[/latex], while all the other variables are held constant. When interpreting a regression coefficient, it is important to be specific to the question, using the actual names of the variables and correct units.

EXAMPLE

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income. A sample of [latex]25[/latex] employees at the company is taken, and the data is recorded in the table below. The employee’s income is recorded in [latex]\$1000[/latex]s, and the job satisfaction score is out of [latex]10[/latex], with higher values indicating greater job satisfaction.

Job Satisfaction Hours of Unpaid Work per Week Age Income ([latex]\$1000[/latex]s)
4 3 23 60
5 8 32 114
2 9 28 45
6 4 60 187
7 3 62 175
8 1 43 125
7 6 60 93
3 3 37 57
5 2 24 47
5 5 64 128
7 2 28 66
8 1 66 146
5 7 35 89
2 5 37 56
4 0 59 65
6 2 32 95
5 6 76 82
7 5 25 90
9 0 55 137
8 3 34 91
7 5 54 184
9 1 57 60
7 0 68 39
10 2 66 187
5 0 50 49

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*}\hat{y}&=&4.7993-0.3818x_1+0.0046x_2+0.0233x_3\\\\\hat{y}&=&\text{predicted job satisfaction score}\\x_1&=&\text{hours of unpaid work per week}\\x_2&=&\text{age}\\x_3&=&\text{income (\$1000s)}\end{eqnarray*}[/latex]

  1. Interpret the regression coefficient for hours of unpaid work per week.
  2. Interpret the regression coefficient for age.
  3. Interpret the regression coefficient for income.

Solution

  1. [latex]b_1=-0.3818[/latex]. Interpretation: For a one-hour increase in the hours of unpaid work per week, the job satisfaction score decreases by [latex]0.3818[/latex], while the other variables are held constant.
  2. [latex]b_2=0.0046[/latex]. Interpretation: For a one-year increase in the age of the employee, the job satisfaction score increases by[latex]0.0046[/latex], while the other variables are held constant.
  3. [latex]b_3=0.0233[/latex]. Interpretation: For a [latex]\$1000[/latex] increase in income, the job satisfaction score increases by [latex]0.0233[/latex], while the other variables are held constant.

NOTES

  1. Remember to include “while the other variables are held constant” with the interpretation of each regression coefficient. We can only talk about how the change in one independent variable affects the dependent variable, so the values of the other variables must be kept fixed.
  2. When writing down the interpretation of each regression coefficient, remember to be specific to the question using the actual names of the independent and dependent variables and appropriate units.
  3. Each regression coefficient has the same units as the dependent variable.
  4. Income is measured in [latex]\$1000[/latex]s, so a one-unit increase in the income variable actually corresponds to a [latex]\$1000[/latex] increase in income.

Making Predictions with a Multiple Regression Model

As with simple linear regression, a multiple regression model can be used to make predictions about the dependent variable from specific values of the independent variables. To make a prediction, substitute the corresponding values of the independent variables into the multiple regression equation and calculate out the value of [latex]\hat{y}[/latex]. Watch out for the units of measurement for each variable when using the multiple regression equation—the units of the values entered into the independent variable [latex]x_i[/latex] in the multiple regression equation must match the units of the independent variable in the sample data.

EXAMPLE

The human resources department at a large company wants to develop a model to predict an employee’s job satisfaction from the number of hours of unpaid work per week the employee does, the employee’s age, and the employee’s income. A sample of [latex]25[/latex] employees at the company is taken, and the data is recorded in the table below. The employee’s income is recorded in [latex]\$1000[/latex]s, and the job satisfaction score is out of [latex]10[/latex], with higher values indicating greater job satisfaction.

Job Satisfaction Hours of Unpaid Work per Week Age Income ([latex]\$1000[/latex]s)
4 3 23 60
5 8 32 114
2 9 28 45
6 4 60 187
7 3 62 175
8 1 43 125
7 6 60 93
3 3 37 57
5 2 24 47
5 5 64 128
7 2 28 66
8 1 66 146
5 7 35 89
2 5 37 56
4 0 59 65
6 2 32 95
5 6 76 82
7 5 25 90
9 0 55 137
8 3 34 91
7 5 54 184
9 1 57 60
7 0 68 39
10 2 66 187
5 0 50 49

Previously, we found the multiple regression equation to predict the job satisfaction score from the other variables:

[latex]\begin{eqnarray*}\hat{y}&=&4.7993-0.3818x_1+0.0046x_2+0.0233x_3\\\\\hat{y}&=&\text{predicted job satisfaction score}\\x_1&=&\text{hours of unpaid work per week}\\x_2&=&\text{age}\\x_3&=&\text{income (\$1000s)}\end{eqnarray*}[/latex]

Predict the job satisfaction score for a 40-year-old employee who works two hours of unpaid work per week and has an income of [latex]\$75,000[/latex].

Solution

The values of the independent variables we need to enter into the multiple regression model are [latex]x_1=2[/latex], [latex]x_2=40[/latex], and [latex]x_3=75[/latex]:

[latex]\begin{eqnarray*}\hat{y}&=&4.7993-0.3818x_1+0.0046x_2+0.0233x_3\\&=&4.7993-0.3818\times 2+0.0046\times 40+0.0233\times 75\\&=&5.96\end{eqnarray*}[/latex]

The predicted job satisfaction score for a 40-year-old employee who works two hours of unpaid work per week and has an income of [latex]\$75,000[/latex] is [latex]5.96[/latex].

NOTES

  1. In the sample data, income is measured in [latex]\$1000[/latex]s. So an income of [latex]\$75,000[/latex] would be recorded as [latex]75[/latex] in the sample data. So, we enter [latex]75[/latex] for the value of [latex]x_3[/latex]
  2. To get the most accurate answer, use Excel to calculate out the value of [latex]\hat{y}[/latex], clicking on the corresponding cells containing the values of the coefficients in the regression summary sheet.

Assumptions about the Multiple Regression Model

The multiple regression model given above is the model we create from sample data—a sample is taken from the population, and the sample data is used to find the regression coefficients in the model. So the regression coefficients, [latex]b_0,b_1,\ldots,b_k[/latex], are estimates of the corresponding population parameters for the regression coefficients, [latex]\beta_0,\beta_1,\ldots,\beta_k[/latex].

The population model for the multiple regression equation is

[latex]\begin{eqnarray*}y&=&\beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_kx_k+\epsilon\end{eqnarray*}[/latex]

where [latex]x_1,x_2,\ldots,x_k[/latex] are the independent variables, [latex]\beta_0,\beta_1,\ldots,\beta_k[/latex] are the population parameters of the regression coefficients, and [latex]\epsilon[/latex] is the error variable. Because the independent variables do not account for all of the variability in the dependent variable [latex]y[/latex], the error variable [latex]\epsilon[/latex] captures the effects of variables other than the independent variables.

We must make certain assumptions about the regression model, in particular about the errors/residuals for the population data, in order to obtain valid conclusions about the multiple regression model. Recall that the error or residual is the difference between the actual value of [latex]y[/latex] in the sample data and the predicted value [latex]\hat{y}[/latex] from the model.  Because we do not have the population data to work with, we cannot verify if these conditions are met. However, much of regression analysis, including testing how well the data fit the model, depends on these assumptions being true.

Assumptions about the multiple regression model include:

  • The model is linear.
  • The errors/residuals have a normal distribution.
  • The mean of the errors/residuals is [latex]0[/latex].
  • The variance of the errors/residuals is constant.
  • The errors/residuals are independent.

Exercises

  1. A local restaurant advocacy group wants to study the relationship between a restaurant’s average weekly profit, the restaurant’s seating capacity, and the average daily traffic that passes the restaurant’s location. The group took a sample of restaurants and recorded their average weekly profit (in [latex]\$1000[/latex]s), the seating restaurant’s seating capacity, and the average number of cars (in [latex]1000[/latex]s) that passes the restaurant’s location. The data is recorded in the following table:
    Seating Capacity Traffic Count ([latex]1000[/latex]s) Weekly Net Profit ([latex]\$1000[/latex]s)
    120 19 23.8
    180 8 29.2
    150 12 22
    180 15 26.2
    220 16 33.5
    235 10 32
    115 18 22.4
    110 12 20.4
    165 21 23.7
    220 20 34.7
    140 24 27.1
    145 24 23.3
    140 13 20.9
    200 14 29.6
    210 14 31.4
    175 12 23.2
    175 15 31.1
    190 17 28.2
    100 23 25.2
    145 20 20.7
    135 13 37.2
    25 13 26.3
    140 25 20
    130 14 28.2
    135 10 24.6
    160 23 23.7
    1. Find the regression model to predict the average weekly profit from the other variables.
    2. Interpret the coefficient for seating capacity.
    3. Interpret the coefficient for traffic count.
    4. Predict the average weekly profit for a restaurant with a seating capacity of [latex]150[/latex] and a traffic count of [latex]25,000[/latex] cars.
    Click to see Answer
    1. [latex]\begin{eqnarray*}\hat{y}&=&21.989+0.046x_1-0.196x_2\\x_1&=&\text{seating capacity}\\x_2&=&\text{traffic count }(1000\text{s})\\\hat{y}&=&\text{average weekly profit }(\$1000\text{s})\end{eqnarray*}[/latex]
    2. For each additional seat in the restaurant, the average weekly profit increases by [latex]\$46[/latex].
    3. For each additional [latex]1000[/latex] cars that pass the restaurant, the average weekly profit decreases by [latex]\$196[/latex].
    4. [latex]\$24,519.20[/latex]

     

  2. A local university wants to study the relationship between a student’s GPA, the average number of hours they spend studying each night, and the average number of nights they go out each week. The university took a sample of students and recorded the following data:
    GPA Average Number of Hours Spent Studying Each Night Average Number of Nights Go Out Each Week
    3.72 5 1
    3.88 3 1
    3.67 2 1
    3.87 3 4
    2.49 1 4
    1.29 1 2
    1.01 2 4
    2.12 1 1
    1.9 1 5
    3.42 3 2
    1.33 1 4
    1.07 0 2
    2.75 3 1
    3.82 4 1
    3.91 5 0
    2.25 2 3
    2.06 1 5
    2.92 3 2
    3.06 3 1
    3.65 2 2
    3.69 4 1
    1. Find the regression model to predict GPA from the other variables.
    2. Interpret the coefficient for the average number of hours spent studying each night.
    3. Interpret the coefficient for the average number of nights a student goes out each week.
    4. Predict the GPA for a student who spends an average of [latex]4[/latex] hours a night studying and goes out an average of [latex]3[/latex] nights a week.
    Click to see Answer
    1. [latex]\begin{eqnarray*}\hat{y}&=&1.692+0.524x_1-0.082x_2\\x_1&=&\text{average number of hours spent studying a night}\\x_2&=&\text{average number of nights go out each week}\\\hat{y}&=&\text{GPA}\end{eqnarray*}[/latex]
    2. For each additional hour spent studying each night, the student’s GPA increases by [latex]0.524[/latex].
    3. For each additional hour a student goes out each week, the student’s GPA decreases by [latex]0.082[/latex].
    4. [latex]3.54[/latex]

     

  3. A very large company wants to study the relationship between the salaries of employees in management positions, their age, the number of years the employee spent in college, and the number of years the employee has been with the company. A sample of management employees is taken, and the data is recorded below:
    Age Years of College Years with Company Salary ([latex]\$1000[/latex]s)
    60 8 29 317.3
    33 3 5 97.3
    57 6 27 263.1
    32 4 5 101.3
    31 6 3 114.2
    61 8 19 350.4
    41 7 8 146.9
    35 4 2 91.7
    51 6 21 198.2
    50 8 10 196.5
    57 5 15 105.7
    49 6 18 118.3
    62 7 27 305.2
    52 8 26 239.9
    39 4 8 145.9
    42 7 5 175.4
    62 4 24 219.4
    60 4 22 202.1
    65 3 21 196.3
    40 4 10 143.9
    62 6 29 408.7
    53 7 5 145.2
    48 8 5 175.1
    61 5 6 152.7
    38 7 3 99.7
    40 7 12 174.9
    45 7 7 149.2
    58 7 14 282.8
    38 4 3 95.7
    41 5 18 232.8
    1. Find the regression model to predict salary from the other variables.
    2. Interpret the coefficient for age.
    3. Interpret the coefficient for years of college.
    4. Interpret the coefficient for years with the company.
    5. Predict the salary for a 47-year-old management employee who spent [latex]5[/latex] years in college and has been with the company for [latex]15[/latex] years.
    Click to see Answer
    1. [latex]\begin{eqnarray*}\hat{y}&=&-42.359+1.436x_1+14.758x_2+5.486x_3\\x_1&=&\text{age}\\x_2&=&\text{years of college}\\x_3&=&\text{years with the company}\\\hat{y}&=&\text{salary (\$1000s)}\end{eqnarray*}[/latex]
    2. For each additional year of age, the salary increases by [latex]\$1436.14[/latex].
    3. For each additional year of college, the salary increases by [latex]\$14,758.04[/latex].
    4. For each additional year with the company, the salary increases by [latex]\$5486.07[/latex].
    5. [latex]\$181,221.15[/latex]

     


13.2 Multiple Regression” and “13.8 Exercises” from Introduction to Statistics by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Introduction to Statistics - Second Edition Copyright © 2025 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book