What are examples of how multiple regression could be used in the real world?

In social science research, it is rare that we would only want to include one explanatory variable in our regression analysis. It is more likely that we would want to investigate the effect that two or more factors have on an outcome, such as confidence in the police. This might be because variables may measure the same thing or have similar relationships – we want to know what the relationship is while also controlling for the influence of other variables. Multiple linear regression allows us to obtain predicted values for specific variables under certain conditions, such as levels of police confidence between sexes, while controlling for the influence of other factors, such as ethnicity.

We are now going to add additional explanatory variables to our regression model and learn how to make predictions using a multiple linear regression model.

Multiple Regression Model:

Using the same procedure outlined above for a simple model, you can fit a linear regression model with policeconf1 as the dependent variable and both sex and the dummy variables for ethnic group as explanatory variables.

To fit a multiple linear regression, select Analyze, Regression, and then Linear.

In the dialogue box that appears, move policeconf1 to the Dependent(s) box and sex1, MIXED, ASIAN, BLACK, and OTHER in the Independent(s) box. (Remember we are still using WHITE as a baseline, so you do not need to include this dummy variable in your multiple linear regression model.) You should get output tables like the ones on the right.

What are examples of how multiple regression could be used in the real world?
Multiple Linear Regression Output

Now that we have run a multiple linear regression on the combined effect of sex and ethnicity on confidence in the police, have the predicted values changed?

Remember that in our simple sex linear regression, the predicted value of police confidence score was 13.325 in females and 13.761 in males. In our simple ethnicity linear regression, the predicted value of police confidence score was 14.617 for Mixed respondents, 12.711 for Asian respondents, 14.067 for Black respondents, 13.550 for White respondents, and 12.81 for respondents of all other ethnicities.

After our multiple linear regression, our values are:

policeconf1 = 13.789 + (1.073 x 1) + (-0.444 x 1) = 14.418 (Mixed Female)
policeconf1 = 13.789 + (1.073 x 1) + (-0.444 x 0) = 14.862 (Mixed Male)
policeconf1 = 13.789 + (-0.860 x 1) + (-0.444 x 1) = 12.485 (Asian Female)
policeconf1 = 13.789 + (-0.860 x 1) + (-0.444 x 0) = 12.929 (Asian Male)
policeconf1 = 13.789 + (.533 x 1) + (-0.444 x 1) = 13.878 (Black Female)
policeconf1 = 13.789 + (.533 x 1) + (-0.444 x 0) = 14.322 (Black Male)
policeconf1 = 13.789 + (-0.728 x 1) + (-0.444 x 1) = 12.617 (Other Female)
policeconf1 = 13.789 + (-0.728 x 1) + (-0.444 x 0) = 13.061 (Other Male)
policeconf1 = 13.789 + (-0.444 x 1) = 13.345 (White Female)
policeconf1 = 13.789 + (-0.444 x 0) = 13.789 (White Male)

Taking into consideration the trends we saw in the gender and ethnicity simple linear regression models, how closely do the results of our multiple linear regression follow the established patterns? Do women still have lower mean police confidence scores? Are differences with respect to ethnicity still seen?

Run another multiple linear regression, including wattack in the model along with sex1 and the ethnicity dummy variables. You’ll need to create dummy variables for the categories in wattack, and then select one of them to be the baseline category, remembering to leave that baseline category out of the multiple linear regression model. Do the predicted scores change at all when you control for the influence of wattack? Are the trends we saw previously still illustrated in this model?

Summary

Here, we’ve used multiple linear regression to determine the statistical significance of police confidence scores while controlling for sex and ethnic background. We’ve learned that there is still a statistically significant relationship between police confidence score and ethnicity, and between police score and sex. Finally, we've used the ethnicity and sex coefficients presented to us in the multiple linear regression to predict police confidence scores for people falling into the various ethnicity and sex categories.

Note: as we are making changes to a dataset we’ll continue using for the rest of this section, please make sure to save your changes before you close down SPSS. This will save you having to repeat sections you’ve already completed.

Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. The goal of multiple linear regression is to model the linear relationship between the explanatory (independent) variables and response (dependent) variables. In essence, multiple regression is the extension of ordinary least-squares (OLS) regression because it involves more than one explanatory variable.

Key Takeaways

  • Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable.
  • Multiple regression is an extension of linear (OLS) regression that uses just one explanatory variable.
  • MLR is used extensively in econometrics and financial inference.

Formula and Calculation of Multiple Linear Regression

yi=β0+β1xi1+β2xi2+...+βpxip+ϵwhere, for i=n observations:yi=dependent variablexi=explanatory variablesβ0=y-intercept (constant term)βp=slope coefficients for each explanatory variableϵ=the model’s error term (also known as the residuals)\begin{aligned}&y_i = \beta_0 + \beta _1 x_{i1} + \beta _2 x_{i2} + ... + \beta _p x_{ip} + \epsilon\\&\textbf{where, for } i = n \textbf{ observations:}\\&y_i=\text{dependent variable}\\&x_i=\text{explanatory variables}\\&\beta_0=\text{y-intercept (constant term)}\\&\beta_p=\text{slope coefficients for each explanatory variable}\\&\epsilon=\text{the model's error term (also known as the residuals)}\end{aligned}yi=β0+β1xi1+β2xi2+...+βpxip+ϵwhere, for i=n observations:yi=dependent variablexi=explanatory variablesβ0=y-intercept (constant term)βp=slope coefficients for each explanatory variableϵ=the model’s error term (also known as the residuals)

What Multiple Linear Regression Can Tell You

Simple linear regression is a function that allows an analyst or statistician to make predictions about one variable based on the information that is known about another variable. Linear regression can only be used when one has two continuous variables—an independent variable and a dependent variable. The independent variable is the parameter that is used to calculate the dependent variable or outcome. A multiple regression model extends to several explanatory variables.

The multiple regression model is based on the following assumptions:

  • There is a linear relationship between the dependent variables and the independent variables
  • The independent variables are not too highly correlated with each other
  • yi observations are selected independently and randomly from the population
  • Residuals should be normally distributed with a mean of 0 and variance σ

The coefficient of determination (R-squared) is a statistical metric that is used to measure how much of the variation in outcome can be explained by the variation in the independent variables. R2 always increases as more predictors are added to the MLR model, even though the predictors may not be related to the outcome variable.

R2 by itself can't thus be used to identify which predictors should be included in a model and which should be excluded. R2 can only be between 0 and 1, where 0 indicates that the outcome cannot be predicted by any of the independent variables and 1 indicates that the outcome can be predicted without error from the independent variables.

When interpreting the results of multiple regression, beta coefficients are valid while holding all other variables constant ("all else equal"). The output from a multiple regression can be displayed horizontally as an equation, or vertically in table form.

Example of How to Use Multiple Linear Regression

As an example, an analyst may want to know how the movement of the market affects the price of ExxonMobil (XOM). In this case, their linear equation will have the value of the S&P 500 index as the independent variable, or predictor, and the price of XOM as the dependent variable.

In reality, multiple factors predict the outcome of an event. The price movement of ExxonMobil, for example, depends on more than just the performance of the overall market. Other predictors such as the price of oil, interest rates, and the price movement of oil futures can affect the price of XOM and stock prices of other oil companies. To understand a relationship in which more than two variables are present, multiple linear regression is used.

Multiple linear regression (MLR) is used to determine a mathematical relationship among several random variables. In other terms, MLR examines how multiple independent variables are related to one dependent variable. Once each of the independent factors has been determined to predict the dependent variable, the information on the multiple variables can be used to create an accurate prediction on the level of effect they have on the outcome variable. The model creates a relationship in the form of a straight line (linear) that best approximates all the individual data points.

Referring to the MLR equation above, in our example:

  • yi = dependent variable—the price of XOM
  • xi1 = interest rates
  • xi2 = oil price
  • xi3 = value of S&P 500 index
  • xi4= price of oil futures
  • B0 = y-intercept at time zero
  • B1 = regression coefficient that measures a unit change in the dependent variable when xi1 changes - the change in XOM price when interest rates change
  • B2 = coefficient value that measures a unit change in the dependent variable when xi2 changes—the change in XOM price when oil prices change

The least-squares estimates—B0, B1, B2…Bp—are usually computed by statistical software. As many variables can be included in the regression model in which each independent variable is differentiated with a number—1,2, 3, 4...p. The multiple regression model allows an analyst to predict an outcome based on information provided on multiple explanatory variables.

Still, the model is not always perfectly accurate as each data point can differ slightly from the outcome predicted by the model. The residual value, E, which is the difference between the actual outcome and the predicted outcome, is included in the model to account for such slight variations.

Assuming we run our XOM price regression model through a statistics computation software, that returns this output:

What are examples of how multiple regression could be used in the real world?
What are examples of how multiple regression could be used in the real world?

Image by Sabrina Jiang © Investopedia 2020

An analyst would interpret this output to mean if other variables are held constant, the price of XOM will increase by 7.8% if the price of oil in the markets increases by 1%. The model also shows that the price of XOM will decrease by 1.5% following a 1% rise in interest rates. R2 indicates that 86.5% of the variations in the stock price of Exxon Mobil can be explained by changes in the interest rate, oil price, oil futures, and S&P 500 index.

The Difference Between Linear and Multiple Regression

Ordinary linear squares (OLS) regression compares the response of a dependent variable given a change in some explanatory variables. However, a dependent variable is rarely explained by only one variable. In this case, an analyst uses multiple regression, which attempts to explain a dependent variable using more than one independent variable. Multiple regressions can be linear and nonlinear.

Multiple regressions are based on the assumption that there is a linear relationship between both the dependent and independent variables. It also assumes no major correlation between the independent variables.

What Makes a Multiple Regression Multiple?

A multiple regression considers the effect of more than one explanatory variable on some outcome of interest. It evaluates the relative effect of these explanatory, or independent, variables on the dependent variable when holding all the other variables in the model constant.

Why Would One Use a Multiple Regression Over a Simple OLS Regression?

A dependent variable is rarely explained by only one variable. In such cases, an analyst uses multiple regression, which attempts to explain a dependent variable using more than one independent variable. The model, however, assumes that there are no major correlations between the independent variables.

Can I Do a Multiple Regression by Hand?

It's unlikely as multiple regression models are complex and become even more so when there are more variables included in the model or when the amount of data to analyze grows. To run a multiple regression you will likely need to use specialized statistical software or functions within programs like Excel.

What Does It Mean for a Multiple Regression to Be Linear?

In multiple linear regression, the model calculates the line of best fit that minimizes the variances of each of the variables included as it relates to the dependent variable. Because it fits a line, it is a linear model. There are also non-linear regression models involving multiple variables, such as logistic regression, quadratic regression, and probit models.

How Are Multiple Regression Models Used in Finance?

Any econometric model that looks at more than one variable may be a multiple. Factor models compare two or more factors to analyze relationships between variables and the resulting performance. The Fama and French Three-Factor Mod is such a model that expands on the capital asset pricing model (CAPM) by adding size risk and value risk factors to the market risk factor in CAPM (which is itself a regression model). By including these two additional factors, the model adjusts for this outperforming tendency, which is thought to make it a better tool for evaluating manager performance.

What is regression give a real life example?

Formulating a regression analysis helps you predict the effects of the independent variable on the dependent one. Example: we can say that age and height can be described using a linear regression model. Since a person's height increases as age increases, they have a linear relationship.

What can multiple regression be used for?

Regression allows you to estimate how a dependent variable changes as the independent variable(s) change. Multiple linear regression is used to estimate the relationship between two or more independent variables and one dependent variable.

In what scenarios would using multiple regression be appropriate?

Use multiple regression when you have three or more measurement variables. One of the measurement variables is the dependent (Y) variable. The rest of the variables are the independent (X) variables; you think they may have an effect on the dependent variable.

What would the regression equation be used for in the real world?

That trend (growing three inches a year) can be modeled with a regression equation. In fact, most things in the real world (from gas prices to hurricanes) can be modeled with some kind of equation; it allows us to predict future events.