It is assumed that the user is sufficiently knowledgeable about the fundamentals of regression analysis. The general bivariate linear regression equation takes the form of whereis the intercept,is the slope, and ε is the error term. It is bivariate as there are only two variables, a Y or dependent variable, and an X or independent variable, where X is also known as the regressor (sometimes a bivariate regression is also known as a univariate regression as there is only a single independent variable X ). The dependent variable is named as such as it depends on the independent variable, for example, sales revenue depends on the amount of marketing costs expended on a product’s advertising and promotion, making the dependent variable sales and the independent variable marketing costs. An example of a bivariate regression is seen as simply inserting the best-fitting line through a set of data points in a two-dimensional plane, as seen on the left panel in Figure 11.6. In other cases, a multivariate regression can be performed, where we have multiple k number of independent X variables or regressors, where the general regression equation will now take the form ofIn this case, the best-fitting line will be within a k+1 dimensional plane.
Figure 11.6: Bivariate Regression
However, fitting a line through a set of data points in a scatter plot as in Figure 11.6 may result in numerous possible lines. The best-fitting line is defined as the single unique line that minimizes the total vertical errors, that is, the sum of the absolute distances between the actual data pointsand the estimated lineas shown on the right panel of Figure 11.6. To find the best-fitting unique line that minimizes the errors, a more sophisticated approach is applied, using regression analysis. Regression analysis finds the unique best-fitting line by requiring that the total errors be minimized, or by calculating
where only one unique line minimizes this sum of squared errors. The errors (vertical distances between the actual data and the predicted line) are squared to avoid the negative errors from canceling out the positive errors. Solving this minimization problem with respect to the slope and intercept requires calculating first derivatives and setting them equal to zero:
which yields the bivariate regression’s least-squares equations:
For multivariate regression, the analogy is expanded to account for multiple independent variables, whereand the estimated slopes can be calculated by:
In running multivariate regressions, great care must be taken to set up and interpret the results. For instance, a good understanding of econometric modeling is required (e.g., identifying regression pitfalls such as structural breaks, multicollinearity, heteroskedasticity, autocorrelation, specification tests, nonlinearities, and so forth) before a proper model can be constructed.
- Start Excel and type in or open your existing dataset (the illustration below uses Risk Simulator | Example Models | 09 Multiple Regression in the examples folder).
- Check to make sure that the data are arranged in columns and select the data including the variable headings, and click on Risk Simulator | Forecasting | Multiple Regression.
- Select the dependent variable and check the relevant options (lags, stepwise regression, nonlinear regression, and so forth) and click OK (Figure 11.7).
Figure 11.8 illustrates a sample multivariate regression result report generated. The report comes complete with all the regression results, analysis of variance results, fitted chart, and hypothesis test results. See the next chapter for technical details on interpreting the results from a regression analysis.