Chapter 13: Regression Analysis
Definition
In economics, regression analysis is used to estimate quantitative functional relationships between dependent variables and one or more independent causal variables from actual data - experimental, time series, cross sectional - when the relationship among the variables is statistical in nature rather than exact. By a statistical relationship it is meant that the dependent variable's observed values are generated by a probability distribution that is a function of other causal variables.
The values of economic variables are determined by the behavior of people and hence these variables are stochastic. Empirical investigation of the relationships among them requires the tools of statistical inference, including regression analysis. This is true, whether the purpose is to forecast future sales or the performance of an economic system or to predict the impact of a new innovation or government regulation.
For example, an economist might wish to estimate the relationship between the quarterly sales of a product in a given geographical area and the total personal disposable income earned per quarter by all individuals living in that area.
If this relationship is assumed to be linear, the hypothesis is that
Y = a + bx + u
13.1
where Y = expected quarterly sales, X = total income, and u = error term.
This relationship is represented graphically by the upward sloping line in Figure 13.1. The data that will be used to estimate the parameters of this relationship consists of paired observations of X and Y: Xi and Yi for n quarters i= 1,...,n and are represented in Figure 13.1 below by the points plotted around the line.
The errors, u, in equation 13.1 above consist of differences between the actual observed values of Y and the expected or average values of Y determined by the linear relationship with X. They are represented graphically in Figure 13.1 by the vertical distances between each point (representing an X, Y observation) and the line (representing the relationship between X and the expected value of Y).
There are a number of reasons why these errors will arise?
- Measurement: The sales figures may have been inaccurately recorded.
- Causal factors left out of account: Sales may have been affected by changes in prices or other variables influencing consumer purchase decisions that have not been included in the hypothesized relationship.
- Random behavior of people: People do not always behave the same way each time they confront the same circumstances.
- Misspecification: The functional form of the relationship may have been incorrectly specified.
Least squares regression is a means of estimating the parameters of the equation hypothesizing Y as a function of X. Graphically, it is a means of fitting a line to the scatter of paired observations of X and Y in Figure 13.1. It involves choosing â and b(hat), estimators of the true parameters a and b, so as to minimize the sum of the squared differences between the actual values of Y and the estimate of Y given by the regression equation. These differences are the estimated values of the errors, ei for i = 1...n. The least squares estimators â and b(hat) minimize
If the errors are random with a zero mean, that is E(u)= 0, the estimators, â and , of the parameters, a and b, of the true relationship obtained through least squares regression are statistically best. They will be closer on average to the true parameters than any other unbiased estimators in general use, regardless of the number of observations.
In the example cited above there is a one way causal relationship between X and Y. Income effects sales but not vice versa. Frequently in economics a two way causal relationship exists between variables, as is illustrated below in the section on Econometric Modeling. In that case the methods of simultaneous estimation described in that section are required to obtain unbiased parameter estimates.
Multiple Regression
Where two or more independent variables affect the dependent variable, it is important to include them in the regression equation.
If the true relationship is
Yi = a + b1 Xi1 + b2 Xi2 + ui
the regression equation is
and solving these equations to get three simultaneous "normal" equations that can in turn be solved to get â, 1 & 2
Note that 1 is the best estimate of the effect on Y of changes in X1 when X2 is constant, and 2 is the best estimate of the effect on Y of changes in X2 when X1 is constant.
All important causal variables must be included in the equation. If some are omitted and they are correlated with those included, the least squares estimators will be biased.
Maintained by webmaster@wpi.eduLast modified: November 07, 2006 12:52:32
