This **preview** has intentionally **blurred** parts. Sign up to view the full document

**Unformatted Document Excerpt**

6 MULTIPLE MODULE REGRESSION CONTENTS: MODULE 6 1. Introduction 2. Matrix Notation 3. Forecasting 4. Data Problems 4.1 Multicollinearity 4.2 Measurement Errors 4.3 Outliers and Undue Influence 5. The F Test for Linear Restrictions 5.1 The Redundant Variables Test 5.2 The Linear Restrictions or Wald Test 5.3 The Chow Test 6. Dummy Variables 6.1 Introduction 6.2 Generating Dummy Variables and Time Trends in Eviews 6.3 Interpreting the Coefficients of Dummy Variables 6.4 Comparing Regression Models 6.5 An Event Study 7. Financial Applications 7.1 Testing the CAPM 7.1.1 The Fama-MacBeth Approach 7.1.2 The Black-Jensen-Scholes Approach 7.2. The Predictability of Share Returns 7.3. Using the APT Model 7.3.1 Introduction 7.3.2 Estimating and Testing the APT Model 7.4. Testing Market Timing and Stock-Selection Performance 2 REFERENCES Gujarati, Ch 7,8,9,and 10 Johnson and Di Nardo, Ch 3 Thomas, Ch 7 and 9 Wooldridge, Ch 4.4-4.5,7.2-7.3, 9.4, Appendices D&E Other Useful References Z.Y.Bello and V.Janjigian (1997) A Reexamination of the Market-Timing and SecuritySelection Performance of Mutual Funds, Financial Analysts Journal, September-October, p 24 - 30. E.R.Berndt (1991) The Practice of Econometrics: Classic and Contemporary , Ch 2 W.Daniel (1990) Applied Nonparametric Statistics, Duxbury, Boston, p 63 - 66. M.Grinblatt and S.Titman (1998) Financial Markets and Corporate Strategy , Chs 5-6 M.Kritzman (1994) What Practitioners need to know about Serial Dependence Financial Analysts Journal, March-April, p 19 - 22 3 1. INTRODUCTION When there are two or more independent variables we have what is called a Multiple Regression model. Where there are k = 2 independent variables we write this model either as: E(Y|X1i, X2i) = β0 + β1X1i + β2X2i or Yi = β0 + β1X1i + β2X2i + ui (i = 1 , ... , n) The coefficients β1 and β2 in this model are what we call partial derivatives. This means that a coefficient such as β1 tells us the impact on Y of a unit change in X1, when the values of all other independent variables are held constant. Once we have two or more independent variables the formula used can become very large and very complicated. It is possible to write these formulae more concisely if we use the matrix notation discussed in the next subsection. The formula needed when forecasting with this type of model are discussed in the next section. There are many situations in which the data we have to work with leads to problems with these estimators so that they no longer possess the desired properties. The key data problems that we need to take into consideration are discussed in section 4. When we estimate a multiple regression model the estimates of the coefficients are not independent if the independent variables are related to each other. Because of this the t statistics are not always reliable so we will often use F statistics instead. We will look at using F statistics for testing for groups of variables and more complicated restrictions on the coefficients in section 5. There are many situations in which our Y variable is affected by factors for which we do not posses a set of numerical values. Here we have to create what are called dummy variables. We will examine the way in which the coefficients are interpreted and how these variables are used in section 6. In section 7, we examine some financial applications that use the multiple regression model. These include various approaches of testing the CAPM, the APT model and evaluating funds managers. 4 2. MATRIX NOTATION If there are two or more independent variables and a sample of n observations then we use the following matrices or blocks of values when we write our model and the corresponding formulae. With k = 2 independent variables we use the matrices: y(nx1) ⎡ ⎢ ⎢ =⎢ ⎢ ⎢ ⎢ ⎣ y1 ⎤ y2 : : yn ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ $ y (nx1) ⎡ ⎢ ⎢ =⎢ ⎢ ⎢ ⎢ ⎣ $ y1 ⎤ $ y2 ⎥ ⎥ :⎥ ⎥ :⎥ $⎦ yn ⎥ X(nx3) ⎡ ⎢ ⎢ =⎢ ⎢ ⎢ ⎣ 1 X 11 1 X 12 : : : : : : 1 X 1n X 2n ⎡ ⎢ ⎢ u(nx1) = ⎢ ⎢ ⎢ ⎢ ⎣ X 21 ⎤ X 22 ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ u1 ⎤ u2 ⎥ ⎥ :⎥ ⎥ :⎥ un ⎥ ⎦ and β(3x1) ⎡β ⎤ ⎢ 0⎥ =⎢β⎥ ⎢ β1 ⎥ ⎢ 2⎥ ⎦ ⎣ $ β (3x1) ⎡ ⎢ =⎢ ⎢ ⎢ ⎣ $ β⎤ 0⎥ $ β⎥ 1 ⎥ $ β⎥ 2⎦ This can easily be extended to any number of variables just by adding extra columns $ to the X matrix and rows to the β and β vectors. The PRF can be written in matrix form as y = Xβ + u and the SRF can be written in matrix form as $ y $ = Xβ The formula for the OLS estimated coefficients is as follows: $ β -1 = (X′X) X′y Details of the derivation are given in the appendix to this module. You can show that $ the matrix formula with one independent variable gives the same formulae for β0 and $ β1 we found in the rnivariate regression module. You should compare this formula with the formula for the slope in the simple linear regression model where n n ) β1 ∑ t=1 = ∑ xt y t t=1 (X t - X)(Yt - Y) n ∑ t=1 (X t - X) 2 5 = n ∑ x2 t t=1 In the matrix formula the expression X′ is called the transpose of the matrix X. We obtain the transpose of any matrix by taking the original matrix and turning it on its side so that all the columns now become rows. The inverse of the product of the -1 transpose and the X matrix which we write as (X′X) is similar to the expression n ∑ (X t t=1 - X) 2 which appears in the denominator. This matrix product gives the sum of the squared deviations from the mean for all the independent variables. The other matrix product X′y is similar to the term n ∑ (X t t=1 X)(Yt - Y) which appears in the numerator. This matrix product gives the sum of the products of the differences from the mean for all the independent variables and the dependent variable. When we wish to test whether any individual coefficient is significantly different from 0 we use the t statistic. In the simple linear regression model this t statistic is equal to the ratio of the estimated coefficient and the estimated standard deviation of the possible values of this estimated coefficient. The degrees of freedom of this t statistic $ $ will be n - 2 because we use 2 estimates β and β when we obtain the estimate of 0 1 the variance of the error terms used in this formula. We write this formula in the following way: $ β tn-2 = $ β 1 = $ V( β ) 1 $ β 1 = $ SE( β ) 1 1 n $ σ/ ∑ xi i=1 2 With the multiple regression model where we have to estimate the estimates of the $ $$ $ constant term β and the coefficients β , β , …. , β of the k independent 0 1 2 k variables the formula for the t statistic is similar to the above expression. Because we have (k + 1) estimates the degrees of freedom will be (n - k - 1). The variance of the $ possible values of the estimate β of the coefficient of independent variable i is found i from what is called the variance-covariance matrix which is the product of the -1 estimated variance of the error terms and the (X′X) matrix we write in the following way -1 $ $2 V( β ) = σ (X′X) 6 This is a square matrix with (k + 1) rows and columns. The variances of the possible values of all the estimated coefficients are found on the main diagonal of this matrix. $ For β the variance is the (i + 1)th. term down the main diagonal. The off diagonal or i (i, j)th. terms contain the covariances between the estimated values of the different $ $ coefficients β and β j . i To obtain the coefficient of determination R2 or the estimated variance of the error $2 terms σ we use our different measures of variation. Using matrix notation we write these formula in the following ways: n Total Variation or TSS = ∑ (Y − Y )2 i i =1 = y ′y − 1⎛ n ⎞ ⎜ ∑ Yi ⎟ n ⎝ i =1 ⎠ 2 n Explained Variation or ESS = ∑ (Yˆ − Y ) 2 i i =1 1⎛ n ⎞ ˆ = β ′X ′y − ⎜ ∑ Yi ⎟ n ⎝ i =1 ⎠ 2 Residual Variation or RSS = TSS - ESS $ = y'y - β 'X'y The formula for the estimated the variance of the error terms is now written as: n $ σ2 = ∑ ei2 i=1 n-k -1 = ˆ y' y - β ′ X' y n - k -1 The formula for the coefficient of determination R2 will be written in the following way: R2 = Explained Variation Total Variation =1- =1- Residual Variation Total Variation ˆ y' y - β ′ X' y ⎛1⎞n 2 ⎟∑ Y ⎝ n ⎠ i=1 i y' y - ⎜ The formula for the F statistic, which is used in the ANOVA procedure, can also be written using matrix notation. Here we now have: 7 Fk,n-k-1 = = (Explained + Residual) Variance Residual Variance ESS / k RSS / (n - k - 1) n ⎡ˆ ⎤ ⎛ 1⎞ β ' X' y - ⎜ ⎟∑ Y 2 ⎥ /k ⎢ ⎝ n ⎠ i=1 i ⎦ ⎣ = ˆ y' y - β ' X' y /(n − k − 1) [ ] 3. FORECASTING If we have a model in which there are two independent variables then our PRF is: Yi = β + β1X1i + β2 X2i + ui (i = 1 , ... , n) Suppose that we have estimated the coefficients and obtained the following SRF $ Y i = 10 + 3X1i + 2X2i (i = 1 , ... , n) We want to obtain forecasts of E(Y|X) and Y when X1 and X2 are given the following values: X10 = 4 and X20 = 7. To write the formulae for forecasts in matrix notation we first define X0'. This is the row vector which contains a 1 for the constant and the values of X1 and X2 for which we want to produce a forecast. In our example X0' = [1 4 7] The point estimate of both the conditional mean E(Y|X1i, X2i) and the actual value Y is the value of the SRF for these values of the independent variables. The formula we use when we are working with matrices is: $ $ Y 0 = X0' β ⎡ 10 ⎢ = [1 4 7] 3 ⎢ ⎢ ⎣2 ⎤ ⎥ ⎥ ⎥ ⎦ = [ (1x10) + (4x3) + (7x2)] = 36 8 $ To obtain an appropriate expression for the variance of the Y values in different $ ˆ ˆ ˆ samples, we note that Y depends on the values of 3 estimates β 0 , β 1 and β 2 . The $ estimated variance of Y is a function of the estimated variances and covariances of ˆ ˆ ˆ β 0 , β 1 and β 2 , and these values are stored in the estimated variance-covariance matrix: -1 $ $2 V( β ) = σ (X'X) It is possible to show that when we wish to forecast the value of the conditional mean of the Y values E(Y | X0'), for the set of X values in the vector X0', the variance of the $ possible Y values is given by the following formula: -1 $ $2 V( Y 0| X0') = σ X0'(X'X) X0 The formula for the interval estimate is: -1 0.5 $ $2 Y 0 ± t(n-k-1)[ σ X0'(X'X) X0] When we wish to forecast individual Y values rather than the conditional mean at this set of X values, the formula for the variance of these values is: -1 $ $2 V( Y 0| X0') = σ [1 + X0'(X'X) X0] Our interval estimate will be: $ Y0 2 -1 $ ± t(n-k-1)[ σ (1 + X0'(X'X) X0)] 0.5 If we have a reasonable sized sample (say 50 or more) we can use the following approximate, but still reasonably accurate, 95% prediction interval. The terms -1 X0'(X'X) X0 will be of the size 1/n and so can be neglected. The value t(n-k-1) ≈ 1.96 ≈ 2. Using these approximations we have -1 0.5 $ $ $2 Y 0 ± t(n-k-1)[ σ (1 + X0'(X'X) X0)] ≈ Y 0 ± 2σ. If you need to use the matrix formulae for a small sample or the confidence interval for a mean value, it is not particularly easy in Eviews. You would need to use the underlying programming language. In such a case it is easier to copy and paste the covariance matrix into Excel and use the matrix multiplication commands to obtain the required interval. 9 4. DATA PROBLEMS Besides the same assumptions about the error terms we also make three further assumptions about the data we are using namely that: 1. There is no multicollinearity, by which we mean that if we have three independent variables while the X1, X2 and X3 then these variables are related to Y but they are not related to each other. 2. The individual observations or Xij terms in our sample do not exert undue influence on the estimated values of the βi's. This means that the values of our estimates will not change dramatically when we change the sample we are using by adding or taking away a particular case. 3. There are no measurement errors in the values of either the dependent variable or independent variables. We will now look at the impact on our estimates when these assumptions do not hold and how we can deal with these problems. 4.1 Multicollinearity There are two types of Multicollinearity that can occur namely exact and close but not exact multicollinearity. (a) Suppose for example that there is an exact linear relationship between two or more variables, for example if one variable is the sum of or the difference between two of the other two variables with X3 = [ X2 - X1 ] then we cannot obtain estimates of the βj’s. Indeed in this situation our usual approach to interpreting the value of β1 makes little or no sense. We originally said that β1 shows the impact on Y of a unit change in X1 when all other X variables are constant. If there is exact multicollinearity, then unit changes in X1 will always be accompanied by changes in the values of other X variables such as X3. This means we can never be sure whether the change in Y was caused by the change in X1 or the change in X3. For those of you who are familiar with matrix algebra, when there is an exact relationship between X variables such as X1 and X3 the determinant of (X'X) is 0 -1 and the elements of its inverse (X'X) are infinite because they are all equal to some finite value divided by 0. This means that we cannot find the value of the variance of the possible values of which means we cannot find interval estimates of βs or conduct t tests of possible values. 10 Most computer packages will tell you that it is not possible to obtain estimates of the coefficients because you have a nonsingular matrix which means we cannot -1 find (X'X) . (b) In most practical situations there is a close but not exact relationship between the X variables in the model. Our computer package will now obtain the estimates ˆˆ ˆ ˆ β 0 , β 1 , β 2 and β 3 of the constant and the coefficients of the independent variables in the model. This type of situation is more difficult to deal with than exact multicollinearity because now our package does not automatically tell that we have a problem. We have to find a way of determining whether our X variables are closely related and whether this will have a significant impact on how useful our model is. The reason why we need to find whether there is a close but inexact relationship between the X variables is as follows. In this situation while the determinant of (X'X) is no longer 0 it will now be equal to some small value which is close to 0. When this happens small changes in the data can cause large proportional changes in the determinant. Here the inverse matrix can be found but the elements of the inverse or -1 ˆˆ ˆ (X'X) matrix are very large as are the variances of β 1 , β 2 and β 3 . When these estimates have very large variances it means that their values vary widely from one sample to another. This means that when a new set of observations becomes available at the end of the month or the end of the quarter when we use these new values to re-estimate our model we may obtain estimates that differ significantly from the ones we are now using. Where these estimates are changing from one period to the next we would be foolish to use the model to analyse market behaviour. The model may still be used to forecast future values provided that we are using actual data values for the dependent variables. To help understand how we can determine whether there is close but not exact multicollinearity present in the independent variables that we are using we will use the multiple regression model with two independent variables Yi = β0 + β1X1i + β2 X2i + ui (i = 1 , ... , n) In the simple linear model where there was one independent variable the variance of ˆ the possible values of β 1 the estimate of the coefficient of X1 is given by the following formula ˆ V( β 1 ) = σ2 n ∑ (X 1i i=1 - X )2 1 11 When there are two independent variables in a multiple regression model it can be ˆ shown that the variance of the possible values of β 1 the estimate of the coefficient of X1 is given by the following formula σ2 ˆ V( β 1 ) = n (1 − r 2 ) 12 ∑ (X 1i i=1 - X )2 1 In this formula the r12 term is the correlation coefficient between the two independent variables X1 and X2. The square of this value r 2 is the coefficient of determination 12 2 or R value when we have a model in which X1 is the dependent variable and X2 is the independent variable. ˆ From this formula we can see that the variance V( β 1 ) will have a very large value when the value of (1 - r 2 ) is very small. This occurs when r 2 is close to 1, which is 12 12 the same as saying when there is close but not exact multicollinearity. If there are more than two independent variables in the model then the variance of ˆ the possible values of β 1 the estimate of the coefficient of X1 is given by the following formula σ2 ˆ V( β 1 ) = = VIF n ∑ (1 − R 2 ) (X - X ) 2 1. i=1 1i 1 σ2 n ∑ (X 1i i=1 - X )2 1 The variance inflation factor or VIF is defined in the following way VIF = 1 (1− R 2 ) 1. 2 where the R12. term in this expression is the coefficient of determination or R value when we have a model in which X1 is the dependent variable and there are (k - 1) 2 independent variables namely X2, X3 , …. , Xk. i.e. it is the R value for the model X1 = α0 + α1X2 + α2X3 + …. + αk-1Xk + v (The corresponding expression for any other variable Xj is R 2. ). From this expression j ˆ for V( β 1 ) we see that if multicollinearity is a problem and there is one or more R 2. j ˆ term which is close to 1 then we will have large values of the VIF and V( β j ). 12 The above results and the description of the consequences of close but inexact multicollinearity can be used to develop the following procedures for determining whether multicollinearity is a problem in any model we estimate. (a) We look at our estimated coefficients to see if they are unstable as where the ˆ variance of possible values of the estimated coefficient V( β 1 ) are large we will find that small changes in the existing values of the variables or the addition of new values to the sample can produce very large changes in the values of these estimates. ˆ (b) Where there are large V( β 1 ) terms we will find that our estimates are not only unstable but they will take extreme or unusual values and can even have the opposite signs to what we would normally expect. ˆ (c) With the large V( β 1 ) values for different coefficients we may find that the t statistics indicate that the individual variables in the model do not have a 2 significant impact. At the same time we may find that the R for the complete model indicates that the combined impact of these supposedly insignificant 2 independent variables is significant. In other words the presence of a large R value along with small values for the t statistics can also indicate that multicollinearity is a problem. (d) Large R 2. values indicate that multicollinearity could be a problem for a model. j We say these values are large when the R 2. value for any variable Xj exceeds j 2 the R value for the complete model. (e) Look at the correlation matrix between the X variables. Although this only gives the relationships between pairs of variables, high correlations almost always mean we wil have problems with multicolliearity. It should be noted that it is very difficult to solve all the problems associated with multicollinearity. The main reason is that where there are relationships between the k different X variables then these k variables do not provide k independent sources of information. To achieve a partial resolution we could omit one or more independent variables from the model. If we omit important variables however, this could make our estimates of the coefficients of the remaining variables biased i.e. too high or too low, because these coefficients now show the impact of the omitted variables. Another approach is to develop alternative estimators to the OLS estimators which are biased but which have smaller variances. These estimators however still do not overcome the fundamental problem of accurately determining the impact of the individual X variables on Y. Multicollinearity will make our t-statistics unreliable due to the variance inflation effect. The F-test and the R2, which look at the total amount of variation explained by 13 the model, rather than the behaviour of the individual variables, are not effected by multicollinearity. 4.2 Measurement Errors In any regression study we assume that it is possible to measure the values of the variables accurately. The impact of measurement errors depends upon where they occur. (a) If there are errors in the Y values then this does not have a major impact on the quality of the estimates that we obtain with OLS. Consider the model Yi + ε i = β 0 + β 1 X i + u i ⇒ Yi = β 0 + β 1 X i + ω i where ε i is the measurement error. As you can see, by the way the model has been rewritten, the estimates of the coefficients should be unbiased. The estimate of the standard deviation of the model errors will be too high. If the errors in the Y values and the model errors are independent the estimated variance will be the sum of the two variances. This means that all our test statistics will be too small, leading us to think the model is not a good fit to the data. (c) When there are errors in the values of the X variables this has a major impact on the quality of the OLS estimates. Yi = β 0 + β 1 ( X i + ε i ) + u i The error in the X variable will impact on the estimate of the coefficients. We now have estimates which are both biased and inconsistent which means that even if we were to use very large samples the quality of our estimates would not improve. Typically the estimates of the coefficients will tend to be too small. Measurement errors can occur when a variable is correctly measured but is used for a proxy variable. For example, we often use indices such as the All Ordinaries as a measure, or proxy, of the market, when estimating the market model. Although these indices are good indicators of what is happening to the market, they do not measure the w hole of the market, and so contain measurement errors in that context. The effects of measurement errors can sometimes be reduced by the use of instrumental variables (beyond the scope of this course but details of this approach can be found in the Eviews manual and a detailed discussion is given in Wooldridge Chapter15). 14 4.3 Outliers and Undue Influence The second assumption says that the results for any model should not be too heavily influenced by one or two observations. Unusual observations can quite easily occur in Finance applications particularly if we are working with returns. Large changes in share prices can produce very large changes in the returns on those shares. To see why this is a problem consider the situation where we have a sample of n = 7 values. Case X Y 1 2 3 4 5.0 5.5 6.0 7.0 10 15 5 10 5 6 7.0 8.0 12 9 7 12.0 20 If you examine the diagram shown below you will see that 6 out of the 7 points lie in a circular area. For these 6 points the SRF has a slope which is negative or zero. The SRF for the 6 Typical Points 15 When we include point 7 which is the unusual point, then using OLS we now get a SRF with a significant positive slope. The SRF when we include the Unusual Data Point From this simple numerical example we see that it is possible that our results are very dependent on a relatively small number of observations. These unusual observations are said to have undue influence over our results. This is why in any applied study you should always check whether your results are unduly affected by observations which may not be typical. In many cases the unusual observations are found to have been incorrectly entered into the data file with for example a decimal point in the wrong place. There are 3 different approaches that are used to determine whether any of our observations have exercised undue influence. (a) We can measure whether the X value of an observation has a very large impact $ on the value we obtain for the fitted or Y value. This (b) We can measure whether the Y value of an observation is very different by $ looking at the size of the residual e = Y - Y which is the difference between the actual value of Y and estimated value of Y. (c) We can measure the size of the change in the estimated values of the coefficients when an observation such as observation 7 is omitted. This third approach can be shown to be a combination of the first two approaches. 16 THE IMPACT OF X VALUES When we discuss the measure which looks at the impact of the X value we begin by noting that if we have a simple or multiple regression model then the set of estimates for all the coefficients in the model can be written in matrix form as $ β -1 = (X′X) X′y Using matrix notation the forecast or fitted values of the dependent variable can be written in matrix form as $ y $ = Xβ $ If we substitute the formula for β into this expression we obtain -1 $ y = X(X′X) X′y = Hy where -1 H = X(X′X) X′y is called the hat matrix. This is an n x n matrix where we have one row and one column for each of the n sets of values for the independent variables in our sample. It is called the hat matrix because it appears in the expression which shows how the $ estimated or fitted values which we call Y hat or y are related to the actual values of Y. The n terms down the main diagonal of the H matrix or hii terms are called the leverage values. They are called leverage values because the size of hii is the main factor which determines how much influence or leverage the actual value Yi has on $ the fitted or estimated value Y i. It can be shown that if we have k independent variables in our model then the values of these leverage or hii terms have the following properties 0 ≤ hii ≤ 1 and n ∑ hii i=1 = k+1 This means that the average value of the hii terms is n h = (1/n) ∑ hii i=1 = (k + 1)/n 17 Consider the second leverage term h22. This term is associated with the second value in our sample. If we have k independent variables X1, X2, X3, …. , Xk in our model then the set of second values for all the independent variables is written as follows [X12, X22, X32, …. , Xk2] The set of average or mean values for all k independent variables is [ X , X , X , …. , X ] 1 2 3 k The second leverage value h22 can be interpreted as the distance between the point associated with the second set of values in our sample and the point associated with the average values of all the independent variables. This implies that the further the X values are from their mean values the more impact that set of sample values has on the estimated or fitted value of Y. The criterion that is used to determine whether a leverage value such as h22 is large enough to say that a sample value has X values which are unusual or outliers is twice the mean value. In other words we would say that the second value in the sample has an X value which is an outlier if h22 ≥ 2 h = 2(k + 1)/n THE IMPACT OF Y VALUES To test whether the Y value in a period such as period 2 is an outlier we can look at the residual in this period or $ e2 = Y2 - Y 2 Because this value depends upon the units of measurement we look at its estimated z score which we call the studentized residual. (This could also be called Student’s t statistic for the residual.) As the mean of these values is 0 we define the studentized residual in period 2 in the following way e r2 = 2 $ σ 1− h 22 $ (In this expression σ is the standard deviation of the residuals and h22 is the leverage of the second value in our sample.) If the studentized residual exceeds 2 this is seen as evidence that the Y value for this sample value is an outlier. 18 THE COMBINED IMPACT OF X AND Y VALUES If a sample value is an outlier either because of its X or its Y value then we can determine whether this sample value has an undue influence on our estimates by using a measure called Cook’s distance measure. Before we define this measure we must note that in a multiple regression model the $ vector β contains the estimated values of the constants and coefficients of the k independent variables which are found using all n values in the sample. The vector $ ( i) β contains the estimated values of the constants and coefficients of the k independent variables which are after we have omitted observation i from the sample. This set of estimates is found using the remaining n - 1 values in the sample. We write the formula for Cooks distance for value i in the sample in the following way Di = $$ $$ (β - β (i) )' (X' X )(β - β (i) ) $ (k + 1) σ 2 This measure can be interpreted as the squared standardized distance between our estimates of all the coefficients with the complete sample of n values and the estimates we obtain when we omit the i th. value in the sample and use only n - 1 values. Cooks distance can also be written as function of the other two terms used to measure whether the separate X or Y value was an outlier. For value 2 in the sample we have D2 = 1 h 22 k + 1 1 − h 22 r2 2 Here k+1 is the number of estimated coefficients or β terms in the equation, h22 and r 2 are the leverage and the square of the studentized residual for value 2 in the 2 sample. The values of Cooks distance do not have exactly the same probability distribution as the values of the F statistic. We can however use the critical values from the F statistic tables to find values of Di which seem to show that value i does exert undue influence. The rules we use are based on an F statistic whose first degree of freedom is (k + 1) the number of coefficients we estimate and whose second degree of freedom is (n - k -1) the sample size less the number of estimates we use when finding Di. The key points to remember 19 (a) If Di is less than the value of Fk+1,n-k-1 which gives 0.20 in the left tail and 0.80 in the right tail then value i is not an influential observation. (b) If Di is greater than the value of Fk+1,n-k-1 which gives 0.50 in the left tail and 0.50 in the right tail then value i is an influential observation. (c) If Di lies between the value of Fk+1,n-k-1 which gives 0.20 in the left tail and 0.80 in the right tail and the value of Fk+1,n-k-1 which gives 0.50 in the left tail and 0.50 in the right tail then it is uncertain whether value i is an influential observation. The closer Di is to the value of Fk+1,n-k-1 which gives 0.50 in the left tail and 0.50 in the right tail then the more likely it is that value i does exert undue influence on the estimates we obtain. RECURSIVE ESTIMATION The three techniques given above were designed to avoid the large amount of computation involved in re-estimating the model while leaving out one data point at a time, then checking what happened to the coefficients. With today’s high speed and cheap computers, it is often easier to write a small program to estimate the coefficients, deleting each data point in turn, and obtain a graph of the coefficients. This allows us to directly observe the effect of each data point in the values of the coefficients. A way of obtaining similar information to this is to use the recursive estimation option in Eviews. On your estimated equation select View → Stability tests → Recursive estimation And in the dialogue box that appears select “Recursive Coefficients”. This will start with a small sample then introduce on more data point at a time. You would expect the estimated coefficient to be slightly unstable to start with then settle down to a steady value. Here is an example using the data file guj5i9. 100 1.8 1.6 0 1.4 -100 1.2 1.0 -200 0.8 -300 0.6 10 15 20 Recursive C(1) Estimates 25 30 ± 2 S.E. 10 15 20 Recursive C(2) Estimates 25 ± 2 S.E. It is clear that there are two highly influential data points here, points number 17 and 22. 20 30 5. THE F TEST FOR LINEAR RESTRICTIONS 5.1 The Redundant Variables Test In the simple linear regression model the ANOVA or F test can be used to choose between the hypotheses: H0 : β1 = 0 and H1 : β1 ≠ 0 If you look at these two hypotheses you can see that in H0 we are imposing a linear restriction, as we are saying that β1 = 0. In H1 the value that β1 can take is not restricted in any way. Suppose we have a multiple regression model such as the example given on pp 276277 of Gujarati (3e) where we wish to model the Annual Sales or Demand for Telephone Cable (Y). The independent variables that we use in our demand function are X1 X2 X3 X4 X5 = = = = = Gross National Product Housing Starts Unemployment Rate Prime Rate lagged 6 months Customer Line Gains The yearly data from 1968 to 1983 is stored in the file GUJ832.XLS in the following order X1, X2, X3, X4, X5, Y When we estimate the model which contains all 5 independent variables we obtain the following SRF. $ Yt = 5962.656 + 4.883663X1t + 2.363956X2t - 819.1287X3t + 12.01048X4t - 851.3927X5t If we want to test whether the complete set of independent variables has a significant impact on the value of Y then our hypotheses will be H0 : β1 = β2 = β3 = β4 = β5 = 0 H1 : Not all βi are equal to 0 21 The null hypothesis can be thought of as a set of 5 linear restrictions namely β1 = 0, β2 = 0, β3 = 0, β4 = 0 and β5 = 0. To choose between these two hypotheses we use the Fk,n-k-1 Statistic and its p value which are given in the bottom LHS of the EViews output. In this example we have F5,10 = 9.283507 and p = 0.001615 This p value indicates that even for a very small level of significance such as α = 0.01 we would reject H0 and conclude that this set of independent variables does have a significant impact on the level of demand (Y). In this situation we are testing whether the combined impact of the complete set of independent variables has a significant impact on the dependent variable. We want to develop a procedure which lets us test whether the combined impact of a subset of independent variables X4 and X5 is significant. (a) When we do this we will refer to the model which contains all 5 independent variables as the Unrestricted model because in this model there are no restrictions on the values that the coefficients of the 5 independent variables can take. The Residual Sum of Square (RSS) for this model is called the URSS to indicate that it is the RSS in the unrestricted model. The value of URSS shows what the model with all 5 variables cannot explain. (b) The model we obtain when we impose the restrictions implied by the null hypothesis is called the restricted model and the Residual Sum of Square (RSS) for this model is called the RRSS to indicate that it is the RSS in the restricted model. The value of RRSS shows what the model from which variables have been excluded cannot explain. If we want to test whether the combined impact of the independent variables X4 and X5 is significant we will now use the following hypotheses H0 : β4 = β5 = 0 H1 : Either or both of β4 and β5 are not equal to 0 The model associated with this H0 is called the restricted model because in this model the values of the coefficients of X4 and X5 are restricted to a value of 0. If this model is correct then Y is affected by 3 independent variables with: Yi = β0 + β1X1i + β2X2i + β3X3i + ui When we estimate the SRF of this restricted model the sum of squared residuals RSS is now written as RRSS to indicate that it is the RSS for the Restricted model. Here the value of RRSS shows what the model with only 3 variables cannot explain. 22 The testing statistic that we use to choose between these hypotheses is Fm,n-k-1 = (RRSS - URSS) / m URSS / (n - k - 1) To see why we use this statistic we first note that because we have an extra two independent variables in the Unrestricted model we would expect that less variation will be left unexplained than in the Restricted model. This implies that URSS < RRSS and their difference (RRSS - URSS) ≥ 0 In the numerator for this F statistic we have: (RRSS – URSS)/m To see how we interpret this expression we note that 1. The term (RRSS – URSS) shows how much more variation is left unexplained when we leave out the two variables. This is equivalent to saying it shows how much more variation we can explain when we include X4 and X5. 2. The term m represents the number of restrictions we impose in H0. When we divide the total amount of variation explain by these two variables by the number of variables we obtain the average amount of variation explained by each variable. The F statistic can be thought of as the ratio of two different measures of variance 1. The numerator is said to measure explained variation because it shows the variation which is explained by the two excluded variables X4 and X5. It also measures unexplained or chance variation because it is based on a sample of values. Our choice of a sample and hence the value of the numerator can also said to be affected by chance. 2. The denominator only measures chance variance as it is simply the estimated variance of the error terms based on the Unrestricted model with all 5 independent variables. We could write the denominator in the following way n $2 σ= ∑ e2 t =1 t (n - k - 1) = 23 U RSS n-k -1 If you did not have EViews to work with then you would have to estimate the model with all 5 independent variables and note the sum of squared residuals URSS then estimate the model with only 3 independent variables and note the sum of squared residuals RRSS. You would then substitute these values into the formula for F. In this example there are m = 2 restrictions, n = 16 sample values and k = 5 independent variables in the unrestricted model. F tests for redundant variables are particularly useful when we have multicollinearity. In this case the t-statistics are not reliable. However, because the F test measures the difference in the amount of variation explained by the model with and without the variable(s) it will give a reliable test of whether the variable contributes to the model. Doing the test in Eviews The F Statistic and its p value which are given in the bottom of the EViews output are used to choose between the following hypotheses H0 : β1 = β2 = β3 …. βk = 0 H1 : Not all βi are equal to 0 In this situation we are testing whether the combined impact of the complete set of independent variables has a significant impact on the dependent variable. Suppose you wish to test whether the combined impact of X4 and X5 is significant. This is equivalent to choosing between the following hypotheses H0 : β4 = β5 = 0 H1 : Either or both of β4 and β5 are not equal to 0 This model is called the restricted model because in this model the values of the coefficients of X4 and X5 are restricted to a value of 0. The residual sum of squares (RSS) in this model is called the RRSS. If 2 variables are excluded then we say that there are m = 2 restrictions. The testing statistic that we use to choose between these hypotheses is Fm,n-k-1 = (RRSS - URSS) / m URSS / (n - k - 1) In this expression the denominator is the estimate of the variance of the error terms based on the unrestricted model with all 5 independent variables. The numerator is the average amount of variation in the Y values that each of the two excluded variables X4 and X5 explains. 24 If you did not have EViews to work with then you would have to estimate the model with all 5 independent variables and note the sum of squared residuals URSS then estimate the model with only 3 independent variables and note the sum of squared residuals RRSS. You would then substitute these values into the formula for F. In this example there are m = 2 restrictions, n = 16 sample values and k = 5 independent variables in the unrestricted model. If you do have EViews to work with you would simply estimated the unrestricted model with all 5 independent variables then click View Coefficient Tests Redundant Variables - Likelihood Ratio When the following Omitted-Redundant Variable Test dialog box appears you will enter the names of the variables whose joint impact you wish to test namely X4 and X5. After you click [OK] the EViews output shown after the dialog box on the next page now appears. This output consists of 4 parts. The final three parts are the standard output for the estimated restricted model which contains 3 independent variables. The information that we require is given at the top of the EViews output. If shows that our F statistic is 6.248127 and it has a p value of 0.017356. In our example where we wish to choose between the hypotheses H0 : β4 = β5 = 0 H1 : Either or both of β4 and β5 are not equal to 0 if we have α = 0.05 then p = 0.017356 < α and we would Reject H0. Should we use a smaller level of significance such as α = 0.01 then we will have p > α and we will accept H0 25 Information needed to test whether the joint impact of X4 and X5 is significant. Redundant Variables: X4 X5 F-statistic 6.248127 Probability 0.017356 Log likelihood ratio 12.97222 Probability 0.001524 Test Equation: LS // Dependent Variable is Y Sample: 1968 1983 Included observations: 16 Variable C X1 X2 X3 Coefficient 195.6043 6.208393 1.495421 -469.7181 Std. Error 2308.317 1.905791 0.625402 198.5697 R-squared Adjusted R-squared S.E. of regression Log likelihood Durbin-Watson stat t-Statistic 0.084739 3.257646 2.391134 -2.365507 0.601254 0.501568 859.3058 -128.4996 1.581264 Prob. 0.9339 0.0069 0.0341 0.0357 Mean dependent var S.D. dependent var Akaike info criterion F-statistic Prob(F-statistic) 7543.125 1217.152 13.72457 6.031455 0.009557 5.2 The Linear Restrictions or Wald Test When we test whether we should exclude one or more independent variables in our model we are testing linear restrictions such as β4 = 0 and β5 = 0. Our procedure can also be used to test other more complicated types of linear restrictions. We shall consider two common applications of this procedure which is also called the Wald Test. In the first application we have a cubic total cost function: TC or Yi = β0 + β1Xi + β2Xi2 + β3Xi3 + ui In this model the dependent variable is the total cost, while the independent variables are the level of output along with the squared and the cubed values of the level of output. If we differentiate the cubic total cost function, we obtain as our marginal cost function the U-Shaped quadratic function which appears in Microeconomic textbooks: MC or Yi = β1 + 2β2Xi + 3β3Xi2 + ui Past experience with cubic Total Cost functions has lead many analysts to believe that the coefficients of Xi2 and Xi3 are equal. If we wish to test whether this is the case we will now have the following hypotheses: H0 : β2 = β3 and 26 H1 : β2 ≠ β3 We once again think of the model which is associated with the null hypothesis as the restricted model. If H0 is correct and β2 = β3,then our cubic total cost function can be written in the following way: Yi = β0 + β1Xi + β2Xi2 + β3Xi3 + ui = β0 + β1Xi + β2Xi2 + β2Xi3 + ui = β0 + β1Xi + β2(Xi2 + Xi3) + ui So, if β2 = β3, our cubic function can be rewritten as a function which contains 2 rather than 3 independent variables, namely the level of output Xi and the sum of the squared and cubed levels of output (Xi2 + Xi3). This function, which contains only these two independent variables, is called the restricted model. We now estimate this model and obtain the residual variation RRSS. We then estimate the original cubic function or the unrestricted model and obtain the Residual variation URSS. These values are then substituted into our formula: Fm,n-k-1 = (RRSS - URSS) / m URSS / (n - k - 1) The number of restrictions imposed is m = 1, namely that β2 = β3 and the number of independent variables k is 3 the number in the Unrestricted model. A second common application arises when we are working with the Cobb–Douglas function. Although this function was originally used in production studies, it is now used in a wide range of finance applications particularly in studies of the demand for money. The formula for this function is: β Y = β0 X1 1 X2 β 2 ui This function is nonlinear in the values of the variables. If, however, we find the natural logs of both sides of this equation we obtain as our model: ln Y = ln β0 + β1 ln X1 + β2 ln X2 + ln ui This model is said to be linear in the logs of the values. The coefficients in this type of model now represent elasticities. For example, β1 shows the percentage change in Y when X1 changes by 1% of its current value. Depending on the type of application, you may be asked to test various different restrictions upon the values of these coefficients, such as whether they are equal in magnitude but opposite in sign. This can be expressed in two ways either that β1 = β2 or β1 + β2 = 0. 27 We can also be asked to test whether these two elasticities sum to 1. In this situation the hypotheses that we must choose between are: H0 : β1 + β2 = 1 H1 : This restriction is not justified and β1 + β2 ≠ 1 Here in H0 we have m = 1 restriction, which we can write either as β1 + β2 = 1 or as β2 = 1 - β1. Once again we say that the model which is associated with H0 is the Restricted model. We can write the log linear version of this model in the following way: ln Yi = ln β0 + β1 ln X1i + β2 ln X2i + ln ui = ln β0 + β1 ln X1i + (1 - β1) ln X2i + ln ui = ln β0 + β1 ( ln X1i - ln X2i ) + ln X2i + ln ui Subtracting ln X2i from both sides we obtain: ln Yi - ln X2i = ln β0 + β1 ( ln X1i - ln X2i ) + ln ui The difference between any two logs is equal to the log of their ratio. This means that the expression on the LHS can be written as ln Yi - ln X2i or as ln ( Yi / X2i ) Similarly the expression on the RHS can be written as ln X1i - ln X2i or as ln (X1i / X2i ) This means that when we impose the restriction that β1 + β2 = 1 our model can be written as a simple linear model in which the dependent variable is ln (Yi / X2i) and the independent variable is ln (X1i / X2i) with: ln ( Yi / X2i ) = ln β0 + β1 ( ln X1i / ln X2i ) + ln ui To test this restriction we estimate the original Unrestricted log linear model with 2 independent variables and obtain the residual variation URSS. We then estimate this Restricted model with just the 1 independent variable and obtain the residual variation RRSS. The number of restrictions is m = 1 and k = 2 is just the number of independent variables in the Unrestricted model. These values are now used in our general formula for F. 28 Doing the test in Eviews When discussing this procedure we will use Example 7.3 on pp 214-217 of Gujarati (3e). Here we wish to model the level of real GNP (Y). The independent variables that we use in our production function are X1 = Labour Input X2 = Capital Input The yearly data from 1958 to 1972 is stored in the file GUJT73.XLS in the following order Y, X1, X2 The model we use is called the Cobb-Douglas function which is a power function with β Y = β0 X1 1 X2 β 2 (Here to simplify our treatment we omit the error term.) If there are constant returns to scale in an economy the powers or exponents of the inputs X1 and X2 should sum to 1. When we test whether there are constant returns to scale we have the following hypotheses H0 : β1 + β2 = 1 H1 : This restriction is not justified While the Cobb-Douglas function is a nonlinear function if we take the natural logs of both sides of this model we obtain a model which is linear in the logs of the values. ln Y = ln β0 + β1 ln X1 + β2 ln X2 Before we estimate this model in EViews we must first obtain the sets of log values for all 3 variables. In the case of the Y variable whose log we call LY we click [Genr] then enter LY = LOG(Y) We do the same for the logs of X1 and X2 which we call LX1 and LX2. We now estimate the linear relationship between the logs of the values of the variables and obtain as our SRF $ ln Y = -3.338455 + 1.498767 ln X1 + 0.489858 ln X2 or $ Y = 0.035492 X11.498767 X20.489858 If we did not have EViews to work with we would call the sum of squared residuals from this model URSS as in this model the values of the coefficients are unrestricted. 29 We would then estimate a second model in which the two coefficients were restricted in the sense that they were assumed to sum to 1 so that β1 + β2 = 1 and β2 = 1 - β1 The sum of squared residuals from this restricted model is called RRSS. To test this hypothesis of constant returns to scale with EViews after we estimate the equation we now click View → Coefficient Tests → Wald - Coefficient Restrictions In the Wald Test dialog box shown below we enter the restriction that we wish to test. In EViews the coefficients are not written as β0, β1, β2, etc. Instead we write the estimated intercept or constant as C(1), the coefficient of X1 as C(2) and the coefficient of X2 as C(3). In this case we write the constraint as C(2) + C(3) = 1. When we click [OK] we obtain the following output Wald Test: Equation: Untitled Test Statistic F-statistic Chi-square Value 4.344966 4.344966 df Probability (1, 12) 1 0.0592 0.0371 Value Std. Err. 0.988625 0.474284 Null Hypothesis Summary: Normalized Restriction (= 0) -1 + C(2) + C(3) Restrictions are linear in coefficients. 30 From the EViews output we see that the F statistic has a value of 4.344966 with a p value of 0.059154. This means that if H0 is correct and β1 + β2 = 1 then the probability of obtaining a sample in which the estimated coefficients are 1.498767 and 0.489858 is 0.059154. If we are using α = 0.05 then we would accept H0 because we have p > α. When α is set at 0.10 however we would now reject H0 as we now have p < α. 5.3 The Chow Test Another important application of this procedure is the Chow test. This is used to test the structural stability of a model. We use this test when our sample can be divided into meaningful components such as the values before and after financial deregulation or the returns from retail and manufacturing firms. We will use the simple linear regression model to explain how this test is used. Suppose we are working with time series data and the sample of n observations can be divided into the n1 observations before some event and the n2 observations after this event. For example we might be using the n = 20 yearly returns from 1976 to 1995 to estimate the market model for a firm. There are n1 = 11 returns for the period before, and n2 = 9 returns for the period since, the crash of ‘87. One analyst thinks that the market model for this firm has not changed since the crash. We write the model for the complete 20 year period as: Yi = β0 + βi Xi + ui i = 1 ... 20 We call it the restricted model because the coefficient values are restricted in the sense that they are the same in the two sub-periods before and after the crash. A second analyst thinks that the market model for this firm changed after the Crash. His model is written in the following way. For the first n1 = 11 years it is Yi = a0 + a1 Xi + ui i = 1 ... 11 and for the final n2 = 9 years it will be Yi = b0 + b1 Xi + ui i = 12 ... 20 This is called the Unrestricted model because the intercept and slope are free to take different values a0, b0, a1 and b1 before and after the crash. The hypotheses that we choose between are: 31 H0 : a0 = b0 and a1 = b1 H1 : Either the intercepts, the slopes or both are not equal. To obtain the URSS we use the Unrestricted model. Here we estimate the separate regression lines for each sub-period and note the two residual variations RSS1 and RSS2 for these two equations. Our URSS is just the sum of these separate residual sums of squares i.e. URSS = RSS1 + RSS2 The Restricted model is the one in which we say that the intercept and slope are equal in both periods. To obtain the RRSS we simply estimate the regression line using all n = 20 observations, and note the sum of squared residuals which are equal to the RRSS. The number of constraints in this situation is equal to the number of coefficients that we need to be equal for the model to be the same in both periods: m = k+1 = 2 In the denominator the degrees of freedom are now given by the following formula: df2 = n - 2(k + 1) whereas in our original formula they were equal to n - (k + 1). This change is necessary because in the Unrestricted model we are now estimating 2(k + 1) parameter values for two different equations. When we are using the Chow test our formula for the F statistic changes from Fm,n-(k+1) = Fk+1,n-2(k+1) = (RRSS - URSS) / m URSS / (n - k - 1) to (RRSS - URSS) / (k +1) URSS / (n - 2(k +1)) In order to carry out this test the model is fitted to both sub-periods separately. This means that we must have enough data in each period to be able to estimate the model reasonably accurately. If this is not the case, we can use the Chow Forecast test that will be discussed in Model 6. You should not try doing the test at every possible date as this will mean you no longer know what you level of significance is. Rather, pick one or two dates of events that you think may cause the coefficient values to change, or divide the data set into two or three equal parts. 32 Doing the test in Eviews When discussing this procedure we will use the data in the file DHSY.XLS which contains quarterly data on gross consumption, income, measured by GDP, and a price index for the UK economy from 1957 to 1975. First we estimate the equation Consumptiont = β 0 + β 1 Incomet + μ t over the whole period. Then we click View → Stability tests → Chow Breakpoint test. In the dialogue box which follows we enter the date of the start of the second period. In this case we will use the first quarter of 1971, which we write as 1971Q1 or 1971:1 We then get the following output Chow Breakpoint Test: 1971Q1 F-statistic Log likelihood ratio 3.870280 7.760536 Probability Probability So our test is written as H0: The model does not change over the period H1: The model is different after 1971 Level of Significance: α = 0.05 Test Statistic: Breakpoint Chow Test p-value: p = 0.025323 Conclusion: Reject H0. The model is different after 1971. 33 0.025323 0.020645 6. DUMMY VARIABLES 6.1 Introduction In many situations the variable whose behaviour we wish to model is influenced by factors that are difficult to quantify. An obvious example is where we wish to model the impact of some major change such as an alteration in the way the government influences the interest rate. We call studies of the impact of these changes event studies. When we are attempting to model 'retail sales' we know that they will be higher during the quarter in which the major religious festival occurs. If we want to include factors such as the time of year, the sex of the buyer or the socio-economic group a person belongs to in our model, we must first create appropriate dummy variables. If we are looking at quarterly retail sales in economies which celebrate Christmas the dummy variable we would construct would consist of a column of values in which we have a 1 in every December quarter and a 0 in each of the other three quarters. For economies which celebrate Chinese New Year the dummy variable will have a 1 in the first quarter and 0’s in the other three quarters. When the sex of the buyer is a relevant factor then we will have a column of values in which we have a 1 for female buyers and 0 for male buyers. When we wish to model the impact of the time of the year, if we have quarterly data there are four different dummy variables we could have: ⎧1 March quarter ⎪ Q1 = ⎨ ⎪ 0 otherwise ⎩ ⎧1 June quarter ⎪ Q2 = ⎨ ⎪ 0 otherwise ⎩ ⎧1 September quarter ⎪ Q3 = ⎨ ⎪0 otherwise ⎩ ⎧1 December quarter ⎪ Q4 = ⎨ ⎪ 0 otherwise ⎩ In our model, however, we never include all four dummy variables. If we did we would have a problem with exact multicollinearity as the sum of all four quarterly dummy variables is a column with 1’s in every position. When we estimate a model which contains a constant or intercept term the column of values for the intercept is just a column of 1’s. In this situation we would be unable to obtain any OLS estimates of the parameters. This is why we can at most use 3 out of 4 quarterly dummy variables or 11 out of 12 monthly dummy variables. (Using all four dummy variables for the four quarters is called falling into the dummy variable trap.) 34 6.2 Creating Dummy Variables and Time Trends in Eviews We can also use EViews to generate both Dummy Variables and Time Trends. To see how this is done suppose we have set a workfile in which we have entered quarterly data for the period from 1978:1 to 1994:4 i.e. from the first quarter of 1978 to the last quarter of 1994. To generate a dummy variable D3 with 1's in the third quarter of each year and 0's in other quarters we click the [Genr] button then enter the equation D3 = @SEAS(3) If we are working with the monthly data that we have used to estimate the Market Model then when testing for the ‘January Effect’ we might want to use a Dummy Variable DJAN which has a value of 1 in January i.e. month 1 of each year and 0’s elsewhere. To generate this variable we click the [Genr] button and then enter DJAN = @SEAS(1) If we want a dummy variable D87 to model the impact of the October ’87 Crash with zeros before October of 1987 and 1’s from October 1987 onwards, we click [Genr] and enter D87 = 0 The column D87 contains 0’s in every month. To make these one after October 1987 we click [Genr] change the sample from 1984:01 1993:12 to 1987:10 1993:12 The column D87 will now contain values of 0 up to in October 1987 and 1 values afterwards. If we want a dummy variable D87 to model the impact of the October ’87 Crash with a 1 in October of 1987 and 0’s elsewhere we click [Genr] and enter DE87 = 0 The column DE87 contains 0’s in every month. To place a 1 in October 1987 we could change the sample as we did before to 1987:10 1987:10 and enter DE87 = 1. I find it easier to open the series, click on Edit+/-, scroll through to October 1987 and type in a 1. Make sure you hit enter, then click on Edit+/- again. The column DE87 will now contain a value of 1 in October 1987 and 0 values elsewhere. 35 These variables that we have generated are examples of what are called additive dummy variables. The estimated coefficients of additive dummy variables show the changes in the intercept when the quality which the dummy variable represents is present. A multiplicative dummy variable is equal to the product of the additive dummy variable and the relevant independent variable. The coefficient of the multiplicative dummy variable shows the impact on the slope when the quality which the dummy variable represents is present. To obtain the multiplicative dummy MD87 associated with the October ’87 crash and the independent variable in the Market Model AGSMV we click [Genr] and enter MD87 = D87*AGSMV A Trend Variable is one which contains the values 1, 2, 3, …. n associated with the periods in the sample. Suppose we want to obtain a trend variable TR which consists of the set of values 1, 2, 3, …. 15 for the 15 year period from 1958 to 1972. We would click [Genr] and then enter TR = @TREND(1957) You will note that in the TREND function we have the previous year 1957 rather than the year 1958 in which our data starts. The reason we do that is that EViews gives the year we enter a value of 0. If we want 1958 to have a value of 1 then we must use 1957 as the starting date. 6.3 Interpreting the Coefficients of Dummy Variables Consider the problem faced by a firm that manufactures a product whose sales rise significantly in the December quarter. The accountant thinks sales depend on income and this December or Christmas effect. The PRF which she plans to use when forecasting sales (Y) has two independent variables namely income (X) and what we call an additive dummy variable (D1) We write this PRF in the following way: E(Y| Xi) = β0 + β1 Xi + β2 D1i + ui This dummy variable is our dummy variable for the fourth quarter or Q4. When she uses this model she is really saying that we need two different models to explain the sales of the product. In the December quarter when Q4 or D1 = 1, the model we use is E(Y| Xi) = β0 + β1 Xi + β2 (1) = (β0 + β2) + β1 Xi In the other quarters D1 = 0, so our model will be: 36 E(Y| Xi) = β0 + β1 Xi + β2 (0) = β0 + β1 Xi From these two models we see that including an additive dummy variable changes the intercept in the relevant quarter by an amount equal to the coefficient of the dummy variable. The additive dummy variable in this model could be used if we think that sales increase by β2 in the December quarter regardless of the level of income. An alternative approach is to think of Christmas as having an effect on β1, the slope or coefficient of income, rather than on the intercept β0. That is, we think that Christmas affects the proportion of income spent rather than the amount of income spent. To model this type of effect we define a variable (D1i Xi) which is the product of our dummy variable and the independent or income variable. We call this a multiplicative dummy variable and we write our model as: E(Y| Xi) = β0 + β1 Xi + β2 (D1i Xi) During the December quarter we have D1i = 1 so that we now write our model as: E(Y| Xi) = β0 + β1 Xi + β2 (1 Xi) = β0 + (β1 + β2) Xi In the other quarters D1i = 0 and now our model will be: E(Y| Xi) = β0 + β1 Xi + β2 (0 Xi) = β0 + β1 Xi From this we see that the coefficient of the multiplicative dummy (D1i Xi) shows the impact of the December quarter on the slope β1, that is, on the proportion of income we spend. If you think that Christmas affects spending in both ways (that is, it increases the basic level of sales β0 and the proportion of income spent β1) then we would include an additive and a multiplicative dummy variable, in which case our model is: E(Y| Xi) = β0 + β1 Xi + β2 D1i + β3 (D1i Xi) In this model the coefficient of the additive dummy β2 would show the impact on the intercept or general level of spending and the coefficient of the multiplicative dummy 37 β3 would show the impact on the slope or proportion of income spent in the December quarter. 6.4 Comparing Regression Models Earlier, we saw that the Chow test could be used to determine whether there is a stable regression model for the whole sample. We can also use dummy variables to carry out this same procedure more effectively. Suppose we have a sample of n quarterly observations on the sales of car air conditioners. This sample can be divided into two groups. The first n1 values are the sales before the government banned the use of harmful gases, and the second n2 values are the sales after the ban. To represent the impact of the government legislation we use the following dummy variable: D2 ⎧ 0 for the n1 values before the ban ⎪ =⎨ ⎪1 for the n 2 values after the ban ⎩ In our standard model of car air conditioner sales we use 'per capita income' as our X or independent variable. If we want to test whether the government legislation has changed the nature of our model, E(Y| Xi) = β0 + β1 Xi We now estimate the model which contains both types of dummy variable: E(Y| Xi) = β0 + β1 Xi + β2 D2i + β3 (D2i Xi) When we used the Chow test we were able to choose between the following hypotheses: H0 : α0 = β0 and α1 = β1 H1 : Either the intercepts, the slopes or both are not equal. If our sample evidence supported H1, this told us that the parameter values were different but this test did not make it possible to determine just where the differences lay. If we use the model with the additive and multiplicative dummy variables we can now determine where the differences lie by performing three different tests. TEST 1 Suppose we want to test whether the model is different because the intercept has changed; that is, because the impact of other variables on sales is different. We estimate the SRF: 38 $ Yi $ $ $ $ = β 0 + β 1 Xi + β 2 D2i + β 3 (D2i Xi) $ and use the 't' statistic for β 2 to choose between: H0 : β2 = 0 and H1 : β2 ≠ 0 If we choose H1 : β2 ≠ 0 this implies that the legislation has changed the intercept from β0 to β0 + β2. TEST 2 To test whether the model is different because the slope has changed; that is, $ because the impact of income on sales is different, we use the t statistic for β 3 to choose between: H0 : β3 = 0 and H1 : β3 ≠ 0 If we choose H1 : β3 ≠ 0 this implies that the legislation has changed the slope from β1 to β1 + β3. We can if we want to, test whether there is a specific change in the size of a parameter. For example, to test whether the change in the intercept is less than 2, we would use the following hypotheses: H0 : β2 = 2 and H1 : β2 < 2 When choosing between these hypotheses the ‘t’ statistic we use is: tn-k-1 = $ β2 - β2 $ σ n / ∑ xi2 i=1 = $ β2 - 2 $ σ/ n ∑ xi2 i=1 If we choose H0 : β2 = 2 this implies that the legislation has increased the intercept by 2. TEST 3 The third test we can perform is similar to the Chow test. If we wish to determine whether the model has changed because of changes in both the intercept and the slope, we now use the F test for linear restrictions. Our restricted model when we use the Chow test or dummy variables is the model in which the parameter values are the same over the whole sample. The unrestricted model is the model which contains both types of dummy variable: 39 E(Y| Xi) = β0 + β1 Xi + β2 D2i + β3 (D2i Xi) In this model there are k = 3 independent variables Xi, D2i and (D2i Xi). Our restricted model is the model in which the coefficients β2 and β3 for the two types of dummy variables D2i and (D2i Xi) are both set to 0 so: E(Y| Xi) = β0 + β1 Xi and hence there are m = 2 restrictions. We obtain the SRFs for both models and use them to obtain the RRSS and the URSS. These values are used in our formula or : Fm,n-(k+1) = (RRSS - URSS) / m URSS / (n - k - 1) where in this example we will have: F2,n-4 = (RRSS - URSS) / 2 URSS / (n - 4) 6.5 An Event Study In the previous section we have looked at using dummy variables to examine the stability of a model. In this section we apply these techniques to examine the effect of the World Trade Center disaster on the market model for Boeing, the aircraft manufacturing company. In addition we look at a type of dummy variable, called an event dummy, which can be used to measure the effect of a single data point. For this study I have used daily closing prices sourced from Datastream. The data starts on the 1st June 2001 and ends on the 23rd November 2001. The S&P 500 index has been used to proxy for the market. .10 .05 RBOEING .00 -.05 -.10 -.15 -.20 -.08 -.04 .00 RMKT 40 .04 You should notice one very large negative return in the bottom left hand corner of the graph. This occurred on the day the New York Stock Exchange re-opened following the disaster. A further examination of the data set shows zero returns from the 11th September to the 14th September inclusive. This is the period the exchange was closed. I have therefore estimated the model without these zero return days. The Eviews output is given below. Dependent Variable: RBOEING Method: Least Squares Sample(adjusted): 6/01/2001 9/11/2001 9/15/2001 11/23/2001 Included observations: 124 after adjusting endpoints Variable Coefficient Std. Error t-Statistic Prob. C RMKT -0.003381 1.697307 0.002220 0.180181 -1.523200 9.419988 0.1303 0.0000 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.421077 0.416332 0.024678 0.074301 284.0861 1.834838 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) -0.004582 0.032302 -4.549776 -4.504288 88.73618 0.000000 An examination of this output shows the following points: • The model explains a significant amount of variation, as evidenced by the p-value of the F-statistic, • The R2 is approximately 42%, which is quite good for this type of model, • The Durbin-Watson statistic shows no significant 1st order correlation and • The mean return for Boeing over the period was negative. If we carry out a t-test on the slope we see it is significantly greater than one. Now re-estimate the model with dummy variables. The first, DA, contains zeros up until the 14th September and ones from the 15th September onwards. The other, DDAY, contains zeros everywhere except the 15th September, where it has a one. I have also included a multiplicative dummy variable for DA. I have not included a multiplicative dummy for DDAY as this would give exact multicollinearity. 41 The output is given below. Dependent Variable: RBOEING Method: Least Squares Sample(adjusted): 6/01/2001 9/11/2001 9/15/2001 11/23/2001 Included observations: 124 after adjusting endpoints Variable Coefficient Std. Error t-Statistic Prob. C RMKT DA DA*RMKT DDAY -0.003432 0.878987 -0.000268 0.924153 -0.099192 0.002653 0.258876 0.004105 0.352788 0.025621 -1.293637 3.395404 -0.065342 2.619571 -3.871479 0.1983 0.0009 0.9480 0.0100 0.0002 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.546852 0.531620 0.022107 0.058159 299.2731 1.949414 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) -0.004582 0.032302 -4.746340 -4.632619 35.90186 0.000000 As before the model explains a significant amount of variation and the adjusted R2 has improved. Let us now consider each of the dummy variables. The additive dummy variable, DA, tells us the change in the intercept. The p-value of 0.95 tells us this change is not significantly different from zero, that is it could easily be caused by sampling errors. As the intercept in the market model measures the risk free rate of return, this study has not detected any change in the risk free rate after the disaster. The multiplicative dummy tells us the change in the slope. The p-value of 0.0100 tells us this change is statistically significant. The beta, or the systematic risk of Boeing, has increased dramatically from 0.88 before the disaster to 1.80 afterwards. The event dummy, DDAY, tells us if there was an immediate exceptional one day return, in addition to the change in the systematic risk captured by the multiplicative dummy variable. It is obvious from the scatter plot we looked at earlier that the return for Boeing on that day was exceptional, but so was the return on the market as a whole on the day the market re-opened. The coefficient of this dummy variable measures how well, or in this case badly, Boeing does after the effects of the general downturn have been removed. In this case the p-value shows this is significantly different from zero. On the day the market re-opened Boeing had an abnormal negative return compared to the market as a whole of about 10%. A second, less rigorous, approach that is often used is to calculate the cumulative abnormal returns, CARs. First we estimate a market model, like the first in this section, without any dummy variables. The estimated beta tells us the expected stock return given a market return on any given day. The difference between the 42 expected and the actual return is the abnormal return. An easy way to estimate these abnormal returns is to take the residuals from the previously estimated model. Sometimes even this level of sophistication is not used and the abnormal return is just taken to be the return on the share minus the market return. If we add these abnormal returns up over time we get the CARs, which can be plotted and our interest lies in finding any informative patterns. Normally, we would expect the CARs to fluctuate randomly about the value zero. An increasing CAR suggests the stock is outperforming the market and vice versa. I have done this for the first model I estimated in this section and produced the graph given below. .20 .15 .10 .05 .00 -.05 -.10 -.15 01:06 01:07 01:08 01:09 01:10 01:11 CAR We can see the CARs follow a rising pattern through July and August, then fall rapidly in September. This suggests that market did not anticipate the events of September, which we know to be true. By then end of September the CAR seem to have settled down into fairly steady behaviour suggesting the impact of the attack was quickly priced into Boeing shares and its returns have been consistent with the market model since that time. Thus, the effect of the September events can be clearly seen using these CARs although a disadvantage of this method is that it does not allow for the same sort of test procedures we were able to apply with the dummy variables. 43 7. FINANCIAL APPLICATIONS In this section we will look at some further applications of Financial Econometrics. In most of these applications we will use the Multiple Regression model. The first type of application that we will look at is set of the procedures used to test the CAPM. We will look at the general theory behind these tests and two of the basic techniques that have been used to test this theory. We will then look at other variables which have been included in regression models which seem to give better forecasts of returns than the CAPM does. While using these other variables such as size and the price to book ratio gives superior forecasts it has not been possible to find a theoretical justification for using these variables. The Arbitrage Pricing Theory (APT) model is a multifactor model which is based on simpler assumptions than is the CAPM namely that there are no arbitrage opportunities available in financial markets. In other words assets or portfolios of assets with the same level of risk must have the same level of returns. We will look at this model and one of the procedures which is used to estimate and test the APT model. Our final application of the Multiple Regression model looks at how we can use the Treynor-Mazuy (TM) model to examine the market-timing and the stock-selection performance of Portfolio managers. 7.1 Testing the CAPM If we assume that investors are only concerned with Risk and Expected Returns and the risk associated with any asset can be measured by the standard deviation of the returns from that asset then we can use mean-variance analysis to select portfolios of assets which have the maximum returns for a given level of risk. If we also assume that the financial markets are frictionless or very competitive then it can be shown that for any set of assets the optimal combinations of risk and returns for different portfolios of these assets are shown on the efficient frontier. If we make a further assumption that there is some riskfree asset for which the rate of return is the riskfree rate of return Rf then it can now be shown that the optimal combinations of risk and returns for different portfolios of these assets are shown on the linear function which intercepts the vertical axis at the riskfree rate of return Rf and which just touches or is tangential to the efficient frontier. In the capital asset pricing model (CAPM) it is also assumed that investors have the same beliefs about the risks and returns for these assets. If the tangency portfolio is the portfolio 44 associated with the point at which this linear function touches the efficient frontier then the CAPM says that this tangency portfolio is the Market Portfolio i.e. the portfolio in which the asset shares for the portfolio are the same as the asset shares for the whole market. In the CAPM this linear function is called the Capital Market Line (CML) and we write it in the following way E(RP) = β0 + β1SD(RP) = Rf + [E(R M ) - Rf SD(RM ) ] SD(R ) P You should note that the terms in this expression can be interpreted as follows 1. The dependent variable is the Expected or Equilibrium returns on the portfolio E(RP). 2. The independent variable is the measure of total risk, the standard deviation of portfolio returns SD(RP). The P subscript indicates that this is a measure for any Portfolio. 3. The [E(RM) - Rf] term is the market risk premium. The M subscript indicates that this is a measure for the Market Portfolio. The parameters in this model are as follows: 4. The intercept or β0 term is equal to the Risk-free rate of return Rf. 5. The slope or β2 term shows the impact on Equilibrium returns on the portfolio E(RP) of a unit change in the standard deviation of portfolio returns SD(RP). This slope also can be interpreted as the Market Price of Risk as it shows the amount of Market Risk Premium for each unit of Market risk. When we use the returns for the 14 large Australian firms in the file BFODAT then the efficient frontier and capital market line when Rf = 0.11 or 11% is as shown below. 45 The CML and Efficient frontier for the firms in BFODAT The CML shows the theoretical relationship between the risk and the returns for a portfolio. At the tangency point it is possible to show that the returns on each asset are a linear function of the systematic or market risk for that asset. This linear model is called the security market line (SML) and we write it in the following way E(Ri) = Rf + [E(RM) - Rf] βi You should note that the terms in this expression can be interpreted as follows 1. The dependent variable is the Expected or Equilibrium returns on that asset E(Ri). 2. The independent variable is the measure of systematic risk we call beta βi for this asset and the market portfolio. 3. The intercept or β0 term is equal to the Risk-free rate of return Rf. 4. The slope or β2 term is [E(RM) - Rf] or the market risk premium. This now shows the impact on Equilibrium returns on the asset E(Ri) of a unit change in the systematic risk. It is obvious that the assumptions upon which the CAPM are based do not completely agree with what happens in actual financial markets. We need to note that 1. Investors do not look at just risk and returns and they do not all use the standard deviation of returns as the only measure of risk. 2. Markets do have some imperfections i.e they are not frictionless. 3. Not all investors have the same expectations about future risks and returns for all assets. 46 While these assumptions do not always hold the results provided by the CAPM may be accurate enough to use in various applications. This is why we need to test whether the predictions provided by the CAPM are accurate enough for investors who need to model the returns on assets. Before we look at two of the basic procedures that are used to test the CAPM we need to briefly consider the critique of these testing procedures which was put forward by Roll. In a 1977 paper Roll argued that it was not possible to test the CAPM. The underlying reason why we could not do so was because we do not know what the returns on the Market Portfolio are actually equal to. The best that we can do is to find the returns on some proxy for the market such as the All Ordinaries Index in Australia, the Hang Seng Index in Hong Kong or the S & P 500 Index in the US. The argument that tests which are based on these proxies can lead to incorrect conclusions can be summarized in the following way. 1. For any set of assets it is always the case that there must exist some portfolio for which the expected returns is a linear functions of beta (ie. a measures of systematic risk with respect to this portfolio. The portfolio must be mean-variance efficient or lie on the efficient frontier. 2. Even if the CAPM is incorrect, if the proxy portfolio we have chosen lies on the efficient frontier we will find that the expected returns are linear functions of the betas. Here the test will incorrectly support the CAPM. 3. If the CAPM is correct but we happen to choose a proxy portfolio which does not lie on the efficient frontier then the expected returns will not be linear functions of the betas. Now the test will incorrectly reject the CAPM. 7.1.1 The Fama-MacBeth Approach To explain how the Fama-MacBeth or two-pass approach to testing the CAPM works we shall use the following example. Consider the set of returns for 14 large firms that are stored in the file BFORET.XLS in columns C to P in rows 2 to 121. We want to use the AGSM Value weighted index as the proxy for the Market portfolio. The returns on this index are stored in column R. The procedure that we use is as follows 1. At the first step or pass we calculate the systematic risk or betas for all 14 firms with respect the AGSM Value weighted index. We do this by estimating the market model Rit = β0 + β1RMt for all 14 firms and noting the estimated values of the slopes. While it is usually a good idea to use EViews to perform most regression calculations in this case it is easier to use Excel. To find all the betas we go to cell C126 and enter 47 =LINEST(C2:C121,$R2:$R121) We now highlight C126 and drag it to cell P126. 2. At the second step or pass we estimate the relationship between the expected or average returns for all firms and their betas or measures of systematic risk. While we could estimate the simple linear regression model $ R j = γ0 + γ1 β j + u j usually we estimate a multiple regression model in which beta, beta squared and some other key factors or characteristics (CHAR) are included as independent variables. We write this model in the following way $ $2 R j = γ0 + γ1 β j + γ2 β j + γ3 CHAR + uj The CHAR variable might be the variance of the error terms in the market models for the different firms. This variance can be seen as a measure of the nonsystematic variance or risk that is present. With this model we can conduct several tests a. If γ3 is significantly different from 0 this means that nonsystematic risk has a significant impact on expected returns. b. If γ2 is significantly different from 0 this means that there is a nonlinear relationship between the expected returns and systematic risk or betas. c. Once we establish that γ3 and γ2 are not significantly different from 0 then we can test whether γ1 is different from 0. If it is, then we can conclude that there is a significant linear relationship between expected returns and the measure of systematic risk that we call beta. 3. To find the average returns for each firm R j , the squared beta values and the variances of the error terms for each market model we use the following Excel procedures. a. To find all 14 average returns go to cell C125 and enter =AVERAGE(C2:C121) We then click this cell and drag it to P125. b. To find the squares of the beta values go to cell C127 and enter =C126^2 We then click this cell and drag it to P127. c. To find the variances of the error terms in all 14 Market Models we use the STEYX function to find their standard deviation and then we square this 48 value. When we use the STEYX function we enter the Y values and then the X values. To do this go to cell C128 and enter the following =STEYX(C2:C121,$R2:$R121)^2 We then click this cell and drag it to P128. Once you have generated these values you can estimate the multiple regression by reading these values into EViews and using Eviews to estimate the model. USING EVIEWS To estimate the multiple regression model using Eviews we use the following procedure 1. We open a new workfile by clicking File → New → Workfile When the Workfile Create dialog box shown below appears we note that the values for the 14 firms are Unstructured/Undated and that there are 14 of them 4. To copy data into the Eviews workfile from the Excel file on the a: drive click Procs Import Read Text-Lotus-Excel When the Open dialog box appears choose a:drive and then the file BFORET. Now click the [Open] button. 5. When the Excel Spreadsheet Import dialog box shown below appears we now a. Click By Series - series in rows b. Enter the first cell which contains the values of the variables which is C125. You can if you wish specify the worksheet in the workbook which contains the data. 49 c. Enter the names of the variables in the same order that they appear in the Excel worksheet namely RET, BETA, BETA2 and VAR. Now click [OK]. Once you have imported the data from the Excel worksheet into the Eviews workfile you will estimate the equation in the usual way. The output that you obtain is shown on the following page. In this case the values have been rounded to 4 decimal places. Dependent Variable: RET Method: Least Squares Sample: 1 14 Included observations: 14 Variable Coefficient C 0.0201 BETA -0.0151 BETA2 0.0116 VAR -0.7614 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.4746 0.3170 0.0053 0.0003 56.0465 2.5091 Std. Error 0.0062 0.0140 0.0070 0.2886 t-Statistic 3.2638 -1.0765 1.6450 -2.6380 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) Prob. 0.0085 0.3070 0.1310 0.0248 0.0130 0.0063 -7.4352 -7.2526 3.0110 0.0811 7.1.2 The Black-Jensen-Scholes Approach Another approach to testing the CAPM was developed by Black, Jensen and Scholes. With this approach we estimate the following model Rjt - Rf $ = α0 + β j(RMt - Rf)+ ujt 50 Here the dependent variable and the independent variable are the excess returns on the asset itself (Rjt - Rf) and on the Market portfolio (RMt - Rf). To generate these excess returns variables we subtract the returns from the riskfree asset from the original returns. In the file BFORET these returns from the riskfree asset are stored in column Q. Once we have estimated this model we then test whether the value of α0 is consistent with the value implied by the CAPM. Since the Security market Line is E(Rj) = Rf + βi[E(RM) - Rf] if we subtract Rf from both sides we have E(Rj) - Rf = [E(RM) - Rf] βi This implies that in our model we should have α0 = 0. Most researchers have used this approach to see whether firms with some particular characteristic have returns which are above those implied by the CAPM. For example to test whether firm size affects returns instead of using the returns for just 1 share we would use the returns from a portfolio of 100 shares of small firms. If the CAPM accurately describes returns the size of the firms is irrelevant and we should have α0 = 0. If on the other hand there is a size affect with smaller firms giving higher returns than the CAPM predicts we will find that α0 will have a positive value. 7.2 The Predictability of Share Returns One way of testing the validity of the CAPM is to estimate multiple regression models in which returns are the dependent variable and there are other independent variables besides the measure of systematic risk or beta. If the CAPM is correct the impact of these other variables should be negligible. If they are found to have a significant impact then this can be interpreted as evidence that the CAPM does not provide a general explanation of how returns on assets are determined. Studies have shown that including the following variables as independent variables seems to give better forecasts but it should be noted that it has not been possible to develop theoretical models which explain why these variables affect returns in these ways. 1. When the Market size or capitalization is used as an independent variable it is often found that it has a coefficient with a significant negative value i.e. smaller firms have higher returns. This relationship cannot be explained by the fact that smaller firms have larger betas. 2. If we use the ratio of the market price (P) and the book value (B) for each share we may find that the coefficient of the ratio (P/B) also has a significant negative value i.e. firms with a low price to book ratio tend to have higher returns. It is possible 51 that this variable has a significant impact when beta does not for the following reason. When there is a large fall in the price of a firm and its debt levels are stable investors see the firm as being more highly leveraged and so more risky. The OLS procedure for estimating beta gives an equal weight to more distant and more recent values of returns and so may give a beta value which is too low. The (P/B) ratio may have a significant impact because it is measuring the systematic risk more accurately than beta is. 3. A third variable is the earnings rate (i.e. the ratio of the earnings and price (E/P)) or its inverse the price earnings ratio (P/E). If the (P/E) ratio is used as an independent variable we now find that the coefficient of the (P/E) ratio also has a significant negative value i.e. firms with a low price to earnings ratio tend to have higher returns. Some studies have used the Cash Flow (CF) rather than the earnings and looked at the impact of the (CF/P) ratio. 4. A fourth variable is the momentum which is a measure of how well a stock has performed in some recent period such as the last 6 months. This coefficient is positive indicating that stocks which have performed well in the last 6 months will perform well in the next 6 months. The different variables namely the size, the E/P or CF/P ratios and the P/B ratio are also related to each other as all are affected by the share price P. Also values of these ratios are highly correlated over time. If we attempt to include all of these variables in the one model we may have a problem with Multicollinearity. This means that our estimates of the coefficients for these variables will be quite inaccurate as the OLS procedure is unable to accurately estimate the impact of individual variables. The recent studies show that the most reliable forecasts are obtained when we use size and the P/B ratio as the independent variables. An important feature of these studies is it was often found that the results change when different time periods are used. There are 2 key results which should be noted. 1. In most studies they found that there was a very strong relationship between returns and these variables in the month of January but little or no relationship in other months. 2. We can also use long term contrarian strategies to make excess returns as shares that have achieved poor returns in the last 3 or 5 years will often achieve high returns in the next 3 or 5 years. This means that long term performance or momentum has a negative relationship with future performance while short term momentum has a positive relationship with future performance. 7.3 Using the APT Model 7.3.1 Introduction 52 In the CAPM, returns are affected by a single factor namely the measure of systematic risk. Many studies showed that when we look at the returns from different types of firms the forecasts from the CAPM consistently underestimate the returns from small firms or firms with a low market-to-book ratio. The Arbitrage Pricing Theory model can be thought of as an attempt to construct a sound theoretical model which takes into account further sources of risk over and above market risk which make it possible to explain these so called anomalies i.e. why small firms and firms with low market-to-book ratios have higher returns than the CAPM says they should have. The APT model also uses fewer assumptions than does the CAPM. The 3 key assumptions are as follows. 1. As in any multifactor model the APT model assumes that the returns on any asset (Ri) are dependent upon a limited number of factors. If there are K factors then for asset i we can write this relationship in the following way Ri = αi + βi1F1 + βi2F2 + …. + βiKFK + ui In this model the returns depend upon a number of factors F1, F2, …. , FK. These factors are usually interpreted as proxies for unexpected changes in macroeconomic variables such as GDP, inflation, interest rates, oil prices and the volatility of share prices. The values of these factors are usually scaled in such a way that they all have 0 expected values. This means that the expected returns E(Ri) should be the intercept term αi. For each asset i the impact of each factor j or sensitivity to that factor is βij. The error term ui represents the risk which is specific to this asset. 2. The second assumption is that in financial markets there are no arbitrage opportunities. This means that if any assets or portfolios of assets have the same risks they should also have the same returns as each other. 3. The third assumption is that there are a very large number of financial assets and we can use these to construct portfolios for which the portfolio risk is not affected by the risk which is specific to each asset. Using these 3 assumptions it is possible to show that the expected return on the financial asset can be written in the following way E(Ri) = Rf + βi1λ1 + βi2λ2 + …. + βiKλK The λj terms are the risk premiums for the separate factors. The equivalent term in the SML is the single market risk premium [E(Ri) - Rf]. When we use the APT model we have to have to estimate the different factors that are used in the model. There are 3 procedures which are used to estimate the factors 53 1. The statistical procedure uses factor analysis to obtain sets of values which have similar covariances to the covariances between the returns on different assets. These factors are similar to index numbers which summarize the values of several variables into a single value. Unfortunately in most cases we are not able to determine which economic variables are summarized by this single set of values. 2. The second approach is to calculate the unexpected changes in the macroeconomic variables which affect the level of systematic or market risk. The unexpected changes in macroeconomic variables such as GDP, inflation, interest rates, oil prices and the volatility of share prices are found by calculating the differences between the actual values and the forecast values of these variables. There are many different ways of forecasting these values. In the application discussed in the next section we will use a very simple forecast namely the average of the past values. The 5 factors which have been shown to be most closely related to returns on assets in the US are a. Changes in the monthly growth rate of the GDP. This variable affects investors future expectations about the growth rate in GDP and corporate earnings. b. Changes in the risk premium associated with a firm defaulting on bond repayments. This is calculated by finding the difference between the returns on bonds with a BAA rating and bonds with a AAA rating. As this increases investors worry more about firms defaulting. c. Changes in the difference between the yields on long term and short term government bonds. These changes affect the discount rate which is used to find the present value of future returns from financial assets. d. Unexpected changes in the rate of inflation alter the future returns for many firms. e. Changes in the expected rate of inflation as measured by changes in the yield on short term Treasury bills. These changes affect interest rates, consumer confidence and government policy. 3. The third approach is to identify the characteristics of firms which have very high or very low returns. These are then used to form portfolios of shares where the firms posses this characteristic e.g the firms with a small size or the firms with a low P/E ratio. The returns from these portfolios are treated as proxy factors. The reason for using this approach is as follows. If there is a risk premium or extra return associated with some characteristic such as the small size of a very then the portfolio of shares with this characteristic are usually very sensitive to the risk associated with this characteristic. 7.3.2 Estimating and Testing the APT Model In general when we are testing the APT model we test whether the data is consistent with certain implications of the APT model. The key implications of this model are 54 1. The expected rate of return of any portfolio with zero betas is the riskfree rate of return. 2. The expected returns on securities increase linearly with increases in a given factor beta. 3. No other characteristics of stocks other than the factor betas affect expected returns. The application that that we will examine is based on question 9 in Chapter 2 of Berndt. The data can be found in the file APT which contains 121 monthly observations for the period from 1977:12 to 1987:12. There are values for the following six variables in columns B to G. B. CPI : The consumer price index. C. POIL : The price of domestic crude oil. D. FRBIND : The Federal Reserve Board index of industrial production. E. DEC : Monthly returns on Digital Corporation shares F. MARKET : Monthly returns on the Market Portfolio G. RKFREE : The riskfree rate of return We will use the following procedure to estimate the APT model in this situation. 1. Our first task is to use the [Genr] function in Eviews to calculate the discrete growth rates for 3 of these variables namely. The new variables we must generate are as follows a. RINF or the rate of inflation is the rate of change in the price level RINFt = CPI t - CPI(t - 1) CPI(t - 1) b. ROIL is the rate of increase in the real price of oil where the real price is the ratio of the price of oil POIL over the general price level or CPI. We write this in the following way ROILt = ⎞ ⎛ POIL (t - 1) ⎛ POIL t ⎞ ⎜ ⎟ -⎜ CPI t ⎠ CPI(t - 1) ⎟ ⎝ ⎠ ⎝ ⎛ POIL (t - 1) ⎞ ⎜ CPI(t - 1) ⎟ ⎝ ⎠ c. GIND or the discrete growth rate in industrial production is given by GINDt = FRBIND t - FRBIND (t - 1) FRBIND (t - 1) 55 When you calculate discrete growth rates you lose an observation from the start of the sample. This means we have 120 values of the growth rates from 1978:1 to 1987:12. 2. After calculating the growth rates we then calculate the forecasts of these growth rates which we will use to find the unexpected changes in the growth rates. A simple forecast is just the mean of the set of values. The difference between the values of these growth rates and the mean of each set of values is called a surprise variables. To find the unexpected change in the rate of inflation or SURINF we click [Genr] and enter the following function in EViews SURINF=RINF-@MEAN(RINF) A similar procedure is used to estimate the other surprise variables SUGIND and SUROIL. 3. Once we have generated these surprise variables we now estimate the APT model for the particular share which in this case is Digital Corporation or DEC. In this model we have a. The risk premium for DEC shares as the dependent variable. To obtain this risk premium we find the difference between the returns for DEC and the riskfree rate of return. b. We include four independent variables namely the market risk premium MRP or difference between the return on the market index and the risk free rate of return and the three surprise variables. The output we obtain is shown on the top of the following page. This shows that as the DEC shares are concerned the CAPM rather than the APT model is the appropriate way to model the returns. The reason why is as follows 1. The p value for the intercept term shows that the intercept is not significantly different from 0. 2. The p values for the 3 surprise variables all show that none of these coefficients are significantly different from 0 whichy is the same as saying that the unexpected changes in these variables do not help us to explain the returns on DEC shares. 3. The p value for the Market Risk Premium shows that this coefficient is significantly different from 0 so that the returns on DEC shares are mainly affected by Market risk. Dependent Variable: DECRP Method: Least Squares Sample(adjusted): 1978M01 1987M12 Included observations: 120 after adjusting endpoints Variable Coefficient Std. Error t-Statistic Prob. C 0.006914 0.007517 0.919743 0.3596 56 MRP SURINF SUROIL SUGIND 0.838438 0.659194 0.059071 -0.018290 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.345182 0.322405 0.081880 0.771001 132.5808 2.144707 0.111875 0.866928 0.211734 0.074656 7.494414 0.760380 0.278985 -0.244985 0.0000 0.4486 0.7808 0.8069 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) 0.012911 0.099470 -2.126347 -2.010202 15.15531 0.000000 4. This evidence does not however provide very strong support for the CAPM as the 2 R value of 0.356053 indicates that about 35.6% of the variation in returns is systematic which means that about 64.4% is unsystematic or specific to DEC. The estimated model that we would use to forecast the future excess returns of DEC shares over and above the riskfree rate of return based on this output would be Rit - Rf = 0.829871(RMt - Rf) 7.4 Testing Market-Timing and Stock-Selection Performance An important issue that many investors are concerned with is whether they should invest in actively managed mutual funds. In these funds the portfolio manager attempts to choose assets that will outperform the market. The alternative approach is to choose a passive fund where the portfolio manager chooses assets which will replicate the performance of the market. One way of deciding whether we should choose an actively managed fund is to look at how well the portfolio managers in these funds select shares and time when to buy and sell shares. One model which is used to evaluate the performance of the portfolio managers in actively managed funds is the Treynor-Mazuy model in which the excess returns on the portfolio are treated as a quadratic function of the excess returns on the market. (Excess returns are the difference between the actual returns and the riskfree rate of return.) We write this model in the following way 2 (RPt - Rf) = αP + β1(RMt - Rf) + β2(RMt - Rf) + ut where (RPt - Rf) = the excess returns on portfolio P (RMt - Rf) = the excess returns on the market portfolio αP = estimated measure of selectivity performance 57 β1 = the traditional measure of systematic risk or beta β2 = estimated measure of market-timing performance The coefficient of the squared excess return on the market portfolio β2 is seen as a measure of market-timing performance for the following reason. The derivative or slope of this function w.r.t. the excess returns on the market portfolio shows how unit changes in the excess returns on the market portfolio affect the excess returns on portfolio P. From the formula for the derivative d(R −R) d(R −R) P M f = β1 + 2 β2(RMt - Rf) f we see the following 1. If β2 is 0 then the impact of a unit change in the excess returns on the market portfolio is just the traditional measure of systematic risk β1. 2. If however β2 has a positive value then the way in which returns from the market portfolio are moving affects the size of impact on the returns from portfolio P from a unit change in the excess returns from the market portfolio. We now find that a. When the market is moving up so there are positive changes in the excess returns on the market portfolio then the size of the impact on the excess returns from portfolio P will be larger than it would have been under the CAPM where it is β1. In other words as long as β2 has a positive value we can conclude that portfolio managers are timing their trading in stocks in such a way that the greater the excess returns from the market portfolio the greater the impact on the returns from portfolio P. b. It also means that when the market is performing poorly and there is a negative excess return from the market portfolio then because of the portfolio managers ability to time the trading of shares the size of the negative impact on the excess returns from portfolio P will be smaller than what it would have been according to the CAPM. Because of the importance of this issue there are have been many studies which looked at the performances of different types of actively managed portfolios. If you want to perform this type of study using the basic Treynor-Mazuy model you will use the following procedure 1. Obtain a set of values for the market returns, the riskfree rate of return and the returns on a particular portfolio P. Use these values to calculate the excess returns for the market portfolio and the actively managed portfolio P. 2. Estimate the quadratic equation 2 (RPt - Rf) = αP + β1(RMt - Rf) + β2(RMt - Rf) 58 + ut and use the estimated value of β2 to choose between the following hypotheses H0 : β2 = 0 There is no evidence of being able to time the market H1 : β2 > 0 There is evidence of being able to time the market 3. Finally you will use the estimated value of αP to choose between the following hypotheses H0 : αP = 0 There is no evidence of being able to select superior shares H1 : αP ≠ 0 There is evidence that portfolio managers selected superior or inferior shares For example consider the file managed_fund.wf1 which contains the weekly exit price for an Australian Equity Fund. The estimation results are presented below and actually produce a negative coefficient suggesting that this managers trading activities do not add to portfolio wealth. Dependent Variable: RFUND-IRATE Method: Least Squares Date: 04/23/07 Time: 16:12 Sample(adjusted): 3/03/1995 11/02/2005 Included observations: 520 after adjusting endpoints Variable Coefficient Std. Error t-Statistic Prob. C RMKT-IRATE (RMKT-IRATE)^2 -0.010721 0.868749 0.892953 0.002970 0.108945 0.957923 -3.609755 7.974170 0.932176 0.0003 0.0000 0.3517 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.645532 0.644160 0.010575 0.057811 1629.297 2.446525 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) -0.054572 0.017727 -6.254990 -6.230448 470.7611 0.000000 More generally, various studies have produced quite different results. One reason why we can obtain different results with different data is that our model is not correctly specified. 1. A possible reason why it may not be correctly specified is that many studies use the Standard and Poors 500 Index as the proxy for the market portfolio but portfolio managers often include shares of companies that are not one of these 500 firms. 2. Another reason is that portfolio managers can also include other financial assets such as government bonds and corporate bonds in a portfolio. 59 The way to allow for this is to include two extra independent variables in our model. The first of these is the excess returns on a second index called the Wilshire 4500 Index which uses shares from 4500 different firms from those in the Standard and Poors 500 Index. The second of these is the excess returns on a second index which summarizes the prices of government and corporate bonds. In the US one index which does this is the Shearson Lehman Government Corporate Index. If we call the excess returns on these two indices rW and rB then the model we will now estimate is the following one. (RPt - Rf) = αP + β1(RMt - Rf) + β2(RMt - Rf) 2 + β3rW + β4rB + ut This should give us more accurate estimates of β2 and αP and this will give us a better idea of whether the portfolio managers have been able to time the market and to pick stocks with higher returns. 60 APPENDIX – DERIVATION OF THE MATRIX LEAST EQUATION SQUARES As for the simple linear regression case, we calculate the residuals then square and sum them to get the sum of squared residuals. Let the vector of estimated residuals be denoted by e. This is an n×1 matrix similar to the y and u vectors defined earlier. n ˆ ˆ SSE = ∑ ei2 = e′e = (y − y )′(y − y ) i =1 ˆ ˆ = (y − Xβ )′(y − Xβ ) ˆ ˆ = (y ′ − β ′X′)(y − Xβ ) ˆˆ ˆ ˆ = y ′y − y ′Xβ − β ′X′y + β ′X′Xβ . Now similarly to the simple linear regression case we take the derivative with respect to β and set this equal to zero to find the minimum of the sum of squared error terms. ∂SSE ˆ = 0 − X′y − X′y + ( X′X + ( X′X)′) β = 0 ˆ ∂β ˆ ⇒ 2 X′Xβ = 2X′y Dividing both sides by 2, then premultiplying by (XtX)-1 gives, ˆ β = ( X′X) −1 X′y . For those interested in the details of the matrix differentiation, rules can be found in Lutekepohl H. [1991] Introduction to Multiple Time Series Analysis, SpringerVerlag, Germany, Appendix A and Magnus J.R. and Neudecker H. [1998] Matrix Differential Calculus with Applications in Statistics and Econometrics, Wiley, New York. 61 ... View Full Document