Lecture Notes 301.pdf - ECON 301 Introduction to...

This preview shows page 1 out of 245 pages.

Unformatted text preview: ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 LECTURE 01 INTRODUCTION AND ASSUMPTIONS OF CLASSICAL LINEAR REGRESSION MODEL (CLRM) Outline of today’s lecture: 1. What is Econometrics?........................................................................................................................ 1 2. Types of Econometrics ........................................................................................................................ 2 3. Simple Linear Regression Model ......................................................................................................... 2 4. Linearity Issue ..................................................................................................................................... 4 5. Stochastic Nature of Linear Regression Model ................................................................................... 4 6. Sources of Disturbance Term .............................................................................................................. 5 7. Assumptions of Classical Linear Regression Model (CLRM) ................................................................ 8 1. What is Econometrics? Literally interpreted, measurement.” econometrics means “economic Although measurement is an important part of econometrics, the scope of econometrics is much broader, as can be seen from the following quotations: o . . . econometrics may be defined as the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference.1 o Econometrics may be defined as the social science in which the tools of economic theory, mathematics, and statistical inference are applied to the analysis of economic phenomena.2 o Econometrics is concerned with the empirical determination of economic laws.3 1 Samuelson, T. C. Koopmans, and J. R. N. Stone “Report of the Evaluative Committee for Econometrica,” Econometrica, vol. 22, no. 2, April 1954, pp. 141–146. 2 Arthur S. Goldberger, Econometric Theory, John Wiley & Sons, New York, 1964, p. 1. 3 H. Theil, Principles of Econometrics, John Wiley & Sons, New York, 1971, p. 1. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 1 ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 2. Types of Econometrics Econometrics may be divided into two broad categories: theoretical econometrics and applied econometrics. In each category, one can approach the subject in the classical (frequentist) or Bayesian tradition. In this course, the emphasis is on the classical approach. 3. Simple Linear Regression Model In econometrics we deal exclusively with stochastic relations. o Stochastic is a term for random or uncertain. o It is the opposite of deterministic. The simplest form of stochastic relation between two variables X and Y is called a simple linear regression model. o Yt 0 1 X t ut t=1,..., n where Y is dependent variable X is independent (explanatory) variable4 u is the stochastic disturbance term, or error term. 0 and 1 are the regression parameters which are unknown. Subscript t refers to the tth observation. 4 The terms regressor, regressand and covariate are also used. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 2 ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 The values of the variables X and Y are observable. However, the values of u are not observable. Observations on X and Y can be made over time, in which case we speak of having “time series” data. o For example we may have data on Turkey’s GDP over 30 years (data collected over a period of time). Or they can be made over individuals, or groups of individuals, firms, or group of firms, countries, or group of countries, objects, geographical areas, etc., in which we speak of having “cross section data”. o For example we may have data on several countries’ GDP for one year (data collected at one point in time). Hence, the subscript t may refer to the tth year (quarter, month, day, etc) or to the tth individual or group (such as countries; Turkey, Germany, France, USA, Japan, etc..). Data of both kinds can be combined to obtain “pooled times and cross-section data” or simply “pooled data” o For example we may have data on 20 countries’ GDP over 30 years. o There is a special type of pooled data, the panel or longitudinal data, also called micropanel data, in which the same cross-sectional unit (say, a family or firm) is surveyed over time (Gujarati, p.23). Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 3 ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 Hence, panel data is a special case of “pooled data”. The phrase "pooled data" is more general. It implies that the data have both a time-series and a crosssectional dimension, but it includes the case where the individuals (or sections) are not the same. For example; some individuals drop out over time and are replaced by others. 4. Linearity Issue Yt 0 1 3 X t ut is nonlinear in the parameter 1 . This model is an example of a nonlinear (in the parameter) regression model. 5. Stochastic Nature of Linear Regression Model The stochastic nature of the regression model implies that for every value of X there is a whole probability distribution of values of Y. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 4 ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 Uncertainty concerning Y arises because of the presence of the stochastic disturbance u which, being random, imparts randomness to Y. o For example, consider a production function of a firm. o Suppose that output depends in some specified way on the quantity of labor (L) in accordance with firm’s technology. Outputt 0 1Labort ut This function may apply in the short run when the quantities of other factors are fixed. o In general, the same quantity of labor will lead to different quantities of output because of variations in weather, human performance, frequency of machine breakdowns, and many other factors. o Output, which is the dependent variable in this case, will depend not only on the quantity of labor input, but also on the large number of random causes, which we summarize in the form of the stochastic disturbance (u). 6. Sources of Disturbance Term The existence of the disturbance term is justified in three main ways. (l) Omission of the influence of chance events [Not Omitted Variable Case!] Although income might be the major determinant of the level of consumption, it is not the only determinant. Other variables, such as the interest rate or liquid asset holdings may have a systematic influence on consumption. Their omission constitutes one type of specification error: the Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 5 ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 nature of the economic relationship is not correctly specified. In addition to these systematic influences, however, are innumerable less systematic influences (chance events) such as weather variations, taste changes, earthquakes, epidemics, strikes, etc.. Although some of these variables may have a significant impact on consumption and thus should definitely be included in the specified relationship, many have only a very slight, irregular influence: the disturbance is often viewed as representing the net influence of a large number of such small and independent causes (Kennedy, p.3). In any economic model, we want to include all the important and relevant explanatory variables in the model, so the error term u is a ‘‘storage bin’’ for unobservable and/or unimportant factors affecting level of consumption. As such, it adds noise that masks the relationship between X (income) and Y (consumption) (Griffith and Hill, p. 48). Note that, if we have omitted some important factor, made any other serious specification error (omitted variable case), then the assumption of E (ut ) 0 that we will see in the next section will be violated, which will have serious consequences (Griffith and Hill, p. 48) (2) Measurement error in Y [not in X!] It may be the case that the dependent variable Y cannot be measured accurately, either because of data collection difficulties or because it is inherently unmeasurable and a proxy variable must be used in its stead. The disturbance term can in these circumstances be thought of as representing this measurement error. On the other hand, errors in measuring the independent variable(s) X (as opposed to the dependent variable Y) create a serious econometric problem and hence this situation cannot be stated as a source of the existence of the disturbance term. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 6 ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 For example: (1) It may be that dependent variable Z is exactly related to X: Z 0 1 X t but we observe Yt Zt ut , where ut denotes the measurement error. Thus: Yt 0 1 X t ut However, note that, this justification is less satisfactory since (as we will see later) measurement error in X, as stated above, can cause considerable problems. (3) Human Indeterminacy Human behavior is such that actions taken under identical circumstances will differ in a random way. The disturbance term can be thought of as representing this inherent randomness in human behavior. In other words, People are not predictable in their behavior. There is an element of unpredictability in all human behavior. We might interpret our model as combining a systematic or deterministic component ( 0 1 X t ) with a random or stochastic component ( ut ) that represents the unpredictable element in the relationship. To sum up, in the social sciences, controlled experiments are usually not possible. For example, an economist cannot hold Turkish national income constant for several years while examining the effect of interest rate on investment. Since the economist cannot neutralize extraneous influences by holding them constant, the best alternative is to take them explicitly into account, by regression Y on X and the extraneous factors (Wonnacott, 1979, p.23) Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 7 ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 7. Assumptions of Classical Linear Regression Model (CLRM) The following assumptions are important in CLRM5. Assumption 1 (A1): Linear regression model [the regression model is linear in the parameters although it may be nonlinear in variables] In simple regression model, the value of Yt , for each value of X t is Yt 0 1 X t ut Assumption 2 (A2): X values are fixed in repeated sampling or X values must be independent of the error term: Cov( X t , ut ) 0 . In many practical cases, Xt is fixed across samples is not just the simplest assumption; it is also the most realistic assumption6. Keeping the value of income X fixed, say, at level 1,100 TL, we draw at random a family and observe its consumption expenditure (Y) as, say, 640 TL. Still keeping at 1,100 TL, we draw at random another family and observe its Y value as 580 TL. In each of these drawings (i.e. repeated sampling), the value of X is fixed at 1,100 TL. This is the sense in which we are assuming the X’s are fixed in repeated sampling (or, across samples). Assumption 3 (A3): Zero mean value of disturbance ut [ E (ut ) 0 ] which is equivalent to assuming that E (Y | X t ) 0 1 X t o In other words, the expected value of the disturbance term for any X should be 0. 5 6 Gujarati, D. (2009) Basic Econometrics, 5th Edition, Mc Graw Hill, pp.62-68. Murray, M. P. (2006) Econometrics: A Modern Introduction, Pearson International, pp. 41-42. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 8 ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 Assumption 4 (A4): Homoscedasticity, i.e. constant variance of ut [Var (ut ) 2 ]. o Population variance of the disturbance term ( ut ) should be constant for all observations (Homoscedasticity). o If this condition is not satisfied, the OLS regression coefficients will be inefficient. o In terms of our production example, this assumption implies that the variation in output is the same whether the quantity of labor is 20, 100, or any other number of units. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 9 ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 Assumption 5 (A5): No autocorrelation7 or zero covariance between ut and us [i.e., Cov( ut ,us ) 0 ]. This assumption can be made stronger by assuming that the values of ut are all statistically independent8. o This condition states that there should be no systematic association between the values of the disturbance term in any two observations. o If this condition is not satisfied, OLS will again give inefficient estimates. o This assumption implies that output is higher than expected today should not lead to higher (or lower) than expected output tomorrow. o Recall that correlation between X and Y is given by: Cov( X ,Y ) XY where X and Y are standard deviations of X and Y, respectively. Therefore, if the correlation between X and Y is zero ( 0 ), it implies that Cov( X ,Y ) 0 . As a result, the no autocorrelation implies that Cov( ut ,us ) 0 where t s .9 7 No autocorrelation means the correlation between any ut and us ( t s ) is zero. 8 The value which the disturnbance term assumes in one period does not depend on the value which it assumed in any other period. 9 Recall that a zero value of the covariance indicates no linear dependence between X and Y. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 10 ECON 301 - Introduction to Econometrics I METU - Department of Economics February 2014 o Note that covariance between X and Y is given by Cov( X ,Y ) E X E( X )Y E( Y ) Therefore, no autocorrelation implies zero covariance: Cov( ut ,us ) E ut E( ut )us E( us ) 0 From Assumption 3, we know that E( ut ) 0 and E( us ) 0 . Thus, no autocorrelation assumption implies that Cov( ut ,us ) E ut E( ut ) us E( us ) 0 0 0 E ut us 0 Assumption 6 (A6): The number of observations (T) must be greater than the number of parameters to be estimated10 (T>k+1) and that there are no exact linear relationship between the independent variables (No perfect multicollinearity). o Although this is stated as an assumption, it can easily be checked, so that it need not be assumed. 10 There are k independent variables and one intercept term, hence: k+1. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 11 ECON 301 - Introduction to Econometrics I METU - Department of Economics March 2014 LECTURE 02 POPULATION REGRESSION FUNCTION – SAMPLE REGRESSION FUNCTION Outline of today’s lecture: 1. Population Regression Function (PRF) and Sample Regression Function (SRF) ................................. 1 A. Homoscedasticity Case ................................................................................................................... 2 B. Heteroscedasticity Case .................................................................................................................. 9 2. Gauss-Markov Assumptions.............................................................................................................. 11 3. Sampling Distribution and Repeated Sampling ................................................................................ 12 4. The Sampling Distribution of βˆi ...................................................................................................... 12 1. Population Regression Function (PRF) and Sample Regression Function (SRF) Suppose we have a small community of 24 families (This is our population). We’re interested in studying the relationship between their weekly disposable income (X) and consumption (Y). We want to estimate the average consumption expenditures of this population (population mean of consumption expenditures) at different income levels. The 24 families can be grouped into four income groups. Each family within a group has the same disposable income [Hence, X’s are fixed!]. Note that this is the entire population, not a sample. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 1 ECON 301 - Introduction to Econometrics I METU - Department of Economics March 2014 A. Homoscedasticity Case We have a small community of 24 families (This is our population, hence the population size is N=24). Table 1 Consumption Expenditures for Different Levels of Disposable Income (Population, N=24) – Homoscedasticity Case Disposable Income (Xt) Consumption (Yt) Average Consumption E (Y | X = X t ) Disturbances (ut) = Yt − E (Y | X = Xt ) Conditional Mean of ut E (ut | X t ) = 0 1100 580, 640, 700, 820, 880, 920 760 -180, -120, -60, 60, 120, 180 0 1200 640, 700, 760, 880, 940, 1000 820 -180, -120, -60, 60, 120, 180 0 1300 700, 760, 820, 940, 1000, 1060 880 -180, -120, -60, 60, 120, 180 0 1400 760, 820, 880, 1000, 1060, 1120 940 -180, -120, -60, 60, 120, 180 0 Conditional Variance of ut Var ( u t | X t ) = 0 t [(-180-0)2+(-120-0)2+(-60-0)2+ (180-0)2+(120-0)2+(600)2]/6=16800 [(-180-0)2+(-120-0)2+(-60-0)2+ (180-0)2+(120-0)2+(600)2]/6=16800 [(-180-0)2+(-120-0)2+(-60-0)2+ (180-0)2+(120-0)2+(600)2]/6=16800 [(-180-0)2+(-120-0)2+(-60-0)2+ (180-0)2+(120-0)2+(600)2]/6=16800 Below we plot these data points on a diagram. The “bold” square dots are the actual observations. The Conditional Mean or Conditional Expectation is given as follows: E (Y | X = X t ) E (Y | X = X t ) may be regarded as the average or expected consumption expenditure of households with the given disposable income X. In the figure, the “blank” square dots are the conditional means. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 2 ECON 301 - Introduction to Econometrics I METU - Department of Economics March 2014 Figure 1 Consumption Expenditures for Different Levels of Disposable Income (Population) – Homoscedasticity Case We might anticipate that consumption will be linearly related to disposable income. This is an initial assumption of our estimation. We could narrow this functional form to: E (Y | X= X t= ) β 0 + β1 X t or simply E (Y= ) β 0 + β1 X t This is the (linear) Population Regression Function (PRF) 1: β 0 + β1 X t . Notice that, here the PRF is the straight line connecting these conditional means. Clearly, consumption expenditures “on average” increase with disposable income. 1 Or, True Regression Function. Instructor: Dr. Ozan ERUYGUR Lecture Notes e-mail: [email protected] 3 ECON 301 - Introduction to Econometrics I METU - Department of Economics March 2014 Geometrically, the Population Regression Function (PRF) is simply the locus of average consumption expenditures of our population at different income levels2. Analytically the equation of PRF3 can be found using any two data points. For example, E (Y= | X 1100) = 760 , X = 1100 and E (Y= | X 1200) = 820 , X = 1200 can be used: β 0 + β11100 = 760 β 0 + β11200 = 820 Solving this system yields β 0 = 100 and β1 = 0.6 . Hence the equation of PRF is as follows: E (Y |= X X= 100 + 0.6 X t t) Role of Disturbance The PRF tells us the “average” consumption expenditures for a given level of household income. But we know that any “particular” household is unlikely to be on this function since there are stochastic deviations from these average consumption expenditure levels at different levels of income. The“average”consumption expenditures at different income levels, in terms of PRF, can be expressed as (Data Generating Process, DGP): = Yt E (Y |= X X t ) + ut where ut is a random variable with mean 0 (disturbance). 2 Or, “th...
View Full Document

{[ snackBarMessage ]}

Get FREE access by uploading your study materials

Upload your study materials now and get free access to over 25 million documents.

Upload now for FREE access Or pay now for instant access
Christopher Reinemann
"Before using Course Hero my grade was at 78%. By the end of the semester my grade was at 90%. I could not have done it without all the class material I found."
— Christopher R., University of Rhode Island '15, Course Hero Intern

Ask a question for free

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern