188 Pages

Week 3&4

Course: GEO 4167, Spring 2012
School: University of Florida
Rating:
 
 
 
 
 

Word Count: 11015

Document Preview

Testing II. for Multicollinearity When two or more independent variables in a regression model are highly correlated with one another (or collinear), they will contribute "redundant" explanatory information. Hence, not all of those independent variables need to be in the model. Multi-collinear relations amongst regressors can cause a host of problems... making it difficult, if not impossible,...

Register Now

Unformatted Document Excerpt

Coursehero >> Florida >> University of Florida >> GEO 4167

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Testing II. for Multicollinearity When two or more independent variables in a regression model are highly correlated with one another (or collinear), they will contribute "redundant" explanatory information. Hence, not all of those independent variables need to be in the model. Multi-collinear relations amongst regressors can cause a host of problems... making it difficult, if not impossible, to isolate the effects or impacts of individual regressors on the dependent variable. Consequences of "Multicollinearity" in Multiple Regression: Estimated OLS coefficients may test statistically insignificant (when they should not... given that they are known to be significant when running simple Bi-variate models)... and may even have the wrong sign. All (or almost all) of the estimated beta coefficients may test as "not significantly different" from zero, while the overall regression model yields very high explanatory power in terms of adjusted R-square-- a contradiction. Why? The presence of multicollinear regressors tends to inflate the standard errors of the models... leading to a greater likelihood that the beta's are found to be not significantly different from zero. In short, multicollinearity "inefficient estimators". Consider a classic example from econometrics (temporal model): Yi = 0 + 1 X1i + 2 X2i + i where Y = value of imports $ X1 = GNP (Gross National Product) index of growth X2 = CPI (Consumer Price Index w.r.t. some base year) = stochastic disturbance/error term (i) = 1,...,n time periods (years) Expectations: X1 and X2 are possibly collinear. Suppose that rX1,X2 = .9972 (approx.)... indicating a very strong positive correlation between regressors. Database: n=16 years (1991-2006) Year base yr (4) Y millions X1 millions X2 inflation index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 28.4 32.0 37.7 40.6 47.7 52.9 58.5 64.0 75.9 94.4 131.9 126.9 155.4 185.8 217.5 260.9 635.7 688.1 753.0 796.3 868.5 935.5 982.3 1,063.4 1,171.1 1,306.6 1,412.9 1,528.8 1,702.2 1,899.5 2,127.6 2.368.5 92.9 94.5 97.2 100.0 104.2 109.8 116.3 121.3 125.3 133.1 147.7 161.2 170.5 181.5 195.4 217.4 Base year Yi = 0 + 1 X1i + 2 X2i + i = -101.49 + 0.08 X1i + .76 X2i + i se('s) t-stats .0571428 (1.40) .760 (1.01) Note: t/2 (critical) = 2.16 at = .05 and 13 d.f. In both cases we "fail to reject" Ho : j = 0, (j=1,2); yet the overall model is said to explain 95.8% of the variation in Y as the adjusted R-square = .958, and an R-square of .970. Note also that F = 199.0, with prob(F) = .001 (showing significance at 99%). insignificant variables and significant model => inherent contradiction! Note that error terms do not appear to be random error +10 0 -10 100 200 Predicted values of Y But when we run the bi-variate models we get entirely different results... Imports = f(GNP) Yi = 0 + 1 X1i + i = -69.03 + 0.13 X1i + i se() t-stat R-square = .986 .0004079 (31.87) Imports = f(CPI) Yi = 0 + 1 X2i + i = -149.52 + 1.82 X2i + i se() t-stat R-square = .985 .05911 (30.79) Note that we now "reject" Ho : j = 0, (j=1). And the estimated slope coefficients are found to be significantly different (greater) than zero at both the 95% and 99% confidence levels. Error terms appear to be more random! Errors for model 2 shown below. error +10 0 -10 100 200 Predicted values of Y Multicollinearity: Indicators & Remedial Measures ...concerns over model efficiency (1) High correlation coefficients between regressors (r >.90); especially in cases involving large sample sizes. (2) Non-significant t-tests for all (or nearly all) 's, while the F-test {Ho : 1 = 2 = .... = k = 0} is "rejected" and the "Coefficient of Determination" R-square and the adjusted R-square are high. (3) Opposite signs of parameter estimates (from what is expected based on theory or other empirical evidence). Remedial measures: (a) drop collinear regressor(s) from the model and re-estimate; (b) apply Principal Components (PC) regression; or (c) apply a "stepwise regression" procedure (four standard types of stepwise procedures). One could also utilize the structured test for Multicollinearity... known as the "Variance Inflation Factor" (VIF) test. Consider a typical OLS regression model: Y = f (X1, X2,...,Xk) where multicollinearity is suspected amongst the k regressors or independent variables of the model based on preliminary OLS runs or results obtained from a "correlation matrix". VIF test Run all possible combinations of "regressor-dependent" LS models (of which there are j= 1, ..., k of them), omitting the dependent variable from the analysis. Calculate R-square for each model Rj2 . The "Variance Inflation Factor" (VIF) for a given Xj variable (or j parameter) is defined, in general, as VIFj = 1 / (1 Rj2) for j = 1, ... , k regressions where the j-th R-square value is the coefficient of determination for the test model: E(Xj ) = 0 + 1X1 + 2X2 +...+ j-1X(j-1) + j+1X(j+1) + kXk. Essentially, each independent variable X(j) is regressed against all remaining independent variables in the original model... and an R-square value is recovered for each model. Should the j-th VIF >> 10 (i.e., a j-th R-square greatly exceed .90), multicollinearity is a potential problem, providing us with statistical evidence that the variance (and hence, the standard error) of the j parameter may be "inflated". Suppose we have a model with k=3 independent variables: Yi = 0 + 1 X1i + 2 X2i + 3 X3i + i . In this case, we would compute k=3 VIF values one for each j (j=1,.., k), using the following test models: (j) (1) (2) X1i = 0 + 1 X2i + 2 X3i + i X2i = 0 + 1 X1i + 2 X3i + i to find VIF of 1 2 (3) X3i = 0 + 1 X1i + 2 X2i + i 3 where VIFj = 1 / (1 - Rj2 ). Remove variable associated with the largest VIF...and repeat procedure if and when necessary. Consider the following test results for the tri-variate model: Coefficient 1 2 3 Variable X1 X2 X3 VIF 22.9* 5.8 3.2 *Note 22.9 >> 10... indicating that variable X1 is highly collinear with {X2, X3} and that the variance and standard error of 1 may be inflated. The presence of X1 in the original model, therefore, yields redundant explanatory information... and would potentially render the original model inefficient in the sense that the model would no longer produce estimators that are of minimum variance. Nevertheless, the beta estimates are still unbiased estimators. Remedial measure: Remove X1 from the model Y=f(X1,X2,X3), and run Y=f(X2,X3) only ... and test VIFs again if necessary. In general, we typically eliminate highly collinear variables. Note An easier way to identify highly collinear regressors in a model is to construct a kxk correlation matrix showing the Pearson Product Moment r correlation coefficient values for all pair-wise combinations of independent variables. That is, for a vector of LHS variables or regressors X = (X1 , X2 , ... Xk ), construct a matrix of values showing the correlations between {Xj , Xm } pairs. r1,1 r1,2 ... r2,1 r2,2 R= . . . . . .. .. .. .. . . ... . r1,k . . .. rk,1 ... rk-1,k ... rk,k-1 rk,k .. Of course, this matrix is symmetrical as rj,m = rm,j . Also, the values along principal diagonal of the matrix will all be 1.0; that is, r,j,m = 1 when j=m... as each variable is perfectly positive correlated with itself. Within the kxk matrix R, search for values that are excessively high |rj,m| > value of .90 or higher; thus, indicating a high linear correlation between variables j and m. Consider using only one of the two variables in your model, as the other is adding redundant information. In our example, rX1,X2 = .997. Spearman rank correlation coefficients can also be used to search for multi-collinearity. One might consider using Principal Components analysis as a way to identify groups of variables that may be highly correlated, and then apply Principal Components Regression instead of LS (an advanced topic, beyond the scope of this course). In our previous example, Imports = f(GNP, CPI)... ...the VIF for the test model GNP = f(CPI), with an R-square of .9940 yields a value of approx. VIF = 1 / (1 - .9940) = 166.6 and the VIF for the test model CPI = f(GNP), with an R-square of .9945 yields a value of approx. VIF = 1 / (1 - .9945) = 181.8 The results suggest that either variable could be dropped from the model to improve the model's overall efficiency (with preference given for the retention of GNP, given that the VIF associated with CPI is slightly larger). Criteria for choosing a model: Mallow's Cp & PRESS Mallow's Cp criterion (variable selection and bias) The calculation of Mallow's Cp is also useful as a variable selection technique. It is based on "Total Mean Square Error" (TMSE), where (theoretically) we wish to evaluate ^ TMSE = E { [ Yi - E(Yi ) ]2 } i=1 n ^ ^ ) - E(Y ) ]2 + Var (Y ) = [ E(Yi i i ^ and E (Yi ) is the mean or expected value of Y from the fitted SRF and E(Yi ) is the mean or expected value for the true model (PRF)... which is unknown. i=1 i=1 n n Objective: In theory, we seek to minimize the value , where T = TMSE(subset model) / error variance = TMSE(subset model) / 2 (unknown) to compare values for TMSE for each "subset" regression model (i.e., at each step of a variable inclusion or exclusion process) with the variance of the random error 2 (unknown) for the true model (associated with the PRF). Note: Small values of T imply that the subset regression model has a small TMSE relative to the error variance 2. Note: Both TMSE and 2 are unknown, and we must rely on sample estimates of these quantities from the SRFs. Mallow's Cp provides an "estimate" of T , and may be defined as Cp = [ SSEp / MSEk ] + 2 (p+1) n where n = sample size; p = number of regressors in the "subset" model that we wish to evaluate; k = number of potential regressors available in the "full" model; SSEp = sum of squared error for the subset model with p regressors; and MSEk = mean square error for the model containing all k independent variables... where ^ MSEk = 2 = [ i2 ] / (n-k*) ...defined as the sum of square error divided by d.f. -- the "degrees of freedom" (n-k*) of the regression model with k explanatory variables and k* = k + 1 parameter estimates (allowing for the computation of the Y-intercept or constant term). The calculation of Mallows Cp allows us to select the "best (sub-set) model". Model selection is made when we find: (1) a small value of Cp (suggesting a TMSE which is small... and possibly minimal); and (2) a value of Cp that is near the value (p+1), where p is the number of regressors contained in the subset model. The above properties indicate that the chosen subset model is likely to be unbiased... and efficient (provided that the LS assumptions are met by the subset model). Note that a regression model (SRF) is said to be "unbiased" if ^ E (Y ) = E (Y) ... and for an unbiased model, it can be shown that the "expected value" of Mallow's Cp: E (Cp) p+1 In other words, despite the potential bias that exists in any subset model (or SRF), if Cp is near p+1, the bias is small or insignificant.. and can essentially be ignored. Hence, we should find Cp p+1 as one possible strategy in the model selection process. Alternatively, Mallow's Cp may be calculated for a subset regression model with p slope parameters as follows: For the subset model {Xm } with m regressors, taken from the set of all k regressors {X1, X2, ... Xk}, Cp{Xm } = (n-k*) [1 - R2{Xm }] + 2p n [1 - R {X1, X2, ..., Xk}] 2 where p is the number of regressors or 's estimated in the subset model under consideration, and p=m+1 for the model with m regressors (predictor variables), n equals the number of sample observations, and k*=k+1 (allowing for the estimation of the Y-intercept or constant term). A final note on model selection and parsimony In the evaluation of a subset model, the choice of how many variables to retain in that model should be subject to test or model-choice criterion. Mallow's Cp is a useful statistic for model choice. Mallow's Cp may be defined as follows: Cp = [ MSEp / MSEk ](n p 1) (n 2(p + 1), where MSEp = mean square error of the model with p regressors; MSEk = mean square error of the model with all k regressors being considered; n = the number sample observations; and p = the number of regressors in a final and/or "optimal" model. The optimal model should have a Mallow's Cp value that is approximately equal to p+1. If Cp >> p+1, then the model is potentially "over-specified" (in short, it contains more variables than are necessary, and therefore, increases the likelihood of multi-collinearity of a simple or complex form); whereas when Cp << p+1, the model is likely to be "under-specified", with at least one important independent variable having been omitted. Model selection and parsimony In the evaluation of a subset model, the choice of how many variables to retain in that model should be subject to test or model-choice criterion. Mallow's Cp is a useful statistic for model choice. Mallow's Cp may be defined as follows: Cp = [ MSEp / MSEk ](n p 1) (n 2(p + 1), where MSEp = mean square error of the model with p regressors; MSEk = mean square error of the model with all k regressors being considered; n = the number sample observations; and p = the number of regressors in a final and/or "optimal" model. The optimal model should have a Mallow's Cp value that is approximately equal to p+1. If Cp >> p+1, then the model is potentially "over-specified" (in short, it contains more variables than are necessary, and therefore, increases the likelihood of multi-collinearity of a simple or complex form); whereas when Cp << p+1, the model is likely to be "under-specified", with at least one important independent variable having been omitted. Recently, the "PRESS" criterion has been used as an alternative to Mallow's Cp. PRESS stands for the Predicted Sum of Squares, and is defined as follows: ^ 2 PRESS = (Yi Y(i) ) i=1 where the symbol ^ (i) denotes the "predicted value" for Y the i-th observation of the dependent variable (Y) obtained when the subset regression model is estimated with that data point for the i-th observation omitted from the sample. n The quantity... ^ (Yi Y(i) ) is referred to as the "deleted residual value" (DRV) for an i-th observation... a specialized error term (of which there are n)...calculated w.r.t. each deleted observation for a given "candidate" subset model. In short, a "candidate" subset model is fit to the sample data n times, each time omitting one of the data points. A predicted value for the omitted observation is then recovered in order to obtain the corresponding DRV. The DRVs are then summed for that subset model to find PRESS. Note that PRESS may then be used to compute PRESS R-square for each subset model. Small differences in the DRVs indicate that the model under consideration is predicting well... and is not adversely affected by omitting individual observations. In other words, the model is "stable" in terms of yielding regression estimators. Subsequently, one desires a model that minimizes PRESS (or maximizes PRESS R-square)... this is typically the model where Mallow's Cp is approximately equal to p+1, where p defines the number of regressors in the model where PRESS is minimzed. The relationship between R2, MSE(2), Cp, PRESS, and p R2 MSE p Cp 7 ~ Cp = p+1 8.1 Press(min) p 7 PRESS is minimized at p=6, and where Cp 8 PRESS 7 p 7 p Stepwise Regression and Model Identification Another more controversial method is to employ... "stepwise" regression. There is debate on the merits of this approach amongst those in the scientific/academic community. Stepwise regression (and there are four major types), employs a screening procedure to determine which explanatory variables to retain and which to drop from the model based on their explanatory power. It is a sequential method for adding and/or eliminating regressors in the determination of a final regression (subset) model which provides the best possible fit (for the chosen stepwise procedure). Data mining? Perhaps. But it is justifiable as part of an inductive approach to model building under the Scientific Method. Four types of stepwise procedures: (1) Forward-Stepwise Regression (FSR); (2) Backward-Stepwise Regression (BSR); (3) General Stepwise Regression (GSR); (combining FSR and BSR); and (4) Min. MSE Stepwise (based on SSE, Cp, , and/or the PRESS criterion) See SPSS or NCSS manual for more information on stepwise procedures. FSR: Starts with a full-blown model (all k regressors) Y = f(X1, X2, X3, X4, ... Xk) Step 1. Run Y = f(1) ... a model with only a constant term to assess the explanatory power of the mean of Y; Step 2. Find explanatory variable that explains the greatest amount of variation in the dependent variable Y and add it to the model: e.g., Y = f(1, X3). Step 3. Find explanatory variable that add the next-greatest amount of explanatory power to the model... based on the residuals (unexplained variation) from the model in Step 2 (or stop if no significant amount of additional explanation can be added from the remaining variables. Suppose, that a second variable can be added: Y = f(1, X3, X4). Step 4. Inspect remaining variables to see if another can be added... if so, add that variable; if not, then stop procedure..etc. The Sequential Process... when does it stop? Continue sequential expansion of the model until no further "significant" explanatory power is found amongst the remaining variables not chosen in prior steps (i.e., stop procedure when no additional variables can be added to the model to account for unexplained variation in the dependent variable). Note: Forward Stepwise? Selection procedures are widely used when mulitcollinearity is encountered. Forward Selection Algorithm (based on the F-Statistic) -the most common form used by software packages. Iteration 0. This procedure begins with the simplest function or designation of the regression model... Y = f(constant), where the constant is the mean of Y. Y = o + ...at iteration i = 0, and the F-statistic (F-ratio) is calculated to assess the degree to which variation in Y is accounted for by its mean. ------------------------------------------------------------------------------------------ Let us denote F for a given iteration i as F(i). Iteration 1. Evaluate each predictor or regressor X(j), j=1,..., k, by fitting the model(s): Y = o + 1 X(j) + using the Least-Squares criterion and obtain estimates for SSE[X(j)] the Sum of Squared Error associated with the variable X(j) based on the best estimates of the betas and each models' ability to predict Y... and find the smallest SSE[X(j)] value best-fit model at iteration i=1. ... and label SSE as SSE[X(1)] the sum of square error associated with that regressor that entered the model at iteration 1. Next, calculate F(1), for the subset of regressors from iteration 1: F(1) = {SSY SSE[X(1)] } / MSE[X(1)], where SSY is the total sum of squares (associated with the dependent variable), and MSE is the mean square error from the best model identified at iteration 1... same formulae from ANOVA. Note that at iteration 1, only one independent variable has entered the model... denoted generically as X(1). ... this does not mean it is X1! Iteration 2. Evaluate each remaining regressor X(j*), j*=1,..., k-1, by fitting the model(s): Y = o + 1 X(1) + 2 X(j*) + using Ordinary Least-Squares and obtain estimates for SSE[X(1), X(j*)] the Sum of Squared Error associated with variable X(1) and each of the other remaining variables based on each models' ability to predict Y. Again, find the smallest SSE[X(1), X(j*)] value best-fit model at i=2 based on the new regressor that has entered the model during iteration 2... and calculate F(2): F(2) = {SSY SSE[X(1), X(2)] } / MSE[X(1), X(2)]. Iteration 3. Evaluate each remaining regressor X(j*), j*=1,..., k-2, by fitting the model(s): Y = o + 1 X(1) + 2 X(2) + 3 X(j*) + using Ordinary Least-Squares and obtain estimates for SSE[X(1), X(2), X(j*)] the Sum of Squared Error associated with variables {X(1), X(2)}, and each of the other remaining k-2 variables based on each models' ability to predict Y. Once again, find the smallest SSE[X(1), X(2), X(j*)] value best-fit model at i=3 based on the new regressor that has entered the model during iteration 3... and calculate F(3): F(3) = {SSY SSE[X(1), X(2), X(3)]}/MSE[X(1), X(2), X(3)]. ...Etc. etc. Statistical justification for "termination" of the procedure... If for an i-th iteration: F(i) > F(i-1), then add X(i) to the model. If for an i-th iteration: F(i) < F(i-1), then do not add X(i) to the model... and terminate the procedure. -----------------------------------------------------------------------Upon termination at an i-th iteration (i), the subset model will contain a total of p regressors, and p = i-1. Note that F(i) is also known as an "incremental F-statistic" or an "iterative F". BSR starts with the full-blown model and eliminates one variable at a time (the weakest variable) until the F(i) value is maximum and SSE is minimized. BSR is problematic, however, if multi-collinear variables are resident... so great caution must be used. GSR A variable selection method in which BSR and FSR are alternately employed: BSR(1), FSR(2), BSR(3), FSR(4)... etc. In this alternating sequence (beginning with BSR), the new subset model is evaluated at each step of the way... and the algorithm terminates when either no remaining variables can be eliminated or when no additional variables can be added to the model. This technique is commonly used as a check... to see if it produces the same results as either the FSR or BSR standalone method. III. The Use of "Interaction Terms" (in model building) ...for quantitative variables A regression approach that is applied when the independent variables in a model "interact" with one another... for uncovering hidden effects that cannot be found simply by examining scatter plots of the dependent variable Y vs. individual (Xi)'s. Scatter plots can reveal the stand-alone functional relationship between Y and an given Xi, but they do not reveal more complex relationships that exist between Y and two or more Xi's. Non-interactive relationships... Consider the model: E(Y) = 1 + 2X1 1X2 ... a first-order linear model. When X2 = 0: X2 = 1: X2 = 2: E(Y) = 1 + 2 X1 E(Y) = 2 X1 E(Y) = -1 + 2 X1 Note that the slope coefficient associated with X1 is not affected by the value of X2, as it remains stable at 2. Graphically, Y 1 + 2X1 2X1 -1 + 2X1 4 3 X1 and X2 are "non-interactive" Note: parallel lines 2 1 -1 The Y-intercept is the only thing that changes; slope remains constant 1 2 3 X1 Hence, the variable X2 has no discernable effect on the relationship between Y and X1 . Interactive relationships... Consider the model: E(Y) = 1 + 2X1 1X2 + X1 X2 ... an interactive model (partial second-order linear model), where X1 and X2 are known to "interact". When X2 = 0: E(Y) = 1 + 2 X1 X2 = 1: X2 = 2: E(Y) = 3 X1 E(Y) = -1 + 4 X1 Note that the slope coefficient associated with X1 is now "dependent" upon the value of X2, and can take on different values over the range of X2. Graphically, Y 4 3 2 1 -1 X1 and X2 are "interactive" Note: intersecting lines Both the Y-intercept and the slope coefficient for X1 change with X2 1 2 3 X1 Hence, the variable X2 has a discernable effect on the relationship between Y and X1. Regression modeling and the "Expansion Method" For k=3 quantitative independent variables: where Y = f(X1 , X2 , X3 ) and the X's are known (or hypothesized) to interact multiplicatively, we can specify E(Y) = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X1 X2 + ... ...+ 5 X1 X3 + 6 X2 X3 Warning: The inclusion of interaction terms may produce multicollinear regressors... requiring pre-testing the model for variance inflation and/or the use of stepwise procedures. Note: This is a partial second-order interactive model. Using the Expansion method: 3 variables turned into 6 regressors! Suppose that Y = f(X1 , X2 , X3 ) and the X's are known (or hypothesized) to interact multiplicatively and there is a known 2nd-order polynomial relationship between Y and the X's. We can now specify the full-blown 2nd-order interactive model as E(Y) = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X12 + 5 X22 +... ...+ 6 X32 + 7 X1X2 + 8 X1X3 + 9 X2X3 Note: This is a full second-order interactive model. Using the Expansion method: 3 variables turned into k=9 regressors and k* = 10 coefficients that must now be estimated. To determine the "order" of a model... add the observed or implied exponent for any right-hand side explanatory variable... and find the largest value. X11 X21 1+1 = 2 second-order X22 2 second-order Full model: All possible combinations of independent variables are considered and included in the specification Partial model: Not all possible combinations of independent variables are considered and included in the specification A 3rd-order specification (with 3 independent variables) would turn into a model with 19 regressors!! Original model: Y = f (X1, X2, X3) Expanded model (full-blown 3rd-order specification): Y = f (X1, X2, X3, X12, X22, X32, X1X2, X1X3, X2X3, X13, X23, X33, X1X2X3, X12X2, X12X3, X22 X1, X22X3, X32X1, X32X2) As you can see, the regression test model can grow geometrically with the addition of independent variables or an expansion of the order of the model. Interpretation of the estimated parameters becomes somewhat difficult in the case of interactive models... and great caution must be exercised in using such a framework. Suggested readings on the topic (highly recommended): "Multiple Regression: Testing and Interpreting Interactions" L.S. Aiken and S.G. West (Sage, 1991). "Interaction Effects in Multiple Regression" J. Jaccard, R. Turrisi, and C.K. Wan (Sage, 1990). Nevertheless, Forward Stepwise Regression (FSR) offers much in the way of testing "expanded models" (especially where limited degrees of freedom might be an issue-- in situations where the number of regressors (k) becomes very large... and/or the sample size n is relatively small. Testing the Structural Stability of a Model (across two time periods) Goal: To evaluate if structural changes or shifts have occurred in a given phenomenon that is being modeled using linear regression. Climate modelers break or change ? Obvious break or change in the process Testing for Structural or Parameter Stability in Regression Models the Chow Test Suppose we are running a regression model to explain variation in Y (total rainfall) as a function of {X} or {X1, X2,... Xk}: one or more climate indices. We are running the model on the same geographic area, but for two distinct time periods: t=1 (1935-1970 inclusive) 36 years t=2 (1971-2008 inclusive) 38 years ...and our goal is to test to see if there is any evidence of a structural change in the relationship between Y and X's between the two time periods. If the independent variable(s) are adequate in accounting for variability in the dependent variable, we would expect to find only minor differences in their overall explanatory power of the model between the two time periods, and we expect that the relationship between variables (the estimated coefficients) are fairly stable; this, of course, assuming that there will always be slight differences due to the fact that we have two distinct "samples" two SRFs). Suppose the model(s) are as follows... Time period t=1: Yi,t=1 = 0,t=1 + 1,t=1 Xi,t=1 + i,t=1 , with i=1,..., n1 sample observations (n1 = 36) ; -----------------------------------------------------------------Time period t=2: Yi,t=2 = 0,t=2 + 1,t=2 Xi,t=2 + i,t=2 , with i=1,..., n2 sample observations (n2 = 38) ; Suppose for the bivariate models... the following scatterplots (and trend lines) are observed: SRF t=1 Total Y Rainfall SRF t=2 Note that it appears as if there is slightly more error variance in time period t=2 than in t=1. X To test for structural changes: changes caused by possible differences in the intercept or slope coefficient(s) or both one could apply a Chow Test. Test assumptions. The error terms of the two models are (a) independently distributed (they do not covary); and (b) normally distributed, and homoscedastic (i.e., they are of constant variance 2): t=1 ~ N(0, 2) and t=2 ~ N(0, 2). Hence, as a pre-test... one must run an F-ratio to compare the variances of the recovered LS residuals from each model. Recall, that ^ ^ ^ 2 / 2 , where 2 > 2 ^ F = 1 2 1 2 Note: the subscripts here refer to the variance estimates for independent samples 1 and 2 (and not necessarily time periods 1 and 2), where the larger variance is always placed in the numerator of the ratio). Note that the F-ratio is distributed as an F-distribution with (n1 k*) and (n2 k*) degrees of freedom for the numerator and denominator, respectively; and where n1 (n2 ) is the sample size associated with the larger (smaller) variance estimate for the error terms, and k*= k+1, where k is the number of explanatory variables in both models (i.e., the number of slope coefficients that are estimated: j's, for j>0). Thus, the standard F-test can be used to evaluate the null hypothesis of "homoscasticity" (equality of variance is equivalent to constant variance here) Ho : 12 = 22 = 2 , where F ~ F(n 1 k*)(n2 k*) , and tested using a level of significance (), typically =.05 at the 95% confidence level. Suppose that the following result were obtained for the two regression models: 1. Error (Residual) Sum of Squares: (ESS1) = 16,337.0 and the estimated error variance (for t=1) is ^ 2 = ESS / (n 2) = 16,337.0 / 34 = 480.5; t=1 1 1 2. Error (Residual) Sum of Squares: (ESS2) = 24,134.4 and the estimated error variance (for t=2) is ^ 2 = ESS / (n 2) = 24,134.4 / 36 = 670.4 . t=2 2 2 Suppose that in our study, we cannot reject the null hypothesis of equality of error variance (i.e., homoscedasticity) at the 95% confidence level as F < F [36 (num), 34 (den)] d.f., where F = 670.4 / 480.5 = 1.39 < F (critical) Note: the error variance for sample t=2 has the larger variance so it is placed in the numerator of the F-ratio. Probability density Once, the assumption of constant variance has been validated, we can proceed to carry out the Chow test. 1.39 (F computed) Test procedure: 1.76 If F < F fail to reject null hypothesis If F > F reject null hypothesis at the (1- ) x 100% confidence level. 0 Non-rejection region F F Rejection region Highly improbable values The Chow Test...carried out in a series of steps: 1. Estimate the "pooled regression" model (using the data for both time periods... in this case, all n1 + n2 = n = 74 observations): Yi,t = 0,t + 1,t Xi,t + i,t , and recover the "restricted error sum of squares" RESS (or the ESS for the "pooled model"). Suppose that for our data/model... RESS = 47,355.0. Note that the overall ESS for the pooled model is called "Restricted Error Sum of Squares" (RESS) because it is obtained by imposing the restrictions that intercept and slope terms of the two sub-period models are not different: 0,t=1 = 0,t=2 and 1,t=1 = 1,t=2 , in short, there is no overall structural shift in the model's estimated coefficients (i.e., the SRF's are not significantly different from one another). 2. Compute the ESS for each of the two models and sum to find the "unrestricted error sum of squares" (UESS); where UESS = ESS1 + ESS2 , which in our example turns out to be UESS = 16,337.0 + 24,134.4 = 40,471.4 . Note that this is a value that turns out to be much less than the RESS (recall: 47,355.0). Since the two samples (SRFs) are deemed as "independent"...as should their error terms, RESS for the pooled model and the UESS for the summed ESS values from the individual models should be "equal" (in theory). Note also that in both the "Unrestricted" (UESS) calculation and the "pooled model", the implied degrees of freedom are equal to dfChow = (n1 + n2 2k*). Under the Chow Test, it is hypothesized that there is no structural shift or changes in the model between the two time periods (i.e., SRF1 and SRF2 are essentially the same, statistically.. and differ only in terms of sampling variability), then... RESS and UESS should not be significantly different! 3. The Chow Test entails the calculation and evaluation of an F-test, of the form F = [(RESS UESS ) / k*] / [UESS / (dfChow)] , where dfChow = n1 + n2 2k*, and where the test statistic: F ~ F[k*, (n1 + n 2 2k*)] . In our example, F = [(RESS UESS ) / k*] / [UESS / (dfChow)] , = [(47,355 40,471.4) / 2 ] / [40,471.4 / 70] = 3441.8 / 578.16 = 5.95 Note that dfChow = 38 + 36 4 = 70. Note also that the critical F for F[k*, (n1 + n at the =.05 significance level is 3.12. 2 2k*)] Probability density Test procedure: If F < F fail to reject null hypothesis If F > F reject null hypothesis at the (1- ) x 100% confidence level. 3.12 (approx) 5.95 (F computed) F 0 Non-rejection region F Rejection region Thus, we must "reject the null hypothesis" (H0) of no structural change at the 95% confidence level. In other words, there is statistical evidence of a structural shift occurring between the two time periods (the sample regression functions are not equal). Testing the Functional Form of a Model (Linear vs. Log-Linear) Suppose we are torn between the results of a linear model and a log-linear version of the same model. Which model gives us the best results and should we pick the transformed version over the untransformed model? For example, consider the following: Yi = 0 + 1X1i + 2X2i + i , lnYi = 0 + 1lnX1i + 2lnX2i + i , where Y is suspended sediment flows in a river system, X1 is water discharge, and X2 is discharged lagged from a previous time period. Suppose that both models do comparably well... Model 1 Adjusted R-square: .739 F-ratio: 24.0* * significant at 95% Model 2 .775 28.2* Functional Form (FF) Test Procedure: MacKinnon, White, and Davidson (MWD) test... 1. Estimate the linear model and recover the predicted values for the dependent variable: Yi ; 2. Estimate the log-linear model and recover the predicted values for the dependent variable: lnYi ; 3. ^ ^ from step 2 ^ Calculate the variable ZFF1i = [ ln(Yi ) (lnYi ) ] , which is defined ln value of the "predicted value" from the linear model less the "predicted value" from the log-linear model. ^ Null Hypothesis and Alternative Hypothesis for the MWD test: H0 : Model is Linear The dependent variable Y is a "linear function" of the regressors; and HA: Model is Log-Linear The dependent variable lnY is a function of the logarithms (ln's) of the regressors. Note that the test can be applied to either base-10 logs or natural logs (ln). 4. Regress Y on the X's and ZFF1 from step 3... and evaluate the coefficient associated with ZFF1 using a standard "t-test" (to see if the corresponding estimated coefficient is equal to zero under the null hypothesis). In short, run the model below, and inspect the t-value associated with the estimate for . Yi = 0 + 1X1i + 2X2i + ZFF1i + i . Reject Ho: Dependent Var. Y is a "linear function" of the regressors (X's) if t > t/2 (critical) for (i.e., if the estimated coefficient is found to be significantly different from a value of zero at =.05). 5. Calculate the variable ZFF2i = [ antilog(lnYi ) (Yi ) ] , which is defined as the antilog of the "predicted value" from the log-linear model less the "predicted value" from the linear model (obtained in steps 1 and 2). ^ ^ 6. Regress lnY on the lnX's and ZFF2 from step 5... and evaluate the coefficient associated with ZFF2 using a standard "t-test" (to see if the corresponding estimated coefficient is equal to zero under the null hypothesis). In short, run the model below, and inspect the t-value associated with the estimate for . lnYi = 0 + 1lnX1i + 2lnX2i + ZFF2i + i . Reject H1: Dependent Var. lnY is a function of the log of the regressors (i.e., if lnX's) t > t/2 (critical) for (i.e., if the estimated coefficient is found to be significantly different from a value of zero at =.05). The degrees of freedom for t-test: (n k* 1), where...as before, k* = k+1. Note: We lose an additional degree of freedom for the estimation of the test variable in each re-run for ZFF(j) in each model (j=1,2). Suppose that the following results were obtained for samples with n=44 observations, and models with (n k* 1) = (44 3 1) = 40 d.f.: Yi = 0 + 1X1i + 2X2i + ZFF1i + i ; t = 2.14 > t* Reject Ho : Linear Model (t = 2.02) /2 -------------------------------------------------------------------- lnYi = 0 + 1lnX1i + 2lnX2i + ZFF2i + i ; t = 1.79 < t* (t/2 = 2.02) -------------------------------------------------------------------Fail to Reject H1 : Log-Linear Model In sum, there is statistical evidence to support the use of the "double-log model": lnY = f(lnX1, lnX2) as the superior model based on adjusted-R2, the F-ratio, and the results of the Mackinnon, White, and Davidson (MWD) test. A Short Note on the Use of Scatterplots and Model Building Interaction Terms should always be considered (when applicable) Consider the following data... for the model: Y = (X1, X2, X1X2) When reviewing your data using scatterplots and correlations, always review potential interactions (when applicable)... i.e., when supported by theory or empirical evidence. In both cases, the relationship between Y and X1; Y and X2 appear to be weak and/or random. Suppose that OLS results show that Y = f (X1) and Y = f (X2) are very weak or insignificant models with R2 values < .10. Does this mean that X1 and X2 are poor explanatory variables? No. Note that the relationship between Y and X1X2 is very pronounced (strong inverse statistical relationship). Model: Y = f (X1, X2, X1X2) Note that only X1X2 offers any explanatory power (not X1, X2). After 1 iteration, the forward stepwise procedure stops Issues in Model Building As we add additional regressors to a model we will always improve the R-square (and even potentially increase the adjusted R-square value). Simple models are preferred to complex models (especially when interpretation of the estimated beta values becomes difficult due to the use of interactive variables, high-order specifications, transformations, etc.), yet we also seek to find the "best model" Inherent trade off. Burning question: Is the increase in explanatory power of the model real or is it due to chance? And can we test to see if the improvement in the explanatory power of the model with more regressors (versus less) is statistically significant? Answer: Yes! Model Comparison Test: Model vs. Subset Model ...for Complete vs. Partial ("Reduced") Model Comparison Model Comparison Test (based on the F-ratio) Used to compare a "subset" model to a "complete" model; that is, to compare a condensed or reduced version of a model to an expanded version of a model that potentially offers greater explanatory power through the inclusion of additional regressors. A formal test to validate the significance of additional explanatory variables brought into a model... to assess whether these variables provide a significant increase in the explanatory power of a model (above and beyond that which is purely due to chance). Consider an original (or reduced) model with g regressors: E(Y) = 0 + 1 X1 + 2 X2 + ... + g Xg ...versus an expanded (or complete) model with (k-g) additional regressors: E(Y) = 0 + 1 X1 + 2 X2 + ... + g Xg + ... ...+ g+1 Xg+1 + ... + k Xk Test Question: Have the additional (k-g) regressors added any real (significant) explanatory power to the model... above and beyond that which would be expected randomly (i.e., by adding randomly generated variables to the model). Null hypothesis Ho: g+1 = g+2 = k = 0 Alternative Hypothesis At least one of the parameters is non-zero (for the j=1, ... k-g additional parameters being evaluated). Test: Modified F-test To compare explanatory power of... Reduced or subset model: Y = f (X1, X2,... Xg) versus the Complete or full model: Y = f (X1, X2,... Xg, Xg+1, ... Xk) Test Statistic: F-ratio F = (SSER SSEC) / (k-g) SSEC / [ n - (k+1) ] where (SSEC < SSER), by definition, given that an additional (k-g) regressors have been added to the model, and SSER = sum of square error for the "reduced" model; SSEC = sum of square error for the "complete" model; (k-g) = the number of additional regressors added or the number of additional parameters that must now be estimated in the complete vs. reduced model; n = the sample size (number of observations); and k+1 = number of parameters that must be estimated in the complete model. The test statistic is distributed as an F-distribution with V1 = k g and V2 = n (k+1) degrees of freedom for the denominator. Prob. Density degrees of freedom for the numerator; Should F > F we must "reject" null hypothesis F 0 F rejection region Application. Suppose that we are testing to see if a full-blown 2nd-order interactive model is outperforming a subset model (partial 2nd-order model). Consider the following specifications: E(Y) = 0 + 1 X1 + 2 X2 + 3 X1X2 (subset/partial 2nd-order model the reduced model) E(Y) = 0 + 1 X1 + 2 X2 + 3 X1X2 + 4 X12 + 5 X22 (a full-blown 2nd-order model the complete model) ...where the regression results are used to test-Ho: 4 = 5 = 0 vs Ha: At least one of the beta parameters (4 , 5 ) is non-zero... ...testing at the 95% confidence level ( = .05). Suppose that the following information is obtained upon OLS estimation of the respective models with a sample size of n=20 observations: SSER = 6.6333 ...for g=3 regressors SSEC = 2.7447 ...with 2 additional regressors (where k = g+2 = 5) The F-statistic is F = (SSER SSEC) / (k g) SSEC / [ n - (k+1) ] = (6.6333 2.7447) / 2 2.7447 / (20 6) = 9.92 Note that F > F ...where F = 3.74 (for V1= k g = 2 and V2 = n (k+1) = 14 d.f. ) ... leading us to "reject" the null hypothesis Ho in favor of Ha . Conclusion, there is statistical evidence that the inclusion of the two regressors yields significant additional explanatory power to the model (at 95% confidence). IV. The Use of "Dummy Variables" and Extensions For categorical or "qualitative" data (for attributes measured at the nominal scale) Allows us to import qualitative information directly into a regression model Offers a compact way in which to test for the possibility of multiple models or SRFs (i.e., more than one PRF)... using a single equation format as opposed to multiple equations. Dummy variables one or more independent variables which take on a binary form {1, 0}, to denote the presence or absence of some attribute(s) in the sample observations. Geographic applications (for spatial analysis): Dummy variables (D) may be used to explore possible regional or locational differences in the relationships among variables. Spatial distribution of observations Region B Region A Di = 1 if Region A = 0 if Region B (if not Region A) ...for i = 1,... , n obs. 1 Dummy Variable would be included in the list of independent variables to differentiate observations from Region A versus those from Region B. D1i = 1 if Region A = 0 if not Region A D2i =1 if Region B = 0 if not Region B Region B Region A ...for i = 1,... , n obs. --------------------------------------------------------- Note that if D1i= 0 and D2i = 0, then that i-th observation is from Region C the default case. 2 Dummy Variables would be included in the list of independent variables to differentiate observations from Region A, B, and C. Region C Note: It does not matter what region is assigned as the "default case". Suppose we had additional qualitative information (attribute I) for the Study Region: highland vs. lowland area. Region B D1i = 1 if Region A Region A = 0 if not Region A D2i =1 if Region B = 0 if not Region B D3i = 1 if highland area = 0 if lowland area (if not highland) ...for i = 1,... , n obs. 3 Dummy Variables would be included in the list of independent variables to differentiate observations from Region A, B, and C and the attribute I. Region C Note: Highland area shown in red cross-hatched area. Note: Individual observations are now be completely differentiated by using three dummies to denote all possible locational attributes. In our example, there are the V=5 groups, based on multiple attributes: Region A lowland: Region A highland: Region B lowland: Region B highland: Region C lowland: Region C highland: D1 = 1, D3 = 0 and (D2 = 0) D1 = 1, D3 = 1 and (D2 = 0) D2 = 1, D3 = 0 and (D1 = 0) D2 = 1, D3 = 1 and (D1 = 0) D1=0; D2=0; D3=0 not applicable in this case Suppose we yet another qualitative variable (attribute II) showing a "protected area"... we can add a fourth dummy. Region B D1i = 1 if Region A Region A = 0 if not Region A D2i =1 if Region B = 0 if not Region B D3i = 1 if highland area = 0 if lowland area (if not highland) D4i = 1 if in protected area = 0 if not in protected area ...for i = 1,... , n obs. Note: Protected area as delineated by green area. Region C In our example, there are the V=10 groups, based on multiple (locational) attributes: Region A lowland, protected: D1 = 1, D3 = 0, D4 = 1 Region A lowland, non-protected: D1 = 1, D3 = 0, D4 = 0 Region A highland, protected: D1 = 1, D3 = 1, D4 = 1 Region A highland, non-protected: D1 = 1, D3 = 1, D4 = 0 Region B lowland, protected: D2 = 1, D3 = 0, D4 = 1 Region B lowland, non-protected: D2 = 1, D3 = 0, D4 = 0 Region B highland, protected: D2 = 1, D3 = 1, D4 = 1 Region B highland, non-protected: D2 = 1, D3 = 1, D4 = 0 Region C lowland, protected: D1= 0; D2 = 0; D4 = 1 Region C lowland, non-protected: D1 = 0; D2 = 0; D4 =0 Region C highland (protected or non-protected): not applicable Rule of Thumb: The number of dummy variables should always be equal to one less than the total number of sub-samples or groupings identified (determined by the number of unique sets of attributes). For 2 regions (Regions A and B) 1 dummy variable (D): if Region A, then D =1 if Region B (not Region A), then D =0 For 3 regions (Regions A, B, and C) 2 dummy vars (D1, D2): if Region A, if Region B, default (if not A or BRegion C) then D1 = 1; D2 = 0 then D1 = 0; D2 = 1 then D1 = 0; D2 = 0 There will always be a default case. If there are g sub-samples or groupings, there will be g-1 dummies; and, by default, the g-th case will be represented when all dummies are equal to 0. How many dummies (locational attributes)? Depending on where sample observations are coming from... it could get complicated. Study Region Highlands A B Political boundaries 2 jurisdictions A, B Park Park buffer zone Floodplain market Market accessible area/ threshold distance (observed) One could also employ "interactive/interacting dummies" (IDs) or "multiple attribute dummies" (MADs) to cover all scenarios... D1 = 1 if highlands region; = 0 if not highlands region D2 = 1 if jurisdiction A; = 0 if not A (if jurisdiction B) An Interactive or Interacting dummy may be specified... (D1D2) = 1 if highlands/jurisdiction A (note that highlands B cannot be directly identified) A Multiple-Attribute Dummy may also be utilized... D3 = 1 if highland region and jurisdiction B; = 0 if not highland or jurisdiction B n.b. Great caution must be used so as to not create perfectly collinear "sets" of dummy variables. Differentiating between locations or some other attribute allows us to test for the possibility of multiple SFRs (PRFs) and dummy related "interactions" between independent variables or regressors. Hence, Dummy Variables are introduced into the regression model in both their "raw" and "interactive forms"...that is, as stand-alone regressors and as variables that interact with the other quantitative variables in the model. In other words, dummy variables may be used as... stand-alone variables to assess "intercept-shifting" qualities interactive variables to assess "slope-shifting" qualities In short, there is a 3-fold purpose for incorporating dummy variables: 1. To test the assumption of "constancy" of the Y-intercept term amongst the various categorical groupings; 2. To test the assumption of "constancy" of the various slope coefficients based on categorical groupings; and 3. To test (via 1 and 2), the assumption of a single sample regression function (SRF)... and implicitly, the assumption of a single PRF... using a single equation; and thus, offering both convenience and expedience in model estimation/evaluation process ease of comparison). Note: The use of dummy variables is very compatible with stepwise regression procedures. Y Single "aggregate" SRF (single PRF implied) SRF (assumed) X Model (0) Y = 0 + 1 X + Y Possibility of multiple SRFs (multiple PRFs implied) SRF (Region A) SRF (assumed) SRF (Region B) Region A Region B X Y = 0 + 1 X + We could run "separate" equations for each region (A, B): (1) Y = 0 + 1 X + (A) for i = 1,..., n(A) observations (2) Y = 0 + 1 X + (B) for i = 1,..., n(B) observations where n(A) + n(B) = n ; ... allowing us to compare statistical results to see if there are obvious differences in the estimated coefficients of the models. Burning Question: If differences are found, are the differences "statistically significant"? This question is further complicated by the fact that many dummy variables may be introduced... thereby yielding many different equations to estimate individually (simply not convenient). In our simple bi-variate example, there are 4 possible outcomes: I. 0 = 0 = 0 and 1 = 1 = 1 -- the two regression models (1) and (2) are identical (that is, the estimated coefficients are not significantly different from one another) "aggregate model" (0) is the best model, and there is evidence of a single SRF (PRF). Y SRF 1 (slope) 0 X II. 0 = 0 and 1 = 1 = 1 -- the two regression models (1) and (2) are identical in terms of their slope... but not in terms of their Y-intercepts; and there is evidence of multiple SRFs (PRFs). Y 0 0 0 < 0 X SRF SRF III. 0 = 0 = 0 and 1 = 1 -- the two regression models (1) and (2) are identical in terms of their Y intercepts... but not in terms of slope; and there is evidence of multiple SRFs (PRFs). Y SRF SRF 1 < 1 0 = 0 X IV. 0 = 0 and 1 = 1 -- the two regression models (1) and (2) are not identical in terms of their Y intercepts or slopes; and there is evidence of multiple SRFs (PRFs). Y 1 < 1 0 0 X SRF SRF For our bi-variate example, the Dummy Variable procedure involves running a single equation/regression model... using one dummy (where D =1 if region A; and D = 0 if region B): Yi = 0 + 1 Di + 1 Xi + 2 (Di Xi ) + i for i = 1,..., n sample observations. Note that... 0 and 1 intercept coefficients 1 and 2 slope coefficients For the equation Yi = 0 + 1 Di + 1 Xi + 2 (Di Xi ) + i ...the "expected values" of Y are as follows: E(Yi | Di = 0, Xi ) = 0 + 1 Xi E(Yi | Di = 1, Xi ) = (0 + 1) + (1 + 2) Xi Note that if 1 and 2 are found to be not significantly different from zero, then one may use the original aggregate model (implying that there is only one SRF and one underlying PRF, and the dummy offers no additional explanatory power to the model). Note that is both (or either) 1 and 2 are found to be significantly different from zero, then there is statistical evidence of multiple SRFs (and more than one PRF). Interacting dummies with other variables Consider adding 1 dummy variable (D) to a full-blown 2nd-order regression model expands our regression from k*= 10 to k*=20 E(Y) = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X12 + 5 X22 +... ... + 6 X32 + 7 X1X2 + 8 X1X3 + 9 X2X3 + ... ... + 10D + 11 DX1 + 12 DX2 + 13 DX3 + 14 DX12 + ... ... + 15 DX22 + 16 DX32 + 17 DX1X2 + 18 DX1X3 + ... ... + 19 DX2X3 Note: Interacting dummies (IDs) (e.g., D1D2) and multiple attributes dummies (more than 2) may also be brought into a model and specified in their raw form and/or as "interaction variables" (like the dummy variable above). Recall our GPA = f(per capita household disposable income)... where SRF: Yi = 0 + 1 X1i + i = 1.375 + . 120 X1i + i ... with an adjusted R-square of .746. Suppose that additional (qualitative) dummy variable was brought into the model to differentiate observations (students) on the basis of geography location (as students in the sample were known to have come from 2 distinct school districts.. while in high school. Actual vs. Predicted values of Y Reconsider our data and results. Suppose that we uncover a potential regularity that occurs in the error term w.r.t. location. i 1 2 3 4 5 6 7 8 areal unit C D H A F E G B Y 4.0 3.0 3.5 2.0 3.0 3.5 2.5 2.5 Y(pred) 3.9028 3.1806 3.1806 2.4583 2.8194 3.5417 2.0972 2.8194 error .0922 -.1806 .3194 -.4583 .1806 -.0417 .4028 -.3194 sign + + + + - (See map on next slide) Study area from which sample observations of n=8 students were drawn (one randomly from each cell). Signs of the error terms are shown for each observation. - over-predicted Y A D B E + under-predicted Y C H + + F + G + School district I School district II Let us now specify a dummy variable (D) to denote school district (keying in on district II), and Di = 1 if an i-th observation/student came from school district II; = 0 if an i-th observation/student came from school district I. Re-writing our regression model (in its complete form): where... Yi = 0 + 1 Di + 2 X1i + 3 (Di X1i ) + i if Di = 0: E(Y) = 0 + 2 Xi (district I the default case) if Di = 1: E(Y) = 0 + 1 Di + 2 X1i + 3 (Di X1i ) (district II the targeted case) Yi = 0 + 1 Di + 2 X1i + 3 (Di X1i) + i Estimation of this expanded model yields the following statistical results (summarized from the OLS output)*: coefficient(estimated) 0 = .500 1 = 1.365 2 = .166 3 = -.0641 t-stat 3.478 7.968 16.125 -5.270 (prob) (.025) (.001) (.000) (.006) significant? at 95% yes yes yes yes *All estimated coefficients were found to be significantly different from zero at the 95% confidence level. The adjusted R-square = .9887 (an increase of + .242); s.e.e. ( ) = .069; and F-ratio = 206.6 (p=.0000) Prediction of GPA for X=14.0 is as follows: if school district I: Yi = 0 + 2 Xi (were Di = 0) = .500 + .166(14.0) = 2.82 (approx.) original est. 3.06** ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- if school district II: Yi = 0 + 1 Di + 2 X1i + 3 (Di X1i ) (were Di = 1) = .500 + 1.365(1) + .166(14.0) .0641(114) = 3.29 (approx.) original est. 3.06** ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ **Yi = 1.375 + .120 (14) = 3.06 (from original model). Note that the results suggest different "effects" of X on Y depending upon location. School District Estimated Coefficient I II -------------------------------------------------------------------------------------------------------------------------------------------------------- Y- intercept term .5000 1.8650 .1019 slope coefficient .1660 (measuring effect of X on Y) ----------------------------------------------------------------------------------------------------------------------------- --------- ^ ^ 0 + 1 = .5000 + 1.365 = 1.865 ^ + = .1660 0.0641 ^ 2 3 = .1019 For D=1 (District II), the estimated coefficients must be recovered. Using the F-statistic, we may now test H0: 1 = 3 = 0 F = (SSER SSEC) / (k g) SSEC / [ n - (k+1) ] = (0.652778 0.019231) / 2 0.019231 / (8 4) = 65.88 (approx.) Note that F > F ...where F = 6.94 (for V1= k g = 2 and V2 = n (k+1) = 4 d.f. ) ... leading us to "reject" the null hypothesis Ho in favor of Ha . Conclusion, there is statistical evidence that the inclusion of the two additional regressors yields significant explanatory power to the model (at 95% confidence)... above and beyond that attributable to chance. Suppose that additional information is obtained regarding the accessibility of the schools. In particular, whether or not a student was "bused" versus those "within walking distance" (say within 1.25 miles of the school--see arrow/radius). A B C H D E F G School district I School district II We may now specify a second dummy variable D2: where D2 = 1 if student was within walking distance; and = 0 if student was not within walking distance. In our example, there were only two student from district II that fall into this category. The new regression model takes on the form.. Yi = 0 + 1 D1i + 2 D2i + 3 X1i + 4 (D1iX1i) + 5 (D2iX1i) + i where Y = student's GPA; X = per capita household disposable income; D1 = 1 if school district II (= 0 if not school district II); D2 = 1 if within walking distance (= 0 if not). Estimation of this new expanded model using a stepwise regression procedure yields the following statistical results*: coefficient(estimated) t-stat (prob) significant? at 95% 0 = .5000 8.098 16.851 37.550 -11.833 (.003) (.000) (.000) (.001) yes yes yes yes 1 = 1.2819 3 = .1666 4 = -.06205 2 (not significant...kicked out of stepwise model) 5 = .01063 4.323 (.022) yes *All retained estimated coefficients were found to be significantly different from zero at 95% confidence. The adjusted R-square = .9979 (an increase of + .0092); s.e.e. ( ) = .0029; and F-ratio = 845.25 (p=.0000) After the stepwise procedure, only one additional regressor was added to the model: (D2X) "interaction term" (with a positive coefficient estimate... and a t > t(critical). Using the F-test, we can see if any real additional explanation was provided by this interaction term; where H0: 5 = 0. F = (SSER SSEC) / (k g) SSEC / [ n - (k+1) ] = (0.019231 0.00266) / 1 0.00266 / (8 5) = 18.68 (approx.) > F=.05 (1, 3 d.f.) = 10.13 There is ample statistical evidence to support the contention that the additional regressor improved the overall performance of the model, as we "reject" null hypothesis at the 95% confidence level. Moreover, the error terms are not significantly different from "normal" (based on the results of D'Agostino's tests)... and the spatial distribution of error exhibits no real definitive pattern (see below). A - (0) D - (0) B + (0) E + (0) F C + G + School district I School district II H - Note: (0) a value that is extreme close to zero Smoothing Spatial Data Some Alternative Techniques Y Consider the relationship between some dependent variable Y (depth in ft.) and X (increasing distance from a point of reference) bathymetry model Smoothed Data Trend SRF X SRF not picking up on nuances in relationship (i.e., micro trend or localized trends) Several options exist. Goal of smoothing methods to reduce some of the variability while capturing the underlying global and local trends, and/or to model values across locations... in order to obtain approximations or predictions where no data points are available. Three easy methods to consider... 1. Moving averages a simple first approximation technique, based on a sliding moving window along the spatial data series (equal weighting of points); 2. Inverse-distance methods an extension of the moving average technique that incorporates unequal weighting of points (with weights declining with increasing distance from the window's center... in accordance with some distance-decay function (1/d) or (1/d2)... or the use of negative exponential or inverse power functions; The use of a moving average is common in time-series modeling Inverse Distance Weighting 3. Splines polynomial fitted data (typically quadratic, cubic, or quintic)... fitting a polynomial regression within each window or sub-sample; and later piecing together the models to produce a smooth curve or surface. The regressions are constrained to match where the segments meet (at "knot points")... a.k.a. Spline Regression. {Note: One could very window size. In general, larger (smaller) window segments provide for greater (lesser) smoothing}. Based on the work of Chambers, 1977; Montgomery and Peck, 1982. Knot points window Interpolated value 3 knot point model 4 knot point model Spline regression versus Linear regression Y SRF SRF (splined) 4 knot points X 4. A fourth option is to apply "Lowess Regression" an acronym for locally weighted scatter-plot smoothing (which dates back to the work of Cleveland, 1979). LR involves the use of weights -- iteratively estimated and assigned to sample observations based on the performance of "prediction residuals" from various polynomial regressions run on a given window size in an attempt to best-fit the data. Under Lowess Regression... Initial estimates for the weights are obtained by setting them inversely proportionally to distance using a pre-determined function (typically cubic or exponential). Weights are then used to compute a "weighted linear regression"* for points within the window, allowing for the recovery of prediction residuals--defined as the difference between the actual values and those obtained from the regression model at an-ith iteration. The weights are then adjusted, at each iteration, giving more weight to observations with low residuals in the pass and less weight to "outliers". *We will talk more about weighted regression later. The process is repeated until the weights no longer change appreciably... when the model parameters converge... and a best-fit for the pre-defined smoothing parameter is found. By its very design, the Lowess regression smoothing technique is much less susceptible to "outliers" than other techniques... and one cane change the smoothing parameter to best fit the trend in the dependent variable. Issue we assign a smoothing parameter and choose the window size... little bit of data mining? Crude Birth Rate Lowess Regression produces a smoothed version of the relationship...in this case providing a much better fit than LS model LS regression Lowess regression Per capita GNP 1990 What is the right amount of smoothing? And what is the correct window size? (Is it representative of the scale)? A simple method for gauging the amount of smoothing is to examine the residuals (error terms) from the smoothing procedure. Note: If the span (proportion of points) of the smoothing window is too large, then the residuals will still retain the observed linear trend in the sample data. That is, if the spatial pattern or dependence found in the sample data is replicated by the error terms, the window is too large. One could start out with an arbitrarily small window and increase it is size until the dependence and/or the observed trend in error disappears (this approach is based on the work of Trexler and Travis, 1993). sensitivity analysis V. "Trend-Surface" (TS) Models and Kriging TS modeling is an alternative to "kriging"... and is a regression-based technique for modeling trends in spatial data and for predicting or interpolating values at locations where no sample data/observations exist. to model or best fit a trend surface Kriging a popular geo-statistical method for interpolation of spatial data; general involving the use of regionalized variables (including random and deterministic variables... at various distance from a point of reference), used to generate interpolated values. sample points Note: There are numerous kriging algorithms from which to choose, and the parameter-choice and functional possibilities to model variability are many. The old adage of "garbage in, garbage out" certainly applies here! Kriging can be somewhat of a "crap shoot" unless great care is taken in understanding the modeling process, the limitations of each kriging method, and the limitations imposed by the data in regard to sample size, distribution of values, density, constant vs. non-constant variance, the presence and effect of outliers, and the nature and/or scale of the spatial process. In kriging, spatial interpolations are made possible by analyzing the sample data within a predetermined/ prescribed "search radius" (kernel) or chosen range, where weights are then assigned to reflect spatial variability and the impact or effect of surrounding data points, under the umbrella of certain limiting assumptions associated with the chosen kriging technique that is deemed as optimal or "appropriate" based on (a) the attributes of the sample data; and (b) the underlying research objective(s). In defense of kriging, one could apply numerous kriging techniques and evaluate interpolated values to see if the various techniques produce "similar" estimates. Regression offers a much simpler alternative, provided that there is an adequate sample size and a fairly systematic spatial trend in a geo-referenced dependent variable (Zi) one that can be modeled using location coordinates, for i = 1,..., n sample observations (where n is sufficiently large, and the LS assumptions are met). Z A simple way to model/generate interpolated values... Trend-surface (TS) modeling is a method of explaining intra-regional variations and patterns in geo-coded data, based on fitting a surface of predicted values that best fits the observed surface and the prevailing spatial trends. It is a regression-based approach that can easily be applied without making "non-defendable" decisions on the choice of parameters inputs and weights... to produce interpolated values based on observed spatial trends and the estimated coefficients of a best-fit model. It involves the prediction and modeling of a trend variable a dependent variable (Zi) from a series of sample observations (i=1,..., n)... collected for n data points or "locations"... represented in terms of {xi ,yi} coordinates. TS analysis typically relies on the use of polynomial(s) and/or the use of interactive variables. Surface Map of Temperature Spikes for a systematic point sample n=16 Once a "best fit" model is identified, interpolated values may be generated for locations where Z is unknown (only for locations within the confines of the study region). TS modeling can be used to explain, predict, or replicate both "localized" or "regionalized" trends in Z relying on a regression platform that contains a polynomial expansion of {x,y} values, other explanatory variables, regressors, or dummy variables. A trend surface showing a rise in Z with increasing values of y; relatively easy to model... even with multiple trends Z y x A fairly complex trend surface based on a series of sample observations... much more difficult to model as there are multiple directional trends. Z y x Consider the following simple 2nd-order trend-surface model along a transect (with interaction terms): Zi = 0 + 1 xi + 2 yi + 3 (xy) + 4 xi2 + 5 yi 2 + i Such a quadratic model might be useful to fit a trend surface where the dominant trend, based on n sample observations, may look something the quadratic-like trends shown below. Z trend trend Z y x x y Theoretical polynomial trend surfaces (of order q=1,...5) From point data we can use regression to identify the SSRF Sample Surface Regression Function once an appropriate function form is identified. First-order polynomial (p=1) -- linear trend Second-order polynomial (p=2) -- quadratic trend Third-order polynomial (p=3) -- cubic trend Any trend-surface model of order q will allow one to model or "fit" a three-dimensional surface with (q-1)2 potential slope shifts. The appropriate order (specification) may be initially found or identified by visual inspection.. rotation of surface.. and/or by inspection of the average trends that occur along various transects or major axes. For example, the trend-surface below shows a possible 3rd-order polynomial (max) along the y-axis. Z x y A best-order model (q*) may be obtained by performing a regression-based sensitivity analysis, accompanied by a thorough assessment of the assumptions regarding error terms and structure, as well as careful attention to the various measures of model performance: adj. R-square, F, PRESS, parameter estimates and associated t-test, and the standard error of the estimate (RMSE). Hold-out samples can be used to test the model's ability to predict values at various locations across the study region. R.J. Johnston and others have warned about the distinct possibility of multicollinear regressors... so great caution must be exercised in the use of regression analysis when modeling trend-surfaces. In general, a q-th order trend-surface model may be specified as q q Zi = j, k ( xi j yi k ) + i j=0 k=0 where j + k < q 0, 0 is the constant term, and the other j, k `s are the slope coefficients. Given the likelihood of multi-collinear relations, the use of "stepwise" regression procedures is highly recommended to avoid inflation of variance (inefficiency) due to collinear regressors. Extensions: One of the ways to lessen the chance of multicollinear relationships in a TS model is to introduce m additional "interactive" variables (Vg) into the model (g=1,..., m). They may be continuous and/or dummy variables and would interact with the {x,y} components of the q-th order model. For example, consider... q q m q q g=1 j=0 k=0 Zi = j, k ( xi j yi k ) + g, j, k Vg ( xi j yi k ) + i j=0 k=0 where j + k < q and g=1,..., m new explanatory variables have been added to the model great for real estate applications (Fik, et al, 2003, Journal of Real Estate Economics). Note that the number of regressors would increase very rapidly, although it would be possible to eliminate a large proportion of them using a stepwise procedure or pre-test VIF regressions. In an attempt to explain variation Y in loan concentration and model its surface (top right), one might consider adding variables such as housing density, home values, and home age to the model in a way that X1 interacts with {x, y} coordinates... given that there appears to be readily X2 identifiable spatial relationships. X3 Why add dummy variables to a TS model? To account for highly localized or regionalized trends that are discernible from the larger global trend(s). To account for "discontinuities" or breaks, splines (knot points), holes in the trend surface, and/or dramatic changes in slope. Trends surfaces with discontinuities, marked changes in slope Z breaks knot point transect Trend surface of Land Values Jacksonville, Florida (1995) Note discontinuity St. John's River Dramatic shifts in slopes discontinuities hole Land Value estimated price per unit area Z Smooth trend surface with localized change in slope Fitted surface (global trend) without dummy observed surface {along x or y} Z transect Fitted surface (global and localized trend) with dummies observed surface {x or y} transect Dummy Variables could be assigned to pick up localized slope shifts or trends... to further enhance the performance aspects of a predictive trend surface model. Generally, the number of estimated coefficients k* associated with the {x,y} terms in any given polynomial model of order p (including the intercept term... as determined by a vector of 1's), estimated using OLS regression, can be defined as k* = { (p + 1) (p + 2) } / 2 Description Linear (p=1) Quadratic (p=2) Cubic (p=3) Quadric (p=4) value k* = 3 k* = 6 k* = 10 k* = 15 Regression Model Z = f(1, x,y) Z = f(1, x, x2, y, y2, xy) Z = f(1, x, x2, x3, y, y2, y3, ...xy, x2y, y2x) Z = f(1, x, x2, x3, x4, y, y2, y3, y4, ... xy, x2y, y2x, x3y, y3x, x2y2 Adding additional complexity (by increasing p) will certainly raise R-square, but is the added complexity justified? The number of additional {x,y} terms added to a polynomial specification of Z = f{x,y} at an order p', in comparison to p, where p' = p+1... will yield a total number of additional coefficients k', with k' = p' +1 The F-test described earlier can assist one in determining whether the variables added from increasing the polynomial order are actually offering real additional explanatory power. Trend surface analysis/regression can also be used to test for spatial dependence in the error structure for geo-referenced data as a diagnostic tool. Z X Y
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

University of Florida - GEO - 4167
Recall our recent Reading Assignments. Read and review: (a) the technical appendix in your textbook on Matrix approach to LS regression. Basic Econometrics by D. Gujarati, 2007, 4th edition. and/or (b) the posted Matrix Algebra review and the Matrix Appro
University of Florida - GEO - 4167
Extending Linear Regression: Weighted Least Squares, Heteroskedasticity, Local Polynomial Regression36-350, Data Mining 23 October 2009Contents1 Weighted Least Squares 2 Heteroskedasticity 2.1 Weighted Least Squares as a Solution to Heteroskedasticity
University of Florida - GEO - 4167
OLS Under Heteroskedasticity Testing for HeteroskedasticityHeteroskedasticity and Weighted Least SquaresWalter Sosa-EscuderoEcon 507. Econometric Analysis. Spring 2009April 14, 2009Walter Sosa-EscuderoHeteroskedasticity and Weighted Least SquaresOL
University of Florida - GEO - 4167
Regression Analysis Tutorial183LECTURE / DISCUSSION Weighted Least SquaresEconometrics Laboratory C University of California at Berkeley C 22-26 March 1999Regression Analysis Tutorial184IntroductionIn a regression problem with time series data (whe
University of Florida - GEO - 4167
Intermediate Quantitative MethodsTimothy J. Fik Associate Professor GEO 4167 section #6647 (undergraduate) GEO 6161 section #8377 (graduate)Credit hours: 3Thursdays (periods 2-4): 8:30-11:30AM Location: TUR 3012 SPRING 2012Intermediate Quantitative Me
University of Florida - AST - 1002
UFIDQ1 9.2 8.25 4.5 8 9.85 5.5 9.1 10 7.5 4.5 9.85 6 3.5 7 6.35 10 9 7.5 9.5 5.25 6.75 5 5.75 2.5 5.25 3.25 6.1 7 6.5 9.1 5 3.25 6.5 8.75 9 3.5 10 5 4.1 5.1 4.5 6.7501713653 03291993 03891805 05193165 09669612 11156163 11161338 11314038 11334031 1139879
University of Florida - AST - 1002
1/19/12discoveredinNov2011~600lyfromEarth P=290daysThe1stexoplanetorbiAngwithintheGoldilockzonearoundaSunlikestarReviewonLecture2WhyPtolemy'sEpicycleModelwasagoodtheory? WhyPtolemy'sEpicycleModelwasnotagood theory? Inwhataspect,Kepler'sModelissuperio
University of Florida - AST - 1002
1/19/12ImportantNo/ce1stQuizonJan26(1weekfromtoday) about10~15problems mul/plechoice+T,F+answering +simplemath? itwillcoverChap0.2. Tipsforstudyingthetextbook.Exoplanets51Pegasib*1stexoplanetdiscovered (1995)orbi/ngaSunlikestar CentralStar(51Pegasi)
University of Florida - AST - 1002
1/24/12Observa-onProjectI:Observingthe FullCycleoftheMoonAim:Understandingtherela-vemo-ons betweentheMoon&amp;Sunbyobserving 1)theloca-onoftheMoonintheskyata fixedobserva-on-me 2)thephaseoftheMoon Due:1weekbeforetheFinalExamObserva-onProjectI:Observingthe
University of Florida - AST - 1002
1/27/12ReviewL02L051.BeginningoftheModernAstronomy Aristotle,Ptolemy,Copernicus,Kepler,Newton 2.Exoplanets(examples&amp;generalproperMes) mass,eccentricity,distancefromhoststars,numberofmembers &amp;layout 3.DetecMonMethodsofExoplanets directimagingwithAO radia
University of Florida - AST - 1002
What'supUniverse? TheStrongestSolarFlareIn2012,arewedoomed?UnderstandingOurWorld,SolarSystemChap48kpclyrSolarSystemLayout(1)30AU 100AU 105AULaunchedin1977 V=20,000m/sAsofAug2006OortCloudisahypothePcalshellwhichiscomposedofnumerous cometlikebodie
University of Florida - AST - 1002
EarthMoonSystemPlanetEarthP=365days d=1AU =5,500kg/m3 6,387kmMoon=3,300kg/m3 1,738kmChap5StudyingEarth:LandscapesStudyingEarth:OverallStructure6mainlayersofEarth1)MetallicCores(ironcore) 2)Mantle(Silicatemantle) 3)Crust 4)Atmosphere 5)Trophospher
University of Florida - AST - 1002
2/10/12Reminder!ObservingProjectsReviewonLecture78TextbookChap45Keyconcepts Q.UnderstandingtheoverallproperKesoftheSolarSystem Layout,Orbits,ChemicalcomposiKon.etc Q.Understandingthebasicfeaturesofthenebulartheoryof SolarSystemformaKon Q.Understanding
University of Florida - AST - 1002
MarsRoverMissionsareincredibly cheapandefficient!Whathappenedsince1990?ReviewonLecture691.Exoplanets:DetecEonMethods(Chap4) GravitaEonalLensing AstrometricMeasurement(angulardistance) TransitMethod 2.SolarSystem(Chap48) GeneralProperEes(members,orbits,
University of Florida - AST - 1002
1/12/12Coursewebsite:www.astro.ufl.edu/~sczoo LecturenotewillbeuploadedonFriday ReviewsessionwillbegivenbeforeQuiz&amp;Exam CDisnotarequirement.ItisopIonal!Whereourstorybegins.?11/12/12NabtaPlayaStoneCircle AncientEgypt~5000B.C.TheGreatGizaPyramids (Pha
University of Florida - MAR - 3053
Enjoy! Hedonic Consumption and Compliance with Assertive MessagesANN KRONROD AMIR GRINSTEIN LUC WATHIEUThis paper examines the persuasiveness of assertive language (as in Nike's slogan &quot;Just do it&quot;) as compared to nonassertive language (as in Microsoft'
University of Florida - MAR - 3053
Exam #1 Review Sheet MAR 3503 Consumer Behavior Spring 2012 These questions should help you organize your thoughts and prepare for the exam. The questions on these pages are, in general, much broader than the questions you'll find on the exam. This means
University of Florida - MAR - 3053
A Stranger's Touch: Effects of Accidental Interpersonal Touch on Consumer Evaluations and Shopping TimeBRETT A. S. MARTINThis article examines an unexplored area of consumer research-the effect of accidental interpersonal touch (AIT) from a stranger on
University of Florida - MAR - 3053
Some notes on reading and evaluating behavioral research Courtesy Lyle Brenner Different papers have different approaches and goals, so not all of the considerations below will necessarily apply. But here are some questions to ponder when reading an empir
University of Florida - MAR - 3053
Plate Size and Color Suggestibility: The Delboeuf Illusion's Bias on Serving and Eating BehaviorKOERT VAN ITTERSUM BRIAN WANSINKDespite the challenged contention that consumers serve more onto larger dinnerware, it remains unclear what would cause this
University of Florida - MAR - 3053
1Copyright Journal of Consumer Research 2011 Preprint (not copyedited or formatted) Please use DOI when citing or quotingThe Presenter's ParadoxKIMBERLEE WEAVER STEPHEN M. GARCIA NORBERT SCHWARZ Author Note Kimberlee Weaver (kdweaver@vt.edu) is an Assi
University of Florida - MAR - 3053
Nostalgia: The Gift That Keeps on GivingXINYUE ZHOU TIM WILDSCHUT CONSTANTINE SEDIKIDES KAN SHI CONG FENGNostalgia, a sentimental longing for a personally experienced and valued past, is a social emotion. It refers to significant others in the context o
University of Florida - CNT - 6107
CNT 6107 Advanced Computer Networks, Spring 2012 Assignment 1given by Jonathan C.L. Liu Out: Feb. 01 (Wednesday), 2012 Due: Beginning of the lecture on Feb. 08 (Wednesday), 2012 The problem sets form an important part of the learning in this course. Thus
University of Florida - CNT - 6107
CNT 6107: A Quick Background ReviewJonathan C.L. Liu, Ph.D.Department of Computer, Information Science and Engineering (CISE), University of Florida1Uses of Computer Networks BusinessApplications Home Applications Mobile Users Social Issues New Emer
University of Florida - CNT - 6107
Chapter 3 Quick Review on Data Link LayerJonathan C.L. Liu, Ph.D.Department of Computer, Information Science and Engineering (CISE), University of Florida1Functions of the Data Link Layer Provide service interface to the network layer Dealing with tr
University of Florida - CNT - 6107
Chapter 3-4 Quick Review on Data Link Layer Part 2Jonathan C.L. Liu, Ph.D.Department of Computer, Information Science and Engineering (CISE), University of Florida1Selected wireless link standards54 Mbps 5-11 Mbps 1 Mbps802.11cfw_a,g 802.11b802.15
University of Florida - CNT - 6107
Chapter 5 Network LayerJonathan C.L. Liu, Ph.D.Department of Computer, Information Science and Engineering (CISE), University of Florida12Store-and-Forward Packet SwitchingThe environment of the network fig 5-1 layer protocols.3Implementation of C
University of Florida - CNT - 6107
CNT 6107 ACN, Spring 2012 A Survey on the Student Backgroundgiven by Jonathan C.L. Liu Out: Jan. 13 (Friday), 2012 Due: Beginning of the lecture on Jan. 18 (Wednesday), 2012 This is an anonymous survey in order to let the instructor determine the proper
University of Florida - CNT - 6107
CNT 6107 Advanced Computer Networks, Spring 2012Prerequisites: Basic probability theory, general networking knowledge (i.e., CNT 4007 and CNT 5106) and operating systems (COP 4600). Goals: This course is designed to cover several design issues in the eme
University of Florida - CNT - 6107
Term Project for CNT 6107Jonathan C.L. Liu, Ph.D. CISE Department University of Florida(c) Jonathan C.L. Liu, Ph.D.Results of Student Survey 81% of students have taken Introductorynetworking courses before 47% of students wrote networking client prog
University of Florida - CNT - 6107
Suggested TopicsJonathan C.L. Liu, Ph.D. CISE Department University of Florida(c) Dr. Jonathan C.L. LiuList of Suggested Topics (WAN) Multicast Schemes and beyond QOS routing (with multiple paths) ATM networks Admission control Congestion control meth
University of Florida - EAS - 4200C
Project: Wing-Box Analysis Objective: to estimate stresses at the root of a wing and determine the safety of the wing using failure criteriay hrootctipcroot c/4 chord line b Sweep angle L/2 xProject: Wing-Box Analysis Load estimation The lift, w, o
University of Florida - EAS - 4200C
0.1. Failure TheoriesIn the previous section, we introduced the concept of stress, strain and the relationship between stresses and strains. We also discussed failure of materials under uniaxial state of stress. Failure of engineering materials can be br
University of Florida - EEL - 4744
54321VDD3VFLVDDIOVDDXA[0.16]( 3.3V )( 3.3V )( 1.8V )9 71 93 107 121 143 159 1704 15 23 29 61 101 109 117 126 139 146 154 167151 152 153 156 157 158 161 162 163 164 165 168 169 172 173 174 175XA0 XA1 XA2 XA3 XA4 XA5 XA6 XA7 XA8 XA9 XA10 XA11
University of Florida - EEL - 4744
University of Florida Department of Electrical &amp; Computer Engineering Page 1/2EEL 4744 Revision 2Drs. K. Gugel &amp; E M. Schwartz TAs: W. Goh &amp; D. Szmulewicz 18-Jan-12Downloading &amp; Installing Code Composer Studio (CCS) v4IntroductionCode Composer Studio
University of Florida - EEL - 4744
University of Florida Department of Electrical &amp; Computer Engineering Page 1/7EEL 4744 Revision 0Drs. E M. Schwartz &amp; K. Gugel TAs: W. Goh &amp; Colin Watson 12-Feb-12Creating and Simulate an ASM Project in CCSIntroductionThe purpose of this document is
University of Florida - EEL - 4744
; divide.asm ; Description: ; 16 and 32 bit unsigned divide routines using the SUBCU instruction ; ; DIVIDE32 = 32 bit divide routine ; Call routine with: XAR0 = Numerator ; XAR1 = Denominator ; Returns: XAR2 = Remainder ; XAR3 = Quotient ; ; DIVIDE = Cal
University of Florida - EEL - 4744
University of Florida Department of Electrical &amp; Computer Engineering Page 1/2EEL 4744 Revision 0Drs. E M. Schwartz &amp; K. Gugel TAs: W. Goh &amp; Colin Watson 12-Feb-12Emulating an ASM Project in CCSIntroductionThere are two types of debugging modes that
University of Florida - EEL - 4744
EEL 4744C Dr. Gugel Spring 2011 Exam #1 LAST NAME_ FIRST NAME _ UF ID#_Open book and open notes, 90-minute examination to be done in pencil. No electronic devices are permitted. All work and solutions are to be written on the exam where appropriate.Poi
University of Florida - EEL - 4744
University of FloridaDepartment of Electrical &amp; Computer Engineering Page 1/11EEL 4744-Spring 2011 15 February 2011Dr. Eric M. Schwartz13-Feb-12 1:01 PMExam 1Last Name, ,First NameInstructions: Turn off cell phones, beepers and other noise making
University of Florida - EEL - 4744
13-Feb-12-1:27 PMExam 1 InfoEEL 4744EEL4744 First Exam120 minutes About 50% hardware, 50% software Questions require understanding Questions deal with &quot;real stuff&quot; Lectures 9 (plus Thur, 10 Feb); HW 3; Labs 3 (and some timing from 4) Relevant topics
University of Florida - EEL - 4744
University of FloridaElectrical &amp; Computer EngineeringEEL 4744Dr. Eric M Schwartz16-Jan-12Page 1/1Instructions Note: Late HW isHomework 1Revision 0not accepted! HW is due at the beginning of class. Put your &quot;last name, first name&quot; and the HW numb
University of Florida - EEL - 4744
University of FloridaElectrical &amp; Computer EngineeringEEL 4744Dr. Eric M Schwartz23-Jan-12Page 1/1Instructions Note: Late HW isHomework 2Revision 0not accepted! HW is due at the beginning of class. Put your &quot;last name, first name&quot; and the HW numb
University of Florida - EEL - 4744
University of FloridaElectrical &amp; Computer EngineeringEEL 4744Dr. Eric M Schwartz7-Feb-12Page 1/1Homework 3Revision 0Instructions Note: Late HW is not accepted! HW is due at the beginning of class. Put your &quot;last name, first name&quot; and the HW numbe
University of Florida - EEL - 4744
University of Florida Electrical &amp; Computer Engineering Dept. Page 1/4EEL 4744 Spring 2011Revision 1Dr. Eric M. Schwartz D. Szmulewicz, M Carroll, TA18-Jan-122Lab 0: Intro to UF F28335 Development Board, Soldering/Wire-wrapping, and your TAOBJECTIVE
University of Florida - EEL - 4744
University of Florida Electrical &amp; Computer Engineering Dept. Page 1/2EEL 4744 Spring 2012Revision 1Dr. Eric M. Schwartz Michael Carroll, TA 27-Jan-122Lab 1: Programming the GCPUOBJECTIVESIn this lab you will review the GCPU, a good example of a sim
University of Florida - EEL - 4744
University of Florida Electrical &amp; Computer Engineering Dept. Page 1/2EEL 4744 Spring 2012Revision 2Dr. Eric M. Schwartz Colin Watson &amp; Erick Macias, TA 31-Jan-12Lab 2: Programming the DSPTable 1: Memory Array Address 0xA000 0xA001 0xA002 0xA003 0xA0
University of Florida - EEL - 4744
University of Florida Electrical &amp; Computer Engineering Dept. Page 1/4EEL 4744 Spring 2012Revision 3Dr. Eric M. Schwartz Erick Macias &amp; Eric Jeffers, TA 13-Feb-12Lab 3: Switch Inputs, LED outputs, GPIO, Timing Loop and LSA.OBJECTIVESIn this lab you
University of Florida - EEL - 4744
University of Florida Electrical &amp; Computer Engineering Dept. Page 1/5EEL 4744 Spring 2012Revision 0Dr. Eric M. Schwartz Colin Watson and Ali Nuhi, TAs 15-Feb-12Lab 4: Keypad and I/O Port ExpansionsOBJECTIVES To explore and understand the implementa
University of Florida - EEL - 4744
University of FloridaDepartment of Electrical &amp; Computer EngineeringEEL 3701 - Fall 2011 Revision 1Dr. Eric M. Schwartz 28-Nov-11Page 1/3LAB 9: G-CPU: Assembly Programming and Hand AssemblyFUNCTIONAL COMPILATION / SIMULATION Because our design will
University of Florida - EEL - 4744
University of Florida Electrical &amp; Computer Engineering Dept. Page 1/1EEL 4744Revision 0Drs. Eric M. Schwartz &amp; Karl Gugel13-Jan-12Pre-laboratory Report GuidelinesDecoding Logic:When we start using the CPLD, you should include printouts of either y
University of Florida - EEL - 4744
Application ReportSPRAAM0A May 2007 Revised October 2007Getting Started With TMS320C28x Digital Signal ControllersChristine Peng . C2000/AEC ABSTRACT This guide is organized by development flow and functional areas to make your design effort as seamles
University of Florida - EEL - 4744
University of FloridaDepartment of Electrical &amp; Computer EngineeringEEL 4744-Spring 2012Dr. Eric. M. Schwartz7-Feb-12SYLLABUSRevision 8Page 1/9EEL 4744C: MICROPROCESSOR APPLICATIONShttp:/mil.ufl.edu/4744/INSTRUCTOR LECTURES Dr. Eric M. Schwartz
University of Florida - EEL - 4744
test_32.asm 1 ;* 2 ; file = test_32.asm Author: Dr. Eric M. Schwartz Date: 13 Feb 2012 3; 4 ; Description: 5 ; This program will be used to test moving 32 bits at a time 6; between pairs of memory, the accumulator, and the auxiliary reisters 7 .global _c_
University of Florida - EEL - 4744
;* ; timer_ex1.asm 24 Mar 2011 Rev. 1 ; ; Original Author: Adam Mills ; Editted by: Dr. Schwartz ;* ;* Flashes the LED on our board. Example using Timer1. ;* GPAMUX1 .set 0x6F86 GPATOGGLE .set 0x6FC6 GPADIR .set 0x6F8A GPADAT .set 0X6FC0 INT13_VECT .set 0
North Carolina State University - ELM - 410
The Cuisenaire productPaul StephensonKfarlcrs dt'liyhtcd by the siniplit it\ of the mctlnxl ot multiplication shown in MT203 (Foster, 2007) may be interested In a (iattcgno-inspired variant. Each set of intersections in the representation Colin Foster d
North Carolina State University - ELM - 410
J Math Teacher Educ (2009) 12:89109 DOI 10.1007/s10857-009-9098-zInstructional practices related to prospective elementary school teachers' motivation for fractionsKristie Jones NewtonPublished online: 10 February 2009 Springer Science+Business Media B
North Carolina State University - ELM - 410
North Carolina State University - PSY - 400
PSY 376- Developmental Psychology Chapter 12: Gender and Sexuality Study Guide 1. Be able to define the basic terminology associated with gender development, including gender, sex, gender identity, gender roles, gender-typing, and gender stereotypes. 2. H
UCF - CNT - 4603
CNT 4603: System Administration Spring 2012Introduction To Active DirectoryInstructor : Dr. Mark Llewellyn markl@cs.ucf.edu HEC 236, 4078-823-2790 http:/www.cs.ucf.edu/courses/cnt4603/spr2012Department of Electrical Engineering and Computer Science Com
UCF - CNT - 4603
CNT 4603 Spring 2012 Course CalendarJanuarySunday8Monday9 Course IntroductionTuesday10Wednesday11 Introductory NotesThursday12Friday13 No Class Today Faculty MeetingSaturday141516 Martin Luther King Jr. Day NO CLASSES 23 NO CLASS TODAY17