Why Course Hero?

7 Million Study Materials
From students who've taken these classes before
24/7 Access to Tutors
Personal attention for all your questions
Learn
93% of our members earn better grades

Register now to access 7 million high quality study materials.
OLS_Assumptions08B
Course:
ESM 206, Spring 2008

School:
UCSB

Rating:
 
 
 
 
 

Related:

Document Preview
  • of Assumptions Ordinary Least Squares Regression (Part 1) ESM 206 Jan 17, 2008 1 Assumptions of OLS regression 1. Model is linear in parameters 2. The data are a random sample of the population 1. The errors are statistically independent from one another If assumptions 1-5 are satisfied, then OLS estimator is unbiased If assumption 6 is also satisfied, then 3. The expected value of the errors is always...

MOST POPULAR ESM MATERIALS

jacksoneta...
UCSB
ESM 260
Density-in...
UCSB
ESM 211
Checkerspo...
UCSB
ESM 211

MOST POPULAR UCSB MATERIALS

Homework_4...
UCSB
CHEM 113b
113B Notes...
UCSB
CHEM 113b
Unnamed_Ho...
UCSB
CHEM 113b
Ask a Tutor my question
 
*
*
Attachment (optional):
Get help Now!
Our current time in (CST) is
 
 
Unformatted Document Excerpt
Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
of Assumptions Ordinary Least Squares Regression (Part 1) ESM 206 Jan 17, 2008 1 Assumptions of OLS regression 1. Model is linear in parameters 2. The data are a random sample of the population 1. The errors are statistically independent from one another If assumptions 1-5 are satisfied, then OLS estimator is unbiased If assumption 6 is also satisfied, then 3. The expected value of the errors is always zero 4. The independent variables are not too strongly collinear 5. The independent variables are measured precisely 6. The residuals have constant variance 7. The errors are normally distributed OLS estimator has minimum variance of all unbiased estimators. If assumption 7 is also satisfied, then we can do hypothesis testing using t and F tests How can we test these assumptions? If assumptions are violated, what does this do to our conclusions? how do we fix the problem? 2 1. Model not linear in parameters Problem: Can't fit the model! Diagnosis: Look at the model Solutions: 1. Re-frame the model 2. Use nonlinear least squares (NLS) regression 3 2. Errors not independent Problem: parameter estimates are biased Diagnosis (1): look for correlation between residuals and another variable (not in the model) Diagnosis (2): look at autocorrelation function of residuals to find patterns in time Space I.e., observations that are nearby in time or space have residuals that are more similar than average I.e., residuals are dominated by another variable, Z, which is not random with respect to the other independent variables Solution (1): add the variable to the model Solution (2): fit model using generalized least squares (GLS) 4 Consumption 300 350 400 450 0.4 0.6 0.8 1.0 residuals.RegModel.10 -50 0 50 Price 1960 1970 Year 1980 1990 5 Autocorrelated residuals 50 Durbin-Watson test for autocorrelation Null hypothesis: no autocorrelation Only makes sense if observations are ordered in time Durbin-Watson test residual(t) -50 0 -50 0 residual(t-1) 50 data: Consumption ~ Price DW = 0.1946, p-value = 2.699e-16 alternative hypothesis: true autocorelation is not 0 6 Looking at autocorrelation in Rcmdr Durbin-Watson test: Fit model Model -> Numerical Diagnostics -> Durbin-Watson Test Plots: add the residuals to the dataset Fit model Models -> Add Observation Statistics to Data Make scatterplot of residuals & Year 7 3. Average error not everywhere zero ("nonlinearity") Problem: indicates that model is wrong Diagnosis: Consumption 450 300 350 400 For one-variable regression, look for curvature in plot of Y vs. X 0.4 0.6 Price 0.8 1.0 8 3. Average error not everywhere zero ("nonlinearity") Problem: indicates that model is wrong Diagnosis: Chlorophyll.a 150 0 50 100 For multiple regression, look for curvature in plot of observed Y vs. predicted Y Add "Fitted Values" to dataset These are the "Predicted Y" 50 100 150 fitted.Chlor.model.1 9 3. Average error not everywhere zero ("nonlinearity") 40 residuals.Chlor.model.1 -20 -10 0 10 20 30 Problem: indicates that model is wrong Diagnosis: Look for curvature in plot of observed vs. predicted Y Look for curvature in plot of residuals vs. predicted Y 50 100 150 fitted.Chlor.model.1 10 3. Average error not everywhere zero ("nonlinearity") indicates Problem: that model is wrong Diagnosis: Look for curvature in plot of observed vs. predicted Y Look for curvature in plot of residuals vs. predicted Y look for curvature in partial-residual plots (also component+residual plots [CR plots]) 11 Models -> Graphs -> Component + Residual Plots Component+Residual Plot Component+Residual(Chlorophyll.a) Component+Residual(Chlorophyll.a) 80 -20 0 Component+Residual Plot 60 40 20 0 -20 1000 2000 NP 3000 4000 0 20 40 60 100 200 300 400 500 600 Phosphorus 12 Average error not everywhere zero ("nonlinearity") Solutions: If pattern is monotonic*, try transforming independent variable Downward curving: use powers less than one E.g. Square root, log, inverse If not, try adding additional terms in the independent variable (e.g., quadratic) Upward curving: use powers greater than one E.g. square * Monotonic: always increasing or always decreasing 13 4. Independent variables are collinear Problem: parameter estimates are imprecise Diagnosis: Look for correlations among independent variables In regression output, none of the individual terms are significant, even though the model as a whole is Solutions: Live with it Remove statistically redundant variables Special case: variables that are perfectly correlated 14 Parameter b0 b1 b2 Residual St dev R2 Est value 16.37383 1.986335 -1.22964 31.6315 0.534192 St dev 41.50584 1.02642 2.131899 t student 0.394495 1.935206 -0.57678 Prob(>|t|) 0.696315 0.063504 0.568867 y = b0 + b1.x1 + b2.x2 R2(adj) F 0.499688 15.48191 Prob(>F) 3.32E-05 0 0; 1 1; 2 0.5; XZ 0.95 15 5. Independent variables not precise ("measurement error") Problem: parameter estimates are biased Diagnosis: know how your data were collected! Solution: very hard State space models Restricted maximum likelihood (REML) Use simulations to estimate bias Consult a professional! 16 6. Errors have non-constant variance ("heteroskedasticity") Problem: Parameter estimates are unbiased P-values are unreliable Diagnosis: plot residuals against fitted values 17 60 50 40 30 Residuals 20 10 0 -10 -20 -30 0 20 40 60 80 100 120 140 160 Predicted chlorophyll-a 18 Errors have non-constant variance ("heteroskedasticity") Problem: Parameter estimates are unbiased P-values are unreliable Solutions: Transform the dependent variable If residual variance increases with predicted value, try transforming with power less than one Diagnosis: plot studentized residuals against fitted values 19 Try square root transform 4 sqrt(Chlorophyll-a) Residual 3 2 1 0 -1 -2 -3 .0 2.5 5.0 7.5 10.0 12.5 15.0 s qrt(Chlorophyll-a) Predicted 20 Errors have non-constant variance ("heteroskedasticity") Problem: Parameter estimates are unbiased P-values are unreliable Solutions: Transform the dependent variable May create nonlinearity in the model Diagnosis: plot studentized residuals against fitted values Fit a generalized linear model (GLM) For some distributions, the variance changes with the mean in predictable ways Fit a generalized least squares model (GLS) Specifies how variance depends on one or more variables Fit a weighted least squares regression (WLS) Also good when data points have differing amount of precision 21
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UCSB - ESM - 206
id 10002 10004 10034 10035 10039 10041 10046 10048 10050 10057 10062 10066 10068 10078 10083 10085 10086 10088 10091 10095 10096 10102 10103 10110 10111 10112 10113 10116 10117 10118 10122 10129 10131 10132 10133 10136 10144 10147 10155 10182 10183 1
UCSB - ESM - 206
ESM 206B: Data analysis for environmental science and managementWinter 2008Overview of this quarter 3 weeks: 6 lectures, starting today; 3 labs, starting next week 2 microexams: due Tues. Jan. 22 at 2 PM and Tues. Feb. 5 at 4 PM Topics: 1. In
UCSB - ESM - 206
Testing for normality; transforming dataESM 206A Nov 19 2007Recall our mercury data 0.853511661, 0.391905707, 0.143344303, 0.198267857, 0.266572367, 0.327306702, 0.834747834, 5.32261822, 0.817037696, 0.157247167, 0.328456677, 3.793153524, 0.5134
UCSB - ESM - 206
Micro-Exam 1 ESM 206A Fall 2007 Once you read this file, you may not ask for help from your peers or the instructors, nor may you discuss with them any of the concepts from the first problem set. If you have a question about how to run the software,
UCSB - ESM - 206
bid 0.05 1.5 0.05 0.05 0.5 0.05 0.1 0.05 0.1 1.5 0.25 0.1 0.1 0.25 0.25 0.05 0.5 1.5 0.25 0.05 1.5 0.25 1.5 1 1 0.25 0.05 0.5 1 1.5 1 0.5 1 1 0.05 0.25 0.5 1 1.5 0.1 0.1 0.5 0.5 0.5 0.05 0.05 0.05 0.05 1 1PHACHOICE N Y Y Y Y Y N Y N N Y Y Y N Y Y N
UCSB - ESM - 206
Making decisions based on a statistical sampleESM 206A Nov 7 2007Environmental challenge The problem: Athington Park House is a very desirable property that was built some years before the discontinuing of lead in paint in the mid-1970s. A prosp
UCSB - ESM - 222
ESM 222 Fate and Transport: Bringing together all the pieces1 Arturo A. KellerFate and TransportWe now have the pieces to put together a conceptual and a simple numerical model of fate and transport of a pollutant in the environment:what moves
UCSB - ESM - 222
ESM 222 Fate and Transport of Pollutants in the Environment Lab #6: Advection & Dispersion in Porous Media Due: 05/23/08 Objective: Understand the parameters that control advection and dispersion in a porous medium: Permeability Porosity Pressure (hy
UCSB - ESM - 206
Bootstrapping in R CommanderI have added three new functions to R Commander to allow you to do bootstrap confidence intervals and permutation tests. They are found under Statistics -> Bootstrap. If you are running R Commander on your own computer, y
UCSB - ESM - 206
Introduction to regressionESM 206 Jan 11, 2007Some questions about eutrophication180 160 140 120 100 80 60 40 20 0 0 100 200 300 400 500 600 700 PhosphorusIf I reduce the phosphorus concentration by 100 units, how much should that reduce
UCSB - ESM - 222
ESM 222L Laboratory in Fate and Transport of Pollutants Monday 12:25 2:15 pm, Bren Hall 1027 Instructor: Arturo Keller, keller@bren.ucsb.edu TA: Kristin Clark, kclark@bren.ucsb.edu, BH 2324 Office Hours: by email appointment Experiments 1. 2. 3. 4.
UCSB - ESM - 222
ENVIRONMENTAL ENGINEERING SCIENCE Volume 20, Number 5, 2003 Mary Ann Liebert, Inc.Introduction Emerging Contaminants in WaterAwe were uncomfortable with this title because it sounded as if we were describing a group of compounds that were in t
UCSB - ESM - 222
ESM 222Pollutant-Water EquilibriumSolubility in WaterNon-ionic compoundsorganic compounds except acids, bases and some alcohols and aldehydes most gasesEquilibrium Distribution of Pollutants in the Environment1Ionic compoundsacids and base
UCSB - ESM - 222
ESM 222 Fate and Transport of Pollutants in the Environment Lab #3: Equilibrium Distribution Due Date: 5/02/2008 Objective: Understand the partitioning behavior of different organic compounds when released into different compartments: Air/Water (Henr
UCSB - ESM - 222
ESM 222 Fate and Transport of Pollutants in the Environment Lab #2: Physicochemical Properties Report due 04/21/08 Objective: Understand the behavior of different types of organic compounds as governed by their: Volatility Solubility Density with res
UCSB - ESM - 222
ESM 222Classification of Pollutants1 Arturo A. KellerPriority PollutantsAmount Produced/Released Persistence Bioaccumulation Toxicity Other Effects2 Arturo A. KellerAmount Produced or ReleasedSome pollutants are produced in large amounts
UCSB - CHEM - 1C
UCSB - CHEM - 1C
UCSB - CHEM - 1C
UCSB - ESM - 206
Hypothesis testing using bootstrap resamplingMay 15, 2008 ESM 206CWhat we've done so far Used bootstrap resampling to understand the pattern of variability of the sample statistic if the population parameter was actually the value we estimated fr
UCSB - ESM - 206
ESM 206 Problem set 3 Solutions Part A: 1) A regression of Highway MPG on weight in pounds has an estimated slope of -0.0073. Thus a 100-pound reduction in weight should, all else being equal, increase mileage by 0.73 MPG. 2) The equation is H i 0
UCSB - CHEM - 2B
Formulas and Constants for Exam 2R 8.31 J K molR0.0821L atm K mol1 L atm = 101 J1 cal = 4.184 JSnC P lnTf Ti0SnCV lnnR lnTf TiS -q i rr Tq rev TSVf ViSw = -Pext VPV = nRT G= H-T S e = 1.602 10-19 CE = E - (0.
UCSB - CHEM - 2B
Chem 2B, Winter 2007Professor Thuc-Quyen Nguyen Name: Quiz #5 02/14/071) (4 points) Why change in free energy is used to predict spontaneous processes instead of change in entropy? We can just focus on the system only. If we use the entropy, we n
UCSB - CHEM - 2B
Formulas and Constants for the Final ExamR J K mol L atm 0.0821 K mol 8.31E=q+w H = E + (PV) H= E+P VR1 L atm = 101 J1 cal = 4.184 Jq = nC T qV = nCV T qP = nCP TSSTf Ti T nCV ln f Ti nC P lnS Sq i rr TnR ln0E = nCV T H = nCP
UCSB - ESM - 206
Regression with categorical independent variablesESM 206B 15 Jan. 20081Types of variablesNumericContinuous Observations can take on, in principle, any real number Infinite # of possible values between 1 and 10Categorical Dichotomous:
UCSB - ESM - 206
Multi-Criteria Decision AnalysisESM 206C 29 May 2008Example problems You need to compare development alternatives for an EIR You want to prioritize exotic plant species for control or eradication in the Santa Monica Mountains You want to rank
UCSB - ESM - 206
Impact AssessmentESM 206A 21 November 2007Impact assessment What is the impact of one or more management techniques on an environmental variable of interest? Effects of grazing on biodiversity in California grasslands Establishment of marine p
UCSB - ESM - 206
Problem set 1 solutions A: t-test practice1. (a) H0: x30 HA: x (b) 1-tailed 1-sample t-test (c) t = -0.4657, df = 12, P = 0.3249 (d) At all levels of alpha, p> so we fail to reject the null hypothesis and conclude that this species does not maintain
UCSB - ESM - 206
Getting Started With the R CommanderJohn Fox 26 August 20061Starting the R CommanderOnce R is running, simply loading the Rcmdr package by typing the command library(Rcmdr) into the R Console starts the R Commander graphical user interface ("G
UCSB - ESM - 206
ESM problem set 2 solutions Here are the answers to questions 1 and 2 for each of the datasets in turn. Chlorophyll: 1) Here is the sample covariance matrix:Chlorophyll-a Phosphorus Nitrogen Chlorophyll-a Phosphorus Nitrogen 2401.908 6061.834 20045.
UCSB - ESM - 206
Regression with multiple independent variablesESM 206 Jan 10, 2008Multiple independent variables Dependent variable may be caused by more than one independent variable pH affected by both SO4 and NO3 Statistical model: yi01 ix2 iz
UCSB - ESM - 206
Logistic regressionESM 206C May 6 20081Categorical dependent variables Firm joins Energy Star or not Parcel of land developed as urban, agriculture, or open space Species goes extinct or not Opinion is Strongly Opposed, Opposed, Neutral, F
UCSB - ESM - 206
Spatial statistics and Generalized Least Squares RegressionESM 206C May 20, 2008pH and NO3 in Norwegian lakes6.5 pH.1981 4.5 0 5.0 5.5 6.050100150200 NO3.1981250300350Call: lm(formula = pH.1981 ~ NO3.1981, data = lake) Residuals:
UCSB - ESM - 206
Survey DesignStatsEvaluating SurveysSample size/margin of error How sample is selected representative? Question Wording Non-responseSample SizeSample SizeZ = Z value (e.g. 1.96 for 95% confidence level) p = percentage picking a choi
UCSB - ESM - 206
Instructions 1) Insert your data into the blue cells 2) Insert the name of the data into the green cell 3) Edit the y label of the graph, and adjust the y axis as appropriate 4) If some of the points are overlapping, hit the F9 key (to generate new r
UCSB - ESM - 206
ESM 206 Data Analysis for Environmental Science & Management2007-2008 Bruce KendallCourse Objectives Learn how to use quantitative data analysis to: Make decisions regarding compliance with environmental standards Assess the impact of past mana
UCSB - CHEM - 173B
Electronic structure and spectraCrystal field theoryShriver, Chapter 19An ionic model, considers metal and its ligands as point charges All 5 d-orbitals are isoenergetic in a spherical environment/crystal field (i.e., free atom) Different arrang
UCSB - ESM - 206
Model selection, and influential data pointsESM 206 Jan 24, 2008Multiple independent variables Dependent variable may be caused by more than one independent variable pH affected by both SO4 and NO3 Statistical model: yi01 ix2 izi
UCSB - ESM - 206
Logistic Regression continuedESM 206C April 24 20071More complex logistic regression and other GLM models Can add more variables, interactions, etc. Within the logistic function, model needs to be linear in parameters With multiple logistic
UCSB - ESM - 206
Assumptions of Ordinary Least Squares Regression (Part 2)ESM 206 Jan 21, 200817. Errors not normally distributed Problem: Parameter estimates are unbiased P-values are unreliable Regression fits the mean; with skewed residuals the mean is n
UCSB - CHEM - 173B
Chem 173B/268B Prof. S. ScottHomework assignment #3 Due in class on Monday, March 3, 20081. HF is a weaker acid than HCl, yet HF is more dangerous to work with than HCl. Explain both observations. 2. Account for the instability of polyanions of o
UCSB - ESM - 206
Microexam 3: Solutions 1. Model 1: lm(formula = Prop_exotic ~ GDP + M_imports + GDP:M_imports, data = exotic_sp) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.124e-02 4.111e-02 1.247 0.226287 GDP 1.334e-06 4.713e-06 0.283 0.779920
UCSB - ESM - 206
ESM 206 Problem Set 4 SolutionsLogistic regression A. See slides from lecture B. 1. The most important factor would probably be price, although it might enter in two ways: absolute price, and the price premium (the difference in price between eco a
UCSB - ESM - 206
Power analysis using "Java applets for power and sample size"ESM 206 Nov. 19 2007 The most useful of the online power analysis tools is at http:/www.math.uiowa.edu/~rlenth/Power/. We will learn how to use this using the onesample t-test from the fir
UCSB - ESM - 206
Basic Statistics in Excel Characterizing a sampleOpen the file "mercury data.xls" from the class website. This contains the "data" we looked at in lecture.Numeric characterizationThe Excel functions for the arithmetic mean, the variance, and the
UCSB - ESM - 206
Techniques of Water-Resources Investigations of the United States Geological Survey Book 4, Hydrologic Analysis and InterpretationChapter A3Statistical Methods in Water ResourcesBy D.R. Helsel and R.M. HirschU.S. DEPARTMENT OF THE INTERIOR GAL
UCSB - ESM - 206
Nonlinear Least Squares RegressionESM 206C May 27 2008Nonlinear least squares regression Some models cannot be made linear in parameters E.g., theta-logistic stock-recruitment modelSolution: nonlinear least squares (NLS)yif x1i , x 2i ,
UCSB - ESM - 206
Problem set 1 solutions A: t-test practice1. (a) H0: x30 HA: x (b) 1-tailed 1-sample t-test (c) t = -0.4657, df = 12, P = 0.3249 (d) At all levels of alpha, p> so we fail to reject the null hypothesis and conclude that this species does not maintain
UCSB - ESM - 206
Bootstrap resamplingMay 13, 2008 ESM 206CHere's some data g of mercury in one gram soil samples 0.853511661, 0.391905707, 0.143344303, 0.198267857, 0.266572367, 0.327306702, 0.834747834, 5.322618220, 0.817037696, 0.157247167, 0.328456677, 3.7931
UCSB - CHEM - 154A
UCSB - ESM - 210
UCSB - ESM - 260
PLoS BIOLOGYThermal Stress and Coral Cover as Drivers of Coral Disease OutbreaksJohn F. Bruno1*, Elizabeth R. Selig2, Kenneth S. Casey3, Cathie A. Page4, Bette L. Willis4, C. Drew Harvell5, Hugh Sweatman6, Amy M. Melendy71 Department of Marine Sc
UCSB - ESM - 260
UCSB - CS - 130B
Computer Science 130B Winter 2007 Programming Assignment #1 Due: 11:59pm, Friday January 26th Implement a divide-and-conquer algorithm for finding the convex hull of a set of 2D points. Your program should take inputs of the following format: n x1 y
UCSB - CS - 130B
Problem (20%) Two character strings may have many common substrings. For example, photograph and tomography have several common substrings of length one, and common substrings of ph and to of length two, and ograph of length six, which is also the ma
UCSB - ESM - 260
Basic OceanographyI. Factors that affect physical oceanographic processes II. Upwelling III. Large-Scale CurrentsIV. El NinoV. TsunamisOceanography OverviewWhat drives all this motion?1) Solar Radiation2) Rotation of the EarthRotation of
UCSB - CS - 130B
Computer Science 130B Winter 2007 Programming Assignment #4 Do not turn in. For your practice only Imagine a set of 8 plane patterns as shown in Figure 1(a). Each pattern differs in shape from the others but together they can be arranged to make diff
UCSB - ME - 125
ME 125NT Intro to Nanotechnology Due: 5/8/2008 Problem Set 5 Problem 6.14 in book We can look at the band structure of an element to get an idea of how many electrons per atom participate in conduction. We can also determine this number based on the
UCSB - ME - 125
Nanotechnology - 125NT Spring 2008 TR 2:00-3:15 Girv 2120 Instructor: Sumita Pennathur Email: sumita@engineering.ucsb.edu Tel: 805-893-5510 Office: Engr II 2330 Office Hours: Thursday 3:30-5pmRequired Text: Rogers, Pennathur, Adams "Nanotechnology:

Flashcards

AP Statistics
Term Definition
ρ true proportion
p hat the sample...
Resistant Not...
Blocking Using...
   
ANOVA
Term Definition
Experimental error
Mann-Whitney U Test
Wilcoxon rank sum test
orthoginality Contrasts...
   
Chapter 7-13 Vocab
Term Definition
random variable a variable...
Discrete random...
continuous random...
density curve describes...
   
Multiple Regression Exam 1
Term Definition
What do you look at... β or beta...
This is the... Sr2...
What is the formula... Y'(prime) =...
Where can you find... Unstandardiz...
   
Regression Analysis Test 2 Terms
Term Definition
perfect multicolinearity the...
severe imperfect... linear...
dominant variable a variable...
consequences of... (1)...