Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
of Assumptions Ordinary Least Squares Regression (Part 1)
ESM 206 Jan 17, 2008
1
Assumptions of OLS regression
1. Model is linear in parameters 2. The data are a random sample of the population
1. The errors are statistically independent from one another
If assumptions 1-5 are satisfied, then
OLS estimator is unbiased
If assumption 6 is also satisfied, then
3. The expected value of the errors is always zero 4. The independent variables are not too strongly collinear 5. The independent variables are measured precisely 6. The residuals have constant variance 7. The errors are normally distributed
OLS estimator has minimum variance of all unbiased estimators.
If assumption 7 is also satisfied, then
we can do hypothesis testing using t and F tests
How can we test these assumptions? If assumptions are violated,
what does this do to our conclusions? how do we fix the problem?
2
1. Model not linear in parameters
Problem: Can't fit the model! Diagnosis: Look at the model Solutions:
1. Re-frame the model 2. Use nonlinear least squares (NLS) regression
3
2. Errors not independent
Problem: parameter estimates are biased Diagnosis (1): look for correlation between residuals and another variable (not in the model) Diagnosis (2): look at autocorrelation function of residuals to find patterns in
time Space I.e., observations that are nearby in time or space have residuals that are more similar than average
I.e., residuals are dominated by another variable, Z, which is not random with respect to the other independent variables
Solution (1): add the variable to the model
Solution (2): fit model using generalized least squares (GLS)
4
Consumption 300 350 400 450
0.4 0.6 0.8 1.0
residuals.RegModel.10 -50 0 50
Price
1960 1970 Year 1980
1990
5
Autocorrelated residuals
50
Durbin-Watson test for autocorrelation
Null hypothesis: no autocorrelation Only makes sense if observations are ordered in time
Durbin-Watson test
residual(t)
-50
0
-50
0 residual(t-1)
50
data: Consumption ~ Price DW = 0.1946, p-value = 2.699e-16 alternative hypothesis: true autocorelation is not 0
6
Looking at autocorrelation in Rcmdr
Durbin-Watson test:
Fit model Model -> Numerical Diagnostics -> Durbin-Watson Test
Plots: add the residuals to the dataset
Fit model Models -> Add Observation Statistics to Data Make scatterplot of residuals & Year
7
3. Average error not everywhere zero ("nonlinearity")
Problem: indicates that model is wrong Diagnosis:
Consumption 450 300 350 400
For one-variable regression, look for curvature in plot of Y vs. X
0.4
0.6 Price
0.8
1.0
8
3. Average error not everywhere zero ("nonlinearity")
Problem: indicates that model is wrong Diagnosis:
Chlorophyll.a 150 0 50 100
For multiple regression, look for curvature in plot of observed Y vs. predicted Y
Add "Fitted Values" to dataset These are the "Predicted Y"
50
100
150
fitted.Chlor.model.1
9
3. Average error not everywhere zero ("nonlinearity")
40 residuals.Chlor.model.1 -20 -10 0 10 20 30
Problem: indicates that model is wrong Diagnosis:
Look for curvature in plot of observed vs. predicted Y Look for curvature in plot of residuals vs. predicted Y
50
100
150
fitted.Chlor.model.1
10
3. Average error not everywhere zero ("nonlinearity")
indicates Problem: that model is wrong Diagnosis:
Look for curvature in plot of observed vs. predicted Y Look for curvature in plot of residuals vs. predicted Y look for curvature in partial-residual plots (also component+residual plots [CR plots])
11
Models -> Graphs -> Component + Residual Plots
Component+Residual Plot
Component+Residual(Chlorophyll.a) Component+Residual(Chlorophyll.a) 80 -20
0
Component+Residual Plot
60
40
20
0
-20
1000
2000 NP
3000
4000
0
20
40
60
100
200
300
400
500
600
Phosphorus
12
Average error not everywhere zero ("nonlinearity")
Solutions: If pattern is monotonic*, try transforming independent variable
Downward curving: use powers less than one
E.g. Square root, log, inverse
If not, try adding additional terms in the independent variable (e.g., quadratic)
Upward curving: use powers greater than one
E.g. square
* Monotonic: always increasing or always decreasing
13
4. Independent variables are collinear
Problem: parameter estimates are imprecise Diagnosis:
Look for correlations among independent variables In regression output, none of the individual terms are significant, even though the model as a whole is
Solutions:
Live with it Remove statistically redundant variables
Special case: variables that are perfectly correlated
14
Parameter b0 b1 b2 Residual St dev R2
Est value 16.37383 1.986335 -1.22964 31.6315 0.534192
St dev 41.50584 1.02642 2.131899
t student 0.394495 1.935206 -0.57678
Prob(>|t|) 0.696315 0.063504 0.568867 y = b0 + b1.x1 + b2.x2
R2(adj)
F
0.499688
15.48191
Prob(>F)
3.32E-05
0 0; 1 1; 2 0.5; XZ 0.95
15
5. Independent variables not precise ("measurement error")
Problem: parameter estimates are biased Diagnosis: know how your data were collected! Solution: very hard
State space models Restricted maximum likelihood (REML) Use simulations to estimate bias Consult a professional!
16
6. Errors have non-constant variance ("heteroskedasticity")
Problem:
Parameter estimates are unbiased P-values are unreliable
Diagnosis: plot residuals against fitted values
17
60 50 40 30
Residuals
20 10 0 -10 -20 -30 0 20 40 60 80 100 120 140 160 Predicted chlorophyll-a
18
Errors have non-constant variance ("heteroskedasticity")
Problem:
Parameter estimates are unbiased P-values are unreliable
Solutions:
Transform the dependent variable
If residual variance increases with predicted value, try transforming with power less than one
Diagnosis: plot studentized residuals against fitted values
19
Try square root transform
4
sqrt(Chlorophyll-a) Residual
3 2 1 0 -1 -2 -3 .0 2.5 5.0 7.5 10.0 12.5 15.0 s qrt(Chlorophyll-a) Predicted
20
Errors have non-constant variance ("heteroskedasticity")
Problem:
Parameter estimates are unbiased P-values are unreliable
Solutions:
Transform the dependent variable
May create nonlinearity in the model
Diagnosis: plot studentized residuals against fitted values
Fit a generalized linear model (GLM)
For some distributions, the variance changes with the mean in predictable ways
Fit a generalized least squares model (GLS)
Specifies how variance depends on one or more variables
Fit a weighted least squares regression (WLS)
Also good when data points have differing amount of precision
21
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more.
Course Hero has millions of course specific materials providing students with the best way to expand
their education.
Below is a small sample set of documents:
UCSB - ESM - 206
id 10002 10004 10034 10035 10039 10041 10046 10048 10050 10057 10062 10066 10068 10078 10083 10085 10086 10088 10091 10095 10096 10102 10103 10110 10111 10112 10113 10116 10117 10118 10122 10129 10131 10132 10133 10136 10144 10147 10155 10182 10183 1
UCSB - ESM - 206
ESM 206B: Data analysis for environmental science and managementWinter 2008Overview of this quarter 3 weeks: 6 lectures, starting today; 3 labs, starting next week 2 microexams: due Tues. Jan. 22 at 2 PM and Tues. Feb. 5 at 4 PM Topics: 1. In
UCSB - ESM - 206
Testing for normality; transforming dataESM 206A Nov 19 2007Recall our mercury data 0.853511661, 0.391905707, 0.143344303, 0.198267857, 0.266572367, 0.327306702, 0.834747834, 5.32261822, 0.817037696, 0.157247167, 0.328456677, 3.793153524, 0.5134
UCSB - ESM - 206
Micro-Exam 1 ESM 206A Fall 2007 Once you read this file, you may not ask for help from your peers or the instructors, nor may you discuss with them any of the concepts from the first problem set. If you have a question about how to run the software,
UCSB - ESM - 206
bid 0.05 1.5 0.05 0.05 0.5 0.05 0.1 0.05 0.1 1.5 0.25 0.1 0.1 0.25 0.25 0.05 0.5 1.5 0.25 0.05 1.5 0.25 1.5 1 1 0.25 0.05 0.5 1 1.5 1 0.5 1 1 0.05 0.25 0.5 1 1.5 0.1 0.1 0.5 0.5 0.5 0.05 0.05 0.05 0.05 1 1PHACHOICE N Y Y Y Y Y N Y N N Y Y Y N Y Y N
UCSB - ESM - 206
Making decisions based on a statistical sampleESM 206A Nov 7 2007Environmental challenge The problem: Athington Park House is a very desirable property that was built some years before the discontinuing of lead in paint in the mid-1970s. A prosp
UCSB - ESM - 222
ESM 222 Fate and Transport: Bringing together all the pieces1 Arturo A. KellerFate and TransportWe now have the pieces to put together a conceptual and a simple numerical model of fate and transport of a pollutant in the environment:what moves
UCSB - ESM - 222
ESM 222 Fate and Transport of Pollutants in the Environment Lab #6: Advection & Dispersion in Porous Media Due: 05/23/08 Objective: Understand the parameters that control advection and dispersion in a porous medium: Permeability Porosity Pressure (hy
UCSB - ESM - 206
Bootstrapping in R CommanderI have added three new functions to R Commander to allow you to do bootstrap confidence intervals and permutation tests. They are found under Statistics -> Bootstrap. If you are running R Commander on your own computer, y
UCSB - ESM - 206
Introduction to regressionESM 206 Jan 11, 2007Some questions about eutrophication180 160 140 120 100 80 60 40 20 0 0 100 200 300 400 500 600 700 PhosphorusIf I reduce the phosphorus concentration by 100 units, how much should that reduce
UCSB - ESM - 222
ESM 222L Laboratory in Fate and Transport of Pollutants Monday 12:25 2:15 pm, Bren Hall 1027 Instructor: Arturo Keller, keller@bren.ucsb.edu TA: Kristin Clark, kclark@bren.ucsb.edu, BH 2324 Office Hours: by email appointment Experiments 1. 2. 3. 4.
UCSB - ESM - 222
ENVIRONMENTAL ENGINEERING SCIENCE Volume 20, Number 5, 2003 Mary Ann Liebert, Inc.Introduction Emerging Contaminants in WaterAwe were uncomfortable with this title because it sounded as if we were describing a group of compounds that were in t
UCSB - ESM - 222
ESM 222Pollutant-Water EquilibriumSolubility in WaterNon-ionic compoundsorganic compounds except acids, bases and some alcohols and aldehydes most gasesEquilibrium Distribution of Pollutants in the Environment1Ionic compoundsacids and base
UCSB - ESM - 222
ESM 222 Fate and Transport of Pollutants in the Environment Lab #3: Equilibrium Distribution Due Date: 5/02/2008 Objective: Understand the partitioning behavior of different organic compounds when released into different compartments: Air/Water (Henr
UCSB - ESM - 222
ESM 222 Fate and Transport of Pollutants in the Environment Lab #2: Physicochemical Properties Report due 04/21/08 Objective: Understand the behavior of different types of organic compounds as governed by their: Volatility Solubility Density with res
UCSB - ESM - 222
ESM 222Classification of Pollutants1 Arturo A. KellerPriority PollutantsAmount Produced/Released Persistence Bioaccumulation Toxicity Other Effects2 Arturo A. KellerAmount Produced or ReleasedSome pollutants are produced in large amounts
UCSB - ESM - 206
Hypothesis testing using bootstrap resamplingMay 15, 2008 ESM 206CWhat we've done so far Used bootstrap resampling to understand the pattern of variability of the sample statistic if the population parameter was actually the value we estimated fr
UCSB - ESM - 206
ESM 206 Problem set 3 Solutions Part A: 1) A regression of Highway MPG on weight in pounds has an estimated slope of -0.0073. Thus a 100-pound reduction in weight should, all else being equal, increase mileage by 0.73 MPG. 2) The equation is H i 0
UCSB - CHEM - 2B
Formulas and Constants for Exam 2R 8.31 J K molR0.0821L atm K mol1 L atm = 101 J1 cal = 4.184 JSnC P lnTf Ti0SnCV lnnR lnTf TiS -q i rr Tq rev TSVf ViSw = -Pext VPV = nRT G= H-T S e = 1.602 10-19 CE = E - (0.
UCSB - CHEM - 2B
Chem 2B, Winter 2007Professor Thuc-Quyen Nguyen Name: Quiz #5 02/14/071) (4 points) Why change in free energy is used to predict spontaneous processes instead of change in entropy? We can just focus on the system only. If we use the entropy, we n
UCSB - CHEM - 2B
Formulas and Constants for the Final ExamR J K mol L atm 0.0821 K mol 8.31E=q+w H = E + (PV) H= E+P VR1 L atm = 101 J1 cal = 4.184 Jq = nC T qV = nCV T qP = nCP TSSTf Ti T nCV ln f Ti nC P lnS Sq i rr TnR ln0E = nCV T H = nCP
UCSB - ESM - 206
Regression with categorical independent variablesESM 206B 15 Jan. 20081Types of variablesNumericContinuous Observations can take on, in principle, any real number Infinite # of possible values between 1 and 10Categorical Dichotomous:
UCSB - ESM - 206
Multi-Criteria Decision AnalysisESM 206C 29 May 2008Example problems You need to compare development alternatives for an EIR You want to prioritize exotic plant species for control or eradication in the Santa Monica Mountains You want to rank
UCSB - ESM - 206
Impact AssessmentESM 206A 21 November 2007Impact assessment What is the impact of one or more management techniques on an environmental variable of interest? Effects of grazing on biodiversity in California grasslands Establishment of marine p
UCSB - ESM - 206
Problem set 1 solutions A: t-test practice1. (a) H0: x30 HA: x (b) 1-tailed 1-sample t-test (c) t = -0.4657, df = 12, P = 0.3249 (d) At all levels of alpha, p> so we fail to reject the null hypothesis and conclude that this species does not maintain
UCSB - ESM - 206
Getting Started With the R CommanderJohn Fox 26 August 20061Starting the R CommanderOnce R is running, simply loading the Rcmdr package by typing the command library(Rcmdr) into the R Console starts the R Commander graphical user interface ("G
UCSB - ESM - 206
ESM problem set 2 solutions Here are the answers to questions 1 and 2 for each of the datasets in turn. Chlorophyll: 1) Here is the sample covariance matrix:Chlorophyll-a Phosphorus Nitrogen Chlorophyll-a Phosphorus Nitrogen 2401.908 6061.834 20045.
UCSB - ESM - 206
Regression with multiple independent variablesESM 206 Jan 10, 2008Multiple independent variables Dependent variable may be caused by more than one independent variable pH affected by both SO4 and NO3 Statistical model: yi01 ix2 iz
UCSB - ESM - 206
Logistic regressionESM 206C May 6 20081Categorical dependent variables Firm joins Energy Star or not Parcel of land developed as urban, agriculture, or open space Species goes extinct or not Opinion is Strongly Opposed, Opposed, Neutral, F
UCSB - ESM - 206
Spatial statistics and Generalized Least Squares RegressionESM 206C May 20, 2008pH and NO3 in Norwegian lakes6.5 pH.1981 4.5 0 5.0 5.5 6.050100150200 NO3.1981250300350Call: lm(formula = pH.1981 ~ NO3.1981, data = lake) Residuals:
UCSB - ESM - 206
Survey DesignStatsEvaluating SurveysSample size/margin of error How sample is selected representative? Question Wording Non-responseSample SizeSample SizeZ = Z value (e.g. 1.96 for 95% confidence level) p = percentage picking a choi
UCSB - ESM - 206
Instructions 1) Insert your data into the blue cells 2) Insert the name of the data into the green cell 3) Edit the y label of the graph, and adjust the y axis as appropriate 4) If some of the points are overlapping, hit the F9 key (to generate new r
UCSB - ESM - 206
ESM 206 Data Analysis for Environmental Science & Management2007-2008 Bruce KendallCourse Objectives Learn how to use quantitative data analysis to: Make decisions regarding compliance with environmental standards Assess the impact of past mana
UCSB - CHEM - 173B
Electronic structure and spectraCrystal field theoryShriver, Chapter 19An ionic model, considers metal and its ligands as point charges All 5 d-orbitals are isoenergetic in a spherical environment/crystal field (i.e., free atom) Different arrang
UCSB - ESM - 206
Model selection, and influential data pointsESM 206 Jan 24, 2008Multiple independent variables Dependent variable may be caused by more than one independent variable pH affected by both SO4 and NO3 Statistical model: yi01 ix2 izi
UCSB - ESM - 206
Logistic Regression continuedESM 206C April 24 20071More complex logistic regression and other GLM models Can add more variables, interactions, etc. Within the logistic function, model needs to be linear in parameters With multiple logistic
UCSB - ESM - 206
Assumptions of Ordinary Least Squares Regression (Part 2)ESM 206 Jan 21, 200817. Errors not normally distributed Problem: Parameter estimates are unbiased P-values are unreliable Regression fits the mean; with skewed residuals the mean is n
UCSB - CHEM - 173B
Chem 173B/268B Prof. S. ScottHomework assignment #3 Due in class on Monday, March 3, 20081. HF is a weaker acid than HCl, yet HF is more dangerous to work with than HCl. Explain both observations. 2. Account for the instability of polyanions of o
UCSB - ESM - 206
Microexam 3: Solutions 1. Model 1: lm(formula = Prop_exotic ~ GDP + M_imports + GDP:M_imports, data = exotic_sp) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.124e-02 4.111e-02 1.247 0.226287 GDP 1.334e-06 4.713e-06 0.283 0.779920
UCSB - ESM - 206
ESM 206 Problem Set 4 SolutionsLogistic regression A. See slides from lecture B. 1. The most important factor would probably be price, although it might enter in two ways: absolute price, and the price premium (the difference in price between eco a
UCSB - ESM - 206
Power analysis using "Java applets for power and sample size"ESM 206 Nov. 19 2007 The most useful of the online power analysis tools is at http:/www.math.uiowa.edu/~rlenth/Power/. We will learn how to use this using the onesample t-test from the fir
UCSB - ESM - 206
Basic Statistics in Excel Characterizing a sampleOpen the file "mercury data.xls" from the class website. This contains the "data" we looked at in lecture.Numeric characterizationThe Excel functions for the arithmetic mean, the variance, and the
UCSB - ESM - 206
Techniques of Water-Resources Investigations of the United States Geological Survey Book 4, Hydrologic Analysis and InterpretationChapter A3Statistical Methods in Water ResourcesBy D.R. Helsel and R.M. HirschU.S. DEPARTMENT OF THE INTERIOR GAL
UCSB - ESM - 206
Nonlinear Least Squares RegressionESM 206C May 27 2008Nonlinear least squares regression Some models cannot be made linear in parameters E.g., theta-logistic stock-recruitment modelSolution: nonlinear least squares (NLS)yif x1i , x 2i ,
UCSB - ESM - 206
Problem set 1 solutions A: t-test practice1. (a) H0: x30 HA: x (b) 1-tailed 1-sample t-test (c) t = -0.4657, df = 12, P = 0.3249 (d) At all levels of alpha, p> so we fail to reject the null hypothesis and conclude that this species does not maintain
UCSB - ESM - 206
Bootstrap resamplingMay 13, 2008 ESM 206CHere's some data g of mercury in one gram soil samples 0.853511661, 0.391905707, 0.143344303, 0.198267857, 0.266572367, 0.327306702, 0.834747834, 5.322618220, 0.817037696, 0.157247167, 0.328456677, 3.7931
UCSB - ESM - 260
PLoS BIOLOGYThermal Stress and Coral Cover as Drivers of Coral Disease OutbreaksJohn F. Bruno1*, Elizabeth R. Selig2, Kenneth S. Casey3, Cathie A. Page4, Bette L. Willis4, C. Drew Harvell5, Hugh Sweatman6, Amy M. Melendy71 Department of Marine Sc
UCSB - CS - 130B
Computer Science 130B Winter 2007 Programming Assignment #1 Due: 11:59pm, Friday January 26th Implement a divide-and-conquer algorithm for finding the convex hull of a set of 2D points. Your program should take inputs of the following format: n x1 y
UCSB - CS - 130B
Problem (20%) Two character strings may have many common substrings. For example, photograph and tomography have several common substrings of length one, and common substrings of ph and to of length two, and ograph of length six, which is also the ma
UCSB - ESM - 260
Basic OceanographyI. Factors that affect physical oceanographic processes II. Upwelling III. Large-Scale CurrentsIV. El NinoV. TsunamisOceanography OverviewWhat drives all this motion?1) Solar Radiation2) Rotation of the EarthRotation of
UCSB - CS - 130B
Computer Science 130B Winter 2007 Programming Assignment #4 Do not turn in. For your practice only Imagine a set of 8 plane patterns as shown in Figure 1(a). Each pattern differs in shape from the others but together they can be arranged to make diff
UCSB - ME - 125
ME 125NT Intro to Nanotechnology Due: 5/8/2008 Problem Set 5 Problem 6.14 in book We can look at the band structure of an element to get an idea of how many electrons per atom participate in conduction. We can also determine this number based on the
UCSB - ME - 125
Nanotechnology - 125NT Spring 2008 TR 2:00-3:15 Girv 2120 Instructor: Sumita Pennathur Email: sumita@engineering.ucsb.edu Tel: 805-893-5510 Office: Engr II 2330 Office Hours: Thursday 3:30-5pmRequired Text: Rogers, Pennathur, Adams "Nanotechnology: