This preview shows pages 1–8. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: LINEAR REGRESSION AND CORRELATION • Consider bivariate data consisting of ordered pairs of numerical values ( x, y ). • Often such data arise by setting an x variable at certain fixed values (which we will call levels ) and taking a random sample from the population of Y that is assumed to exist at each of the levels of x . • Here we are thinking of x as not being a random variable , because we are considering only selected fixed values of x (for sampling purposes). • However, Y is a random variable and we define Y on the population that exists at each level of x . 1 • Graphically, a scatterplot of the data depicting the yvalues obtained by sampling the populations at each of the pre selected xvalues might appear as follows: 6 r r r r r r r r r r r r r r r r r r r r r r r x y 2 Our objectives given such data are usually twofold: • Summarize the characteristics of the Y populations across values of x – Fit the Model • Interpolate between levels of X to estimate parameters of Y populations from which samples were not taken – Prediction • The center of our attention is usually on the means of the Y populations, E ( Y ) , and especially their relationship to one another . • Considering various relationships among these population means is called parametric modeling . 3 • The simple linear regression model says that the populations at each xvalue are normally distributed and that the means of these normal distributions all fall on a straight line, called the regression line. • Chapters 11 and 12 are mostly about investigating to what extent the relationship among the population means is linear . 4 • Let us begin by considering a linear relationship among population means. • The equation of a straight line through means E ( Y ) across xvalues can be written as E ( Y ) = β + β 1 x • Here β is the intercept and β 1 is the slope of the line. 6 x E(Y) ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! β β 1 1 5 • The yvalues observed at each xvalue is assumed to be a random sample from a normal distribution with the mean E ( Y ) = β + β 1 x , i.e., the mean is a linear function of x . • The variance of the normal distributions at each xvalue is assumed to be the same (or constant). • Thus the yvalues can be related to the xvalues through the relationship y = β + β 1 x + (1) • Here is a random variable (called random error ) with mean zero i.e, ( E ( ) = 0) , and variance σ 2 . • This model says that sample values are random distances from the line μ = β + β 1 x at each xvalue. 6 • The unknown constants in Equation (1), β , β 1 , and σ 2 are called the parameters of the model. • The next question we consider is “How do we proceed to derive a good approximating line through Y population means, given only samples from some of the Y populations?” • In other words, we need to obtain good estimates of the parameters of the model using the observed data....
View
Full
Document
 Spring '10
 hao
 Correlation, Linear Regression

Click to edit the document details