Ch. 7 – Scatterplots, Association, and Correlation So far, we’ve seen univariate data. This section, however, considers bivariate data and how two numerical variables are related. Methods of description are introduced here and formalized in Ch. 27. Terminology : x y Explanatory variable Response variable Independent variable Dependent variable Predictor variable Predicted variable Notation : - bivariate sample of size n : { ( x 1 , y 1 ), ( x 2 , y 2 ), …, ( x n , y n ) } - sample means: x , y - sample std dev.: s x , s y Displaying relationships : Def’n: An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable. A scatterplot is a graphical display of two quantitative variables. - x -variable goes on the x -axis, y -variable on the y -axis - origin (0,0) may be included Look for : - form of relationship (i.e. any obvious pattern) - strength of relationship (i.e. closeness of fitting to a line) - direction of relationship (i.e. positive or negative association) - any unusual observations or outliers x y 1 1 2 2 4 1 3 2 (graph of above data used to discuss scatterplot traits further) Correlation : Def’n: Pearson’s Sample Correlation Coefficient r is given by r = = = i i y x n i y i x i z z n s y y s x x n 1 1 1 1 1 where i x z is the “standardized” observation for x i and i y z is the “standardized” observation for y i for i = 1, …, n (example graphs of correlation drawn in class: 1. strong positive linear; 2. weak positive linear; 3. strong negative linear; 4. no pattern; 5. parabola; 6. exponential)

Properties of r : A measure of the LINEAR relationship between two variables.
