CH. 5 -Regression (124)-defined ind. var.,dep. var. free to vary, predict Y from X. Correlation (125)-have paired X and Y scores and want to see if variables are related and how strong the association is, no obvious IV, no preselected values of X or Y so both free to vary. Bivariate -representation of the joint freq. of 2 var. . Reversion toward the mean (126)-Galton. Reversion Line -best fitting straight line. Correlation coefficient (127)-# that describes the degree of ass. or strength of a relationship btw 2 variables. Pearson product moment correlation coefficient(r XY or r) (127)-measure strength & direction of linear relationship btw two quantitative var.; most widely used. Cross Product (131)-product of the 2 deviations (X i -Xbar)(Y i -Ybar) . Covariance (132)-obtain strength of ass. that is ind. of the # of pairs of scores, mean of the cross product sum (133): S xy = Σ(X i -Xbar)(Y i -Ybar)/ n; CPS reflects Coefficient of Determination (r 2 ) (135)-variance explained or accounted for. Coefficient of NONdetermination(k 2 ) -k 2 = 2 1 r - ; variance not explained. (136) S X 2 and S Y 2 -sample variances(measure of the dispersion of scores). Common Errors in Correlation (138) 1) R is not a % of association b/w 2 variables but measure of strength; -1 to 1. 2) Interpreting r in terms of arbitrary descriptive labels 3) Inferring that b/c 2 variables are correlated, 1 causes the other. Factors that affect the size of a correlation coefficient (140): 1) Nature of the relationship between X and Y: correlation ratio or eta squared(η 2 ) determines strength of assoc. btw nonlinearly related var. Eyeball test (141) - most simple evidence of nonlinearity. 2)Truncated Range (restricted)-size of r will be reduced. 3) Spurious effects due to subgroups w/ different means or standard deviations(142). Discontinuous Dist. (144)-results when sample is restricted to sm. # of points along continuum or when sample contains outliers, i.e. extreme groups. Heteroscedasticity (145)-unequal Homoscedasticity -dispersions are uniform. Spearman Rank Correlation (r S ) (147) - paired data are in ranks ; measure of the monotonic relationship btw 2 sets of ranks (149): r s =1- (6Σ(Rx i -Ry i ) 2 )/ (n(n 2 -1)) ; where Rx i -Ry i =difference btw the i th person’s ranks on X and Y, n =# of pairs of ranks; coefficient=1 if each person’s X and Y ranks are equal. CH. 6- Regression Analysis (160)-paired data(X i ,Y i ) where X is the ind. variable w/ values X i that are selected in advance and Y is the dep. variable w/ values Y i . Multiple regression- simultaneous use of 2 or more predictors in predicting a dep. var. Predicting Y from X (162)(usual method)-best fitting line should minimize some function of the error in predicting Y i from X i ; vertical distances on Y axis. Prediction Error or Residual -( e i ); difference between the actual ith score(Y i ) and the predicted score ( Y’ i) : e i =Y i -Y i . Principal of least squares-line of best fit
STATS 2402-04

