12 Take a look at the pickup data with price grouped by make boxplotdataprice

# 12 take a look at the pickup data with price grouped

• 33

This preview shows page 13 - 26 out of 33 pages.

12
Take a look at the pickup data, with price grouped by make . > boxplot(data\$price ~ data\$make, ylab="price") Dodge Ford GMC 5000 10000 15000 20000 price Doesn’t look like there is much going on. 13
R can fit the ANOVA model with the lm function: > model <- lm(data\$price ~ data\$make) > coef(model) (Intercept) data\$makeFord data\$makeGMC 6554.200 2313.717 1442.008 > mean(data\$price[data\$make=="Dodge"]) [1] 6554.2 > mean(data\$price[data\$make=="Ford"]) [1] 8867.917 > mean(data\$price[data\$make=="GMC"]) [1] 7996.208 14
And the anova function is useful to make a pretty table: > anova(model) Analysis of Variance Table Response: data\$price Df Sum Sq Mean Sq F value Pr(>F) data\$make 2 29571553 14785776 0.4628 0.6326 Residuals 43 1373653582 31945432 > 29571553/(1373653582+29571553) [1] 0.02107399 This last number is SSR / ( SSE + SSR ) = SSR / SST : I Brand explains only 2% of our observed variability! 15
What else is going on in the ANOVA table? I Mean Square values are sums of squares ( SSR and SSE ) divided by the degrees of freedom ( R - 1 and n - R ). I If β 1 = · · · = β R - 1 = 0 , then MSR / MSE is an “ F random variable, so the top-right value is P ( F > MSR / MSE ). I This is the probability of observing a larger MSR / MSE , if the groupings do not matter. I We have P ( F > MSR / MSE ) = . 63, which does not indicate strong evidence against β 1 . . . = β R - 1 = 0. Some of this should seem familiar; we’ll see more detail later. 16
Learning Check 1. Think-Pair-Share : In your own words, what is the difference between a conditional and a marginal distribution? Come up with your own example. 17
Correlation and covariance Cov ( X , Y ) = E [ ( X - E [ X ]) ( Y - E [ Y ]) ] 0.0 0.2 0.4 0.6 0.8 1.0 -1 0 1 2 3 4 x y E[Y] E[X] X and Y vary with each other around their means. 18
Correlation is the standardized covariance: corr ( X , Y ) = cov ( X , Y ) p var ( X ) var ( Y ) = cov ( X , Y ) sd ( X ) sd ( Y ) The correlation is scale invariant and the units of measurement don’t matter: I It is always true that - 1 corr ( X , Y ) 1. Correlation gives I the direction ( - or + ) I and strength (0 1) of the linear relationship between X and Y . 19
Sample correlation and std. deviation Recall: I Sample Covariance is s xy = n i =1 ( X i - ¯ X )( Y i - ¯ Y ) n - 1 . (in units X times units Y ) I Sample Standard Deviation is s x = s n i =1 ( X i - ¯ X ) 2 n - 1 . (in units X ) I Sample Correlation is r xy = s xy s x s y = 1 n - 1 n X i =1 ( X i - ¯ X ) s x ( Y i - ¯ Y ) s y . (correlation is scale free!) 20
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 corr = 1 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 corr = .5 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 corr = .8 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 corr = -.8 21
Correlation only measures linear relationships: I corr( X , Y ) = 0 does not mean the variables are unrelated! -3 -2 -1 0 1 2 -8 -6 -4 -2 0 corr = 0.01 0 5 10 15 20 0 5 10 15 20 corr = 0.72 Also be careful with influential observations. 22
Correlation and regression “Imagine” that Y = b 0 + b 1 X + e : cov ( X , Y ) = cov ( X , b 0 + b 1 X + e ) = cov ( X , b 1 X ) = b 1 var ( X ) Thus corr ( X , Y ) = b 1 σ x σ y b 1 = r xy s y s x . That is, b 1 is correlation times units Y per units X . 23
We used the definition of covariance to suggest what the slope, b 1 should be. What about the intercept, b 0 ?

#### You've reached the end of your free preview.

Want to read all 33 pages?

• Spring '14

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern