Unformatted text preview: n, we not only want to beUer understand a distribu9on, but we want to compare the distribu9on for subgroups or to compare against another popula9on or standard • How do you think the expected grade distribu9on might vary with gender? Two Qualita9ve variables Stat 2 Survey Male B
A grade C DF Female sex mosaicplot(table(video$sex, video$grade), main = "Stat 2 Survey") How to read a Mosaic plot There are 91 students in the survey. Think of them as spread out evenly in the box New Plot: Mosaic Put all the females on one side of the box. There are 38. New Plot: Mosaic Rearrange the females so that those who expect the same grade are together in the box. 8 of the 38 expect a C Mosaic plot Stat 2 Survey
Male B
A Smaller frac9on of females expect an A in comparison to Males grade C DF Female sex None of the males expect a C Case: East Bay Housing Market load(url("hUp://www.stanford.edu/~vcs/
StatData/SFHousing.rda")) Warning: It’s BIG San Francisco Chronicle lis9ngs Data • Record: house sold in a par9cular 9me period • Over 200,000 houses • Subset to a dozen ci9es in the East Bay – about 25,000 houses Variables: • City • County • Price • # bedrooms • Lot square footage • and 10 more Rela9onship between city and sale price Data types: City  factor Sale price  numeric Examine a subset of the ci9es someCities = c("Albany", "Berkeley”, "El
Cerrito", "Emeryville", "Piedmont",
"Richmond", "Lafayette", "Walnut Creek",
"Kensington","Alameda","Orinda”,"Moraga")! shousing = !
housing[housing$city %in% someCities &
housing$price < 2000000,]! dim(shousing)!
[1] 20415 15 Boxplots boxplot(shousing$price ~ shousing$city,
las = 2)! Ci9es ordered by median price Rela9onship between price per square foot and total square foot Both are quan9ta9ve ppsf = shousing$price/shousing$bsqft
plot(ppsf ~ shousing$bsqft)! WHAT’s Wrong with this plot? ScaUer plot plot(ppsf ~ shousing$bsqft, plot y against x pch=19,
change plovng character to solid circle cex = 0.2,
shrink plovng character to 20% subset = shousing$city =="Berkeley",!
Plot a subset of records main="Berkeley",
9tle of plot xlab="Area (ft^2)",
label for x axis ylab = "Price/ft^2")
label for y axis Rela9onships between more than 2 variables • Qualita9ve informa9on can be conveyed in plots through color, plovng symbol, juxtaposed panels • The following plot uses informa9on from 4 variables: city, number of bedrooms, lot size (sq i), and price per square i What do you see? Berkeley ● Piedmont ● ●
● ●
● 1 bedrooms ●
● ● ●...
