# Bedrooms
price per square feet
price

n, we not only want to better understand a distribution, but we want to compare the distribution for subgroups or to compare against another population or standard •  How do you think the expected grade distribution might vary with gender? Two Qualitative variables Stat 2 Survey Male B A grade C DF Female sex mosaicplot(table(video$sex, video$grade), main = "Stat 2 Survey") How to read a Mosaic plot There are 91 students in the survey. Think of them as spread out evenly in the box New Plot: Mosaic Put all the females on one side of the box. There are 38. New Plot: Mosaic Rearrange the females so that those who expect the same grade are together in the box. 8 of the 38 expect a C Mosaic plot Stat 2 Survey Male B A Smaller fraction of females expect an A in comparison to Males grade C DF Female sex None of the males expect a C Case: East Bay Housing Market load(url("http://www.stanford.edu/~vcs/ StatData/SFHousing.rda")) Warning: It's BIG San Francisco Chronicle listings Data •  Record: house sold in a particular time period •  Over 200,000 houses •  Subset to a dozen cities in the East Bay – about 25,000 houses Variables: •  City •  County •  Price •  # bedrooms •  Lot square footage •  and 10 more Relationship between city and sale price Data types: City - factor Sale price - numeric Examine a subset of the cities someCities = c("Albany", "Berkeley", "El Cerrito", "Emeryville", "Piedmont", "Richmond", "Lafayette", "Walnut Creek", "Kensington","Alameda","Orinda","Moraga")! shousing = ! housing[housing$city %in% someCities & housing$price < 2000000,]! dim(shousing)! [1] 20415 15 Boxplots boxplot(shousing$price ~ shousing$city, las = 2)! Cities ordered by median price Relationship between price per square foot and total square foot Both are quantitative ppsf = shousing$price/shousing$bsqft  plot(ppsf ~ shousing$bsqft)! WHAT's Wrong with this plot? Scatter plot plot(ppsf ~ shousing$bsqft, plot y against x pch=19, change plotting character to solid circle cex = 0.2, shrink plotting character to 20% subset = shousing$city =="Berkeley",! Plot a subset of records main="Berkeley", title of plot xlab="Area (ft^2)", label for x axis ylab = "Price/ft^2") label for y axis Relationships between more than 2 variables •  Qualitative information can be conveyed in plots through color, plotting symbol, juxtaposed panels •  The following plot uses information from 4 variables: city, number of bedrooms, lot size (sq ft), and price per square ft What do you see? Berkeley ● Piedmont ● ● ● ● ● 1 bedrooms ● ● ● ●
