2-EDA-to-DM

# Next we study the connections between families plots

27 which could indicate diﬀerent types of houses/properties being sold. The relationship is a little more straight-forward for houses greater than 5,000 square feet, though, as there is a clear positive association. Distribution of Sale Price Distribution of Lot Area (Outliers Surpressed) Sale Price by Lot Area (outliers surpressed) 350 \$600,000 400 300 250 \$400,000 150 Sale Price Count Count 300 200 200 100 \$200,000 100 50 0 0 \$0 \$200,000 \$400,000 \$600,000 0 10000 20000 30000 40000 50000 60000 70000 Sale Value 0 10000 20000 30000 40000 50000 60000 70000 Lot Area (SF) Lot Area (SF) Figure 1: Univariate and bivariate distributions of Sale Price (\$) and Lot Area (SF). Note: Two houses have more than 75,000 square feet lot area, and are surpressed in these images. Sale Price by Cluster (Ward's) Lot Area by Cluster (Ward's) 20000 ● ● 6e+05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Neighborhood Analysis SalePrice ● 2000 ● ● ● ● ● ● ● ● ● 1980 ● ● ● 1960 Lot.Area 4e+05 ● ● ● ● ● 15000 ● Year Built by Cluster (Ward's) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Year.Built 5e+05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10000 One of the questions we are interested in answering relates to the diﬀ1940 erences between neighborhoods, 3e+05 as they are deﬁned by the Ames City Assessor's Oﬃce. Since there are 1920 neighborhoods in Ames, this 33 2e+05 is not necessarily an easy or straight-forward task. First of all, Figure 1900 gives us an idea of the spatial 2 1e+05 distribution of the neighborhoods around Ames, along with the spatial distribution of sale price. We can 1880 see that some neighborhoods have many more1 houses than 4others, and that house values are not evenly
