2-EDA-to-DM

# There is some odd behavior with lot areas smaller

Unformatted text preview: asured on a scale as deﬁned by the city assessor. As anPrice (\$) the scale forLot Area (SF) Year Built one of these variables, overall quality, is given Sale example, Total SF in Table 1. We also applied these scoring systems to the ordinal variables without associated numerical Min 40,000 428 1,008 1872 values, as most of the ordinal variables have similarly named levels. Since the scales of these ordinal Median 159,000 2,390 9,239 1976 variables are discrete, this all but rules out model-based clustering as a legitimate option, as the data is Max 615,000 5,542 215,200 2010 deﬁnitely not multivariate normal. Numeric variables - summary Mean 174,700 2,444 9,719 1972 Level Value SD 7,033 765 8,172 30 Poor 1.0 Table 2: Summary statistics for a selection of numerical variables in Poor 1.5 the Ames housing dataset. BAvg BAvg Avrg AAvg Good VGd Excl Excl 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 Table 1: Levels of the Overall Quality variable, and the corresponding numeric values associated with them by the Ames Assessor’s Oﬃce. EDA 2 To get an idea of the values some of the variables in this dataset take on, summary statistics for a selection of the numerical and ordinal variables in the dataset are shown in Tables 2 and 3, respectively. The most interesting variable in the dataset, perhaps, is sale price, which ranges from \$40,000 to \$615,000, and has a median of \$159,000. In Table 3 it appears the majority of homes have around average or good overall quality and condition. Min Median Max Sale Price (\$) 40,000 159,000 615,000 Total SF 428 2,390 5,542 Lot Area (SF) 1,008 9,239 215,200 12 Year Built 1872 1976 2010 Mean 174,700 2,444 9,719 1972 SD 7,033 765 8,172 30 Table 2: Summary statistics for a selection of numerical variables in the Ames housing dataset. 2 12 12 Categorical variables - summary 12 Categorical variables - summary Overall Quality Overall Condition Basement Quality Kitchen Quality Poor1.0: 1 Poor1.0: 3 None : 91 Poor : 1 Poor1.5: 8 Poor1.5: 7 Poor : 2 Fair : 33 Fair2.0: 1...
