Important Terms • Population – Total set of subjects in which we are interested • Sample – A subset of the population for which we have data • Subject – Entities we measure (individuals) Histogram Interpretation How many total students sampled? 60 + 82 + 60 + 41 = 243 Which class has highest / lowest frequency? What are those frequencies? Highest: “100-109” with 82 Lowest: “120-129” with 41 How many students have an IQ between 110 and 129? 60 + 41 = 101 Stem-And-Leaf Plot A bar chart on its side “Stem” is all digits except the last one Last digit is the “leaf” Ascending order No commas If nothing in a row, write the row, but leave it blank Example (HW 2.1-2.2) eBay selling prices 199 210 210 223 225 225 225 228 232 235 Sampling Methods Simple Random Sampling Each subject everywhere has an equally likely chance of being selected Often done with a random number table Choosing a company somewhere in the U.S. Systematic Selecting every “ k -th” subject Surveying every 10 th person we meet downtown Convenience Individuals are easily found (e.g. internet surveys) Often the “laziest” way, so less reliable answers Sampling Methods Stratified Sampling Taking some subjects from all possible groups Cluster Sampling Taking all subjects from some possible groups Skewness

Outliers The mean is sensitive to outliers. The median is resistant to outliers. When outliers are present, best to use median as measure of central tendency. Example: average selling price of homes in the U.S. Standard Deviation (Quiz Question) The average distance between any data point and the mean of the data. Measures how much/little the data distribution is spread out. Summary Stats Interpretation Mean Average of the data set Median (also called Q2) About 50% of data lie below (and above) this value. Range Difference between maximum and minimum Max & Min Highest and lowest points in data set Q1 and Q3 25% and 75% percentiles Interquartile Range (IQR) Difference between Q3 and Q1 Box-Plot (HW 2.5-2.6) Greater than 31 cents: .75 Greater than \$1.05: .25 Range = max - min = 206 - 2.6 = 203.4 IQR = Q3 – Q1 = 105 – 31 = 74 IQR = range for the middle half of the data. Box-Plot Outlier Test (HW 2.5-2.6) Any point lying above Q3 + 1.5 x IQR is an outlier. Any point lying below Q1 – 1.5 x IQR is also an outlier. Are there any outliers on this box-plot? Q1 - 1.5 x IQR = 256 - 1.5 x (1105 - 256) = -1017.5 Because there are no points beneath this cutoff, we have no lower outliers. Q3 + 1.5 x IQR = 1105 + 1.5 x (1105 - 256) = 2378.5 Because the max is greater than this cutoff (320,000 > 2378.5), we have an upper outlier. Mean & Median (HW 2.3-2.4) This chart shows the number of grams of protein in various brands of loafs of bread. Compute the mean and median of the data set. What can you say about the shape of the distribution?
mean = 0 ! 15 ( ) + 1 ! 16 ( ) + 2 ! 21 ( ) + 3 ! 4 ( ) 56 = 1.25 Mean & Median (HW 2.3-2.4) For the median, find half the total count (about 28), so we need to find where bread # 28 is.

