3 Statistics notes from book 1.1 Individuals – are objects described in a set of data. Sometimes people. When the objects that we want to study are not people, call them cases Variable – any characteristic of an individual. A variable can take different values for different individuals Categorical variable – places an individual into one of two or more groups or categories A quantitative variable – takes numerical values for which arithmetic operations such as adding and averaging make sense Distribution of a variable tells us what values it takes and how often it takes these values Extreme values of a distribution are in the tails of the distribution Back-to-back stem plot used for comparing two related distributions (i.e. male and female literacy rates) Splitting each stem – one with leaves 0-4 and 5-9 Trim by removing last digit or digits before making a stem plot Histogram – breaks range of values of a variable into classes and displays only the count or percent of the observations that fall into each class o Diff btw histogram and bar graph is spaces in bar graph Outlier – individual value that falls outside the overall pattern Peaks are modes, one peak is unmade Symmetry if value smaller and larger than midpoint are mirror images Skewed to right if right tail (larger values) is much longer than left tail Time plot – plots each observation against the time at which it was measured. Always put time on horizontal scale and variable on vertical scale Time series – measurements of a variable taken at regular intervals over time Trend – a persistent, long-term rise or fall Seasonal variation – a pattern that repeats itself at known regular intervals of time 1.2 – describing distributions with numbers Mean is the average value o “x-bar” = 1/n∑x i o sensitive to influence of few extreme observations o not a resistant measure Median is the middle value o If odd, M is center observation in ordered list. Location = (n+1)/2 up from bottom of list o If even, M is mean of two center observations in list. Location = (n+1)/2 from bottom Upper quartile – upper half of the data Lower quartile – lower half of data First quartile Q 1 = median of observations whose position is left of overall M

Third quartile Q 3 = median of observations whose position is right of overall M Five number summary – a set of observations consist of smallest observation, the first quartile, the median, the third quartile, and the largest observation written from smallest to largest Box plot – a graph of a five-number summary. Central box spans q1 and q3. Line marks M. lines extended are min and max The Interquartile range – IQR is the distance between first and third quartile 1.5xIQR rule for outliers – an observation is a suspected outlier if it falls more than 1.5x IQR above the third quartile or below the first quartile variance – s 2
