# stat notes .pdf - Chapter 2 Categorical/Quantitative...

• 38
• 100% (1) 1 out of 1 people found this document helpful

This preview shows page 1 - 5 out of 38 pages.

-------------------------------------------------------------------------- Chapter 2 - Categorical/Quantitative variables - Frequency tables - Marginal distribution - Conditional distribution - Bar graph - Pie chart Categorical variable - Outcomes fall into different categories (%) - Nominal : no natural ordering (arts vs. science) - no order to it - Ordinal : has a clear ordering (A vs. B vs. C vs. F) Quantitative variable - Outcomes can be measured on a numerical scale (mean, median,..) Frequency table/ Relative frequency table - Display all categories of a single categorical variable with relative frequencies - Their percentages should add up to 100% Bar Chart - Each bar represents a category Pie Chart - Can get confusing! Contingency Table - Shows the relationship between 2 categorical variable - Displays two categorical variables simultaneously - Marginal total subtotal (each column and row) - Grand total total of everything (both column and row) *note: always compare % not counts as the total is not equal! Marginal distribution - Displaying distribution of one variable only (only looking at one set of subtotal on a diff table) Conditional distribution - Displaying a distribution of one variable, satisfying a condition of other variables *note: Not a relationship all the % should be the same A relationship different % between variables Simpson’s Paradox - A trend appears in several different groups of data but disappears when these groups are combined
-------------------------------------------------------------------------- Chapter 3 - Histogram - Distributions of data - Stem and leaf plots - Box plot - mean/median - Measures of spread - Variance - Standard deviation (SD) - Interquartile range (IQR) - Five number summary - Outlier formula Histogram - Graphical display of the distribution of quantitative variables - Frequency is among Y-axis Stem and leaf display - Useful display of quantitative data - Better for smaller data sets - Stem: one or more leading digits - Leaf: single trailing digit Distribution for quantitative data 1. Shape: - Unimodal, bimodal, multimodal (# of peak) - Symmetric or skewed? - Skewed to the right: positively skewed; long right-hand tail - Skewed to the left: negatively skewed; long left-hand tail - Any outliers? (unusually large or small observations) 2. Center 3. Spread
Measures of Centre Mean Median - Middle number observed in data - Divides data into two equal parts **Odd: 1,3,5,... - (middle number) number equals the # of observations to look at **Even: 2,4,6,... - Average of both middle numbers Measures of Spread Range Interquartile range (IQR) - The pth percentile (0<p<100) - p% of the observations fall below it, and (100-p)% fall above it - Quartiles (Q1, Q2, Q3) - Q1 = first quartile = 25th percentile - Q2 = second quartile = 50th percentile = median - Q3 = third quartile = 75th percentile - IQR = Q3 - Q1 *note: third quantile is the median of bottom half of data Variance
Standard deviation (SD) Properties of Variance and SD - Non-negative - The more spread out the observations, the bigger the variance and Sd - SD has the same unit as observations - If observations are all equal, both variance and SD are equal to 0 The 5-number Summary and box-plots - Include: minimum, Q1, Q2, Q3, maximum - Boxplot provides a graphical display of five number summary - Identifying outliers in box plots: -