Chapter 1
population – total set of subjects in which we are interested / sample – subset of pop. for whom we have data
descriptive statistics – methods for summarizing data (graphs, avgs, percentages, etc.)
inferential statistics – methods of making decisions / predictions about a population based on sample data.
design – planning how to obtain data / probability – helps to determine chances of an occurrence
random sampling – good and crucial for experiments and studies
subjects – entities that are measured in a study
parameter – numerical summary of a population / statistic – numerical summary of sample taken from a pop.
Chapter 2
variable – any characteristic that is recorded for subjects in a study
categorical variable – observation that belongs to one of a set of categories (college major, dating status, etc.)
quantitative variable – observation that takes on numerical value that represents magnitude of value (GPA, etc.)
discrete quantitative variable – a count; finite number of possible values (“the number of.
..”)
continuous “
“  form an interval; continuum of infinitely many possible values (time, height, dist., etc.)
frequency table – shows # of observations/values for variable; counts
percentage table – shows proportions x 100 (percentages and proportions are relative frequencies)
categorical variable graphs – pie charts, bar graphs (Pareto chart – bar graph ordered by highest to lowest freq., Pareto
principle – a small subset of categories often contains most of the observations).
quantitative variable graphs
dot plot
stemandleaf plot
histogram
dot & s&l plots good for small data sets; histograms good for large data sets
histograms more flexible in defining intervals than s&l plots
data values retained in s&l and dot plots but not histograms
bar chart – for categorical variables, histogram – for quantitative variables

 shape – symmetric
skewed to the right if
or skewed
if right tail > left tail
(highest pt
(pop. is
viceversa
 mode)
polarized)
time series – data set collected over time; graphically displayed by time plot; common pattern – trend
measures of center – mean and median
mean = sum of observations / number of observations; median – midpt. of obs. ordered from least to greatest
mean =
symmetric shape: mean = median, right skewed: mean > median, left skewed: mean < median
outliers – observations that fall well above/below overall bulk of data
mean can be highly influenced by outliers / median is resistant to outliers / mode – value that occurs most freq.
numerical summary of observations is resistant if outliers have little to no effect on its value
measures of spread – range and standard deviation
range: difference b/w largest and smallest observations
deviation: difference b/w observation and mean (xx); positive deviation + negative deviation = 0
variance: avg of squared deviations / standard deviation: square root of variance
standard deviation =
