entire collection of ind. or objects about which information is desired
subset of the population selected for study
|categorical data (qualitative)||
univariate data set when observations are categorical
univariate data set if each observation is a number
possible values of the variable correspond to isolated points on a number line
observations determined by counting
possible values forms and entire interval on the number line
table that displays the possible categories along with the associated frequencies and/or relative frequencies
use with categorical data
horizontal access used for category names
vertical axis used for frequency or relative frequency
looking for frequently and infrequently occurring categories
observes characteristics of a sample selected from one or more existing populations
goal is to draw conclusions about corresponding population or about differences between two or more pops.
when investigator observes how response variable behaves when one or more of explanatory variables (factors) are manipulated
goal is to determine effect of manipulated factors
researcher controls who is in which group
|selection bias (undercoverage)||
tendency for samples to differ from the corresponding pop. as a result of systematic exclusion of some part of the pop.
|response bias (measurement)||
tendency for samples to differ from the corresponding population because the method of observation tends to produce values that differ from the true value
|simple random sample (SRS)||
of size n is a sample that is selected from a population in a way that ensures that every different possible sample of desired size has same chance of being selected
variables that have values that are controlled by the experimenter
variable that is not controlled by the experimenter and is measured as part of experiment
one that is not one of the explanatory variables in the study but is thought to affect the response variable
(of subjects to treatments or of treatments to trials)
to ensure that the experiment does not systematically favor one experimental condition(treatment) over another
|comparative bar chart||
used to give a visual comparison of two or more groups
accomplished by constructing two or more bar charts that use the same set of horizontal and vertical axes.
use the relative frequency to construct scale on vertical axis so we can make meaningful comparisons if sample sizes are not the same.
a compact way to summarize univariated numerical data
each number broke into two pieces
used with a small to moderate number of observations(not large)
stem is the first part of the number and consists of beginning digit(s)
leave is the last part and consists of final digit(s)
an unusually small or large data value.
|relative frequency distribution||
calculated by dividing the frequency by total # of observations in the data set
graph of the frequency or relative frequency distribution
similar to a bar chart for categorical data.
discrete numerical data
works well for large data sets
horizontal and vertical scale
histogram with a single peak
histogram with two peaks
|positively skewed (right skewed)||
if upper tail of histogram stretches out much farther than lower tail
|negatively skewed (left skewed)||
if lower tail is much longer then the upper tail
vertical line of symmetry so that the part of the histogram to the left of the line is a mirror image to the part on the right.
most important graph based on bivariate numerical data
x-axis meets a horizontal line from y-axis shows the point representing the observation
a fairly strong curved pattern indicates a strong relationship
|sample mean (average)||
sum of all observations in the sample divided by number of observations in the sample.
obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list. Then:
if single middle value of n is odd this equals sample mean
the average of the middle two values if n is even
|comparing mean and median||
median is the value on the measurement axis that separates the smoothed histogram into two equal parts
the mean is the balance point for the distribution
if histogram is symmetric(dividing point and balance point equal) mean and median are the same.
If histogram is unimodal with a longer upper tail (+) the outlying values in the upper tail pull the mean up so it will generally lie above the median. An unusually high exam score will raise the mean but does not affect the median....and vise versa for negative skew.
number of &amp;amp;amp;amp;amp;quot;S's&amp;amp;amp;amp;amp;quot; in the sample divided by n
largest observation - smallest observation
sum of squared deviations from the mean divided by n-1
value - mean squared divided by n-1
|sample standard deviation||
the size of a &amp;amp;amp;amp;amp;quot;typical&amp;amp;amp;amp;amp;quot; or &amp;amp;amp;amp;amp;quot;representative&amp;amp;amp;amp;amp;quot; deviation from the mean
it is the positive square root of the sample variance and is denoted by s.
|quartiles and interquartile range (IQR)||
IQR-measure of variability that is resistant to the effects of outliers
lower quartile=mean of lower half of sample
upper quartile=mean of upper half of sample
uses smallest observation
lower quartile = median of lower half
upper quartile = median of upper half
see figure 4.8 practice
more than 1.5 (IQR) away from the nearest quartile. (the nearest end of the box
it is extreme if it is more than 9(IQR) from the nearest quartile and it is mild otherwise.
value - mean divided by s/d
tells us how many standard deviations the value is from the mean.
It is positive or negative according to whether the value lies about or below the mean
|correlation coefficient r||
measures the strength of any linear relationship between two numerical variables =pearson!!!
sample regression line, the line that minimizes this sum of squared deviations
y hat = a + bx
y hat is the prediction of y that results from substuting a particular x value into the equation
Pieces of information about individuals organized in variables
Set of data identified with particular circumstances
No natural order
ex. gender/eye color
ex. categories ordered from strongly disagree to strongly agree
A measurement or count that it makes sense to discuss the difference between the. Slurs but not the ratio
ratio between values has intrinsic meaning
Ex. Income weight or time
-what values the variables take and
-how often the variable takes those values mode
The most commonly occurring value in a distribution
Average of a set
Midpoint half of the observations are smaller and half are larger
P(A)= 1 - P(notA) or
P(notA)= 1 - P(A)
P(A and B) = P(A) * P (B)
|The General Addition Rule||
For any two events A and B, P( A or B)= P(A) + P(B) - P(A and B).
Of event B, given A is
P(B l A) = P(A and B)/ P(A)
compare P(B l A) and P(B l not A)
|The General Multiplication Rule||
A and B P(A and B) = P(A) * P(B l A)