Terms  Definitions 

population 
entire collection of ind. or objects about which information is desired

sample 
subset of the population selected for study

categorical data (qualitative) 
univariate data set when observations are categorical

numerical data(quantitative) 
univariate data set if each observation is a number

discrete data 
possible values of the variable correspond to isolated points on a number line
observations determined by counting 
continuous data 
possible values forms and entire interval on the number line

frequency distribution 
table that displays the possible categories along with the associated frequencies and/or relative frequencies

bar chart 
use with categorical data
horizontal access used for category names vertical axis used for frequency or relative frequency looking for frequently and infrequently occurring categories 
observational study 
observes characteristics of a sample selected from one or more existing populations
goal is to draw conclusions about corresponding population or about differences between two or more pops. 
experiment 
when investigator observes how response variable behaves when one or more of explanatory variables (factors) are manipulated
goal is to determine effect of manipulated factors researcher controls who is in which group 
selection bias (undercoverage) 
tendency for samples to differ from the corresponding pop. as a result of systematic exclusion of some part of the pop.

response bias (measurement) 
tendency for samples to differ from the corresponding population because the method of observation tends to produce values that differ from the true value

simple random sample (SRS) 
of size n is a sample that is selected from a population in a way that ensures that every different possible sample of desired size has same chance of being selected

explanatory variable(factors) 
variables that have values that are controlled by the experimenter

response variable 
variable that is not controlled by the experimenter and is measured as part of experiment

treatment 
experimental condition

extraneous variable 
one that is not one of the explanatory variables in the study but is thought to affect the response variable

random assignment (of subjects to treatments or of treatments to trials) 
to ensure that the experiment does not systematically favor one experimental condition(treatment) over another

comparative bar chart 
used to give a visual comparison of two or more groups
accomplished by constructing two or more bar charts that use the same set of horizontal and vertical axes. use the relative frequency to construct scale on vertical axis so we can make meaningful comparisons if sample sizes are not the same. 
stemandleaf display 
a compact way to summarize univariated numerical data
each number broke into two pieces used with a small to moderate number of observations(not large) stem is the first part of the number and consists of beginning digit(s) leave is the last part and consists of final digit(s) 
outlier (p103) 
an unusually small or large data value.

relative frequency distribution 
calculated by dividing the frequency by total # of observations in the data set

histogram 
graph of the frequency or relative frequency distribution
similar to a bar chart for categorical data. discrete numerical data works well for large data sets horizontal and vertical scale 
unimodal 
histogram with a single peak

bimodal 
histogram with two peaks

positively skewed (right skewed) 
if upper tail of histogram stretches out much farther than lower tail

negatively skewed (left skewed) 
if lower tail is much longer then the upper tail

symmetric 
vertical line of symmetry so that the part of the histogram to the left of the line is a mirror image to the part on the right.

scatterplot 
most important graph based on bivariate numerical data
xaxis meets a horizontal line from yaxis shows the point representing the observation a fairly strong curved pattern indicates a strong relationship 
sample mean (average) 
sum of all observations in the sample divided by number of observations in the sample.

sample median 
obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list. Then:
if single middle value of n is odd this equals sample mean the average of the middle two values if n is even 
comparing mean and median 
median is the value on the measurement axis that separates the smoothed histogram into two equal parts
the mean is the balance point for the distribution if histogram is symmetric(dividing point and balance point equal) mean and median are the same. If histogram is unimodal with a longer upper tail (+) the outlying values in the upper tail pull the mean up so it will generally lie above the median. An unusually high exam score will raise the mean but does not affect the median....and vise versa for negative skew. 
sample proportion 
number of &amp;amp;amp;amp;amp;quot;S's&amp;amp;amp;amp;amp;quot; in the sample divided by n

range 
largest observation  smallest observation

sample variance 
sum of squared deviations from the mean divided by n1
value  mean squared divided by n1 
sample standard deviation 
the size of a &amp;amp;amp;amp;amp;quot;typical&amp;amp;amp;amp;amp;quot; or &amp;amp;amp;amp;amp;quot;representative&amp;amp;amp;amp;amp;quot; deviation from the mean
it is the positive square root of the sample variance and is denoted by s. 
quartiles and interquartile range (IQR) 
IQRmeasure of variability that is resistant to the effects of outliers
lower quartile=mean of lower half of sample upper quartile=mean of upper half of sample IQR=uqlq 
fivenumber summary 
uses smallest observation
lower quartile = median of lower half median upper quartile = median of upper half largest observation 
skeletal boxplot 
see figure 4.8 practice

outlier (p185) 
more than 1.5 (IQR) away from the nearest quartile. (the nearest end of the box
it is extreme if it is more than 9(IQR) from the nearest quartile and it is mild otherwise. 
zscore 
value  mean divided by s/d
tells us how many standard deviations the value is from the mean. It is positive or negative according to whether the value lies about or below the mean 
correlation coefficient r 
measures the strength of any linear relationship between two numerical variables =pearson!!!

leastsquares line 
sample regression line, the line that minimizes this sum of squared deviations
y hat = a + bx a=intercept b=slope y hat is the prediction of y that results from substuting a particular x value into the equation 
Data 
Pieces of information about individuals organized in variables

dataset 
Set of data identified with particular circumstances

Norminal variables 
No natural order
ex. gender/eye color 
Ordinal variables 
Natural order
ex. categories ordered from strongly disagree to strongly agree 
Interval variables 
A measurement or count that it makes sense to discuss the difference between the. Slurs but not the ratio
ex. Temperature 
Ratio variables 
ratio between values has intrinsic meaning
Ex. Income weight or time 
Distribution 
what values the variables take and
how often the variable takes those values mode 
Mode 
The most commonly occurring value in a distribution
highest frequency 
Mean 
Average of a set

Median 
Midpoint half of the observations are smaller and half are larger

Complement Rule 
P(A)= 1  P(notA) or
P(notA)= 1  P(A) at least 
Multiplication Rule 
P(A and B) = P(A) * P (B)

The General Addition Rule 
For any two events A and B, P( A or B)= P(A) + P(B)  P(A and B).

Conditional Probability 
Of event B, given A is
P(B l A) = P(A and B)/ P(A) 
Check Independence 
compare P(B l A) and P(B l not A)

The General Multiplication Rule 
A and B P(A and B) = P(A) * P(B l A)

Leave a Comment ({[ getComments().length ]})
Comments ({[ getComments().length ]})
{[ comment.comment ]}