Statistics Flashcards

Terms Definitions
population
entire collection of ind. or objects about which information is desired
sample
subset of the population selected for study
categorical data (qualitative)
univariate data set when observations are categorical
numerical data(quantitative)
univariate data set if each observation is a number
discrete data
possible values of the variable correspond to isolated points on a number line
observations determined by counting
continuous data
possible values forms and entire interval on the number line
frequency distribution
table that displays the possible categories along with the associated frequencies and/or relative frequencies
bar chart
use with categorical data
horizontal access used for category names
vertical axis used for frequency or relative frequency
looking for frequently and infrequently occurring categories
observational study
observes characteristics of a sample selected from one or more existing populations
goal is to draw conclusions about corresponding population or about differences between two or more pops.
experiment
when investigator observes how response variable behaves when one or more of explanatory variables (factors) are manipulated
goal is to determine effect of manipulated factors
researcher controls who is in which group
selection bias (undercoverage)
tendency for samples to differ from the corresponding pop. as a result of systematic exclusion of some part of the pop.
response bias (measurement)
tendency for samples to differ from the corresponding population because the method of observation tends to produce values that differ from the true value
simple random sample (SRS)
of size n is a sample that is selected from a population in a way that ensures that every different possible sample of desired size has same chance of being selected
explanatory variable(factors)
variables that have values that are controlled by the experimenter
response variable
variable that is not controlled by the experimenter and is measured as part of experiment
treatment
experimental condition
extraneous variable
one that is not one of the explanatory variables in the study but is thought to affect the response variable
random assignment
(of subjects to treatments or of treatments to trials)
to ensure that the experiment does not systematically favor one experimental condition(treatment) over another
comparative bar chart
used to give a visual comparison of two or more groups
accomplished by constructing two or more bar charts that use the same set of horizontal and vertical axes.
use the relative frequency to construct scale on vertical axis so we can make meaningful comparisons if sample sizes are not the same.
stem-and-leaf display
a compact way to summarize univariated numerical data
each number broke into two pieces
used with a small to moderate number of observations(not large)
stem is the first part of the number and consists of beginning digit(s)
leave is the last part and consists of final digit(s)
outlier (p103)
an unusually small or large data value.
relative frequency distribution
calculated by dividing the frequency by total # of observations in the data set
histogram
graph of the frequency or relative frequency distribution
similar to a bar chart for categorical data.
discrete numerical data
works well for large data sets
horizontal and vertical scale
unimodal
histogram with a single peak
bimodal
histogram with two peaks
positively skewed (right skewed)
if upper tail of histogram stretches out much farther than lower tail
negatively skewed (left skewed)
if lower tail is much longer then the upper tail
symmetric
vertical line of symmetry so that the part of the histogram to the left of the line is a mirror image to the part on the right.
scatterplot
most important graph based on bivariate numerical data
x-axis meets a horizontal line from y-axis shows the point representing the observation
a fairly strong curved pattern indicates a strong relationship
sample mean (average)
sum of all observations in the sample divided by number of observations in the sample.
sample median
obtained by first ordering the n observations from smallest to largest (with any repeated values included, so that every sample observation appears in the ordered list. Then:
if single middle value of n is odd this equals sample mean
the average of the middle two values if n is even
comparing mean and median
median is the value on the measurement axis that separates the smoothed histogram into two equal parts
the mean is the balance point for the distribution
if histogram is symmetric(dividing point and balance point equal) mean and median are the same.
If histogram is unimodal with a longer upper tail (+) the outlying values in the upper tail pull the mean up so it will generally lie above the median. An unusually high exam score will raise the mean but does not affect the median....and vise versa for negative skew.
sample proportion
number of "S's" in the sample divided by n
range
largest observation - smallest observation
sample variance
sum of squared deviations from the mean divided by n-1
value - mean squared divided by n-1
sample standard deviation
the size of a "typical" or "representative" deviation from the mean
it is the positive square root of the sample variance and is denoted by s.
quartiles and interquartile range (IQR)
IQR-measure of variability that is resistant to the effects of outliers
lower quartile=mean of lower half of sample
upper quartile=mean of upper half of sample
IQR=uq-lq
five-number summary
uses smallest observation
lower quartile = median of lower half
median
upper quartile = median of upper half
largest observation
skeletal boxplot
see figure 4.8 practice
outlier (p185)
more than 1.5 (IQR) away from the nearest quartile. (the nearest end of the box
it is extreme if it is more than 9(IQR) from the nearest quartile and it is mild otherwise.
z-score
value - mean divided by s/d
tells us how many standard deviations the value is from the mean.
It is positive or negative according to whether the value lies about or below the mean
correlation coefficient r
measures the strength of any linear relationship between two numerical variables =pearson!!!
least-squares line
sample regression line, the line that minimizes this sum of squared deviations
y hat = a + bx
a=intercept
b=slope
y hat is the prediction of y that results from substuting a particular x value into the equation
Data
Pieces of information about individuals organized in variables
dataset
Set of data identified with particular circumstances
Norminal variables
No natural order
ex. gender/eye color
Ordinal variables
Natural order
ex. categories ordered from strongly disagree to strongly agree
Interval variables
A measurement or count that it makes sense to discuss the difference between the. Slurs but not the ratio
ex. Temperature
Ratio variables
ratio between values has intrinsic meaning
Ex. Income weight or time
Distribution
-what values the variables take and
-how often the variable takes those values mode
Mode
The most commonly occurring value in a distribution
highest frequency
Mean
Average of a set
Median
Midpoint half of the observations are smaller and half are larger
Complement Rule
P(A)= 1 - P(notA) or
P(notA)= 1 - P(A)
at least
Multiplication Rule
P(A and B) = P(A) * P (B)
The General Addition Rule
For any two events A and B, P( A or B)= P(A) + P(B) - P(A and B).
Conditional Probability
Of event B, given A is
P(B l A) = P(A and B)/ P(A)
Check Independence
compare P(B l A) and P(B l not A)
The General Multiplication Rule
A and B P(A and B) = P(A) * P(B l A)
/ 59
Term:
Definition:
Definition:

Leave a Comment ({[ getComments().length ]})

Comments ({[ getComments().length ]})

{[comment.username]}

{[ comment.comment ]}

View All {[ getComments().length ]} Comments
Ask a homework question - tutors are online