| Terms |
Definitions |
|
modified boxplot
|
a display for quantitative data that graphs the five-number summary on an axis and shows outliers of they exist
|
|
double-blind
|
when neither the subjects nor people who have contact with them know which treatment a subject received
|
|
intersection
|
the event that all of the events occur
|
|
monotonic increasing function
|
Preserves the order of data. That is, if a > b, then f(a) > f(b)
|
|
probability
|
the proportion of times the outcome would occur in a very long series of repetitions
|
|
pth percentile
|
the value such that p percent of the observations fall underneath or at it (median is 50th percentile)
|
|
probability model
|
a mathematical description of a random phenomenon consisting of a sample space and a way of assigning probabilities to events
|
|
placebo
|
a dummy treatment
|
|
five-number-summary
|
A summary of a data set that includes the minimum, lower quartile, median, upper quartile, and maximum.
|
|
union
|
the event that at least one of the collection occurs
|
|
conditional probability
|
the probability that an event will occur given that one or more other events have occurred
|
|
left-skewed
|
A density curve where the left side of the distribution extends in a long tail. (Mean < median.)
|
|
monotonic function
|
moves in one direction as its argument t increases, Moves in one direction as its argument t increases
|
|
subjects
|
what the units are called when they are human beings
|
|
study
|
an experiment when we actually do something to people, animals, objects in order to observe the response
|
|
factor
|
the explanatory variables in an experiment
|
|
block
|
a group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments
|
|
variable
|
any characteristics of an individual
|
|
quantitative variable
|
takes numerical values for which arithmetic operations such as adding and averaging make sense
|
|
distribution
|
tells you what values the variable takes and how often it takes these values
|
|
third quartile
|
the median of the upper half of the data
|
|
randomize
|
use impersonal chance to assign experimental units to treatments
|
|
right-skewed
|
Right tail of a graph is longer than the left; mean > median
|
|
explanatory data analysis
|
using statistical tools and ideas to help you examine data in order to describe their main features
|
|
slope
|
the rate of change.
|
|
probability sample
|
a sample chosen by chance
|
|
lurking variable
|
a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables
|
|
curved pattern
|
shows that the relationship is not linear
|
|
population
|
the entire group of individuals that we want information about
|
|
histogram
|
a bar chart representing a frequency distribution.
|
|
outlier
|
an individual observation that falls outside the overall pattern of the graph
|
|
extrapolation
|
the use of regression line for prediction far outside the domain of values of the explanatory variable(x) that you used to obtain the line or curve.
|
|
voluntary response sample
|
consists of people who choose themselves by responding to a general appeal
|
|
correlation
|
measures the direction and strength of the linear relationship between two quantitative variables.
|
|
monotonic decreasing function
|
Reverses the order of data. That is, if a > b, then f(a) < f(b)
|
|
symmetric
|
having similarity in size, shape, and relative position of corresponding parts.
|
|
z score
|
a measure of how many standard deviations you are away from the norm (average or mean)
|
|
normal probability plot
|
a display to help assess whether a distribution of data is approximately normal; if it is nearly straight, the data satisfy the nearly normal condition
|
|
nonresponse
|
occurs when an individual chosen for the sample can't be contacted or does not cooperate
|
|
explanatory variable
|
a variable that is being manipulated by the researcher in order to see if it affects the outcome variable
|
|
sampling
|
involves studying a part in order to gain information about the whole
|
|
exponential growth
|
increases by a fixed percentage of the previous total
|
|
dotplot
|
a type of graph used to display quantitative data, graphs a dot for each case against a single axis
|
|
undercoverage
|
occurs when some groups in the population are left out of the process of choosing the sample
|
|
statistical inference
|
answers specific questions with a known degree of confidence
|
|
statistics
|
the science of data
|
|
Interquartile range
|
Q3-Q1
|
|
statistical significance
|
an observed effect so large that it would rarely occur by chance
|
|
random
|
when individual outcomes are uncertain
|
|
outlier
|
an observation that lies outside the overall pattern of the other observations
|
|
influential observations
|
individual points that substantially change the regression line
|
|
bias
|
when the design of a study is systematically favors certain outcomes' systematic favoritism towards one outcome.
|
|
time plot
|
plots each observation against the time at which it was measured
|
|
trend
|
a long-term upward or downward movement over time
|
|
simulation
|
the imitation of chance behavior, based on a model that accurately reflects the experiment under consideration
|
|
convenience sampling
|
chooses the individuals easiest to reach
|
|
independent
|
knowing that one event occuring does not change the probability that we would assign to the other event
|
|
median
|
the midpoint of a distribution
|
|
probability model
|
Used to calculate theoretical answer.
|
|
density curve
|
a curve with an area of exactly 1 (100%) beneath it; alawys on or above the horizontal axis; median divides area under curve in half; mean is the 'balance point'; when mean and median are the same, curve is symmetric
|
|
percentile
|
A point on the distributionbelow with a certain % of scores fall. Ex: Scored in the 99th percentile = only 1% scored higher than you
|
|
ogive
|
a line graph of a cumulative frequency or cumulative relative frequency distribution.
|
|
observational study
|
observes individuals and measures variables of interest but does not attempt to influence the responses
|
|
least-squares regression line
|
the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible
|
|
categorical variable
|
places an individual into one of several groups or categories
|
|
standard deviation
|
the square root of the variance
|
|
response variable
|
measures an outcome of a study
|
|
strata
|
groups of similar individuals
|
|
sample space
|
the set of all possible outcomes
|
|
block design
|
the random assignment of units to treatments is carried out separately within each block
|
|
complement
|
the event does not occur
|
|
residual
|
the scatterplot of the regression residuals against the explanatory variable
|
|
lack of realism
|
the most serious potential weakness of experiments
|
|
event
|
any outcome or a set of outcomes of a random phenomenon
|
|
randomization
|
the use of chance to divide experimental units into groups
|
|
experimental units
|
the individuals on which the experiment is done
|
|
first quartile
|
the median of the lower half of the data
|
|
Simpson's paradox
|
refers to the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group
|
|
sample
|
items selected at random from a population and used to test hypotheses about the population
|
|
census
|
attempts to contact every individual in the entire population
|
|
disjoint events
|
events that have no outcomes in common
|
|
simple random sample
|
consists of individuals from the population chosen in such a way that every set of individuals has a equal chance to be the sample actually selected
|
|
individuals
|
the objects described by a set of data
|
|
confounding variables
|
factors that cause differences between the experimental group and the control group other than the independent variable
|
|
matched pairs
|
a common form of blocking for comparing just two treatments
|
|
seasonal variation
|
a pattern that repeats itself at regular time intervals
|
|
joint probability
|
the probability of two events occurring together
|
|
mean
|
the ordinary arithmetic average or most common measure of center
|
|
intercept
|
the value of y hat when x= 0
|
|
tree diagram
|
a branching diagram that shows all possible combinations or outcomes of an event
|
|
sample design
|
refers to the method used to choose the sample from the population
|
|
scatterplot
|
a graphed cluster of dots, each of which represents the values of two quantitative variables.
|
|
transforming/reexpressing
|
applying a function such as the logarithm or square root to a quantitative transforming variable
|
|
linear growth
|
increases by a fixed amount in each equal time period
|
|
experiment
|
A research method in which an investigator manipulates one or more factors to observe the effect on some behavior or mental process
|
|
joint event
|
simultaneous occurence of two events; probability is called a joint probability
|
|
treatment
|
a specific experimental condition applied to the units
|