Terms  Definitions 

Mean 
The average

expected value 
mathematical expectation.

resistant 
Relatively unaffected by extreme values.

retrospective study 
data was already gathered

Observation study 
observes individuals and measures variable of interest but does not attempt to influence the responses

Variance 
The square of the standard deviation

Control Group 
Our baseline in an experiment

Variable 
Any characteristic of an individual that can take different values for different individuals.

distribution 
describes how a quantitative variable behaves. Generally include shape, center, spread, & unusual features.

Independent 
Two events are considered independent if the occurrence of one event does not affect the likelihood (probability) of the occurrence of the other.
Ex: Event #1: Subject is a Male Event #2: Subject is Pregnant (these two events are not independent) General probability rule for "and" probabilities... SEE FORMULAs: 
Anecdotal Evidence 
Evidence that is haphazardly collected and is usually based on a few striking individual cases and as such is not representative of the population as a whole and is not reliable.

Skewed 
not symmetricalcould be to the left or right

SingleBlind 
When one (subject or evaluators) is blinded

Symmetric 
A distribution where two sides are mirror images of each other

control 
the effects of lurking variables on the response, most simply by comparing two or more treatments

sample surverys 
estimates populations parameter so the sample needs to be representative as possible

range 
the maximum data value minus the minimum data value

Center 
use either the median or mean depending on the skewness of the data.

addition rule 
method for finding the probability that either or both of two events occurs

Variable of Interest 
The characteristic or trait being measured/observed

mathematical model 
An equation used to imitate a relationshipusually used for prediction

Response Bias 
Anything in a survey design that influences responses

scatterplot 
a graphed cluster of dots, each of which represents the values of two variables. The slope of the points suggests the direction of the relationship between the two variables. The amount of scatter suggests the strength of the correlation (little scatter indicates high correlation). (Also called a scattergram or scatter diagram.) (Myers Psychology 8e p. 031)

Experiment 
A study which imposes a treatment or intervention on individuals in order to observe their responses.

Discrete variables 
a variable that assumes values that can be counted

simple random sample 
abbreviated SRS, this requires that every item in the population has an equal chance to be chosen and that every possible combination of items has an equal chance to exist. No grouping can be involved.

modified boxplot 
a display for quantitative data that graphs the fivenumber summary on an axis and shows outliers of they exist

random error 
error that has a random distribution and can be attributed to chance

Observational Study 
A study in which no treatment is imposed. Such studies include research of available data or sample surveying.

discrete numerical data 
possible values are isolated points on the number line

Central Limit Theorem 
The sampling distribution model of the sample mean and proportion from a random sample is approximately normal for large n regardless of the distribuion of the population as long as observations are independent

probability 
a number between 0 and 1 that reports the likelihood of the event's occurence

Linear Transformations 
when all data is multiplied by a constant, or a constant is added or subtracted, or a combination of the two operations.
FORMULA: **multiplying by a constant changes the center and spread by that same constant. **adding a constant changes the center but not the spread by that constant. 
right skewed 
A distribution with a long tail on the right side.

Mode 
Hump or local high point in shape of a distribution

Variences s² 
This of a set of observations is the average of the square of the deviations of the observations from their mean.

Direction, Form, and Strength 
the three things you look for when examining a scatterplot

least squares regression line 
also know as the regression line or line of best fit it is the line that minimizes the sums of the squares of the vertical distances from the actual points to the line

Simple Random Sample (of size = n) 
A sample in which a) Every individual in a given population is given an equal probability of making it into the sample b) Every sample of a given size = n has an equal probability of actually being the sample chosen

RA 
random assignment

categorical 
individual observations are categorical responses

Factor 
Explanatory variables in an experiment

categorical (quantitative) data 
trait, quality, category

Minimum 
The smallest number in a distribution

blocking 
using extraneous factors to create experimental groups that are similar with respect to those factors, thereby filtering out their effect

srs 
population is divided into several subpops and then srs is drawn from each one

sample 
a representative subset of a population, examined in hope of learning about the population

Parameter 
A numerical summary that describes a variable's distribution in a given population such as mu = pop mean, and sigma = pop standard deviation.

Interquartile Range 
The difference between the 3rd quartile and the 1st quartile

Independence 
When the probability of one even occurring has no affect on the probability of a second event occurring

nonresponse 
occurs when an individual chosen for the sample can't be contacted or does not cooperate

5 W's 
who, what, when, where, why, how

population 
the entire group of individuals or instances about whom we hope to learn

Undercoverage/NonResponse Bias 
a situation when certain individuals are left out of the sample or individuals are not included in the data set that were originally part of the sample.

standard error 
standard deviation of a distribution of a sample statistic

left skewed, right skewed, or approximately symmetric 
Descriptions of shape.

lurking variables 
a variable that has an important effect on the relationship among the variables in a study but is not included among the variables studied

placebo treatment 
a treatment that resembles the other treatments in an experiment in all apparent ways but that has no active ingredients

Statistically Significant 
When an observed difference is too large for us to believe that it is likely to have occurred naturally

block design 
a random assignment of units to treatments is carried out separately within each block

Individuals 
the objects described by a set of data. They may be people, animals or things.

Stratified sample 
a sample obtained by dividing the population into subgroups, called strata, according to various homogeneous characteristics and then selecting members from each stratum

yintercept 
the value of the resonse variable when the explanatory variable is zero

Convenience sampling 
choosing those who are easiest to reach as your sample.

median 
value for which half the numbers are larger and half are smaller

regression line 
A straight line that describes how a response variable y changes as an explanatory variable x changes

Response Variable 
Variabvles that you hope to predict or explain (Y)

Double Blind 
a test procedure in which the identity of those receiving the intervention is concealed from both the administrators and the subjects until after the test is completed

segmented bar charts 
divide up area of rectangle according to categories

single blind 
when the subjects in an experiment do not know if they are in the treatment or control group

Simple Randomized Comparative Design: 
treatment groups are selected randomly and comparisons are drawn across groups.

confounding 
the effect (if any) of x on y is confounded with the effect of a lurking variable z

Binomial Probability Model 
Counts the number of successes in n trials

Interval level of measurement 
a measurement level that ranks data and in which the precise differences between units of measure exist

frequency distribution for categorical data 
a table that displays the possible categories along with the associated frequencies or relative frequencies

Power Regression 
a line of fit for data that shows a power relationship. To determine this, you would have to determine that the shape of the distribution is not exponential.
A log  log transformation on the explanatory and response variable will straighten out our data and allow us to use linear regression. 
Third Quartile 
75th percentile

Se 
Standard Deviation of residuals

randomization 
random assignment of experimental units to treatments or of treatments to trials

correlation measures the strength of only this type of relationship 
linear

Matched Pairs 
an observational technique that involves matching each participant in the experimental group with a specific participant in the control group in order to eliminate the possibility that a third variable (and not the independent variable) caused changes in the dependent variable

stemplot 
Shows the overall shape of a distribution while retaining each numerical value of the data.

Percentile 
The percent of the distribution that is at or to the left of the observation

Undercoverage 
When part of a population is represented less than another

Experimental Design 
A design in which researchers manipulate an independent variable and measure a dependent variable to determine a causeandeffect relationship

Dependent variable 
a variable in correlation and regression analysis that cannot be controlled or manipulated

principles of experimental design 
control, randomize, replicate, block

Replacement 
placing cards/names etc. back into the sample space to ensure probability is preserved.

multiplication rule 
method for finding the probability that both of two events occur

experimental units 
Individuals on which teh experiment is done

Categorical Variable 
records which of several groups or categories an individual belongs to

doubleblind experiment 
an experiment in which neither the subjects nor the individuals who measure the response know which treatment was received

Confidence Interval 
An inverval used for estimating a parameter

design 
refers to the method used to choose the sample from the population

population parameter 
a numerically valued attribute for a poppulation ex: mean income of all employed people

Confounding variable 
a variable that influences the outcome variable but cannot be separated from the other variables that influence the outcome variable

bar chart 
a display for categorical data that uses bar height to represent counts or percentages for each category

Simulation 
the imitation of chance behavior on a model that accurately reflects an experiment under consideration
**Make sure your probability model mimics that of the true experiment... 
Positively Associated 
High values of one variable tend to occur together

Standard Deviation 
Measure of spread  tells the average distance each value is away from the mean

population of interest 
all of the individuals from which subjects for an experiment may be drawn

Data value or datum 
a value in a data set

Continuous Random Variable 
a random variable with an infinite number of possible outcomes. The probability distributions of a continuous random variable are shown as density curves. Probabilities are found by finding the area under the curve.
P (X = A) = 0 *Recall that the probability for any given event of a continuous random variable is zero! Remember that P(X>x) = P(X>x) for a CONTINUOUS random variable (fix formula) 
cumulative relative frequency plot 
a graph of a cumulative relative frequency distribution

Pilot 
Small trial run of a survey to see if questions are clear

advantage of histogram 
easy to see shape of distribution & good for large data sets

1 in k systematic sampling 
a sample selected from an ordered arrangement of a population by choosing a starting point at random from the first k individuals on the list and then selecting every kth individual thereafter

Simple Random Sample (SRS) 
The best way to sample by size n consist of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected.

