NCSU Stat 311 Terms Flashcards

Terms Definitions
the process of learning about a population by studying a sample
sample regression
estimates the association between x and y in the entire population
regression line
an estimate from a sample trying to describe the true regression line from the population
observational study
a statistical study in which the subjects are not modified (just observed) so that researchers can measure and record certain characteristics
experiment (experimental study)
A statistical study in which a "treatment" is applied to the subjects (i.e. they are modified) and researchers measure the effect of the treatment
lurking variable (confounding variable)
-other variables that may influence the response that are not studied
explanatory variable
variable that explains or causes the differences in another variable, ( "x" or independent variable)
response variable
variable which is thought to depend on the value of the explanatory variable, ("y", dependent variable)
study question
the question about the population that the study is attempting to answer
the complete set of all individuals/objects the study is attempting to answer a question about, the whole group of individuals we are interested in
study subjects
the individuals actually measured in the study (i.e. the selected sample of individuals/objects from the population)
what the research does/gives to some or all of the study subjects; the factor whose effect is under study; also called the explanatory variable
response variable
the quantity or characteristic that is measured to determine the treatment effect
control group
group of subjects that have the same sources of variability as those receiving the treatment but does NOT receive treatment; sometimes called the placebo group
confounding factor
any factor other than the experimental treatment that can affect the response variable in the experiment
completely randomized design
a design in which the treatments in the experiment are randomly assigned to the experimental units without using matched pairs or blocks
people who make measurements
single blinding
subject doesn't know if he/she is in the treatment or control group
double blinding
neither RESEARCHERS nor SUBJECTS know where the participants are assigned between the control and treatment group
matched pair design
makes two measures on each subject
blocking design
-extension of completely randomized design
- put similar subjects into blocks, expect the blocks to differ with respect to the response variable
-then do a completely randomized experiment within each block
a group of subjects that are similar in some way
"blocks" refers to ...
"experimental units" refers to...
repeated time periods in which the blocks receive the varying treatments
scatter plot
used to compare variables
-must measure two variables on a common individual (an individual can be a person, place, or even time)
-then plot the two variables
positive association
this type of association occurs when the value of one variable tends to increase as the value of the other variable increases
negative association
this type of association occurs when the value of one variable tends to decrease as the value of the other variable tends to increase
non-linear association
this type of association occurs when there is no linear relationship between two values
a number that indicates the strength and the association of a straight-line relationship between two quantitative variables
strength of correlation
determined by the absolute value of the correlation, indicates the overall closeness of the points to a straight line
direction of the correlation
determined by the sign of the correlation
magnitude of r
absolute value of r, indicates the strength of the relationship
r = 1 or r = -1
indicates that there is a perfect linear relationship and all data points fall in the straight line
squared correlation, r²
this is the proportion of variation in the response variable that is explained by the explanatory variable. It is positive between 0 and 1.

Referring to a correllation
correlation coefficient, used to measure linear relationship between x and y
the line of best fit
-this estimates the average value of y when you know x and individual's values will vary around the predicted value
- can be used to give a prediction of a value of y, given a specific value of x
randomization test
a test on two groups when paired data is NOT available
sampling frame
a list of all individuals in the population
in hypothesis testing, population parameter =
null value
null hypothesis
-the statement being tested
-a statement that describe some aspect of the statistical behavior of a set of data
-this statement is treated as valid unless the actual behavior of the data contradicts this assumption
null value
-the specific # the parameter equals if the null hypothesis is true
- value of population parameter being tested in the null hypothesis
alternative hypothesis
- a statement that something is happening
- researchers want to prove this
- it may be a statement that the assumed status quo is false, or that there is a relationship, or there is a difference
two types of alternative hypothesis
one sided test, two sided test
one-sided test
when Ha specifies a single direction
two-sided test
when Ha includes values in both directions
the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming Ho is true
level of significance
(α) is the border line for deciding that the p-value is low enough to justify choosing the alternative hypothesis
hypothesis testing about paired differences
matched pairs design
matched pairs design
taking two measures on the same subject to see if there is a difference between the two measurements
paired t-test
a one-sample t-test used on the sample of differences to examine whether the sample mean difference is significantly different from 0
sampling distribution
-describes the possible values the statistic might have when random samples are taken from a population

the distribution of statistics ("xbar" or "p hat") for all possible samples from the same population of a given sample size (n)
statistical inference
gives us methods for drawing conclusions about a population based on data from samples
confidence interval
an interval of values computed from sample data that is likely to include the true population
standard error
is the estimated standard deviation of the sample distribution of the statistic
confidence level
proportion of samples for which the confidence interval will capture the true parameters, % of time we expect the procedure to work, determines how frequently the observed interval contains the parameter
standard error of sample mean
(s) is the sample standard deviation
a number summarized by the same characteristic of the sample data, computed from the sample values, a known value that varies from sample to sample
is the distribution of possible values of the statistic for repeated samples of the same size taken from the same population
sampling distribution
mean of a sampling distribution
the average of all possible values of the statistic for repeated samples of the same size from a population
the standard deviation(SD) of a sampling distribution
measures the average distance of the possible values of the statistic from the mean of the sampling distribution, roughly speaking
there is a difference between N and n!
n= sample size (number of values in one sample/subgroup)
N= number of samples (number of subgroups)
Law of Large Numbers (LLN)
as you average more observations, sample mean settles down at population mean
graphs used for categorical variables
1. pie chart
2. bar graph
graphic representations for quantitative variables
1. histogram
2. stem-and-leaf plot
3. box plot
standard deviation
a value that measures the variability (spread) of data.
density curve
the outline of the histogram which approximates the overall pattern of a distribution

1. Its always on or above the horizontal axis
2. It has area of exactly 1 underneath it
standard normal distribution
-this is a normal distribution with a mean of 0 and a standard deviation of 1
-all other normal distributions are compared to this
(a standardized value) that is the distance between a specified value and the mean, measured in number of standard deviations
observation (individual)
an individual or the value of a single measurement
a characteristic that can differ from one individual to the next
categorical variables
the observational units are being divided into units, there is no special ordering of the categories
ordinal variables
the observational units are being divided into categories which have an order

basically a categorical variable with ordered categories
quantitative variables
-variables that take numerical values
- you should be able to do mathematical operations with these numbers such as adding, multiplying, etc.
(A social security number would not be one of these)
graphs for quantitative variables
1. Histogram
2. Stem-and-Leaf Plot
3. Dot Plot
Pie Chart
each slice of a pie corresponds to a category and the size of the angle of the slice shows the percentage of the individuals in the corresponding category
Bar Graph
-each category is presented as a bar
- the height of the bar represents the number (or percentage) of individuals in the corresponding category
highest value subtract the lowest value
bar graphs for a quantitative range of possible value are broken into categories
actual number of individuals who fall into each interval (of a histogram)
relative frequency
proportion or percentage that are in an interval (of a histogram)
stem and leaf plot
every individual data value is shown
dot plot
display a dot for each observation along a number line
the overall pattern of how often the possible values occur
shape of a distribution
shows how values are distributed in a distribution
location, average, mean and median measure this
unusual values that do not fit with the rest of the pattern
(may be due to data entry errors or may be actual unusual values)
symmetric distribution
one half of the distribution is the mirror image of the other (bell shape)
bimodal distributions
has two peaks which can be caused by two or more groups of values in the sample
multimodal distribution
distribution with several peaks
the middle number of the data when it is ordered, 50% of the data is above it and 50% of the data is below it
two measures of the center
mean and median
symmetric distribution
(mean ? median)
mean = median
right skewed distribution
(mean ? median)
mean is greater than median
left skewed distribution
(mean ? median)
mean is less than median
First Quartile (Q1)
25% of the data is at or below this number
Third Quartile (Q3)
75% of the data is at or below this number
Inter-Quartile Range (IQR)
A value describing the spread over approximately the middle 50% of the data
the five number summary includes
1) maximum
2) minimum
3) Q1
4) median
5) Q3
a graphical representation of the 5 number summary
1.5*IQ Rule
an outlier is any value that lays more than one and a half times the length of the box
measures the distance of all individuals from the mean
sub groups of population which might have different responses to the question of interest
stratified sample
is a collection of samples taken in each stratum of the population
cluster samples
sampling technique used when natural groups are evident in a statistical population
systematic samples
select ever k-th individual from the sampling frame
under coverage
sampling frame does not include all the population
over coverage
sampling frame includes individuals who are not in the population being examined
data entry errors
person recording the data makes mistakes
question wording error
the set up of the question can have a big influence on the answers
definition of statistics
a collection of procedures and principles for gathering data and analyzing information to help people make decisions when face with uncertainty
the objects described by the data set
(each student in the class is an observational unit or individual)
characteristics of the individuals
(max speed, sex of the students, height, time of sleep)
subgroup of the population examined to measure the variables and gather information
a number that describes a characteristic of the population. It is mostly a summary of a population. It's value is unknown.
summary of a sample, the value of this is usually known
taken to measure ALL individuals in the population
selection bias
this method of selection of participants favors a particular outcome
non response bias
some part of the individuals in the sample cannot be reached or do not respond, this creates a bias because respondents may differ in meaningful ways from non-respondents.
response bias
participants give incorrect information
response rate
the proportion of the sample that responded to the question
non-response rate
the proportion of the sample that didn't respond to the question
convenience samples
investigators choose individuals that are easy to reach
volunteer response samples
individuals decide whether to answer the questions or not
simple random sample
statistical significance
a result is unlikely to have occurred just by chance
practical significance
the difference from the claimed value we observe is actually meaningful
numbers in"stem"column of stem and leaf plot
first digit of each number in the data set
numbers in"leaf"column of stem and leaf plot
contains only the last digit of the # regardless of whether it falls before or after the decimal point
/ 128

Leave a Comment ({[ getComments().length ]})

Comments ({[ getComments().length ]})


{[ comment.comment ]}

View All {[ getComments().length ]} Comments
Ask a homework question - tutors are online