| Terms |
Definitions |
|
Q3
|
Upper quartile
|
|
Variance
|
E(X^2) - E(X)^2
|
|
Heterogeneous
|
Not similar in makeup
|
|
calculating correlation, given a percentage of the correlation.. convert to non percentage, and square root it.
|
..
|
|
mean
|
the arithmetic average of a distribution obtained by adding the scores and the dividing them by the number of scores
|
|
Midrange
|
xmin+xmax/2. Pro: Easy to understand and calculate. Con: Influenced by extreme values and ignores most data values.
|
|
interval scale
|
*equality of units-->equal distance between observation points on the scale -can specify direction like ordinal scale but also indicate the difference as well
|
|
stratified sampling
|
divides the population of interest into non-overlapping homogeneous subgroups, then separate random samples are independently selected from each subgroup
|
|
COMPLEMENT
|
THE EVENT THAT A DOES NOT OCCUR
|
|
Standard Deviation
|
The square root of the variance.
|
|
Mean of differences is always equal
|
to Xa-Wb
|
|
bell-shaped curve
|
frequency curve that resembles the outline of a bell, as the normal curve
|
|
experiment
|
A study where a treatment is deliberately imposed on each individual in the study before responses are measured in order to observe responses to the treatment. A valid experiment must have 1) control or comparison, 2) randomization and 3) replication.
|
|
statistical significance
|
a statistical statement of how likely it is that an obtained result occurred by chance.
|
|
Variable
|
Item that can vary or take on different values
|
|
Ordinal Scale
|
The data have the properties of nominal data but the order or rank of the data is meaningful. Nonnumeric or Numeric code can be used.
|
|
Sign of correlation coefficent
|
indicates pos or neg association.
|
|
dependent variable
|
variable whose value is determined by the values assumed by other variables
|
|
Completely Randomized
|
Type of experiment in which all experimental units have an equal chance of receiving any treatment
|
|
Standard scores
|
data values that lies above or below the meanz= standard score= data value-mean ---------------- Standard Deviation
|
|
Stem and Leaf
|
simple arithmetic and easy to draw pictures that can be used to summarize data quickly. Shows both the rank order and the shape of distribution of data.
|
|
correlation coefficient
|
number that is a measure of the strength and direction of the correlation between two variables
|
|
Number of groups =
|
(# of A levels)(# of B levels)
|
|
Scatter plots don't show?
|
cause and effect! Only experiments do that. Show correlation or not.
|
|
Three Principles of Experimental Design
|
Control Group; Comparison of Tx group so you can separate those effects of lack of realism (double-blinded is best)Randomization: Randomly assign people into groups (does not depend on characteristics or rely on the judgement of the experimenter)Replication: Use enough experimental units to reduce chance variation (use as many subjects as possible)
|
|
The distribution of sample means is the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population
|
Under what conditions will the distribution of sample means be normal?
|
|
What is a nominal scale?
|
A nominal scale is a set of categories that have no numerical order
|
|
Standard deviation of x bar (standard deviation of the sampling distribution of x bar)
|
A measure of the variability of the values of the statistic x bar about mu; a measure of the variability of the sampling distribution of x bar, in other words, the "average" amount that the statistic x bar deviates from its associated parameter
|
|
Empirical Rule
|
34, 13.5, 2.35
|
|
Ratio scale examples
|
Kelvin, height, weight
|
|
X
|
The symbol for explanatory variable
|
|
Degrees of freedom for Pearson =
|
N-2
|
|
Outlier
|
unusually small or unusually large value in a data set
|
|
Random
|
A phenomenon that describes the uncertainty of individual outcomes but give sa regular distribution of the outcomes in the long run
|
|
Census
|
method of measuring a variable for every unit of a population
|
|
discrete
|
can take on only particular values and not other values
|
|
Context
|
The context ideally tells who was measured, what was measured, how the data were collected, where the data were collected, and when and why the study was performed.
|
|
randomize
|
use of impersonal chance to assign experimental units to treatments
|
|
probability
|
likelihood of the occurrence of an event
|
|
Voluntary Response
|
Sampling design where the individuals can choose on their own whether to participate in the sample
|
|
lagged variables
|
taking varible in data (dependent) to see if sales this month is correlated to past months. Lag 1 is one period back. Lag 2 is 2 months back
|
|
Dispersion
|
How much variation is there in the data? How spread out are the data values? Are there unusual values?
|
|
How to critical value?
|
Degrees of freedom
Alpha level= 0.05
Alternative hypothesis (directional or non-directional)
|
|
Sampling variability
|
The variability of sample results from one sample to the next—something we must measure in order to effectively do inference. Margin of error only covers sampling variability. Significance level (α): See “Level of Significance” above.
|
|
double blind
|
neither the subject nor the doctor, nurse or whomever is diagnosing the results knows which treatment the subject received.
|
|
Slope
|
Value of Beta 1 tells you "if the value of x goes up by one unit, you expect y to go up by a certain (slope) amount"Not causation, correlation
|
|
Popular Variance
|
The sum of squared deviations from the mean divided by the population size.
|
|
Descriptive statistics
|
used to summarize and describe a set of data
|
|
Parameter symbols
|
mu, sigma, and p (mean of population, standard deviation of population, proportion of a population)
|
|
Using the same formula would mean that the sample variance would underestimate the population variance. The slightly different formula corrects the underestimation.
|
Define standard deviation.
|
|
Statistic Symbols
|
x bar, s, p hat (mean of sample, standard deviation of sample, proportion of sample)
|
|
A nonsampling error
|
is a bias because it is a directional error.
- cannot be reduced by simply increasing the size of the sample.
|
|
Margin of error for 95% confidence
|
The maximum amount that a statistic value will differ from the parameter value for the middle 95% of the distribution of all possible statistics. (Note: 95% can be changed to any other level of confidence.)
|
|
center and spread, clusters and gaps, outliers and other unusual features, shape
|
the four things to always mention when describing a plot
|
|
What do you look for to see how good a survey actually is?
|
Wording of exact question posed.Amount of nonresponse.Sampling Design.Date of the Survey.
|
|
Uniform Variance
|
(b-a)^2/2
|
|
Assumptions
|
Residual distributed normalIndependent (If you have cross sextional data)mean = 0, constant variance (in between those 2 bars)
|
|
BOX PLOTS
|
L=LOWEST DATA VALUE
Q1=FIRST QUARTILE
Q2,Md=MEDIAN
Q3=THIRD QUARTILE
IQR=Q3-Q1
H=HIGEST VALUE
|
|
Mu
|
The mean of the population
|
|
EVENT
|
A POSSIBLE OUTCOME OF AN EXPERIMENT
|
|
Scatterplot
|
Shows the relationship between two quantitative variables measured on the same individuals.
|
|
Process
|
Sequence of operations used in production, manufacturing, etc.
|
|
If P value is small
|
slope is 0
|
|
99.7%
|
Fall within 3 standard deviation of the mean
|
|
Conditional Distribution
|
The distribution of a variable restricting the Who to consider only a smaller group of individuals is called a conditional distribution
|
|
Form
|
is there a straight line relationship between the variables? Does the graph curve slightly or sharply either up or down? Can you see a pattern?
|
|
correlated
|
degree to which two variables are associated
|
|
sample median
|
single middle value of odd ordered data set or average of two middle values of even ordered data set
|
|
Intercept
|
Beta notdoes not mean a whole lot oftentimes
|
|
Skewed Right (Positively skewed)
|
The mean exceeds the median.
|
|
Mutually exclusive
|
when it’s is impossible for both outcomes to occur for a given individual
|
|
alpha
|
the first letter of the Greek alphabet (A, α).
|
|
completely randomized design
|
An experimental design where all individuals participating in the experiment are assigned at random to the treatments.
|
|
t-tests on each coefficient
|
Null hypothesis: Beta=0 (reject if p-value less than alpha)
|
|
percentile rank
|
how many people scored at or below desired score
|
|
Sampling error
|
error due to surveying a sample rather than taking a census of the entire population.
|
|
What is a population?
|
The entire set of participants (or objects) that are of interest to the research question.
|
|
Process in statistical control
|
A process whose inputs and outputs exhibit natural variation when observed over time
|
|
Split Stem
|
is good to use when you have a large number of values very close together.
5:768
6:000112223333333344444444444
6:5555555555555555555556666666666666
6:77777777888889999999999999999
7:34567
|
|
Conditions necessary for a one-sample t procedure (using t* for C.I. or getting P-value from t table):
|
Normality of the original population & SRS. (Note: a t-distribution is robust with respect to nonnormality provided no outliers and no strong skewness. So, we can use a t-distribution procedure when n < 40 provided the data have no outliers. We must have an SRS however.) Check (1) data collection and (2) if n < 40, check for outliers in data plot; if n ≥ 40, apply CLThm
|
|
Correlations of Y and X's
|
want to be close to 1 or -1, sign means slope is up or down, also use to see if there is multicollinarity. Even if it is low by itself, it is possible that it may be correlated with the other residuals from the other variables. May be significant with more variables
|
|
How do you determine if a probability is due to chance?
|
look at the alpha level
A result is due to chance, if the probability of obtaining it is greater than the alpha level (fail to reject)
A result is due to “nonchance” (reflective of other factors) if the probability of obtaining it is less than the alpha level. (reject)
THIS MEANS IT IS STASTICALLY SIGNIFICANT
|