Stat Exam
#### Complete list of Terms and Definitions for Stat Exam

Q3 Upper quartile
Variance E(X^2) - E(X)^2
Heterogeneous Not similar in makeup
calculating correlation, given a percentage of the correlation.. convert to non percentage, and square root it. ..
mean the arithmetic average of a distribution obtained by adding the scores and the dividing them by the number of scores
Midrange xmin+xmax/2. Pro: Easy to understand and calculate. Con: Influenced by extreme values and ignores most data values.
interval scale *equality of units-->equal distance between observation points on the scale -can specify direction like ordinal scale but also indicate the difference as well
stratified sampling divides the population of interest into non-overlapping homogeneous subgroups, then separate random samples are independently selected from each subgroup
COMPLEMENT THE EVENT THAT A DOES NOT OCCUR
Standard Deviation The square root of the variance.
Mean of differences is always equal to Xa-Wb
bell-shaped curve frequency curve that resembles the outline of a bell, as the normal curve
experiment A study where a treatment is deliberately imposed on each individual in the study before responses are measured in order to observe responses to the treatment. A valid experiment must have 1) control or comparison, 2) randomization and 3) replication.
statistical significance a statistical statement of how likely it is that an obtained result occurred by chance.
Variable Item that can vary or take on different values
Ordinal Scale The data have the properties of nominal data but the order or rank of the data is meaningful. Nonnumeric or Numeric code can be used.
Sign of correlation coefficent indicates pos or neg association.
dependent variable variable whose value is determined by the values assumed by other variables
Completely Randomized Type of experiment in which all experimental units have an equal chance of receiving any treatment
Standard scores data values that lies above or below the meanz= standard score= data value-mean ---------------- Standard Deviation
Stem and Leaf simple arithmetic and easy to draw pictures that can be used to summarize data quickly. Shows both the rank order and the shape of distribution of data.
correlation coefficient number that is a measure of the strength and direction of the&nbsp;correlation&nbsp;between two&nbsp;variables
Number of groups = (# of A levels)(# of B levels)
Scatter plots don't show? cause and effect! Only experiments do that. Show correlation or not.
Three Principles of Experimental Design Control Group; Comparison of Tx group so you can separate those effects of lack of realism (double-blinded is best)Randomization: Randomly assign people into groups (does not depend on characteristics or rely on the judgement of the experimenter)Replication: Use enough experimental units to reduce chance variation (use as many subjects as possible)
The distribution of sample means is the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population Under what conditions will the distribution of sample means be normal?
What is a nominal scale? A nominal scale is a set of categories that have no numerical order
Standard deviation of x bar (standard deviation of the sampling distribution of x bar) A measure of the variability of the values of the statistic x bar about mu; a measure of the variability of the sampling distribution of x bar, in other words, the "average" amount that the statistic x bar deviates from its associated parameter
Empirical Rule 34, 13.5, 2.35
Ratio scale examples Kelvin, height, weight
X The symbol for explanatory variable
Degrees of freedom for Pearson = N-2
Outlier unusually small or unusually large value in a data set
Random A phenomenon that describes the uncertainty of individual outcomes but give sa regular distribution of the outcomes in the long run
Census method of measuring a variable for every unit of a population
discrete can take on only particular values and not other values
Context The context ideally tells who was measured, what was measured, how the data were collected, where the data were collected, and when and why the study was performed.
randomize use of impersonal chance to assign experimental units to treatments
probability likelihood of the occurrence of an event
Voluntary Response Sampling design where the individuals can choose on their own whether to participate in the sample
lagged variables taking varible in data (dependent) to see if sales this month is correlated to past months. Lag 1 is one period back. Lag 2 is 2 months back
Dispersion How much variation is there in the data? How spread out are the data values? Are there unusual values?
How to critical value? Degrees of freedom Alpha level= 0.05 Alternative hypothesis (directional or non-directional)
Sampling variability The variability of sample results from one sample to the next—something we must measure in order to effectively do inference. Margin of error only covers sampling variability. Significance level (α): See “Level of Significance” above.
double blind neither the subject nor the doctor, nurse or whomever is diagnosing the results knows which treatment the subject received.
Slope Value of Beta 1 tells you "if the value of x goes up by one unit, you expect y to go up by a certain (slope) amount"Not causation, correlation
Popular Variance The sum of squared deviations from the mean divided by the population size.
Descriptive statistics used to summarize and describe a set of data
Parameter symbols mu, sigma, and p (mean of population, standard deviation of population, proportion of a population)
Using the same formula would mean that the sample variance would underestimate the population variance. The slightly different formula corrects the underestimation. Define standard deviation.
Statistic Symbols x bar, s, p hat (mean of sample, standard deviation of sample, proportion of sample)
A nonsampling error is a bias because it is a directional error. - cannot be reduced by simply increasing the size of the sample.
Margin of error for 95% confidence The maximum amount that a statistic value will differ from the parameter value for the middle 95% of the distribution of all possible statistics. (Note: 95% can be changed to any other level of confidence.)
center and spread, clusters and gaps, outliers and other unusual features, shape the four things to always mention when describing a plot
What do you look for to see how good a survey actually is? Wording of exact question posed.Amount of nonresponse.Sampling Design.Date of the Survey.
Uniform Variance (b-a)^2/2
Assumptions Residual distributed normalIndependent (If you have cross sextional data)mean = 0, constant variance (in between those 2 bars)
BOX PLOTS L=LOWEST DATA VALUE Q1=FIRST QUARTILE Q2,Md=MEDIAN Q3=THIRD QUARTILE IQR=Q3-Q1 H=HIGEST VALUE
Mu The mean of the population
EVENT A POSSIBLE OUTCOME OF AN EXPERIMENT
Scatterplot Shows the relationship between two quantitative variables measured on the same individuals.
Process Sequence of operations used in production, manufacturing, etc.
If P value is small slope is 0
99.7% Fall within 3 standard deviation of the mean
Conditional Distribution The distribution of a variable restricting the Who to consider only a smaller group of individuals is called a conditional distribution
Form is there a straight line relationship between the variables? Does the graph curve slightly or sharply either up or down? Can you see a pattern?
correlated degree to which two variables are associated
sample median single middle value of odd ordered data set or average of two middle values of even ordered data set
Intercept Beta notdoes not mean a whole lot oftentimes
Skewed Right (Positively skewed) The mean exceeds the median.
Mutually exclusive when it&rsquo;s is impossible for both outcomes to occur for a given individual
alpha the first letter of the Greek alphabet (A, &Icirc;&plusmn;).
completely randomized design An experimental design where all individuals participating in the experiment are assigned at random to the treatments.
t-tests on each coefficient Null hypothesis: Beta=0 (reject if p-value less than alpha)
percentile rank how many people scored at or below desired score
Sampling error error due to surveying a sample rather than taking a census of the entire population.
What is a population? The entire set of participants (or objects) that are of interest to the research question.
Process in statistical control A process whose inputs and outputs exhibit natural variation when observed over time
Split Stem is good to use when you have a large number of values very close together. 5:768 6:000112223333333344444444444 6:5555555555555555555556666666666666 6:77777777888889999999999999999 7:34567
Conditions necessary for a one-sample t procedure (using t* for C.I. or getting P-value from t table): Normality of the original population & SRS. (Note: a t-distribution is robust with respect to nonnormality provided no outliers and no strong skewness. So, we can use a t-distribution procedure when n < 40 provided the data have no outliers. We must have an SRS however.) Check (1) data collection and (2) if n < 40, check for outliers in data plot; if n ≥ 40, apply CLThm
Correlations of Y and X's want to be close to 1 or -1, sign means slope is up or down, also use to see if there is multicollinarity. Even if it is low by itself, it is possible that it may be correlated with the other residuals from the other variables. May be significant with more variables
How do you determine if a probability is due to chance? look at the alpha level A result is due to chance, if the probability of obtaining it is greater than the alpha level (fail to reject) A result is due to &ldquo;nonchance&rdquo; (reflective of other factors) if the probability of obtaining it is less than the alpha level. (reject) THIS MEANS IT IS STASTICALLY SIGNIFICANT