 Mean - resistant? Not Resistant Strata The population divided into subgroups. Confounding Two variables (explanatory variables or lurking variables) are confounded when their effects on a response variable cannot be distinguished from each other. statistic value calculated from data to summarize aspects of the data residual the difference between the actual value of y and the predicted value of y for a specific x value variable the characteristic of the individuals; can take on different values for different individuals; 2 types are categorical and quantitative (ex. in an experiment measuring hippo height differences, height is the ________) Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on trhe vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual. Explanatory variable always on the x-axis. Explanatory variable = x, response = y. If there is no explanatory-response variable relationship, either variable can go on the horizontal axis. event any collection of outcomes from the sample space of a chance experiment retrospective study an observational study in which subjects are selected and then their previous conditions or behaviors are determined variance the standard deviation squared, it is a measure of spread correlation the numerical measure r that measures the strength and direction of the linear relationship outlier an individual value that falls outside the overall pattern stemplot used for small sets of quantitative data; numbers arranged smallest on top, largest on bottom; don't skip #s; cut-off last digit of every #; eliminate duplicates; makes a histogram Median Arrange all observations in order of size, from smallest to largest. If n = odd, M is the center observation. If n = even, M is the mean of the two cneter observations in the ordered list. Convenience sampling Chooses the individuals easiest to reach. 1 Contingeny Table Displays counts (%) of individuals falling into caegories on two or more variables Voluntary Response Sampling People choose to participate, asking people to participate (an online survey). shifting adding a constant to each data value adds the same constant to the mean, the median, and the quartiles, but does not change the standard deviation or IQR sampling variability the natural tendency of randomly drawn samples to differ 5 number summary includes the minimum, first quartile, median, third quartile, & the maximum y-intercept the value of the resonse variable when the explanatory variable is zero Standard deviation The average distance that a typical data point would lie from the average of its distribution normal distributions playing a large roll in statistics, these are rather special in the sense of being average or natural. There are described by a special family of bell-shaped, symmetric density curves called Normal curves. r^2 in Regression The coefficient of determination, r^2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x. (SSM - SSE)/SSM 2 65-95-99.7 Rule Says 68% fall within 1 standard deviation, 95% of data fall within 2 standard deviations, 99.7% of data fall with in 3 standard deviations Null hypothesis The hypothesis that states there is no difference between two or more sets of data. re-express data we do this by taking the logarithm, the square root, the reciprocal, or some other mathematical operation on all values in the data set single blind when the subjects in an experiment do not know if they are in the treatment or control group histogram This breaks the range of values of a variable into classes and displays only the count or percent of the observations that fall into each class. statistically significant the results are too unusual to have occurred by chance Extrapolation The use of a regression line or curve for prediction outside the domain of values of the explanatory varaible x that you used to obtain the line or curve. Such predictions cannot be trusted. 2 standard deviation measure of spread - tells the average distance each value is away from the mean changing center and spread doing this is equivalent to changing its units shape, center, and spread the three descriptions of the overall pattern of distribution Degrees of freedom n-1 of the squared deviations can vary freely. 3 Pilot Small trial run of a survery to see if questions are clear density curve This is a curve that is always on or above the horizontal axis and has area exactly 1 underneath it. This describes the overall pattern of a distribution. The area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval. simple random sample this of sample size n is one in which each set of n elements in the population has an equal chance of selection Slope of the Least-Squares Regression Line b = r (Sy/Sx) This equation says that along the regression line, a change of one standard deviation in x corresponds to a change of r standard deviations in y.
