Topic 1 *Statistics The science of collecting, classifying, and interpreting data. *Observational Study Observe a group and measure quantities of interest. This is passive data collection, and the purpose of the study is to describe the group. *Passive Data Collection One does not attempt to influence the group. *Experiment Delibrately impose treatments on groups in order to observe responses. The purpose is to study whether the treatments cause a change in the responses. *Population The entire group of interest. *Sample A part of the population selected to draw conclusions about the entire population. *Census A sample that attempts to include the entire population. *Parameter A concept that describes the population. *Statistic Refers to a number produced from a subset of the group of interest; A number produced from a sample that estimates a population parameter *Experimental Group A collection of experimental units subjected to a difference in treatment, imposed by the experimenter. *Control Group A collection of experimental units subjected to the same conditions as those in an experimental group except that no treatment is imposed. *Confounding Effects When you have multiple factors in a study and you can’t tell which factor causes a change in the variable of interest. *Shape of a Distribution can reveal much information. By shape, we are referring to a general statement how the data values are distributed. *Variable Any characteristic or quantity to be measures on units in a study. *Categorical Variable Places a unit into one of several categories. (Ex: gender, race, political party) *Quantitative Variable Takes on numerical values for which arithmetic makes sense. (Ex: SAT score, # of siblings, cost of textbooks) *Univariate Data has one variable. *Bivariate Data has two variables. *Multivariate Data has three or more variables. *Frequency Number of times the value occurs in the data. *Relative Frequency Proportion of the data with the value. *Histogram Bar graph of binned data where the height of the bar above each bin denotes the frequency (relative frequency) of values in the bin. *# of Histogram Bins (general rule) # of bins = sqr(# of observations) *Symmetric Data Data has roughly the same mirror image on each side of a center value. *Skewed Data Data has one side which is much longer than the other relative to the mode (peak value). * Skewed to the Right if there is a cluster of values at the left and the values trail off much farther to the right than to left. *Multimodal Data Data has more than one mode; Would not expect the mean to be larger than the median, the mean to be less than the median, or the mean and the median to be approximately the same. *Measures of Central Tendency (typical)
