This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentTopic 1
*Statistics
The science of collecting, classifying, and interpreting data.
*Observational Study
Observe a group and measure quantities of interest. This is passive
data collection, and the purpose of the study is to describe the group.
*Passive Data Collection
One does not attempt to influence the group.
*Experiment
Delibrately impose treatments on groups in order to observe responses. The
purpose is to study whether the treatments cause a change in the responses.
*Population
The entire group of interest.
*Sample
A part of the population selected to draw conclusions about the entire population.
*Census
A sample that attempts to include the entire population.
*Parameter
A concept that describes the population.
*Statistic
Refers to a number produced from a subset of the group of interest;
A number
produced from a sample that estimates a population parameter
*Experimental Group
A collection of experimental units subjected to a difference in
treatment, imposed by the experimenter.
*Control Group
A collection of experimental units subjected to the same conditions as
those in an experimental group except that no treatment is imposed.
*Confounding Effects
When you have multiple factors in a study and you can’t tell which
factor causes a change in the variable of interest.
*Shape of a Distribution
can reveal much information. By shape, we are referring to a
general statement how the data values are distributed.
*Variable
Any characteristic or quantity to be measures on units in a study.
*Categorical Variable
Places a unit into one of several categories. (Ex: gender, race,
political party)
*Quantitative Variable
Takes on numerical values for which arithmetic makes sense. (Ex:
SAT score, # of siblings, cost of textbooks)
*Univariate
Data has one variable.
*Bivariate
Data has two variables.
*Multivariate
Data has three or more variables.
*Frequency
Number of times the value occurs in the data.
*Relative Frequency
Proportion of the data with the value.
*Histogram
Bar graph of binned data where the height of the bar above each bin denotes
the frequency (relative frequency) of values in the bin.
*# of Histogram Bins
(general rule)
# of bins = sqr(# of observations)
*Symmetric Data
Data has roughly the same mirror image on each side of a center value.
*Skewed Data
Data has one side which is much longer than the other relative to the mode
(peak value).
* Skewed to the Right
if there is a cluster of values at the left and the values trail off much
farther to the right than to left.
*Multimodal Data
Data has more than one mode;
Would not expect
the mean to be larger
than the median, the mean to be less than the median, or the mean and the median to be
approximately the same.
*Measures of Central Tendency (typical)
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '09
 young
 Normal Distribution, Standard Deviation, Probability theory, System R

Click to edit the document details