CA.1
Cluster Analysis (CA)
What is CA?
The purpose is to develop a formal classification method
to classify observations into groups (a.k.a., clusters).
These groups contain observations that are similar to
each other.
No prior groups of the observations
Test #1 Answers
STAT 873
Fall 2013
Complete the problems below. Make sure to fully explain all answers and show your work to receive full credit!
1) (26 total points) Suppose researchers want to estimate a person's body fat percentage (Y) by the person's
Project #1 Answers
STAT 873
Fall 2013
Complete the following problems below. Within each part, include your R program output with code inside of it
and any additional information needed to explain your answer. Note that you will need to edit your output a
Means.1
Testing means
Inference procedures for one mean vector
Univariate case
The univariate test for a mean taught in an introductory
statistics class involves the hypotheses:
H0: = 0
Ha: 0
for some constant 0. The test statistic is
where is the sample
R Intro. 1
Introduction to R
The R installation file for Windows can be downloaded from
http:/cran.r-project.org/bin/windows/base/. Select the
Download R 3.*.* for Windows link. You can simply
execute the file on your computer to install (all the
installa
PCA.1
Principal Components Analysis (PCA)
What is PCA?
From Johnson (1998):
PCA involves a mathematical procedure that transforms
a set of correlated variables into a smaller set of
uncorrelated variables called principal components
(PCs). These PCs are l
NNC.1
Nearest neighbor classification (NNC)
Unlike DA, NNC does not rely on any distributional
assumptions! Simply, NNC works by looking at the K
observations closest to an observation of interest. This
observation of interest is classified as coming from
MA.1
Introduction to matrix algebra
The basics
What is a matrix? From the regression analysis book of
Kutner, Nachtsheim, and Neter (2004, p. 176):
A matrix is a rectangular array of elements arranged in
rows and columns
Example:
The dimension or size of
Logistic regression models
The purpose of the next two sub-sections is to examine
how to use logistic and multinomial regression models to
obtain a probability that an observation belongs to a
particular population. The logistic regression model will
be u
Graphics.1
Graphics
Why plot data?
1)Plotting your data should usually be one of the first items
done after you obtain data in order to
Look for trends
Discover unusual observations (outliers)
Suggest items to examine in a more sophisticated
statistica
FA.1
Factor Analysis (FA)
What is FA?
FA has the same objectives as PCA:
1. Discover the true dimension of the data
2. Try to interpret new variables
However, the way FA goes about achieving these
objectives is different than PCA. This can lead to more
ea
DDC.1
Data, distributions, and correlation
The basics
Experimental unit The object in which information is
collected upon. This information is organized into
observed variable values.
Univariate data Information measured on one variable
Multivariate data
DA.1
Discriminant Analysis (DA)
What is DA?
Suppose N observations are known to come from K
different populations (or groups). Each observations
group is known. DA allows for the construction of a
mathematical rule to classify observations into the
popula
Test #2 Answers
STAT 873
Fall 2013
Complete the problems below. Make sure to fully explain all answers and show your work to receive full credit!
1) (16 total points) A representative sample of 200 patients who were admitted to an intensive care unit at a