Fall 2015
Stat 407 Homework 2
Name_
Reading Assignment: WearestillinChapters1and2ofEverittandHothorn.Themultivariatenormal
distributionisdiscussedinsection1.6ofChapter1.
Written Assignment Due:Monday,September21,2015,inclass.
Althoughyoumayworktogether,yo
692
log
log
x1i
= 0 + 1x1i + 2x2
1i + 3 x2i e
= 0 + 1x1i + 2x2i + . . . + pxpi
i
1 i
i
1 i
1 i = P (Yi = 2 | (x1i, x2i, . . . , xpi)
Logistic regression relates the log-odds that the i-th case
was sampled from population 1 to a linear function of
some par
xp
p
p1 p2
11 12
1
2
21 22
= . and = .
.
.
.
.
is the p p population covariance matrix.
x
x1
x
2
= . and
.
where
1
100
1p
2p
.
pp
1
exp[ (x )1(x )],
f (x) =
p/2
1/2
2
(2) |
Now consider a p 1 random vector X = [X1, X2, . . . , Xp].
The d
0 )2
(X
0 )
0)(s2)1(X
= n(X
s2/n
153
i.e., the squared standardized distance exceeds the upper
percentile of a central F-distribution with (1, n 1) df.
0 ) > t 2
0)(s2)1(X
n(X
(n1) (/2)
We reject H0 when
is large.
t2 =
Notice that rejecting H0 whe
Statistics 407 Homework 1
ReadingAssignment:HothornandEveritt,Chapters1and2
WrittenAssignmentDueDate:Wednesday, September 9, 2015, in class. Please submit a paper
copy of your solutions. Be sure to put your name on the
paper you submit.
Althoughyoumayw
Stat 407
Exam 1 Solutions
Fall 2015
1. (56pts) The following data set contains measurements on breakfast cereals for a single
serving.
Cereal
Calories
Total Carbs
Fiber
Sugars
(mg)
(g)
(g)
Protein
(g)
1
Grape.nuts
200
47
5
7
6
2
Raisin.Bran
170
43
7
18
4
lab_1.R
Feiyi Wu
Exercise 1:
a
b
c
d
d
<-
4+2
a * 3
b - 6
c/3
# [1] 4
tips<-read.csv("http:/www.ggobi.org/book/data/tips.csv")
dim(tips)
# [1] 244
8
head(tips)
#
#
#
#
#
#
#
1
2
3
4
5
6
obs totbill tip sex smoker day time size
1
16.99 1.01
F
No Sun Night
#6.
(a)
Crossvalidation Method:
Mg
Actual
Sex
And
species
Mg male
Mg
female
Mm
male
Mm
female
Mf male
Mf
female
Classification
Mm
male
female
3
0
male
9
female
4
6
12
2
1
2
2
Mf
male
0
female
0
1
0
0
2
4
2
0
2
4
9
0
0
0
0
1
0
13
3
0
0
0
0
2
17
The estimat
439
A factor analysis would use correlations among responses to
the 80 questions to determine if they can be grouped into
six sub-groups that reect variation in six postulated factors.
Marketing researchers postulate that consumer choices are
based on a
348
350
If multivariate normality holds, however, some distributional
properties of PCs can be established.
PCs do do not require the assumption of multivariate
normality.
Corresponding eigenvalues give the variation in principal
component scores in th
620
For example, individuals who are good credit risks may
constitute population 1 and those who are bad risks may
constitute population 2. Variables that may be used to
discriminate between populations or to classify individuals
into one of the two popu
X11
X21
.
Xp1
X2 =
X12
X22
.
Xp2
Xn =
X1n
X2n
.
Xpn
536
Better data better clusters: The decision on which
variables to collect information is important. There may be
10 variables that can jointly produce very useful clusters. If
your data set does n
253
6. X11, X12, ., X1n1 are independent of X21, X22, ., X2n2 .
5. X21, X22, ., X2n2 are independent.
4. X11, X12, ., X1n1 are independent
3. 1 = 2
2. X21, X22, ., X2n2 N (2, 2).
1. X11, X12, ., X1n1 N (1, 1).
The following assumptions are needed to make